These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements.
This JIRA makes following change: Change Router metrics context from ‘router’ to ‘dfs’.
Mount tables support ACL, The users won’t be able to modify their own entries (we are assuming these old (no-permissions before) mount table with owner:superuser, group:supergroup, permission:755 as the default permissions). The fix way is login as superuser to modify these mount table entries.
Support multi-thread pre-read in AliyunOSSInputStream to improve the sequential read performance from Hadoop to Aliyun OSS.
MapReduce jobs that output to filesystems without direct support for recursive delete can set mapreduce.fileoutputcommitter.task.cleanup.enabled=true to have each task delete their intermediate work directory rather than waiting for the ApplicationMaster to clean up at the end of the job. This can significantly speed up the cleanup phase for large jobs on such filesystems.
Added an option to not disables short-circuit reads on failures, by setting dfs.domain.socket.disable.interval.seconds to 0.
Fix the document error of setting up HFDS Router Federation
Change default State Store from local file to ZooKeeper. This will require additional zk address to be configured.
HBase integration module was mixed up with for hbase-server and hbase-client dependencies. This JIRA split into sub modules such that hbase-client dependent modules and hbase-server dependent modules are separated. This allows to make conditional compilation with different version of Hbase.
Use environment variable HTTPFS_HTTP_HOSTNAME to limit the IP addresses httpfs server binds to. Default: httpfs server binds to all IP addresses on the host.
WASB: Bug fix to support non-sequential page blob reads. Required for HBASE replication.
WASB: Bug fix for recent regression in hflush() and hsync().
WASB: Fix Spark process hang at shutdown due to use of non-daemon threads by updating Azure Storage Java SDK to 7.0
Federation supports and controls global quota at mount table level.
In a federated environment, a folder can be spread across multiple subclusters. Router aggregates quota that queried from these subclusters and uses that for the quota-verification.
WASB: listStatus 10x performance improvement for listing 700,000 files
This change was required to address license compatibility issues with the JSON parser in the older AWS SDKs.
A consequence of this, where needed, the applied patch contains HADOOP-12705 Upgrade Jackson 2.2.3 to 2.7.8.
This patch changed the default build and test environment from Ubuntu “Trusty” 14.04 to Ubuntu “Xenial” 16.04.
This change allows the inode and inode directory sections of the fsimage to be loaded in parallel. Tests on large images have shown this change to reduce the image load time to about 50% of the pre-change run time.
It works by writing sub-section entries to the image index, effectively splitting each image section into many sub-sections which can be processed in parallel. By default 12 sub-sections per image section are created when the image is saved, and 4 threads are used to load the image at startup.
This is disabled by default for any image with more than 1M inodes (dfs.image.parallel.inode.threshold) and can be enabled by setting dfs.image.parallel.load to true. When the feature is enabled, the next HDFS checkpoint will write the image sub-sections and subsequent namenode restarts can load the image in parallel.
A image with the parallel sections can be read even if the feature is disabled, but HDFS versions without this Jira cannot load an image with parallel sections. OIV can process a parallel enabled image without issues.
Key configuration parameters are:
dfs.image.parallel.load=false - enable or disable the feature
dfs.image.parallel.target.sections = 12 - The target number of subsections. Aim for 2 to 3 times the number of dfs.image.parallel.threads.
dfs.image.parallel.inode.threshold = 1000000 - Only save and load in parallel if the image has more than this number of inodes.
dfs.image.parallel.threads = 4 - The number of threads used to load the image. Testing has shown 4 to be optimal, but this may depends on the environment
This change allows the inode and inode directory sections of the fsimage to be loaded in parallel. Tests on large images have shown this change to reduce the image load time to about 50% of the pre-change run time.
It works by writing sub-section entries to the image index, effectively splitting each image section into many sub-sections which can be processed in parallel. By default 12 sub-sections per image section are created when the image is saved, and 4 threads are used to load the image at startup.
This is disabled by default for any image with more than 1M inodes (dfs.image.parallel.inode.threshold) and can be enabled by setting dfs.image.parallel.load to true. When the feature is enabled, the next HDFS checkpoint will write the image sub-sections and subsequent namenode restarts can load the image in parallel.
A image with the parallel sections can be read even if the feature is disabled, but HDFS versions without this Jira cannot load an image with parallel sections. OIV can process a parallel enabled image without issues.
Key configuration parameters are:
dfs.image.parallel.load=false - enable or disable the feature
dfs.image.parallel.target.sections = 12 - The target number of subsections. Aim for 2 to 3 times the number of dfs.image.parallel.threads.
dfs.image.parallel.inode.threshold = 1000000 - Only save and load in parallel if the image has more than this number of inodes.
dfs.image.parallel.threads = 4 - The number of threads used to load the image. Testing has shown 4 to be optimal, but this may depends on the environment.
UPGRADE WARN: 1. It can upgrade smoothly from 2.10 to 3.* if not enable this feature ever. 2. Only path to do upgrade from 2.10 to 3.3 currently when enable fsimage parallel loading feature. 3. If someone want to upgrade 2.10 to 3.*(3.1.*/3.2.*) prior release, please make sure that save at least one fsimage file after disable this feature. It relies on change configuration parameter(dfs.image.parallel.load=false) first and restart namenode before upgrade operation.