Bulk loading in secure mode is a bit more involved than normal setup, since the client
has to transfer the ownership of the files generated from the mapreduce job to HBase. Secure
bulk loading is implemented by a coprocessor, named SecureBulkLoadEndpoint.
SecureBulkLoadEndpoint uses a staging directory "hbase.bulkload.staging.dir"
,
which defaults to /tmp/hbase-staging/
. The algorithm is as follows.
Create an hbase owned staging directory which is world traversable (-rwx--x--x,
711
) /tmp/hbase-staging
.
A user writes out data to his secure output directory: /user/foo/data
A call is made to hbase to create a secret staging directory which is globally
readable/writable (-rwxrwxrwx, 777
):
/tmp/hbase-staging/averylongandrandomdirectoryname
The user makes the data world readable and writable, then moves it into the random staging directory, then calls bulkLoadHFiles()
Like delegation tokens the strength of the security lies in the length and randomness of the secret directory.
You have to enable the secure bulk load to work properly. You can modify the
hbase-site.xml
file on every server machine in the cluster and add the
SecureBulkLoadEndpoint class to the list of regionserver coprocessors:
<property> <name>hbase.bulkload.staging.dir</name> <value>/tmp/hbase-staging</value> </property> <property> <name>hbase.coprocessor.region.classes</name> <value>org.apache.hadoop.hbase.security.token.TokenProvider, org.apache.hadoop.hbase.security.access.AccessController,org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint</value> </property>