public final class TableSnapshotInputFormat extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
Usage is similar to TableInputFormat.
TableMapReduceUtil.initTableSnapshotMapperJob(String, Scan, Class, Class, Class, Job,
boolean, Path)
can be used to configure the job.
{ @code Job job = new Job(conf); Scan scan = new Scan(); TableMapReduceUtil.initSnapshotMapperJob(snapshotName, scan, MyTableMapper.class, MyMapKeyOutput.class, MyMapOutputValueWritable.class, job, true, tmpDir); }
Internally, this input format restores the snapshot into the given tmp
directory. Similar to TableInputFormat
an InputSplit
is created per region.
The region is opened for reading from each RecordReader. An internal
RegionScanner is used to execute the Scan obtained from the user.
HBase owns all the data and snapshot files on the filesystem. Only the HBase user can read from snapshot files and data files. HBase also enforces security because all the requests are handled by the server layer, and the user cannot read from the data files directly. To read from snapshot files directly from the file system, the user who is running the MR job must have sufficient permissions to access snapshot and reference files. This means that to run mapreduce over snapshot files, the MR job has to be run as the HBase user or the user must have group or other priviledges in the filesystem (See HBASE-8369). Note that, given other users access to read from snapshot/data files will completely circumvent the access control enforced by HBase.
Modifier and Type | Class and Description |
---|---|
static class |
TableSnapshotInputFormat.TableSnapshotRegionRecordReader
Snapshot region record reader.
|
static class |
TableSnapshotInputFormat.TableSnapshotRegionSplit
Snapshot region split.
|
Constructor and Description |
---|
TableSnapshotInputFormat() |
Modifier and Type | Method and Description |
---|---|
org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context) |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext job) |
static void |
setInput(org.apache.hadoop.mapreduce.Job job,
String snapshotName,
org.apache.hadoop.fs.Path restoreDir)
Set job input.
|
public org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException
createRecordReader
in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
IOException
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext job) throws IOException, InterruptedException
getSplits
in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
IOException
InterruptedException
public static void setInput(org.apache.hadoop.mapreduce.Job job, String snapshotName, org.apache.hadoop.fs.Path restoreDir) throws IOException
job
- The jobsnapshotName
- The snapshot namerestoreDir
- The directory where the temp table will be createdIOException
- on errorCopyright © 2014 The Apache Software Foundation. All Rights Reserved.