org.apache.hadoop.hbase.mapred
Class MultiTableSnapshotInputFormat
java.lang.Object
org.apache.hadoop.hbase.mapred.TableSnapshotInputFormat
org.apache.hadoop.hbase.mapred.MultiTableSnapshotInputFormat
- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat<ImmutableBytesWritable,Result>
@InterfaceAudience.Public
@InterfaceStability.Evolving
public class MultiTableSnapshotInputFormat
- extends TableSnapshotInputFormat
- implements org.apache.hadoop.mapred.InputFormat<ImmutableBytesWritable,Result>
MultiTableSnapshotInputFormat generalizes .TableSnapshotInputFormat
allowing a MapReduce job to run over one or more table snapshots, with one or more scans
configured for each.
Internally, the input format delegates to .TableSnapshotInputFormat
and thus has the same performance advantages; see .TableSnapshotInputFormat
for
more details.
Usage is similar to TableSnapshotInputFormat, with the following exception:
initMultiTableSnapshotMapperJob takes in a map
from snapshot name to a collection of scans. For each snapshot in the map, each corresponding
scan will be applied;
the overall dataset for the job is defined by the concatenation of the regions and tables
included in each snapshot/scan
pair.
TableMapReduceUtil.initMultiTableSnapshotMapperJob(Map,
Class, Class, Class, JobConf, boolean, Path)
can be used to configure the job.
Job job = new Job(conf);
Map<String, Collection<Scan>> snapshotScans = ImmutableMap.of(
"snapshot1", ImmutableList.of(new Scan(Bytes.toBytes("a"), Bytes.toBytes("b"))),
"snapshot2", ImmutableList.of(new Scan(Bytes.toBytes("1"), Bytes.toBytes("2")))
);
Path restoreDir = new Path("/tmp/snapshot_restore_dir")
TableMapReduceUtil.initTableSnapshotMapperJob(
snapshotScans, MyTableMapper.class, MyMapKeyOutput.class,
MyMapOutputValueWritable.class, job, true, restoreDir);
Internally, this input format restores each snapshot into a subdirectory of the given tmp
directory. Input splits and
record readers are created as described in .TableSnapshotInputFormat
(one per region).
See TableSnapshotInputFormat
for more notes on
permissioning; the
same caveats apply here.
- See Also:
TableSnapshotInputFormat
,
TableSnapshotScanner
Method Summary |
org.apache.hadoop.mapred.RecordReader<ImmutableBytesWritable,Result> |
getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
|
org.apache.hadoop.mapred.InputSplit[] |
getSplits(org.apache.hadoop.mapred.JobConf job,
int numSplits)
|
static void |
setInput(org.apache.hadoop.conf.Configuration conf,
Map<String,Collection<Scan>> snapshotScans,
org.apache.hadoop.fs.Path restoreDir)
Configure conf to read from snapshotScans, with snapshots restored to a subdirectory of
restoreDir. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
MultiTableSnapshotInputFormat
public MultiTableSnapshotInputFormat()
getSplits
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
int numSplits)
throws IOException
- Specified by:
getSplits
in interface org.apache.hadoop.mapred.InputFormat<ImmutableBytesWritable,Result>
- Overrides:
getSplits
in class TableSnapshotInputFormat
- Throws:
IOException
getRecordReader
public org.apache.hadoop.mapred.RecordReader<ImmutableBytesWritable,Result> getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
throws IOException
- Specified by:
getRecordReader
in interface org.apache.hadoop.mapred.InputFormat<ImmutableBytesWritable,Result>
- Overrides:
getRecordReader
in class TableSnapshotInputFormat
- Throws:
IOException
setInput
public static void setInput(org.apache.hadoop.conf.Configuration conf,
Map<String,Collection<Scan>> snapshotScans,
org.apache.hadoop.fs.Path restoreDir)
throws IOException
- Configure conf to read from snapshotScans, with snapshots restored to a subdirectory of
restoreDir.
Sets:
.MultiTableSnapshotInputFormatImpl#RESTORE_DIRS_KEY
,
.MultiTableSnapshotInputFormatImpl#SNAPSHOT_TO_SCANS_KEY
- Parameters:
conf
- snapshotScans
- restoreDir
-
- Throws:
IOException
Copyright © 2007–2016 The Apache Software Foundation. All rights reserved.