org.apache.hadoop.hbase.mapreduce
Class MultiTableSnapshotInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
      extended by org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat
          extended by org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormat

@InterfaceAudience.Public
@InterfaceStability.Evolving
public class MultiTableSnapshotInputFormat
extends TableSnapshotInputFormat

MultiTableSnapshotInputFormat generalizes TableSnapshotInputFormat allowing a MapReduce job to run over one or more table snapshots, with one or more scans configured for each. Internally, the input format delegates to TableSnapshotInputFormat and thus has the same performance advantages; see TableSnapshotInputFormat for more details. Usage is similar to TableSnapshotInputFormat, with the following exception: initMultiTableSnapshotMapperJob takes in a map from snapshot name to a collection of scans. For each snapshot in the map, each corresponding scan will be applied; the overall dataset for the job is defined by the concatenation of the regions and tables included in each snapshot/scan pair. (java.util.Map, Class, Class, Class, org.apache.hadoop.mapreduce.Job, boolean, org.apache .hadoop.fs.Path) can be used to configure the job.

Job job = new Job(conf);
 Map<String, Collection<Scan>> snapshotScans = ImmutableMap.of(
    "snapshot1", ImmutableList.of(new Scan(Bytes.toBytes("a"), Bytes.toBytes("b"))),
    "snapshot2", ImmutableList.of(new Scan(Bytes.toBytes("1"), Bytes.toBytes("2")))
 );
 Path restoreDir = new Path("/tmp/snapshot_restore_dir")
 TableMapReduceUtil.initTableSnapshotMapperJob(
     snapshotScans, MyTableMapper.class, MyMapKeyOutput.class,
      MyMapOutputValueWritable.class, job, true, restoreDir);
 
 
Internally, this input format restores each snapshot into a subdirectory of the given tmp directory. Input splits and record readers are created as described in .TableSnapshotInputFormat (one per region). See TableSnapshotInputFormat for more notes on permissioning; the same caveats apply here.

See Also:
TableSnapshotInputFormat, TableSnapshotScanner

Constructor Summary
MultiTableSnapshotInputFormat()
           
 
Method Summary
 List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
           
static void setInput(org.apache.hadoop.conf.Configuration configuration, Map<String,Collection<Scan>> snapshotScans, org.apache.hadoop.fs.Path tmpRestoreDir)
           
 
Methods inherited from class org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat
createRecordReader, setInput
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MultiTableSnapshotInputFormat

public MultiTableSnapshotInputFormat()
Method Detail

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
                                                       throws IOException,
                                                              InterruptedException
Overrides:
getSplits in class TableSnapshotInputFormat
Throws:
IOException
InterruptedException

setInput

public static void setInput(org.apache.hadoop.conf.Configuration configuration,
                            Map<String,Collection<Scan>> snapshotScans,
                            org.apache.hadoop.fs.Path tmpRestoreDir)
                     throws IOException
Throws:
IOException


Copyright © 2007–2015 The Apache Software Foundation. All rights reserved.