org.apache.hadoop.hbase.mapreduce
Class TableSnapshotInputFormatImpl

java.lang.Object
  extended by org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl

@InterfaceAudience.Private
@InterfaceStability.Evolving
public class TableSnapshotInputFormatImpl
extends Object

API-agnostic implementation for mapreduce over table snapshots.


Nested Class Summary
static class TableSnapshotInputFormatImpl.InputSplit
          Implementation class for InputSplit logic common between mapred and mapreduce.
static class TableSnapshotInputFormatImpl.RecordReader
          Implementation class for RecordReader logic common between mapred and mapreduce.
 
Field Summary
static org.apache.commons.logging.Log LOG
           
protected static String RESTORE_DIR_KEY
           
 
Constructor Summary
TableSnapshotInputFormatImpl()
           
 
Method Summary
static Scan extractScanFromConf(org.apache.hadoop.conf.Configuration conf)
           
static List<String> getBestLocations(org.apache.hadoop.conf.Configuration conf, HDFSBlocksDistribution blockDistribution)
          This computes the locations to be passed from the InputSplit.
static List<HRegionInfo> getRegionInfosFromManifest(SnapshotManifest manifest)
           
static SnapshotManifest getSnapshotManifest(org.apache.hadoop.conf.Configuration conf, String snapshotName, org.apache.hadoop.fs.Path rootDir, org.apache.hadoop.fs.FileSystem fs)
           
static List<TableSnapshotInputFormatImpl.InputSplit> getSplits(org.apache.hadoop.conf.Configuration conf)
           
static List<TableSnapshotInputFormatImpl.InputSplit> getSplits(Scan scan, SnapshotManifest manifest, List<HRegionInfo> regionManifests, org.apache.hadoop.fs.Path restoreDir, org.apache.hadoop.conf.Configuration conf)
           
static void setInput(org.apache.hadoop.conf.Configuration conf, String snapshotName, org.apache.hadoop.fs.Path restoreDir)
          Configures the job to use TableSnapshotInputFormat to read from a snapshot.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG

RESTORE_DIR_KEY

protected static final String RESTORE_DIR_KEY
See Also:
Constant Field Values
Constructor Detail

TableSnapshotInputFormatImpl

public TableSnapshotInputFormatImpl()
Method Detail

getSplits

public static List<TableSnapshotInputFormatImpl.InputSplit> getSplits(org.apache.hadoop.conf.Configuration conf)
                                                               throws IOException
Throws:
IOException

getRegionInfosFromManifest

public static List<HRegionInfo> getRegionInfosFromManifest(SnapshotManifest manifest)

getSnapshotManifest

public static SnapshotManifest getSnapshotManifest(org.apache.hadoop.conf.Configuration conf,
                                                   String snapshotName,
                                                   org.apache.hadoop.fs.Path rootDir,
                                                   org.apache.hadoop.fs.FileSystem fs)
                                            throws IOException
Throws:
IOException

extractScanFromConf

public static Scan extractScanFromConf(org.apache.hadoop.conf.Configuration conf)
                                throws IOException
Throws:
IOException

getSplits

public static List<TableSnapshotInputFormatImpl.InputSplit> getSplits(Scan scan,
                                                                      SnapshotManifest manifest,
                                                                      List<HRegionInfo> regionManifests,
                                                                      org.apache.hadoop.fs.Path restoreDir,
                                                                      org.apache.hadoop.conf.Configuration conf)
                                                               throws IOException
Throws:
IOException

getBestLocations

public static List<String> getBestLocations(org.apache.hadoop.conf.Configuration conf,
                                            HDFSBlocksDistribution blockDistribution)
This computes the locations to be passed from the InputSplit. MR/Yarn schedulers does not take weights into account, thus will treat every location passed from the input split as equal. We do not want to blindly pass all the locations, since we are creating one split per region, and the region's blocks are all distributed throughout the cluster unless favorite node assignment is used. On the expected stable case, only one location will contain most of the blocks as local. On the other hand, in favored node assignment, 3 nodes will contain highly local blocks. Here we are doing a simple heuristic, where we will pass all hosts which have at least 80% (hbase.tablesnapshotinputformat.locality.cutoff.multiplier) as much block locality as the top host with the best locality.


setInput

public static void setInput(org.apache.hadoop.conf.Configuration conf,
                            String snapshotName,
                            org.apache.hadoop.fs.Path restoreDir)
                     throws IOException
Configures the job to use TableSnapshotInputFormat to read from a snapshot.

Parameters:
conf - the job to configure
snapshotName - the name of the snapshot to read from
restoreDir - a temporary directory to restore the snapshot into. Current user should have write permissions to this directory, and this should not be a subdirectory of rootdir. After the job is finished, restoreDir can be deleted.
Throws:
IOException - if an error occurs


Copyright © 2007–2016 The Apache Software Foundation. All rights reserved.