org.apache.hadoop.hbase.mapreduce
Class TableSnapshotInputFormatImpl
java.lang.Object
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl
@InterfaceAudience.Private
@InterfaceStability.Evolving
public class TableSnapshotInputFormatImpl
- extends Object
API-agnostic implementation for mapreduce over table snapshots.
Method Summary |
static Scan |
extractScanFromConf(org.apache.hadoop.conf.Configuration conf)
|
static List<String> |
getBestLocations(org.apache.hadoop.conf.Configuration conf,
HDFSBlocksDistribution blockDistribution)
This computes the locations to be passed from the InputSplit. |
static List<HRegionInfo> |
getRegionInfosFromManifest(SnapshotManifest manifest)
|
static SnapshotManifest |
getSnapshotManifest(org.apache.hadoop.conf.Configuration conf,
String snapshotName,
org.apache.hadoop.fs.Path rootDir,
org.apache.hadoop.fs.FileSystem fs)
|
static List<TableSnapshotInputFormatImpl.InputSplit> |
getSplits(org.apache.hadoop.conf.Configuration conf)
|
static List<TableSnapshotInputFormatImpl.InputSplit> |
getSplits(Scan scan,
SnapshotManifest manifest,
List<HRegionInfo> regionManifests,
org.apache.hadoop.fs.Path restoreDir,
org.apache.hadoop.conf.Configuration conf)
|
static void |
setInput(org.apache.hadoop.conf.Configuration conf,
String snapshotName,
org.apache.hadoop.fs.Path restoreDir)
Configures the job to use TableSnapshotInputFormat to read from a snapshot. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.apache.commons.logging.Log LOG
RESTORE_DIR_KEY
protected static final String RESTORE_DIR_KEY
- See Also:
- Constant Field Values
TableSnapshotInputFormatImpl
public TableSnapshotInputFormatImpl()
getSplits
public static List<TableSnapshotInputFormatImpl.InputSplit> getSplits(org.apache.hadoop.conf.Configuration conf)
throws IOException
- Throws:
IOException
getRegionInfosFromManifest
public static List<HRegionInfo> getRegionInfosFromManifest(SnapshotManifest manifest)
getSnapshotManifest
public static SnapshotManifest getSnapshotManifest(org.apache.hadoop.conf.Configuration conf,
String snapshotName,
org.apache.hadoop.fs.Path rootDir,
org.apache.hadoop.fs.FileSystem fs)
throws IOException
- Throws:
IOException
extractScanFromConf
public static Scan extractScanFromConf(org.apache.hadoop.conf.Configuration conf)
throws IOException
- Throws:
IOException
getSplits
public static List<TableSnapshotInputFormatImpl.InputSplit> getSplits(Scan scan,
SnapshotManifest manifest,
List<HRegionInfo> regionManifests,
org.apache.hadoop.fs.Path restoreDir,
org.apache.hadoop.conf.Configuration conf)
throws IOException
- Throws:
IOException
getBestLocations
public static List<String> getBestLocations(org.apache.hadoop.conf.Configuration conf,
HDFSBlocksDistribution blockDistribution)
- This computes the locations to be passed from the InputSplit. MR/Yarn schedulers does not take
weights into account, thus will treat every location passed from the input split as equal. We
do not want to blindly pass all the locations, since we are creating one split per region, and
the region's blocks are all distributed throughout the cluster unless favorite node assignment
is used. On the expected stable case, only one location will contain most of the blocks as local.
On the other hand, in favored node assignment, 3 nodes will contain highly local blocks. Here
we are doing a simple heuristic, where we will pass all hosts which have at least 80%
(hbase.tablesnapshotinputformat.locality.cutoff.multiplier) as much block locality as the top
host with the best locality.
setInput
public static void setInput(org.apache.hadoop.conf.Configuration conf,
String snapshotName,
org.apache.hadoop.fs.Path restoreDir)
throws IOException
- Configures the job to use TableSnapshotInputFormat to read from a snapshot.
- Parameters:
conf
- the job to configuresnapshotName
- the name of the snapshot to read fromrestoreDir
- a temporary directory to restore the snapshot into. Current user should
have write permissions to this directory, and this should not be a subdirectory of rootdir.
After the job is finished, restoreDir can be deleted.
- Throws:
IOException
- if an error occurs
Copyright © 2007–2015 The Apache Software Foundation. All rights reserved.