org.apache.hadoop.hbase.mapreduce
Class TableSnapshotInputFormatImpl

java.lang.Object
  extended by org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl

@InterfaceAudience.Private
@InterfaceStability.Evolving
public class TableSnapshotInputFormatImpl
extends Object

API-agnostic implementation for mapreduce over table snapshots.


Nested Class Summary
static class TableSnapshotInputFormatImpl.InputSplit
          Implementation class for InputSplit logic common between mapred and mapreduce.
static class TableSnapshotInputFormatImpl.RecordReader
          Implementation class for RecordReader logic common between mapred and mapreduce.
 
Constructor Summary
TableSnapshotInputFormatImpl()
           
 
Method Summary
static List<String> getBestLocations(org.apache.hadoop.conf.Configuration conf, HDFSBlocksDistribution blockDistribution)
          This computes the locations to be passed from the InputSplit.
static List<TableSnapshotInputFormatImpl.InputSplit> getSplits(org.apache.hadoop.conf.Configuration conf)
           
static void setInput(org.apache.hadoop.conf.Configuration conf, String snapshotName, org.apache.hadoop.fs.Path restoreDir)
          Configures the job to use TableSnapshotInputFormat to read from a snapshot.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TableSnapshotInputFormatImpl

public TableSnapshotInputFormatImpl()
Method Detail

getSplits

public static List<TableSnapshotInputFormatImpl.InputSplit> getSplits(org.apache.hadoop.conf.Configuration conf)
                                                               throws IOException
Throws:
IOException

getBestLocations

public static List<String> getBestLocations(org.apache.hadoop.conf.Configuration conf,
                                            HDFSBlocksDistribution blockDistribution)
This computes the locations to be passed from the InputSplit. MR/Yarn schedulers does not take weights into account, thus will treat every location passed from the input split as equal. We do not want to blindly pass all the locations, since we are creating one split per region, and the region's blocks are all distributed throughout the cluster unless favorite node assignment is used. On the expected stable case, only one location will contain most of the blocks as local. On the other hand, in favored node assignment, 3 nodes will contain highly local blocks. Here we are doing a simple heuristic, where we will pass all hosts which have at least 80% (hbase.tablesnapshotinputformat.locality.cutoff.multiplier) as much block locality as the top host with the best locality.


setInput

public static void setInput(org.apache.hadoop.conf.Configuration conf,
                            String snapshotName,
                            org.apache.hadoop.fs.Path restoreDir)
                     throws IOException
Configures the job to use TableSnapshotInputFormat to read from a snapshot.

Parameters:
conf - the job to configure
snapshotName - the name of the snapshot to read from
restoreDir - a temporary directory to restore the snapshot into. Current user should have write permissions to this directory, and this should not be a subdirectory of rootdir. After the job is finished, restoreDir can be deleted.
Throws:
IOException - if an error occurs


Copyright © 2015 The Apache Software Foundation. All rights reserved.