org.apache.hadoop.hbase.mapreduce
Class TableInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
      extended by org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
          extended by org.apache.hadoop.hbase.mapreduce.TableInputFormat
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable

@InterfaceAudience.Public
@InterfaceStability.Stable
public class TableInputFormat
extends TableInputFormatBase
implements org.apache.hadoop.conf.Configurable

Convert HBase tabular data into a format that is consumable by Map/Reduce.


Field Summary
static String INPUT_TABLE
          Job parameter that specifies the input table.
static String SCAN
          Base-64 encoded scanner.
static String SCAN_BATCHSIZE
          Set the maximum number of values to return for each call to next().
static String SCAN_CACHEBLOCKS
          Set to false to disable server-side caching of blocks for this scan.
static String SCAN_CACHEDROWS
          The number of rows for caching that will be passed to scanners.
static String SCAN_COLUMN_FAMILY
          Column Family to Scan
static String SCAN_COLUMNS
          Space delimited list of columns and column families to scan.
static String SCAN_MAXVERSIONS
          The maximum number of version to return.
static String SCAN_ROW_START
          Scan start row
static String SCAN_ROW_STOP
          Scan stop row
static String SCAN_TIMERANGE_END
          The ending timestamp used to filter columns with a specific range of versions.
static String SCAN_TIMERANGE_START
          The starting timestamp used to filter columns with a specific range of versions.
static String SCAN_TIMESTAMP
          The timestamp used to filter columns with a specific timestamp.
 
Fields inherited from class org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
INPUT_AUTOBALANCE_MAXSKEWRATIO, MAPREDUCE_INPUT_AUTOBALANCE, TABLE_ROW_TEXTKEY
 
Constructor Summary
TableInputFormat()
           
 
Method Summary
static void addColumns(Scan scan, byte[][] columns)
          Adds an array of columns specified using old format, family:qualifier.
static void configureSplitTable(org.apache.hadoop.mapreduce.Job job, TableName tableName)
          Sets split table in map-reduce job.
 org.apache.hadoop.conf.Configuration getConf()
          Returns the current configuration.
protected  Pair<byte[][],byte[][]> getStartEndKeys()
           
 void setConf(org.apache.hadoop.conf.Configuration configuration)
          Sets the configuration.
 
Methods inherited from class org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
calculateRebalancedSplits, createRecordReader, getHTable, getScan, getSplitKey, getSplits, includeRegionInSplit, reverseDNS, setHTable, setScan, setTableRecordReader
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INPUT_TABLE

public static final String INPUT_TABLE
Job parameter that specifies the input table.

See Also:
Constant Field Values

SCAN

public static final String SCAN
Base-64 encoded scanner. All other SCAN_ confs are ignored if this is specified. See TableMapReduceUtil.convertScanToString(Scan) for more details.

See Also:
Constant Field Values

SCAN_ROW_START

public static final String SCAN_ROW_START
Scan start row

See Also:
Constant Field Values

SCAN_ROW_STOP

public static final String SCAN_ROW_STOP
Scan stop row

See Also:
Constant Field Values

SCAN_COLUMN_FAMILY

public static final String SCAN_COLUMN_FAMILY
Column Family to Scan

See Also:
Constant Field Values

SCAN_COLUMNS

public static final String SCAN_COLUMNS
Space delimited list of columns and column families to scan.

See Also:
Constant Field Values

SCAN_TIMESTAMP

public static final String SCAN_TIMESTAMP
The timestamp used to filter columns with a specific timestamp.

See Also:
Constant Field Values

SCAN_TIMERANGE_START

public static final String SCAN_TIMERANGE_START
The starting timestamp used to filter columns with a specific range of versions.

See Also:
Constant Field Values

SCAN_TIMERANGE_END

public static final String SCAN_TIMERANGE_END
The ending timestamp used to filter columns with a specific range of versions.

See Also:
Constant Field Values

SCAN_MAXVERSIONS

public static final String SCAN_MAXVERSIONS
The maximum number of version to return.

See Also:
Constant Field Values

SCAN_CACHEBLOCKS

public static final String SCAN_CACHEBLOCKS
Set to false to disable server-side caching of blocks for this scan.

See Also:
Constant Field Values

SCAN_CACHEDROWS

public static final String SCAN_CACHEDROWS
The number of rows for caching that will be passed to scanners.

See Also:
Constant Field Values

SCAN_BATCHSIZE

public static final String SCAN_BATCHSIZE
Set the maximum number of values to return for each call to next().

See Also:
Constant Field Values
Constructor Detail

TableInputFormat

public TableInputFormat()
Method Detail

getConf

public org.apache.hadoop.conf.Configuration getConf()
Returns the current configuration.

Specified by:
getConf in interface org.apache.hadoop.conf.Configurable
Returns:
The current configuration.
See Also:
Configurable.getConf()

setConf

public void setConf(org.apache.hadoop.conf.Configuration configuration)
Sets the configuration. This is used to set the details for the table to be scanned.

Specified by:
setConf in interface org.apache.hadoop.conf.Configurable
Parameters:
configuration - The configuration to set.
See Also:
Configurable.setConf( org.apache.hadoop.conf.Configuration)

addColumns

public static void addColumns(Scan scan,
                              byte[][] columns)
Adds an array of columns specified using old format, family:qualifier.

Overrides previous calls to Scan.addColumn(byte[], byte[])for any families in the input.

Parameters:
scan - The Scan to update.
columns - array of columns, formatted as family:qualifier
See Also:
Scan.addColumn(byte[], byte[])

getStartEndKeys

protected Pair<byte[][],byte[][]> getStartEndKeys()
                                           throws IOException
Overrides:
getStartEndKeys in class TableInputFormatBase
Throws:
IOException

configureSplitTable

public static void configureSplitTable(org.apache.hadoop.mapreduce.Job job,
                                       TableName tableName)
Sets split table in map-reduce job.



Copyright © 2007–2016 The Apache Software Foundation. All rights reserved.