org.apache.hadoop.hbase.mapred
Class TableInputFormat

java.lang.Object
  extended by org.apache.hadoop.hbase.mapred.TableInputFormat
All Implemented Interfaces:
InputFormat, JobConfigurable

public class TableInputFormat
extends Object
implements InputFormat, JobConfigurable

Convert HBase tabular data into a format that is consumable by Map/Reduce


Field Summary
static String COLUMN_LIST
          space delimited list of columns
 
Constructor Summary
TableInputFormat()
           
 
Method Summary
 void configure(JobConf job)
          Initializes a new instance from a JobConf.
 RecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter)
          Construct a RecordReader for a FileSplit.
 InputSplit[] getSplits(JobConf job, int numSplits)
          A split will be created for each HRegion of the input table
 void validateInput(JobConf job)
          Are the input directories valid? This method is used to test the input directories when a job is submitted so that the framework can fail early with a useful error message when the input directory does not exist.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COLUMN_LIST

public static final String COLUMN_LIST
space delimited list of columns

See Also:
for column name wildcards, Constant Field Values
Constructor Detail

TableInputFormat

public TableInputFormat()
Method Detail

getRecordReader

public RecordReader getRecordReader(InputSplit split,
                                    JobConf job,
                                    Reporter reporter)
                             throws IOException
Description copied from interface: InputFormat
Construct a RecordReader for a FileSplit.

Specified by:
getRecordReader in interface InputFormat
Parameters:
split - the InputSplit
job - the job that this split belongs to
Returns:
a RecordReader
Throws:
IOException

getSplits

public InputSplit[] getSplits(JobConf job,
                              int numSplits)
                       throws IOException
A split will be created for each HRegion of the input table

Specified by:
getSplits in interface InputFormat
Parameters:
job - the job whose input files are to be split
numSplits - the desired number of splits
Returns:
the splits
Throws:
IOException
See Also:
InputFormat.getSplits(org.apache.hadoop.mapred.JobConf, int)

configure

public void configure(JobConf job)
Description copied from interface: JobConfigurable
Initializes a new instance from a JobConf.

Specified by:
configure in interface JobConfigurable
Parameters:
job - the configuration

validateInput

public void validateInput(JobConf job)
                   throws IOException
Description copied from interface: InputFormat
Are the input directories valid? This method is used to test the input directories when a job is submitted so that the framework can fail early with a useful error message when the input directory does not exist.

Specified by:
validateInput in interface InputFormat
Parameters:
job - the job to check
Throws:
InvalidInputException - if the job does not have valid input
IOException


Copyright © 2006 The Apache Software Foundation