org.apache.hadoop.hbase.mapred
Class TableInputFormat

java.lang.Object
  extended by org.apache.hadoop.hbase.mapred.TableInputFormat
All Implemented Interfaces:
InputFormat<HStoreKey,MapWritable>, JobConfigurable

public class TableInputFormat
extends Object
implements InputFormat<HStoreKey,MapWritable>, JobConfigurable

Convert HBase tabular data into a format that is consumable by Map/Reduce


Field Summary
static String COLUMN_LIST
          space delimited list of columns
 
Constructor Summary
TableInputFormat()
           
 
Method Summary
 void configure(JobConf job)
          Initializes a new instance from a JobConf.
 RecordReader<HStoreKey,MapWritable> getRecordReader(InputSplit split, JobConf job, Reporter reporter)
          Get the RecordReader for the given InputSplit.
 InputSplit[] getSplits(JobConf job, int numSplits)
          A split will be created for each HRegion of the input table
 void validateInput(JobConf job)
          Check for validity of the input-specification for the job.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COLUMN_LIST

public static final String COLUMN_LIST
space delimited list of columns

See Also:
for column name wildcards, Constant Field Values
Constructor Detail

TableInputFormat

public TableInputFormat()
Method Detail

getRecordReader

public RecordReader<HStoreKey,MapWritable> getRecordReader(InputSplit split,
                                                           JobConf job,
                                                           Reporter reporter)
                                                    throws IOException
Description copied from interface: InputFormat
Get the RecordReader for the given InputSplit.

It is the responsibility of the RecordReader to respect record boundaries while processing the logical split to present a record-oriented view to the individual task.

Specified by:
getRecordReader in interface InputFormat<HStoreKey,MapWritable>
Parameters:
split - the InputSplit
job - the job that this split belongs to
Returns:
a RecordReader
Throws:
IOException

getSplits

public InputSplit[] getSplits(JobConf job,
                              int numSplits)
                       throws IOException
A split will be created for each HRegion of the input table

Specified by:
getSplits in interface InputFormat<HStoreKey,MapWritable>
Parameters:
job - job configuration.
numSplits - the desired number of splits, a hint.
Returns:
an array of InputSplits for the job.
Throws:
IOException
See Also:
InputFormat.getSplits(org.apache.hadoop.mapred.JobConf, int)

configure

public void configure(JobConf job)
Description copied from interface: JobConfigurable
Initializes a new instance from a JobConf.

Specified by:
configure in interface JobConfigurable
Parameters:
job - the configuration

validateInput

public void validateInput(JobConf job)
                   throws IOException
Description copied from interface: InputFormat
Check for validity of the input-specification for the job.

This method is used to validate the input directories when a job is submitted so that the JobClient can fail early, with an useful error message, in case of errors. For e.g. input directory does not exist.

Specified by:
validateInput in interface InputFormat<HStoreKey,MapWritable>
Parameters:
job - job configuration.
Throws:
InvalidInputException - if the job does not have valid input
IOException


Copyright © 2006 The Apache Software Foundation