org.apache.hadoop.hbase.mapred
Class TableInputFormat

java.lang.Object
  extended by org.apache.hadoop.hbase.mapred.TableInputFormat
All Implemented Interfaces:
InputFormat<HStoreKey,MapWritable>, JobConfigurable

public class TableInputFormat
extends Object
implements InputFormat<HStoreKey,MapWritable>, JobConfigurable

Convert HBase tabular data into a format that is consumable by Map/Reduce


Field Summary
static String COLUMN_LIST
          space delimited list of columns
 
Constructor Summary
TableInputFormat()
           
 
Method Summary
 void configure(JobConf job)
          Initializes a new instance from a JobConf.
 RecordReader<HStoreKey,MapWritable> getRecordReader(InputSplit split, JobConf job, Reporter reporter)
          Get the RecordReader for the given InputSplit.
 InputSplit[] getSplits(JobConf job, int numSplits)
          A split will be created for each HRegion of the input table
 void validateInput(JobConf job)
          Check for validity of the input-specification for the job.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COLUMN_LIST

public static final String COLUMN_LIST
space delimited list of columns

See Also:
for column name wildcards, Constant Field Values
Constructor Detail

TableInputFormat

public TableInputFormat()
Method Detail

getRecordReader

public RecordReader<HStoreKey,MapWritable> getRecordReader(InputSplit split,
                                                           JobConf job,
                                                           Reporter reporter)
                                                    throws IOException
Get the RecordReader for the given InputSplit.

It is the responsibility of the RecordReader to respect record boundaries while processing the logical split to present a record-oriented view to the individual task.

Specified by:
getRecordReader in interface InputFormat<HStoreKey,MapWritable>
Parameters:
split - the InputSplit
job - the job that this split belongs to
Returns:
a RecordReader
Throws:
IOException

getSplits

public InputSplit[] getSplits(JobConf job,
                              int numSplits)
                       throws IOException
A split will be created for each HRegion of the input table

Specified by:
getSplits in interface InputFormat<HStoreKey,MapWritable>
Parameters:
job - job configuration.
numSplits - the desired number of splits, a hint.
Returns:
an array of InputSplits for the job.
Throws:
IOException
See Also:
InputFormat.getSplits(org.apache.hadoop.mapred.JobConf, int)

configure

public void configure(JobConf job)
Initializes a new instance from a JobConf.

Specified by:
configure in interface JobConfigurable
Parameters:
job - the configuration

validateInput

public void validateInput(JobConf job)
                   throws IOException
Check for validity of the input-specification for the job.

This method is used to validate the input directories when a job is submitted so that the JobClient can fail early, with an useful error message, in case of errors. For e.g. input directory does not exist.

Specified by:
validateInput in interface InputFormat<HStoreKey,MapWritable>
Parameters:
job - job configuration.
Throws:
InvalidInputException - if the job does not have valid input
IOException


Copyright © 2006 The Apache Software Foundation