org.apache.hadoop.hbase.mapreduce
Class TableInputFormatBase

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
      extended by org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
Direct Known Subclasses:
TableInputFormat

public abstract class TableInputFormatBase
extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>

A base for TableInputFormats. Receives a HTable, an Scan instance that defines the input columns etc. Subclasses may use other TableRecordReader implementations.

An example of a subclass:

   class ExampleTIF extends TableInputFormatBase implements JobConfigurable {

     public void configure(JobConf job) {
       HTable exampleTable = new HTable(new HBaseConfiguration(job),
         Bytes.toBytes("exampleTable"));
       // mandatory
       setHTable(exampleTable);
       Text[] inputColumns = new byte [][] { Bytes.toBytes("columnA"),
         Bytes.toBytes("columnB") };
       // mandatory
       setInputColumns(inputColumns);
       RowFilterInterface exampleFilter = new RegExpRowFilter("keyPrefix.*");
       // optional
       setRowFilter(exampleFilter);
     }

     public void validateInput(JobConf job) throws IOException {
     }
  }
 


Nested Class Summary
protected  class TableInputFormatBase.TableRecordReader
          Iterate over an HBase table data, return (ImmutableBytesWritable, Result) pairs.
 
Constructor Summary
TableInputFormatBase()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
          Builds a TableRecordReader.
protected  HTable getHTable()
          Allows subclasses to get the HTable.
 Scan getScan()
          Gets the scan defining the actual details like columns etc.
 List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
          Calculates the splits that will serve as input for the map tasks.
protected  void setHTable(HTable table)
          Allows subclasses to set the HTable.
 void setScan(Scan scan)
          Sets the scan defining the actual details like columns etc.
protected  void setTableRecordReader(TableInputFormatBase.TableRecordReader tableRecordReader)
          Allows subclasses to set the TableInputFormatBase.TableRecordReader.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TableInputFormatBase

public TableInputFormatBase()
Method Detail

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                  org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                           throws IOException
Builds a TableRecordReader. If no TableRecordReader was provided, uses the default.

Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
Parameters:
split - The split to work with.
context - The current context.
Returns:
The newly created record reader.
Throws:
IOException - When creating the reader fails.
See Also:
InputFormat.createRecordReader( org.apache.hadoop.mapreduce.InputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext)

getSplits

public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
                                                       throws IOException
Calculates the splits that will serve as input for the map tasks. The number of splits matches the number of regions in a table.

Specified by:
getSplits in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
Parameters:
context - The current job context.
Returns:
The list of input splits.
Throws:
IOException - When creating the list of splits fails.
See Also:
InputFormat.getSplits( org.apache.hadoop.mapreduce.JobContext)

getHTable

protected HTable getHTable()
Allows subclasses to get the HTable.


setHTable

protected void setHTable(HTable table)
Allows subclasses to set the HTable.

Parameters:
table - The table to get the data from.

getScan

public Scan getScan()
Gets the scan defining the actual details like columns etc.

Returns:
The internal scan instance.

setScan

public void setScan(Scan scan)
Sets the scan defining the actual details like columns etc.

Parameters:
scan - The scan to set.

setTableRecordReader

protected void setTableRecordReader(TableInputFormatBase.TableRecordReader tableRecordReader)
Allows subclasses to set the TableInputFormatBase.TableRecordReader.

Parameters:
tableRecordReader - A different TableInputFormatBase.TableRecordReader implementation.


Copyright © 2010 The Apache Software Foundation