|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
public abstract class TableInputFormatBase
A base for TableInputFormat
s. Receives a HTable
, an
Scan
instance that defines the input columns etc. Subclasses may use
other TableRecordReader implementations.
An example of a subclass:
class ExampleTIF extends TableInputFormatBase implements JobConfigurable { public void configure(JobConf job) { HTable exampleTable = new HTable(HBaseConfiguration.create(job), Bytes.toBytes("exampleTable")); // mandatory setHTable(exampleTable); Text[] inputColumns = new byte [][] { Bytes.toBytes("columnA"), Bytes.toBytes("columnB") }; // mandatory setInputColumns(inputColumns); RowFilterInterface exampleFilter = new RegExpRowFilter("keyPrefix.*"); // optional setRowFilter(exampleFilter); } public void validateInput(JobConf job) throws IOException { } }
Constructor Summary | |
---|---|
TableInputFormatBase()
|
Method Summary | |
---|---|
org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Builds a TableRecordReader. |
protected HTable |
getHTable()
Allows subclasses to get the HTable . |
Scan |
getScan()
Gets the scan defining the actual details like columns etc. |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext context)
Calculates the splits that will serve as input for the map tasks. |
protected boolean |
includeRegionInSplit(byte[] startKey,
byte[] endKey)
Test if the given region is to be included in the InputSplit while splitting the regions of a table. |
protected void |
setHTable(HTable table)
Allows subclasses to set the HTable . |
void |
setScan(Scan scan)
Sets the scan defining the actual details like columns etc. |
protected void |
setTableRecordReader(TableRecordReader tableRecordReader)
Allows subclasses to set the TableRecordReader . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public TableInputFormatBase()
Method Detail |
---|
public org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException
createRecordReader
in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
split
- The split to work with.context
- The current context.
IOException
- When creating the reader fails.InputFormat.createRecordReader(
org.apache.hadoop.mapreduce.InputSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext)
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException
getSplits
in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
context
- The current job context.
IOException
- When creating the list of splits fails.InputFormat.getSplits(
org.apache.hadoop.mapreduce.JobContext)
protected boolean includeRegionInSplit(byte[] startKey, byte[] endKey)
This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job,
(and hence, not contributing to the InputSplit), given the start and end keys of the same.
Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing,
continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys.
Note: It is possible that endKey.length() == 0
, for the last (recent) region.
Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included).
startKey
- Start key of the regionendKey
- End key of the region
protected HTable getHTable()
HTable
.
protected void setHTable(HTable table)
HTable
.
table
- The table to get the data from.public Scan getScan()
public void setScan(Scan scan)
scan
- The scan to set.protected void setTableRecordReader(TableRecordReader tableRecordReader)
TableRecordReader
.
tableRecordReader
- A different TableRecordReader
implementation.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |