7.2. HBase Input MapReduce Example

To use HBase as a MapReduce source, the job would be configured via TableMapReduceUtil in the following manner...

Job job = ...;	
Scan scan = new Scan();
scan.setCaching(500);  // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false);  
// Now set other scan attrs
...
  
TableMapReduceUtil.initTableMapperJob(
  tableName,   		// input HBase table name
  scan, 			// Scan instance to control CF and attribute selection
  MyMapper.class,		// mapper
  Text.class,		// reducer key 
  LongWritable.class,	// reducer value
  job			// job instance
  );

...and the mapper instance would extend TableMapper...

public class MyMapper extends TableMapper<Text, LongWritable> {
public void map(ImmutableBytesWritable row, Result value, Context context) 
throws InterruptedException, IOException {
// process data for the row from the Result instance.