To use HBase as a MapReduce source, the job would be configured via TableMapReduceUtil in the following manner...
Job job = ...; Scan scan = new Scan(); scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // Now set other scan attrs ... TableMapReduceUtil.initTableMapperJob( tableName, // input HBase table name scan, // Scan instance to control CF and attribute selection MyMapper.class, // mapper Text.class, // reducer key LongWritable.class, // reducer value job // job instance );
...and the mapper instance would extend TableMapper...
public class MyMapper extends TableMapper<Text, LongWritable> { public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException { // process data for the row from the Result instance.