org.apache.hadoop.hbase.mapreduce
Class TableMapReduceUtil

java.lang.Object
  extended by org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil

public class TableMapReduceUtil
extends Object

Utility for TableMapper and TableReducer


Constructor Summary
TableMapReduceUtil()
           
 
Method Summary
static void addDependencyJars(org.apache.hadoop.conf.Configuration conf, Class... classes)
          Add the jars containing the given classes to the job's configuration such that JobClient will ship them to the cluster and add them to the DistributedCache.
static void addDependencyJars(org.apache.hadoop.mapreduce.Job job)
          Add the HBase dependency jars as well as jars for any of the configured job classes to the job configuration, so that JobClient will ship them to the cluster and add them to the DistributedCache.
static void initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<? extends org.apache.hadoop.io.WritableComparable> outputKeyClass, Class<? extends org.apache.hadoop.io.Writable> outputValueClass, org.apache.hadoop.mapreduce.Job job)
          Use this before submitting a TableMap job.
static void initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<? extends org.apache.hadoop.io.WritableComparable> outputKeyClass, Class<? extends org.apache.hadoop.io.Writable> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars)
          Use this before submitting a TableMap job.
static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job)
          Use this before submitting a TableReduce job.
static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner)
          Use this before submitting a TableReduce job.
static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl)
          Use this before submitting a TableReduce job.
static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl, boolean addDependencyJars)
          Use this before submitting a TableReduce job.
static void limitNumReduceTasks(String table, org.apache.hadoop.mapreduce.Job job)
          Ensures that the given number of reduce tasks for the given job configuration does not exceed the number of regions for the given table.
static void setNumReduceTasks(String table, org.apache.hadoop.mapreduce.Job job)
          Sets the number of reduce tasks for the given job configuration to the number of regions the given table has.
static void setScannerCaching(org.apache.hadoop.mapreduce.Job job, int batchSize)
          Sets the number of rows to return and cache with each scanner iteration.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TableMapReduceUtil

public TableMapReduceUtil()
Method Detail

initTableMapperJob

public static void initTableMapperJob(String table,
                                      Scan scan,
                                      Class<? extends TableMapper> mapper,
                                      Class<? extends org.apache.hadoop.io.WritableComparable> outputKeyClass,
                                      Class<? extends org.apache.hadoop.io.Writable> outputValueClass,
                                      org.apache.hadoop.mapreduce.Job job)
                               throws IOException
Use this before submitting a TableMap job. It will appropriately set up the job.

Parameters:
table - The table name to read from.
scan - The scan instance with the columns, time range etc.
mapper - The mapper class to use.
outputKeyClass - The class of the output key.
outputValueClass - The class of the output value.
job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
Throws:
IOException - When setting up the details fails.

initTableMapperJob

public static void initTableMapperJob(String table,
                                      Scan scan,
                                      Class<? extends TableMapper> mapper,
                                      Class<? extends org.apache.hadoop.io.WritableComparable> outputKeyClass,
                                      Class<? extends org.apache.hadoop.io.Writable> outputValueClass,
                                      org.apache.hadoop.mapreduce.Job job,
                                      boolean addDependencyJars)
                               throws IOException
Use this before submitting a TableMap job. It will appropriately set up the job.

Parameters:
table - The table name to read from.
scan - The scan instance with the columns, time range etc.
mapper - The mapper class to use.
outputKeyClass - The class of the output key.
outputValueClass - The class of the output value.
job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
Throws:
IOException - When setting up the details fails.

initTableReducerJob

public static void initTableReducerJob(String table,
                                       Class<? extends TableReducer> reducer,
                                       org.apache.hadoop.mapreduce.Job job)
                                throws IOException
Use this before submitting a TableReduce job. It will appropriately set up the JobConf.

Parameters:
table - The output table.
reducer - The reducer class to use.
job - The current job to adjust.
Throws:
IOException - When determining the region count fails.

initTableReducerJob

public static void initTableReducerJob(String table,
                                       Class<? extends TableReducer> reducer,
                                       org.apache.hadoop.mapreduce.Job job,
                                       Class partitioner)
                                throws IOException
Use this before submitting a TableReduce job. It will appropriately set up the JobConf.

Parameters:
table - The output table.
reducer - The reducer class to use.
job - The current job to adjust.
partitioner - Partitioner to use. Pass null to use default partitioner.
Throws:
IOException - When determining the region count fails.

initTableReducerJob

public static void initTableReducerJob(String table,
                                       Class<? extends TableReducer> reducer,
                                       org.apache.hadoop.mapreduce.Job job,
                                       Class partitioner,
                                       String quorumAddress,
                                       String serverClass,
                                       String serverImpl)
                                throws IOException
Use this before submitting a TableReduce job. It will appropriately set up the JobConf.

Parameters:
table - The output table.
reducer - The reducer class to use.
job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
partitioner - Partitioner to use. Pass null to use default partitioner.
quorumAddress - Distant cluster to write to; default is null for output to the cluster that is designated in hbase-site.xml. Set this String to the zookeeper ensemble of an alternate remote cluster when you would have the reduce write a cluster that is other than the default; e.g. copying tables between clusters, the source would be designated by hbase-site.xml and this param would have the ensemble address of the remote cluster. The format to pass is particular. Pass <hbase.zookeeper.quorum>:<hbase.zookeeper.client.port>:<zookeeper.znode.parent> such as server,server2,server3:2181:/hbase.
serverClass - redefined hbase.regionserver.class
serverImpl - redefined hbase.regionserver.impl
Throws:
IOException - When determining the region count fails.

initTableReducerJob

public static void initTableReducerJob(String table,
                                       Class<? extends TableReducer> reducer,
                                       org.apache.hadoop.mapreduce.Job job,
                                       Class partitioner,
                                       String quorumAddress,
                                       String serverClass,
                                       String serverImpl,
                                       boolean addDependencyJars)
                                throws IOException
Use this before submitting a TableReduce job. It will appropriately set up the JobConf.

Parameters:
table - The output table.
reducer - The reducer class to use.
job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
partitioner - Partitioner to use. Pass null to use default partitioner.
quorumAddress - Distant cluster to write to; default is null for output to the cluster that is designated in hbase-site.xml. Set this String to the zookeeper ensemble of an alternate remote cluster when you would have the reduce write a cluster that is other than the default; e.g. copying tables between clusters, the source would be designated by hbase-site.xml and this param would have the ensemble address of the remote cluster. The format to pass is particular. Pass <hbase.zookeeper.quorum>:<hbase.zookeeper.client.port>:<zookeeper.znode.parent> such as server,server2,server3:2181:/hbase.
serverClass - redefined hbase.regionserver.class
serverImpl - redefined hbase.regionserver.impl
addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
Throws:
IOException - When determining the region count fails.

limitNumReduceTasks

public static void limitNumReduceTasks(String table,
                                       org.apache.hadoop.mapreduce.Job job)
                                throws IOException
Ensures that the given number of reduce tasks for the given job configuration does not exceed the number of regions for the given table.

Parameters:
table - The table to get the region count for.
job - The current job to adjust.
Throws:
IOException - When retrieving the table details fails.

setNumReduceTasks

public static void setNumReduceTasks(String table,
                                     org.apache.hadoop.mapreduce.Job job)
                              throws IOException
Sets the number of reduce tasks for the given job configuration to the number of regions the given table has.

Parameters:
table - The table to get the region count for.
job - The current job to adjust.
Throws:
IOException - When retrieving the table details fails.

setScannerCaching

public static void setScannerCaching(org.apache.hadoop.mapreduce.Job job,
                                     int batchSize)
Sets the number of rows to return and cache with each scanner iteration. Higher caching values will enable faster mapreduce jobs at the expense of requiring more heap to contain the cached rows.

Parameters:
job - The current job to adjust.
batchSize - The number of rows to return in batch with each scanner iteration.

addDependencyJars

public static void addDependencyJars(org.apache.hadoop.mapreduce.Job job)
                              throws IOException
Add the HBase dependency jars as well as jars for any of the configured job classes to the job configuration, so that JobClient will ship them to the cluster and add them to the DistributedCache.

Throws:
IOException

addDependencyJars

public static void addDependencyJars(org.apache.hadoop.conf.Configuration conf,
                                     Class... classes)
                              throws IOException
Add the jars containing the given classes to the job's configuration such that JobClient will ship them to the cluster and add them to the DistributedCache.

Throws:
IOException


Copyright © 2011 The Apache Software Foundation. All Rights Reserved.