org.apache.pig.backend.hadoop.executionengine.mapReduceLayer
Class InputSizeReducerEstimator
java.lang.Object
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
- All Implemented Interfaces:
- PigReducerEstimator
public class InputSizeReducerEstimator
- extends Object
- implements PigReducerEstimator
Class that estimates the number of reducers based on input size.
Number of reducers is based on two properties:
- pig.exec.reducers.bytes.per.reducer -
how many bytes of input per reducer (default is 1000*1000*1000)
- pig.exec.reducers.max -
constrain the maximum number of reducer task (default is 999)
If using a loader that implements LoadMetadata the reported input size is used, otherwise
attempt to determine size from the filesystem.
e.g. the following is your pig script
a = load '/data/a';
b = load '/data/b';
c = join a by $0, b by $0;
store c into '/tmp';
and the size of /data/a is 1000*1000*1000, and the size of /data/b is
2*1000*1000*1000 then the estimated number of reducer to use will be
(1000*1000*1000+2*1000*1000*1000)/(1000*1000*1000)=3
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
InputSizeReducerEstimator
public InputSizeReducerEstimator()
estimateNumberOfReducers
public int estimateNumberOfReducers(org.apache.hadoop.mapreduce.Job job,
MapReduceOper mapReduceOper)
throws IOException
- Determines the number of reducers to be used.
- Specified by:
estimateNumberOfReducers
in interface PigReducerEstimator
- Parameters:
job
- job instancemapReduceOper
-
- Returns:
- the number of reducers to use, or -1 if the count couldn't be estimated
- Throws:
IOException
Copyright © 2007-2012 The Apache Software Foundation