|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.impl.plan.PlanVisitor<PhysicalOperator,PhysicalPlan>
org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhyPlanVisitor
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
public class MRCompiler
The compiler that compiles a given physical plan into a DAG of MapReduce operators which can then be converted into the JobControl structure. Is implemented as a visitor of the PhysicalPlan it is compiling. Currently supports all operators except the MR Sort operator Uses a predecessor based depth first traversal. To compile an operator, first compiles the predecessors into MapReduce Operators and tries to merge the current operator into one of them. The goal being to keep the number of MROpers to a minimum. It also merges multiple Map jobs, created by compiling the inputs individually, into a single job. Here a new map job is created and then the contents of the previous map plans are added. However, any other state that was in the previous map plans, should be manually moved over. So, if you are adding something new take care about this. Ex of this is in requestedParallelism Only in case of blocking operators and splits, a new MapReduce operator is started using a store-load combination to connect the two operators. Whenever this happens care is taken to add the MROper into the MRPlan and connect it appropriately.
Field Summary | |
---|---|
static String |
USER_COMPARATOR_MARKER
|
Fields inherited from class org.apache.pig.impl.plan.PlanVisitor |
---|
mCurrentWalker, mPlan |
Constructor Summary | |
---|---|
MRCompiler(PhysicalPlan plan)
|
|
MRCompiler(PhysicalPlan plan,
PigContext pigContext)
|
Method Summary | |
---|---|
MROperPlan |
compile()
The front-end method that the user calls to compile the plan. |
void |
connectMapToReduceLimitedSort(MapReduceOper mro,
MapReduceOper sortMROp)
|
CompilationMessageCollector |
getMessageCollector()
|
MROperPlan |
getMRPlan()
Used to get the compiled plan |
POForEach |
getPlainForEachOP()
|
PhysicalPlan |
getPlan()
Used to get the plan that was compiled |
Pair<MapReduceOper,Integer> |
getQuantileJob(POSort inpSort,
MapReduceOper prevJob,
FileSpec lFile,
FileSpec quantFile,
int rp,
Pair<Integer,Byte>[] fields)
|
protected Pair<MapReduceOper,Integer> |
getSamplingJob(POSort sort,
MapReduceOper prevJob,
List<PhysicalPlan> transformPlans,
FileSpec lFile,
FileSpec sampleFile,
int rp,
List<PhysicalPlan> sortKeyPlans,
String udfClassName,
String[] udfArgs,
String sampleLdrClassName)
Create a sampling job to collect statistics by sampling an input file. |
Pair<MapReduceOper,Integer> |
getSkewedJoinSampleJob(POSkewedJoin op,
MapReduceOper prevJob,
FileSpec lFile,
FileSpec sampleFile,
int rp)
Create Sampling job for skewed join. |
MapReduceOper |
getSortJob(POSort sort,
MapReduceOper quantJob,
FileSpec lFile,
FileSpec quantFile,
int rp,
Pair<Integer,Byte>[] fields)
|
void |
randomizeFileLocalizer()
|
void |
simpleConnectMapToReduce(MapReduceOper mro)
|
void |
visitDistinct(PODistinct op)
|
void |
visitFilter(POFilter op)
|
void |
visitFRJoin(POFRJoin op)
This is an operator which will have multiple inputs(= to number of join inputs) But it prunes off all inputs but the fragment input and creates separate MR jobs for each of the replicated inputs and uses these as the replicated files that are configured in the POFRJoin operator. |
void |
visitGlobalRearrange(POGlobalRearrange op)
|
void |
visitLimit(POLimit op)
|
void |
visitLoad(POLoad op)
|
void |
visitLocalRearrange(POLocalRearrange op)
|
void |
visitMergeJoin(POMergeJoin joinOp)
Since merge-join works on two inputs there are exactly two MROper predecessors identified as left and right. |
void |
visitPackage(POPackage op)
|
void |
visitPOForEach(POForEach op)
|
void |
visitSkewedJoin(POSkewedJoin op)
|
void |
visitSort(POSort op)
|
void |
visitSplit(POSplit op)
Compiles a split operator. |
void |
visitStore(POStore op)
|
void |
visitStream(POStream op)
|
void |
visitUnion(POUnion op)
|
Methods inherited from class org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhyPlanVisitor |
---|
visitAdd, visitAnd, visitBinCond, visitCast, visitCogroup, visitCombinerPackage, visitComparisonFunc, visitConstant, visitCross, visitDemux, visitDivide, visitEqualTo, visitGreaterThan, visitGTOrEqual, visitIsNull, visitJoinPackage, visitLessThan, visitLocalRearrangeForIllustrate, visitLTOrEqual, visitMapLookUp, visitMod, visitMultiply, visitMultiQueryPackage, visitNegative, visitNot, visitNotEqualTo, visitOr, visitPartitionRearrange, visitPOOptimizedForEach, visitPreCombinerLocalRearrange, visitProject, visitRead, visitRegexp, visitSplit, visitSubtract, visitUserFunc |
Methods inherited from class org.apache.pig.impl.plan.PlanVisitor |
---|
popWalker, pushWalker, visit |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static String USER_COMPARATOR_MARKER
Constructor Detail |
---|
public MRCompiler(PhysicalPlan plan) throws MRCompilerException
MRCompilerException
public MRCompiler(PhysicalPlan plan, PigContext pigContext) throws MRCompilerException
MRCompilerException
Method Detail |
---|
public void randomizeFileLocalizer()
public MROperPlan getMRPlan()
public PhysicalPlan getPlan()
getPlan
in class PlanVisitor<PhysicalOperator,PhysicalPlan>
public CompilationMessageCollector getMessageCollector()
public MROperPlan compile() throws IOException, PlanException, VisitorException
IOException
PlanException
VisitorException
public void visitSplit(POSplit op) throws VisitorException
visitSplit
in class PhyPlanVisitor
op
- - The split operator
VisitorException
public void visitLoad(POLoad op) throws VisitorException
visitLoad
in class PhyPlanVisitor
VisitorException
public void visitStore(POStore op) throws VisitorException
visitStore
in class PhyPlanVisitor
VisitorException
public void visitFilter(POFilter op) throws VisitorException
visitFilter
in class PhyPlanVisitor
VisitorException
public void visitStream(POStream op) throws VisitorException
visitStream
in class PhyPlanVisitor
VisitorException
public void connectMapToReduceLimitedSort(MapReduceOper mro, MapReduceOper sortMROp) throws PlanException, VisitorException
PlanException
VisitorException
public void simpleConnectMapToReduce(MapReduceOper mro) throws PlanException
PlanException
public POForEach getPlainForEachOP()
public void visitLimit(POLimit op) throws VisitorException
visitLimit
in class PhyPlanVisitor
VisitorException
public void visitLocalRearrange(POLocalRearrange op) throws VisitorException
visitLocalRearrange
in class PhyPlanVisitor
VisitorException
public void visitPOForEach(POForEach op) throws VisitorException
visitPOForEach
in class PhyPlanVisitor
VisitorException
public void visitGlobalRearrange(POGlobalRearrange op) throws VisitorException
visitGlobalRearrange
in class PhyPlanVisitor
VisitorException
public void visitPackage(POPackage op) throws VisitorException
visitPackage
in class PhyPlanVisitor
VisitorException
public void visitUnion(POUnion op) throws VisitorException
visitUnion
in class PhyPlanVisitor
VisitorException
public void visitFRJoin(POFRJoin op) throws VisitorException
visitFRJoin
in class PhyPlanVisitor
VisitorException
public void visitMergeJoin(POMergeJoin joinOp) throws VisitorException
visitMergeJoin
in class PhyPlanVisitor
VisitorException
public void visitDistinct(PODistinct op) throws VisitorException
visitDistinct
in class PhyPlanVisitor
VisitorException
public void visitSkewedJoin(POSkewedJoin op) throws VisitorException
visitSkewedJoin
in class PhyPlanVisitor
VisitorException
public void visitSort(POSort op) throws VisitorException
visitSort
in class PhyPlanVisitor
VisitorException
public MapReduceOper getSortJob(POSort sort, MapReduceOper quantJob, FileSpec lFile, FileSpec quantFile, int rp, Pair<Integer,Byte>[] fields) throws PlanException
PlanException
public Pair<MapReduceOper,Integer> getQuantileJob(POSort inpSort, MapReduceOper prevJob, FileSpec lFile, FileSpec quantFile, int rp, Pair<Integer,Byte>[] fields) throws PlanException, VisitorException
PlanException
VisitorException
public Pair<MapReduceOper,Integer> getSkewedJoinSampleJob(POSkewedJoin op, MapReduceOper prevJob, FileSpec lFile, FileSpec sampleFile, int rp) throws PlanException, VisitorException
PlanException
VisitorException
protected Pair<MapReduceOper,Integer> getSamplingJob(POSort sort, MapReduceOper prevJob, List<PhysicalPlan> transformPlans, FileSpec lFile, FileSpec sampleFile, int rp, List<PhysicalPlan> sortKeyPlans, String udfClassName, String[] udfArgs, String sampleLdrClassName) throws PlanException, VisitorException
sort
- the POSort operator used to sort the bagprevJob
- previous job of current sampling jobtransformPlans
- PhysicalPlans to transform input sampleslFile
- path of input filesampleFile
- path of output filerp
- configured parallemismsortKeyPlans
- PhysicalPlans to be set into POSort operator to get sorting keysudfClassName
- the class name of UDFudfArgs
- the arguments of UDFsampleLdrClassName
- class name for the sample loader
PlanException
VisitorException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |