org.apache.pig.backend.hadoop.executionengine.mapReduceLayer
Class CombinerOptimizer

java.lang.Object
  extended by org.apache.pig.impl.plan.PlanVisitor<MapReduceOper,MROperPlan>
      extended by org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.MROpPlanVisitor
          extended by org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer

public class CombinerOptimizer
extends MROpPlanVisitor

Optimize map reduce plans to use the combiner where possible. Currently Foreach is copied to the combiner phase if it does not contain a nested plan and all UDFs in the generate statement are algebraic. The version of the foreach in the combiner stage will use the initial function, and the version in the reduce stage will be changed to use the final function. Major areas for enhancement: 1) Currently, scripts such as: B = group A by $0; C = foreach B { C1 = distinct A; generate group, COUNT(C1); } do not use the combiner. The issue is being able to properly decompose the expression in the UDF's plan. The current code just takes whatever is the argument to the algebraic UDF and replaces it with a project. This works for things like generate group, SUM(A.$1 + 1). But it fails for things like the above. Certain types of inner plans will never be movable (like filters). But distinct or order by in the inner plan should be moble. And, things like: C = cogroup A by $0, B by $0; D = foreach C { D1 = distinct A; D2 = distinct B; generate UDF(D1 + D2); } make it even harder. The first step is probably just to handle queries like the first above, as they will probably be the most common. 2) Scripts such as: B = group A by $0; C = foreach B generate algebraic(A), nonalgebraic(A); currently aren't moved into the combiner, even though they could be. Again, the trick here is properly decomposing the plan since A may be more than a simply projection. #2 should probably be the next area of focus.


Field Summary
 
Fields inherited from class org.apache.pig.impl.plan.PlanVisitor
mCurrentWalker, mPlan
 
Constructor Summary
CombinerOptimizer(MROperPlan plan, String chunkSize)
           
CombinerOptimizer(MROperPlan plan, String chunkSize, CompilationMessageCollector messageCollector)
           
 
Method Summary
 CompilationMessageCollector getMessageCollector()
           
 void visitMROp(MapReduceOper mr)
           
 
Methods inherited from class org.apache.pig.impl.plan.PlanVisitor
getPlan, popWalker, pushWalker, visit
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CombinerOptimizer

public CombinerOptimizer(MROperPlan plan,
                         String chunkSize)

CombinerOptimizer

public CombinerOptimizer(MROperPlan plan,
                         String chunkSize,
                         CompilationMessageCollector messageCollector)
Method Detail

getMessageCollector

public CompilationMessageCollector getMessageCollector()

visitMROp

public void visitMROp(MapReduceOper mr)
               throws VisitorException
Overrides:
visitMROp in class MROpPlanVisitor
Throws:
VisitorException


Copyright © ${year} The Apache Software Foundation