org.apache.hadoop.hive.ql.exec
Class Operator<T extends Serializable>

java.lang.Object
  extended by org.apache.hadoop.hive.ql.exec.Operator<T>
All Implemented Interfaces:
Serializable, Node
Direct Known Subclasses:
CollectOperator, CommonJoinOperator, ExtractOperator, FilterOperator, ForwardOperator, GroupByOperator, LateralViewForwardOperator, LateralViewJoinOperator, LimitOperator, MapOperator, ScriptOperator, SelectOperator, TableScanOperator, TerminalOperator, UDTFOperator, UnionOperator

public abstract class Operator<T extends Serializable>
extends Object
implements Serializable, Node

Base operator implementation.

See Also:
Serialized Form

Nested Class Summary
static interface Operator.OperatorFunc
          OperatorFunc.
static class Operator.ProgressCounter
          TODO This is a hack for hadoop 0.17 which only supports enum counters.
static class Operator.State
          State.
 
Field Summary
protected  String alias
           
protected  long beginTime
           
protected  List<Operator<? extends Serializable>> childOperators
           
protected  Operator<? extends Serializable>[] childOperatorsArray
          Cache childOperators in an array for faster access.
protected  int[] childOperatorsTag
           
protected  Map<String,ExprNodeDesc> colExprMap
          A map of output column name to input expression map.
protected  T conf
           
protected  ArrayList<String> counterNames
          List of counter names associated with the operator.
protected  HashMap<String,Operator.ProgressCounter> counterNameToEnum
          Each operator has its own map of its counter names to disjoint ProgressCounter - it is populated at compile time and is read in at run-time while extracting the operator specific counts.
protected  HashMap<String,Long> counters
          populated at runtime from hadoop counters at run time in the client.
protected  boolean done
           
protected static String fatalErrorCntr
           
protected  Object groupKeyObject
           
protected  String id
           
protected  ObjectInspector[] inputObjInspectors
           
protected  long inputRows
           
protected  org.apache.commons.logging.Log LOG
           
protected static String numInputRowsCntr
           
protected static String numOutputRowsCntr
           
protected  String operatorId
           
protected  org.apache.hadoop.mapred.OutputCollector out
           
protected  ObjectInspector outputObjInspector
           
protected  long outputRows
           
protected  List<Operator<? extends Serializable>> parentOperators
           
protected  org.apache.hadoop.mapred.Reporter reporter
           
protected  Operator.State state
           
protected  HashMap<Enum<?>,org.apache.hadoop.io.LongWritable> statsMap
           
protected static String timeTakenCntr
           
protected  long totalTime
           
 
Constructor Summary
Operator()
           
Operator(org.apache.hadoop.mapred.Reporter reporter)
          Create an operator with a reporter.
 
Method Summary
protected  boolean allInitializedParentsAreClosed()
           
protected  boolean areAllParentsInitialized()
          checks whether all parent operators are initialized or not.
 void assignCounterNameToEnum()
          Called only in SemanticAnalyzer after all operators have added their own set of counter names.
 void augmentPlan()
          Called during semantic analysis as operators are being added in order to give them a chance to compute any additional plan information needed.
 boolean checkFatalErrors(org.apache.hadoop.mapred.Counters ctrs, StringBuilder errMsg)
          Recursively check this operator and its descendants to see if the fatal error counter is set to non-zero.
 void close(boolean abort)
           
protected  void closeOp(boolean abort)
          Operator specific close routine.
 String dump(int level)
           
 String dump(int level, HashSet<Integer> seenOpts)
           
 void endGroup()
           
protected  void fatalErrorMessage(StringBuilder errMsg, long counterValue)
          Get the fatal error message based on counter's code.
protected  void forward(Object row, ObjectInspector rowInspector)
           
 List<Operator<? extends Serializable>> getChildOperators()
           
 ArrayList<Node> getChildren()
          Implements the getChildren function for the Node Interface.
 Map<String,ExprNodeDesc> getColumnExprMap()
          Returns a map of output column name to input expression map Note that currently it returns only key columns for ReduceSink and GroupBy operators.
 T getConf()
           
 ArrayList<String> getCounterNames()
           
 HashMap<String,Operator.ProgressCounter> getCounterNameToEnum()
           
 HashMap<String,Long> getCounters()
           
 boolean getDone()
           
 ExecMapperContext getExecContext()
           
 Object getGroupKeyObject()
           
 String getIdentifier()
          This function is not named getId(), to make sure java serialization does NOT serialize it.
 String getName()
          Implements the getName function for the Node Interface.
 String getOperatorId()
           
 List<Operator<? extends Serializable>> getParentOperators()
           
 RowSchema getSchema()
           
 Map<Enum<?>,Long> getStats()
           
 int getType()
          Should be overridden to return the type of the specific operator among the types in OperatorType.
protected  void incrCounter(String name, long amount)
          this is called in operators in map or reduce tasks.
protected static ObjectInspector[] initEvaluators(ExprNodeEvaluator[] evals, ObjectInspector rowInspector)
          Initialize an array of ExprNodeEvaluator and return the result ObjectInspectors.
protected static StructObjectInspector initEvaluatorsAndReturnStruct(ExprNodeEvaluator[] evals, List<String> outputColName, ObjectInspector rowInspector)
          Initialize an array of ExprNodeEvaluator and put the return values into a StructObjectInspector with integer field names.
 void initialize(org.apache.hadoop.conf.Configuration hconf, ObjectInspector[] inputOIs)
          Initializes operators only if all parents have been initialized.
protected  void initializeChildren(org.apache.hadoop.conf.Configuration hconf)
          Calls initialize on each of the children with outputObjetInspector as the output row format.
 void initializeCounters()
           
 void initializeLocalWork(org.apache.hadoop.conf.Configuration hconf)
           
protected  void initializeOp(org.apache.hadoop.conf.Configuration hconf)
          Operator specific initialization.
 void initOperatorId()
           
 void jobClose(org.apache.hadoop.conf.Configuration conf, boolean success, JobCloseFeedBack feedBack)
          Unlike other operator interfaces which are called from map or reduce task, jobClose is called from the jobclient side once the job has completed.
 void logStats()
           
 void passExecContext(ExecMapperContext execContext)
          Pass the execContext reference to every child operator
 void preorderMap(Operator.OperatorFunc opFunc)
           
 void process(Object row, int tag)
          Process the row.
abstract  void processOp(Object row, int tag)
          Process the row.
 void removeChild(Operator<? extends Serializable> child)
           
 void replaceChild(Operator<? extends Serializable> child, Operator<? extends Serializable> newChild)
          Replace one child with another at the same position.
 void replaceParent(Operator<? extends Serializable> parent, Operator<? extends Serializable> newParent)
          Replace one parent with another at the same position.
static void resetId()
           
static void resetLastEnumUsed()
           
 void resetStats()
           
 void setAlias(String alias)
          Store the alias this operator is working on behalf of.
 void setChildOperators(List<Operator<? extends Serializable>> childOperators)
           
 void setColumnExprMap(Map<String,ExprNodeDesc> colExprMap)
           
 void setConf(T conf)
           
 void setCounterNames(ArrayList<String> counterNames)
           
 void setCounterNameToEnum(HashMap<String,Operator.ProgressCounter> counterNameToEnum)
           
 void setDone(boolean done)
           
 void setExecContext(ExecMapperContext execContext)
           
 void setGroupKeyObject(Object keyObject)
           
 void setId(String id)
           
 void setOperatorId(String operatorId)
           
 void setOutputCollector(org.apache.hadoop.mapred.OutputCollector out)
           
 void setParentOperators(List<Operator<? extends Serializable>> parentOperators)
           
 void setReporter(org.apache.hadoop.mapred.Reporter rep)
           
 void setSchema(RowSchema rowSchema)
           
 void startGroup()
           
 void updateCounters(org.apache.hadoop.mapred.Counters ctrs)
          called in ExecDriver.progress periodically.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

childOperators

protected List<Operator<? extends Serializable>> childOperators

parentOperators

protected List<Operator<? extends Serializable>> parentOperators

operatorId

protected String operatorId

counterNames

protected ArrayList<String> counterNames
List of counter names associated with the operator. It contains the following default counters NUM_INPUT_ROWS NUM_OUTPUT_ROWS TIME_TAKEN Individual operators can add to this list via addToCounterNames methods.


counterNameToEnum

protected HashMap<String,Operator.ProgressCounter> counterNameToEnum
Each operator has its own map of its counter names to disjoint ProgressCounter - it is populated at compile time and is read in at run-time while extracting the operator specific counts.


state

protected transient Operator.State state

conf

protected T extends Serializable conf

done

protected boolean done

statsMap

protected transient HashMap<Enum<?>,org.apache.hadoop.io.LongWritable> statsMap

out

protected transient org.apache.hadoop.mapred.OutputCollector out

LOG

protected transient org.apache.commons.logging.Log LOG

alias

protected transient String alias

reporter

protected transient org.apache.hadoop.mapred.Reporter reporter

id

protected transient String id

inputObjInspectors

protected transient ObjectInspector[] inputObjInspectors

outputObjInspector

protected transient ObjectInspector outputObjInspector

colExprMap

protected transient Map<String,ExprNodeDesc> colExprMap
A map of output column name to input expression map. This is used by optimizer and built during semantic analysis contains only key elements for reduce sink and group by op


childOperatorsArray

protected transient Operator<? extends Serializable>[] childOperatorsArray
Cache childOperators in an array for faster access. childOperatorsArray is accessed per row, so it's important to make the access efficient.


childOperatorsTag

protected transient int[] childOperatorsTag

counters

protected transient HashMap<String,Long> counters
populated at runtime from hadoop counters at run time in the client.


inputRows

protected transient long inputRows

outputRows

protected transient long outputRows

beginTime

protected transient long beginTime

totalTime

protected transient long totalTime

groupKeyObject

protected transient Object groupKeyObject

numInputRowsCntr

protected static String numInputRowsCntr

numOutputRowsCntr

protected static String numOutputRowsCntr

timeTakenCntr

protected static String timeTakenCntr

fatalErrorCntr

protected static String fatalErrorCntr
Constructor Detail

Operator

public Operator()

Operator

public Operator(org.apache.hadoop.mapred.Reporter reporter)
Create an operator with a reporter.

Parameters:
reporter - Used to report progress of certain operators.
Method Detail

resetId

public static void resetId()

setChildOperators

public void setChildOperators(List<Operator<? extends Serializable>> childOperators)

getChildOperators

public List<Operator<? extends Serializable>> getChildOperators()

getChildren

public ArrayList<Node> getChildren()
Implements the getChildren function for the Node Interface.

Specified by:
getChildren in interface Node
Returns:
List

setParentOperators

public void setParentOperators(List<Operator<? extends Serializable>> parentOperators)

getParentOperators

public List<Operator<? extends Serializable>> getParentOperators()

setConf

public void setConf(T conf)

getConf

public T getConf()

getDone

public boolean getDone()

setDone

public void setDone(boolean done)

setSchema

public void setSchema(RowSchema rowSchema)

getSchema

public RowSchema getSchema()

setId

public void setId(String id)

getIdentifier

public String getIdentifier()
This function is not named getId(), to make sure java serialization does NOT serialize it. Some TestParse tests will fail if we serialize this field, since the Operator ID will change based on the number of query tests.


setReporter

public void setReporter(org.apache.hadoop.mapred.Reporter rep)

setOutputCollector

public void setOutputCollector(org.apache.hadoop.mapred.OutputCollector out)

setAlias

public void setAlias(String alias)
Store the alias this operator is working on behalf of.


getStats

public Map<Enum<?>,Long> getStats()

areAllParentsInitialized

protected boolean areAllParentsInitialized()
checks whether all parent operators are initialized or not.

Returns:
true if there are no parents or all parents are initialized. false otherwise

initialize

public void initialize(org.apache.hadoop.conf.Configuration hconf,
                       ObjectInspector[] inputOIs)
                throws HiveException
Initializes operators only if all parents have been initialized. Calls operator specific initializer which then initializes child ops.

Parameters:
hconf -
inputOIs - input object inspector array indexes by tag id. null value is ignored.
Throws:
HiveException

initializeLocalWork

public void initializeLocalWork(org.apache.hadoop.conf.Configuration hconf)
                         throws HiveException
Throws:
HiveException

initializeOp

protected void initializeOp(org.apache.hadoop.conf.Configuration hconf)
                     throws HiveException
Operator specific initialization.

Throws:
HiveException

initializeChildren

protected void initializeChildren(org.apache.hadoop.conf.Configuration hconf)
                           throws HiveException
Calls initialize on each of the children with outputObjetInspector as the output row format.

Throws:
HiveException

passExecContext

public void passExecContext(ExecMapperContext execContext)
Pass the execContext reference to every child operator


processOp

public abstract void processOp(Object row,
                               int tag)
                        throws HiveException
Process the row.

Parameters:
row - The object representing the row.
tag - The tag of the row usually means which parent this row comes from. Rows with the same tag should have exactly the same rowInspector all the time.
Throws:
HiveException

process

public void process(Object row,
                    int tag)
             throws HiveException
Process the row.

Parameters:
row - The object representing the row.
tag - The tag of the row usually means which parent this row comes from. Rows with the same tag should have exactly the same rowInspector all the time.
Throws:
HiveException

startGroup

public void startGroup()
                throws HiveException
Throws:
HiveException

endGroup

public void endGroup()
              throws HiveException
Throws:
HiveException

allInitializedParentsAreClosed

protected boolean allInitializedParentsAreClosed()

close

public void close(boolean abort)
           throws HiveException
Throws:
HiveException

closeOp

protected void closeOp(boolean abort)
                throws HiveException
Operator specific close routine. Operators which inherents this class should overwrite this funtion for their specific cleanup routine.

Throws:
HiveException

jobClose

public void jobClose(org.apache.hadoop.conf.Configuration conf,
                     boolean success,
                     JobCloseFeedBack feedBack)
              throws HiveException
Unlike other operator interfaces which are called from map or reduce task, jobClose is called from the jobclient side once the job has completed.

Parameters:
conf - Configuration with with which job was submitted
success - whether the job was completed successfully or not
Throws:
HiveException

replaceChild

public void replaceChild(Operator<? extends Serializable> child,
                         Operator<? extends Serializable> newChild)
Replace one child with another at the same position. The parent of the child is not changed

Parameters:
child - the old child
newChild - the new child

removeChild

public void removeChild(Operator<? extends Serializable> child)

replaceParent

public void replaceParent(Operator<? extends Serializable> parent,
                          Operator<? extends Serializable> newParent)
Replace one parent with another at the same position. Chilren of the new parent are not updated

Parameters:
parent - the old parent
newParent - the new parent

forward

protected void forward(Object row,
                       ObjectInspector rowInspector)
                throws HiveException
Throws:
HiveException

resetStats

public void resetStats()

preorderMap

public void preorderMap(Operator.OperatorFunc opFunc)

logStats

public void logStats()

getName

public String getName()
Implements the getName function for the Node Interface.

Specified by:
getName in interface Node
Returns:
the name of the operator

getColumnExprMap

public Map<String,ExprNodeDesc> getColumnExprMap()
Returns a map of output column name to input expression map Note that currently it returns only key columns for ReduceSink and GroupBy operators.

Returns:
null if the operator doesn't change columns

setColumnExprMap

public void setColumnExprMap(Map<String,ExprNodeDesc> colExprMap)

dump

public String dump(int level)

dump

public String dump(int level,
                   HashSet<Integer> seenOpts)

initEvaluators

protected static ObjectInspector[] initEvaluators(ExprNodeEvaluator[] evals,
                                                  ObjectInspector rowInspector)
                                           throws HiveException
Initialize an array of ExprNodeEvaluator and return the result ObjectInspectors.

Throws:
HiveException

initEvaluatorsAndReturnStruct

protected static StructObjectInspector initEvaluatorsAndReturnStruct(ExprNodeEvaluator[] evals,
                                                                     List<String> outputColName,
                                                                     ObjectInspector rowInspector)
                                                              throws HiveException
Initialize an array of ExprNodeEvaluator and put the return values into a StructObjectInspector with integer field names.

Throws:
HiveException

incrCounter

protected void incrCounter(String name,
                           long amount)
this is called in operators in map or reduce tasks.

Parameters:
name -
amount -

getCounterNames

public ArrayList<String> getCounterNames()

setCounterNames

public void setCounterNames(ArrayList<String> counterNames)

getOperatorId

public String getOperatorId()

initOperatorId

public void initOperatorId()

setOperatorId

public void setOperatorId(String operatorId)

getCounters

public HashMap<String,Long> getCounters()

updateCounters

public void updateCounters(org.apache.hadoop.mapred.Counters ctrs)
called in ExecDriver.progress periodically.

Parameters:
ctrs - counters from the running job

checkFatalErrors

public boolean checkFatalErrors(org.apache.hadoop.mapred.Counters ctrs,
                                StringBuilder errMsg)
Recursively check this operator and its descendants to see if the fatal error counter is set to non-zero.

Parameters:
ctrs -

fatalErrorMessage

protected void fatalErrorMessage(StringBuilder errMsg,
                                 long counterValue)
Get the fatal error message based on counter's code.

Parameters:
errMsg - error message should be appended to this output parameter.
counterValue - input counter code.

resetLastEnumUsed

public static void resetLastEnumUsed()

assignCounterNameToEnum

public void assignCounterNameToEnum()
Called only in SemanticAnalyzer after all operators have added their own set of counter names.


initializeCounters

public void initializeCounters()

getCounterNameToEnum

public HashMap<String,Operator.ProgressCounter> getCounterNameToEnum()

setCounterNameToEnum

public void setCounterNameToEnum(HashMap<String,Operator.ProgressCounter> counterNameToEnum)

getType

public int getType()
Should be overridden to return the type of the specific operator among the types in OperatorType.

Returns:
OperatorType.* or -1 if not overridden

setGroupKeyObject

public void setGroupKeyObject(Object keyObject)

getGroupKeyObject

public Object getGroupKeyObject()

augmentPlan

public void augmentPlan()
Called during semantic analysis as operators are being added in order to give them a chance to compute any additional plan information needed. Does nothing by default.


getExecContext

public ExecMapperContext getExecContext()

setExecContext

public void setExecContext(ExecMapperContext execContext)


Copyright © 2010 The Apache Software Foundation