org.apache.hadoop.hive.ql.udf.generic
Class GenericUDAFEvaluator

java.lang.Object
  extended by org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator
Direct Known Subclasses:
GenericUDAFAverage.GenericUDAFAverageEvaluator, GenericUDAFBridge.GenericUDAFBridgeEvaluator, GenericUDAFCount.GenericUDAFCountEvaluator, GenericUDAFHistogramNumeric.GenericUDAFHistogramNumericEvaluator, GenericUDAFMax.GenericUDAFMaxEvaluator, GenericUDAFMin.GenericUDAFMinEvaluator, GenericUDAFSum.GenericUDAFSumDouble, GenericUDAFSum.GenericUDAFSumLong, GenericUDAFVariance.GenericUDAFVarianceEvaluator

public abstract class GenericUDAFEvaluator
extends Object

A Generic User-defined aggregation function (GenericUDAF) for the use with Hive. New GenericUDAF classes need to inherit from this GenericUDAF class. The GenericUDAF are superior to normal UDAFs in the following ways: 1. It can accept arguments of complex types, and return complex types. 2. It can accept variable length of arguments. 3. It can accept an infinite number of function signature - for example, it's easy to write a GenericUDAF that accepts array, array> and so on (arbitrary levels of nesting).


Nested Class Summary
static interface GenericUDAFEvaluator.AggregationBuffer
          The interface for a class that is used to store the aggregation result during the process of aggregation.
static class GenericUDAFEvaluator.Mode
          Mode.
 
Constructor Summary
GenericUDAFEvaluator()
          The constructor.
 
Method Summary
 void aggregate(GenericUDAFEvaluator.AggregationBuffer agg, Object[] parameters)
          This function will be called by GroupByOperator when it sees a new input row.
 Object evaluate(GenericUDAFEvaluator.AggregationBuffer agg)
          This function will be called by GroupByOperator when it sees a new input row.
abstract  GenericUDAFEvaluator.AggregationBuffer getNewAggregationBuffer()
          Get a new aggregation object.
 ObjectInspector init(GenericUDAFEvaluator.Mode m, ObjectInspector[] parameters)
          Initialize the evaluator.
abstract  void iterate(GenericUDAFEvaluator.AggregationBuffer agg, Object[] parameters)
          Iterate through original data.
abstract  void merge(GenericUDAFEvaluator.AggregationBuffer agg, Object partial)
          Merge with partial aggregation result.
abstract  void reset(GenericUDAFEvaluator.AggregationBuffer agg)
          Reset the aggregation.
abstract  Object terminate(GenericUDAFEvaluator.AggregationBuffer agg)
          Get final aggregation result.
abstract  Object terminatePartial(GenericUDAFEvaluator.AggregationBuffer agg)
          Get partial aggregation result.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GenericUDAFEvaluator

public GenericUDAFEvaluator()
The constructor.

Method Detail

init

public ObjectInspector init(GenericUDAFEvaluator.Mode m,
                            ObjectInspector[] parameters)
                     throws HiveException
Initialize the evaluator.

Parameters:
m - The mode of aggregation.
parameters - The ObjectInspector for the parameters: In PARTIAL1 and COMPLETE mode, the parameters are original data; In PARTIAL2 and FINAL mode, the parameters are just partial aggregations (in that case, the array will always have a single element).
Returns:
The ObjectInspector for the return value. In PARTIAL1 and PARTIAL2 mode, the ObjectInspector for the return value of terminatePartial() call; In FINAL and COMPLETE mode, the ObjectInspector for the return value of terminate() call. NOTE: We need ObjectInspector[] (in addition to the TypeInfo[] in GenericUDAFResolver) for 2 reasons: 1. ObjectInspector contains more information than TypeInfo; and GenericUDAFEvaluator.init at execution time. 2. We call GenericUDAFResolver.getEvaluator at compilation time,
Throws:
HiveException

getNewAggregationBuffer

public abstract GenericUDAFEvaluator.AggregationBuffer getNewAggregationBuffer()
                                                                        throws HiveException
Get a new aggregation object.

Throws:
HiveException

reset

public abstract void reset(GenericUDAFEvaluator.AggregationBuffer agg)
                    throws HiveException
Reset the aggregation. This is useful if we want to reuse the same aggregation.

Throws:
HiveException

aggregate

public void aggregate(GenericUDAFEvaluator.AggregationBuffer agg,
                      Object[] parameters)
               throws HiveException
This function will be called by GroupByOperator when it sees a new input row.

Parameters:
agg - The object to store the aggregation result.
parameters - The row, can be inspected by the OIs passed in init().
Throws:
HiveException

evaluate

public Object evaluate(GenericUDAFEvaluator.AggregationBuffer agg)
                throws HiveException
This function will be called by GroupByOperator when it sees a new input row.

Parameters:
agg - The object to store the aggregation result.
Throws:
HiveException

iterate

public abstract void iterate(GenericUDAFEvaluator.AggregationBuffer agg,
                             Object[] parameters)
                      throws HiveException
Iterate through original data.

Parameters:
parameters - The objects of parameters.
Throws:
HiveException

terminatePartial

public abstract Object terminatePartial(GenericUDAFEvaluator.AggregationBuffer agg)
                                 throws HiveException
Get partial aggregation result.

Returns:
partial aggregation result.
Throws:
HiveException

merge

public abstract void merge(GenericUDAFEvaluator.AggregationBuffer agg,
                           Object partial)
                    throws HiveException
Merge with partial aggregation result. NOTE: null might be passed in case there is no input data.

Parameters:
partial - The partial aggregation result.
Throws:
HiveException

terminate

public abstract Object terminate(GenericUDAFEvaluator.AggregationBuffer agg)
                          throws HiveException
Get final aggregation result.

Returns:
final aggregation result.
Throws:
HiveException


Copyright © 2010 The Apache Software Foundation