org.apache.pig
Interface Algebraic
- All Known Implementing Classes:
- AVG, COR, COUNT, COUNT_STAR, COV, Distinct, DoubleAvg, DoubleMax, DoubleMin, DoubleSum, FloatAvg, FloatMax, FloatMin, FloatSum, IntAvg, IntMax, IntMin, IntSum, LongAvg, LongMax, LongMin, LongSum, MAX, MaxTupleBy1stField, MIN, StringMax, StringMin, SUM, Top
public interface Algebraic
An interface to declare that an EvalFunc's
calculation can be decomposed into intitial, intermediate, and final steps.
More formally, suppose we have to compute an function f over a bag X. In general, we need to know the entire X
before we can make any progress on f. However, some functions are algebraic e.g. SUM. In
these cases, you can apply some initital function f_init on subsets of X to get partial results.
You can then combine partial results from different subsets of X using an intermediate function
f_intermed. To get the final answers, several partial results can be combined by invoking a final
f_final function. For the function SUM, f_init, f_intermed, and f_final are all SUM.
See the code for builtin AVG to get a better idea of how algebraic works.
When eval functions implement this interface, it is a hint to the system to try and compute
partial results early which causes queries to run faster.
getInitial
String getInitial()
- Returns:
- A string to instatiate f_init. f_init should be an eval func
getIntermed
String getIntermed()
- Returns:
- A string to instantiate f_intermed. f_intermed should be an eval func
getFinal
String getFinal()
- Returns:
- A string to instantiate f_final. f_final should be an eval func parametrized by
the same datum as the eval func implementing this interface
Copyright © ${year} The Apache Software Foundation