org.apache.pig.piggybank.evaluation.stats
Class COV
java.lang.Object
org.apache.pig.EvalFunc<DataBag>
org.apache.pig.piggybank.evaluation.stats.COV
- All Implemented Interfaces:
- Serializable, Algebraic
public class COV
- extends EvalFunc<DataBag>
- implements Algebraic, Serializable
Computes the covariance between sets of data. The returned value
will be a bag which will contain a tuple for each combination of input
schema and inside tuple we will have two schema name and covariance between
those two schemas.
- Parameters:
data sets
- tuple which contain DataBag corresponding to each data set, and inside
DataBag we have tuple corresponding to each data atom. E.g. ({(1),(2)},{(3),(4)})
.
- Return Value:
DataBag which contain every possible combination of input schemas
covariance between data sets
- Return Schema:
- covariance
- Example:
register statistics.jar;
A = load 'input.xml' using PigStorage(':');
B = group A all;
define c COV('a','b','c');
D = foreach B generate group,c(A.$0,A.$1,A.$2);
- Author:
- ajay garg
- See Also:
- http://en.wikipedia.org/wiki/Covariance
,
Serialized Form
Methods inherited from class org.apache.pig.EvalFunc |
finish, getArgToFuncMapping, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setPigLogger, setReporter, warn |
schemaName
protected Vector<String> schemaName
COV
public COV()
COV
public COV(String... schemaName)
exec
public DataBag exec(Tuple input)
throws IOException
- Function to compute covariance between data sets.
- Specified by:
exec
in class EvalFunc<DataBag>
- Parameters:
input
- input tuple which contains data sets.output
- output dataBag which contain covariance between each pair of data sets.
- Returns:
- result, of type T.
- Throws:
IOException
toString
public String toString()
- Function to return argument of constructor as string. It append ( and ) at starting and end or argument respectively.
If default constructor is called is returns empty string.
- Overrides:
toString
in class Object
- Returns:
- argument of constructor
getInitial
public String getInitial()
- Specified by:
getInitial
in interface Algebraic
- Returns:
- A string to instatiate f_init. f_init should be an eval func
getIntermed
public String getIntermed()
- Specified by:
getIntermed
in interface Algebraic
- Returns:
- A string to instantiate f_intermed. f_intermed should be an eval func
getFinal
public String getFinal()
- Specified by:
getFinal
in interface Algebraic
- Returns:
- A string to instantiate f_final. f_final should be an eval func parametrized by
the same datum as the eval func implementing this interface
combine
protected static Tuple combine(DataBag values)
throws IOException
- combine results of different data chunk
- Parameters:
values
- DataBag containing partial results computed on different data chunksoutput
- Tuple containing combined data
- Throws:
IOException
computeAll
protected static Tuple computeAll(DataBag first,
DataBag second)
throws IOException
- compute sum(XY), sum(X), sum(Y) from given data sets
- Parameters:
first
- DataBag containing first data setsecond
- DataBag containing second data set
- Returns:
- tuple containing sum(XY), sum(X), sum(Y)
- Throws:
IOException
outputSchema
public Schema outputSchema(Schema input)
- Overrides:
outputSchema
in class EvalFunc<DataBag>
- Parameters:
input
- Schema of the input
- Returns:
- Schema of the output
Copyright © ${year} The Apache Software Foundation