Package org.apache.pig.builtin

This package contains builtin Pig UDFs.

See:
          Description

Class Summary
ABS ABS implements a binding to the Java function Math.abs(double) for computing the absolute value of the argument.
ACOS ACOS implements a binding to the Java function Math.acos(double) for computing the arc cosine of value of the argument.
AddDuration AddDuration returns the result of a DateTime object plus a Duration object
AlgebraicByteArrayMathBase Core logic for applying an accumulative/algebraic math function to a bag of doubles.
AlgebraicByteArrayMathBase.Final  
AlgebraicByteArrayMathBase.Initial  
AlgebraicByteArrayMathBase.Intermediate  
AlgebraicDoubleMathBase Core logic for applying an accumulative/algebraic math function to a bag of doubles.
AlgebraicDoubleMathBase.Final  
AlgebraicDoubleMathBase.Intermediate  
AlgebraicFloatMathBase Core logic for applying an accumulative/algebraic math function to a bag of Floats.
AlgebraicFloatMathBase.Final  
AlgebraicFloatMathBase.Intermediate  
AlgebraicIntMathBase Core logic for applying an accumulative/algebraic math function to a bag of doubles.
AlgebraicIntMathBase.Final  
AlgebraicIntMathBase.Intermediate  
AlgebraicLongMathBase Core logic for applying an accumulative/algebraic math function to a bag of Longs.
AlgebraicLongMathBase.Final  
AlgebraicLongMathBase.Intermediate  
ARITY Deprecated. Use SIZE instead.
ASIN ASIN implements a binding to the Java function Math.asin(double) for computing the arc sine of value of the argument.
ATAN ATAN implements a binding to the Java function Math.atan(double) for computing the arc tangent of value of the argument.
AVG Generates the average of a set of values.
AVG.Final  
AVG.Initial  
AVG.Intermediate  
BagSize This method should never be used directly, use SIZE.
BagToString Flatten a bag into a string.
BagToTuple Flatten a bag into a tuple.
Base base class for math udfs
BinStorage Load and store data in a binary format.
Bloom Use a Bloom filter build previously by BuildBloom.
BuildBloom Build a bloom filter for use later in Bloom.
BuildBloom.Final  
BuildBloom.Initial  
BuildBloom.Intermediate  
BuildBloomBase<T> A Base class for BuildBloom and its Algebraic implementations.
CBRT CBRT implements a binding to the Java function Math.cbrt(double) for computing the cube root of the argument.
CEIL CEIL implements a binding to the Java function Math.ceil(double).
CONCAT Generates the concatenation of the first two arguments.
ConstantSize This method should never be used directly, use SIZE.
COR Computes the correlation between sets of data.
COR.Final  
COR.Initial  
COR.Intermed  
COS COS implements a binding to the Java function Math.cos(double).
COSH COSH implements a binding to the Java function Math.cosh(double).
COUNT Generates the count of the number of values in a bag.
COUNT_STAR Generates the count of the values of the first field of a tuple.
COUNT_STAR.Final  
COUNT_STAR.Initial  
COUNT_STAR.Intermediate  
COUNT.Final  
COUNT.Initial  
COUNT.Intermediate  
COV Computes the covariance between sets of data.
COV.Final  
COV.Initial  
COV.Intermed  
CubeDimensions Produces a DataBag with all combinations of the argument tuple members as in a data cube.
CurrentTime CURRENT_TIME generates the DateTime object of the current time.
DaysBetween DaysBetween returns the number of days between two DateTime objects
DIFF DIFF takes two bags as arguments and compares them.
Distinct Find the distinct set of tuples in a bag.
Distinct.Final  
Distinct.Initial  
Distinct.Intermediate  
DoubleAbs  
DoubleAvg This method should never be used directly, use AVG.
DoubleAvg.Final  
DoubleAvg.Initial  
DoubleAvg.Intermediate  
DoubleBase base class for math udfs that return Double value
DoubleMax This method should never be used directly, use MAX.
DoubleMax.Final  
DoubleMax.Intermediate  
DoubleMin This method should never be used directly, use MIN.
DoubleMin.Final  
DoubleMin.Intermediate  
DoubleRound Given a single data atom it Returns the closest long to the argument.
DoubleSum This method should never be used directly, use SUM.
DoubleSum.Final  
DoubleSum.Intermediate  
EXP Given a single data atom it returns the Euler's number e raised to the power of input
FloatAbs  
FloatAvg This method should never be used directly, use AVG.
FloatAvg.Final  
FloatAvg.Initial  
FloatAvg.Intermediate  
FloatMax This method should never be used directly, use MAX.
FloatMax.Final  
FloatMax.Intermediate  
FloatMin This method should never be used directly, use MIN.
FloatMin.Final  
FloatMin.Intermediate  
FloatRound ROUND implements a binding to the Java function Math.round(float).
FloatSum This method should never be used directly, use SUM.
FLOOR FLOOR implements a binding to the Java function Math.floor(double).
FunctionWrapperEvalFunc EvalFunc that wraps an implementation of the Function interface, which is passed as a String in the constructor.
GenericInvoker<T> The generic Invoker class does all the common grunt work of setting up an invoker.
GetDay GetDay extracts the day of a month from a DateTime object.
GetHour GetHour extracts the hour of a day from a DateTime object.
GetMilliSecond GetSecond extracts the millisecond of a second from a DateTime object.
GetMinute GetMinute extracts the minute of an hour from a DateTime object.
GetMonth GetMonth extracts the month of a year from a DateTime object.
GetSecond GetSecond extracts the second of a minute from a DateTime object.
GetWeek GetMonth extracts the week of a week year from a DateTime object.
GetWeekYear GetMonth extracts the week year from a DateTime object.
GetYear GetYear extracts the year from a DateTime object.
HoursBetween HoursBetween returns the number of hours between two DateTime objects
INDEXOF INDEXOF implements eval function to search for a string Example: A = load 'mydata' as (name); B = foreach A generate INDEXOF(name, ",");
IntAbs ABS implements a binding to the Java function Math.abs(int) for computing the absolute value of the argument.
IntAvg This method should never be used directly, use AVG.
IntAvg.Final  
IntAvg.Initial  
IntAvg.Intermediate  
IntMax This method should never be used directly, use MAX.
IntMax.Final  
IntMax.Intermediate  
IntMin This method should never be used directly, use MIN.
IntMin.Final  
IntMin.Intermediate  
IntSum This method should never be used directly, use SUM.
INVERSEMAP This UDF accepts a Map as input with values of any primitive data type.
InvokeForDouble  
InvokeForFloat  
InvokeForInt  
InvokeForLong  
InvokeForString  
Invoker<T>  
IsEmpty Determine whether a bag or map is empty.
JsonLoader A loader for data stored using JsonStorage.
JsonMetadata Reads and Writes metadata using JSON in metafiles next to the data.
JsonStorage A JSON Pig store function.
KEYSET This UDF takes a Map and returns a Bag containing the keyset.
LAST_INDEX_OF string.INSTR implements eval function to search for the last occurrence of a string Returns null on error Example: A = load 'mydata' as (name); B = foreach A generate LASTINDEXOF(name, ",");
LCFIRST lower-case the first character of a string
LOG LOG implements a binding to the Java function Math.log(double).
LOG10 LOG10 implements a binding to the Java function Math.log10(double).
LongAbs  
LongAvg This method should never be used directly, use AVG.
LongAvg.Final  
LongAvg.Initial  
LongAvg.Intermediate  
LongMax This method should never be used directly, use MAX.
LongMax.Final  
LongMax.Intermediate  
LongMin This method should never be used directly, use MIN.
LongMin.Final  
LongMin.Intermediate  
LongSum This method should never be used directly, use SUM.
LongSum.Final  
LongSum.Intermediate  
LOWER LOWER implements eval function to convert a string to lower case Example: A = load 'mydata' as (name); B = foreach A generate LOWER(name);
MapSize This method should never be used directly, use SIZE.
MAX Generates the maximum of a set of values.
MAX.Final  
MAX.Intermediate  
MilliSecondsBetween MilliSecondsBetween returns the number of milliseconds between two DateTime objects
MIN Generates the minimum of a set of values.
MIN.Final  
MIN.Intermediate  
MinutesBetween MinutesBetween returns the number of minutes between two DateTime objects
MonthsBetween MonthsBetween returns the number of months between two DateTime objects
PigStorage A load function that parses a line of input into fields using a character delimiter.
PigStreaming The default implementation of PigToStream and StreamToPig interfaces.
RANDOM Return a random double value.
REGEX_EXTRACT Syntax: String RegexExtract(String expression, String regex, int match_index). Input: expression-source string. regex-regular expression. match_index-index of the group to extract. Output: extracted group, if fail, return null. Matching strategy: Try to only match the first sequence by using Matcher.find() instead of Matcher.matches() (default useMatches=false). DEFINE NON_GREEDY_EXTRACT REGEX_EXTRACT(true);
REGEX_EXTRACT_ALL Syntax: String RegexExtractAll(String expression, String regex). Input: expression-source string. regex-regular expression. Output: A tuple of matched strings. Matching strategy: Trying to match the entire input by using Matcher.matches() instead of Matcher.find() (default useMatches=true). DEFINE GREEDY_EXTRACT REGEX_EXTRACT(false);
REPLACE REPLACE implements eval function to replace part of a string.
RollupDimensions Produces a DataBag with hierarchy of values (from the most detailed level of aggregation to most general level of aggregation) of the specified dimensions For example, (a, b, c) will produce the following bag:
ROUND ROUND implements a binding to the Java function Math.round(double).
SecondsBetween SecondsBetween returns the number of seconds between two DateTime objects
SIN SIN implements a binding to the Java function Math.sin(double).
SINH SINH implements a binding to the Java function Math.sinh(double).
SIZE Generates the size of the argument passed to it.
SQRT SQRT implements a binding to the Java function Math.sqrt(double).
STARTSWITH Pig UDF to test input tuple.get(0) against tuple.get(1) to determine if the first argument starts with the string in the second.
StringConcat This method should never be used directly, use CONCAT.
StringMax This method should never be used directly, use MAX.
StringMax.Final  
StringMax.Initial  
StringMax.Intermediate  
StringMin This method should never be used directly, use MIN.
StringMin.Final  
StringMin.Initial  
StringMin.Intermediate  
StringSize This method should never be used directly, use SIZE.
STRSPLIT Wrapper around Java's String.split
input tuple: first column is assumed to have a string to split;
the optional second column is assumed to have the delimiter or regex to split on;
if not provided, it's assumed to be '\s' (space)
the optional third column may provide a limit to the number of results.
If limit is not provided, 0 is assumed, as per Java's split().
SUBSTRING SUBSTRING implements eval function to get a part of a string.
SubtractDuration SubtractDuration returns the result of a DateTime object plus a Duration object
SUM Generates the sum of a set of values.
SUM.Final  
SUM.Intermediate  
TAN TAN implements a binding to the Java function Math.tan(double).
TANH TANH implements a binding to the Java function Math.tanh(double).
TextLoader This load function simply creates a tuple for each line of text that has a single chararray field that contains the line of text.
TOBAG This class takes a list of items and puts them into a bag T = foreach U generate TOBAG($0, $1, $2); It's like saying this: T = foreach U generate {($0), ($1), ($2)} All arguments that are not of tuple type are inserted into a tuple before being added to the bag.
ToDate ToDate converts the ISO or the customized string or the Unix timestamp to the DateTime object.
ToDate2ARGS This method should never be used directly, use ToDate.
ToDate3ARGS This method should never be used directly, use ToDate.
ToDateISO This method should never be used directly, use ToDate.
TOKENIZE Given a chararray as an argument, this method will split the chararray and return a bag with a tuple for each chararray that results from the split.
TOMAP This class makes a map out of the parameters passed to it T = foreach U generate TOMAP($0, $1, $2, $3); It generates a map $0->1, $2->$3
ToMilliSeconds ToUnixTime converts the DateTime to the number of milliseconds that have passed since January 1, 1970 00:00:00.000 GMT.
TOP Top UDF accepts a bag of tuples and returns top-n tuples depending upon the tuple field value of type long.
TOP.Final  
TOP.Initial  
TOP.Intermed  
ToString ToString converts the DateTime object of the ISO or the customized string.
TOTUPLE This class makes a tuple out of the parameter T = foreach U generate TOTUPLE($0, $1, $2); It generates a tuple containing $0, $1, and $2
ToUnixTime ToUnixTime converts the DateTime to the Unix Time Long
TRIM Returns a string, with leading and trailing whitespace omitted.
TupleSize This method should never be used directly, use SIZE.
UCFIRST upper-case the first character of a string
UPPER UPPER implements eval function to convert a string to upper case Example: A = load 'mydata' as (name); B = foreach A generate UPPER(name);
Utf8StorageConverter This abstract class provides standard conversions between utf8 encoded data and pig data types.
VALUELIST This UDF takes a Map and returns a Bag containing the values from map.
VALUESET This UDF takes a Map and returns a Tuple containing the value set.
WeeksBetween WeeksBetween returns the number of weeks between two DateTime objects
YearsBetween YearsBetween returns the number of years between two DateTime objects
 

Annotation Types Summary
MonitoredUDF Describes how the execution of a UDF should be monitored, and what to do if it times out.
Nondeterministic A non-deterministic UDF is one that can produce different results when invoked on the same input.
OutputSchema An EvalFunc can annotated with an OutputSchema to tell Pig what the expected output is.
 

Package org.apache.pig.builtin Description

This package contains builtin Pig UDFs. This includes EvalFuncs, LoadFuncs and StoreFuncs.



Copyright © 2007-2012 The Apache Software Foundation