|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.LoadFunc
org.apache.pig.FileInputLoadFunc
org.apache.pig.piggybank.storage.HiveColumnarLoader
public class HiveColumnarLoader
Loader for Hive RC Columnar files.
Supports the following types:
*
Hive Type | Pig Type from DataType |
---|---|
string | CHARARRAY |
int | INTEGER |
bigint or long | LONG |
float | float |
double | DOUBLE |
boolean | BOOLEAN |
byte | BYTE |
array | TUPLE |
map | MAP |
a = LOAD 'file' USING HiveColumnarLoader("uid bigint, ts long, arr array, m map");
-- to reference the fields
b = FOREACH GENERATE a.uid, a.ts, a.arr, a.m;
Usage 2:
a = LOAD 'file' USING HiveColumnarLoader("uid bigint, ts long, arr array, m map", "2009-10-01:2009-10-02");
-- to reference the fields
b = FOREACH GENERATE a.uid, a.ts, a.arr, a.m;
Usage 3:
Nested Class Summary |
---|
Nested classes/interfaces inherited from interface org.apache.pig.LoadPushDown |
---|
LoadPushDown.OperatorSet, LoadPushDown.RequiredField, LoadPushDown.RequiredFieldList, LoadPushDown.RequiredFieldResponse |
Field Summary | |
---|---|
static String |
DAY_DATE_COLUMN
|
protected static org.apache.commons.logging.Log |
LOG
|
protected static Pattern |
pcols
Regex to filter out column names |
protected TupleFactory |
tupleFactory
|
Constructor Summary | |
---|---|
HiveColumnarLoader(String table_schema)
Table schema should be a space and comma separated string describing the Hive schema. For example uid BIGINT, pid long, means 1 column of uid type BIGINT and one column of pid type LONG. The types are not case sensitive. |
|
HiveColumnarLoader(String table_schema,
String dateRange)
Table schema should be a space and comma separated string describing the Hive schema. For example uid BIGINT, pid long, means 1 column of uid type BIGINT and one column of pid type LONG. The types are not case sensitive. |
|
HiveColumnarLoader(String table_schema,
String dateRange,
String columns)
|
Method Summary | |
---|---|
List<LoadPushDown.OperatorSet> |
getFeatures()
Determine the operators that can be pushed to the loader. |
org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,BytesRefArrayWritable> |
getInputFormat()
This will be called during planning on the front end. |
Tuple |
getNext()
Retrieves the next tuple to be processed. |
String[] |
getPartitionKeys(String location,
org.apache.hadoop.mapreduce.Job job)
Find what columns are partition keys for this input. |
ResourceSchema |
getSchema(String location,
org.apache.hadoop.mapreduce.Job job)
Get a schema for the data to be loaded. |
ResourceStatistics |
getStatistics(String location,
org.apache.hadoop.mapreduce.Job job)
Get statistics about the data to be loaded. |
void |
prepareToRead(org.apache.hadoop.mapreduce.RecordReader reader,
PigSplit split)
Initializes LoadFunc for reading data. |
LoadPushDown.RequiredFieldResponse |
pushProjection(LoadPushDown.RequiredFieldList requiredFieldList)
Indicate to the loader fields that will be needed. |
void |
setLocation(String locationStr,
org.apache.hadoop.mapreduce.Job job)
Communicate to the loader the location of the object(s) being loaded. |
void |
setPartitionFilter(Expression partitionFilter)
Set the filter for partitioning. |
Methods inherited from class org.apache.pig.FileInputLoadFunc |
---|
getSplitComparable |
Methods inherited from class org.apache.pig.LoadFunc |
---|
getAbsolutePath, getLoadCaster, getPathStrings, join, relativeToAbsolutePath, setUDFContextSignature |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String DAY_DATE_COLUMN
protected static final Pattern pcols
protected static final org.apache.commons.logging.Log LOG
protected TupleFactory tupleFactory
Constructor Detail |
---|
public HiveColumnarLoader(String table_schema)
table_schema
- This property cannot be nullpublic HiveColumnarLoader(String table_schema, String dateRange)
table_schema
- This property cannot be nulldateRange
- must have format yyyy-MM-dd:yyy-MM-dd only dates between these two dates inclusively will be considered.public HiveColumnarLoader(String table_schema, String dateRange, String columns)
Method Detail |
---|
public org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,BytesRefArrayWritable> getInputFormat() throws IOException
LoadFunc
getInputFormat
in class LoadFunc
IOException
- if there is an exception during InputFormat
constructionpublic Tuple getNext() throws IOException
LoadFunc
getNext
in class LoadFunc
IOException
- if there is an exception while retrieving the next
tuplepublic void prepareToRead(org.apache.hadoop.mapreduce.RecordReader reader, PigSplit split) throws IOException
LoadFunc
prepareToRead
in class LoadFunc
reader
- RecordReader
to be used by this instance of the LoadFuncsplit
- The input PigSplit
to process
IOException
- if there is an exception during initializationpublic void setLocation(String locationStr, org.apache.hadoop.mapreduce.Job job) throws IOException
LoadFunc
LoadFunc.relativeToAbsolutePath(String, Path)
. Implementations
should use this method to communicate the location (and any other information)
to its underlying InputFormat through the Job object.
This method will be called in the backend multiple times. Implementations
should bear in mind that this method is called multiple times and should
ensure there are no inconsistent side effects due to the multiple calls.
setLocation
in class LoadFunc
locationStr
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, Path)
job
- the Job
object
store or retrieve earlier stored information from the UDFContext
IOException
- if the location is not valid.public String[] getPartitionKeys(String location, org.apache.hadoop.mapreduce.Job job) throws IOException
LoadMetadata
getPartitionKeys
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
IOException
- if an exception occurs while retrieving partition keyspublic ResourceSchema getSchema(String location, org.apache.hadoop.mapreduce.Job job) throws IOException
LoadMetadata
getSchema
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
IOException
- if an exception occurs while determining the schemapublic ResourceStatistics getStatistics(String location, org.apache.hadoop.mapreduce.Job job) throws IOException
LoadMetadata
getStatistics
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
IOException
- if an exception occurs while retrieving statisticspublic void setPartitionFilter(Expression partitionFilter) throws IOException
LoadMetadata
LoadMetadata.getPartitionKeys(String, Job)
, then this method is not
called by pig runtime. This method is also not called by the pig runtime
if there are no partition filter conditions.
setPartitionFilter
in interface LoadMetadata
partitionFilter
- that describes filter for partitioning
IOException
- if the filter is not compatible with the storage
mechanism or contains non-partition fields.public List<LoadPushDown.OperatorSet> getFeatures()
LoadPushDown
getFeatures
in interface LoadPushDown
public LoadPushDown.RequiredFieldResponse pushProjection(LoadPushDown.RequiredFieldList requiredFieldList) throws FrontendException
LoadPushDown
pushProjection
in interface LoadPushDown
requiredFieldList
- RequiredFieldList indicating which columns will be needed.
FrontendException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |