|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.pig.LoadFunc
org.apache.pig.FileInputLoadFunc
org.apache.pig.builtin.PigStorage
org.apache.pig.piggybank.storage.PigStorageSchema
public class PigStorageSchema
This Load/Store Func reads/writes metafiles that allow the schema and aliases to be determined at load time, saving one from having to manually enter schemas for pig-generated datasets. It also creates a ".pig_headers" file that simply lists the delimited aliases. This is intended to make export to tools that can read files with header lines easier (just cat the header to your data). Due to StoreFunc limitations, you can only write the metafiles in MapReduce mode. You can read them in Local or MapReduce mode.
Nested Class Summary |
---|
Nested classes/interfaces inherited from interface org.apache.pig.LoadPushDown |
---|
LoadPushDown.OperatorSet, LoadPushDown.RequiredField, LoadPushDown.RequiredFieldList, LoadPushDown.RequiredFieldResponse |
Field Summary |
---|
Fields inherited from class org.apache.pig.builtin.PigStorage |
---|
in, mLog, mRequiredColumns, signature, writer |
Constructor Summary | |
---|---|
PigStorageSchema()
|
|
PigStorageSchema(String delim)
|
Method Summary | |
---|---|
Tuple |
getNext()
Retrieves the next tuple to be processed. |
String[] |
getPartitionKeys(String location,
org.apache.hadoop.mapreduce.Job job)
Find what columns are partition keys for this input. |
ResourceSchema |
getSchema(String location,
org.apache.hadoop.mapreduce.Job job)
Get a schema for the data to be loaded. |
ResourceStatistics |
getStatistics(String location,
org.apache.hadoop.mapreduce.Job job)
Get statistics about the data to be loaded. |
void |
setPartitionFilter(Expression partitionFilter)
Set the filter for partitioning. |
void |
storeSchema(ResourceSchema schema,
String location,
org.apache.hadoop.mapreduce.Job job)
Store schema of the data being written |
void |
storeStatistics(ResourceStatistics stats,
String location,
org.apache.hadoop.mapreduce.Job job)
Store statistics about the data being written. |
Methods inherited from class org.apache.pig.builtin.PigStorage |
---|
checkSchema, cleanupOnFailure, equals, equals, getFeatures, getInputFormat, getOutputFormat, hashCode, prepareToRead, prepareToWrite, pushProjection, putNext, relToAbsPathForStoreLocation, setLocation, setStoreFuncUDFContextSignature, setStoreLocation, setUDFContextSignature |
Methods inherited from class org.apache.pig.FileInputLoadFunc |
---|
getSplitComparable |
Methods inherited from class org.apache.pig.LoadFunc |
---|
getAbsolutePath, getLoadCaster, getPathStrings, join, relativeToAbsolutePath |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public PigStorageSchema()
public PigStorageSchema(String delim)
Method Detail |
---|
public Tuple getNext() throws IOException
LoadFunc
getNext
in class PigStorage
IOException
- if there is an exception while retrieving the next
tuplepublic ResourceSchema getSchema(String location, org.apache.hadoop.mapreduce.Job job) throws IOException
LoadMetadata
getSchema
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
IOException
- if an exception occurs while determining the schemapublic ResourceStatistics getStatistics(String location, org.apache.hadoop.mapreduce.Job job) throws IOException
LoadMetadata
getStatistics
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
IOException
- if an exception occurs while retrieving statisticspublic void setPartitionFilter(Expression partitionFilter) throws IOException
LoadMetadata
LoadMetadata.getPartitionKeys(String, Job)
, then this method is not
called by Pig runtime. This method is also not called by the Pig runtime
if there are no partition filter conditions.
setPartitionFilter
in interface LoadMetadata
partitionFilter
- that describes filter for partitioning
IOException
- if the filter is not compatible with the storage
mechanism or contains non-partition fields.public String[] getPartitionKeys(String location, org.apache.hadoop.mapreduce.Job job) throws IOException
LoadMetadata
getPartitionKeys
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
IOException
- if an exception occurs while retrieving partition keyspublic void storeSchema(ResourceSchema schema, String location, org.apache.hadoop.mapreduce.Job job) throws IOException
StoreMetadata
storeSchema
in interface StoreMetadata
schema
- Schema to be recordedlocation
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
IOException
public void storeStatistics(ResourceStatistics stats, String location, org.apache.hadoop.mapreduce.Job job) throws IOException
StoreMetadata
storeStatistics
in interface StoreMetadata
stats
- statistics to be recordedlocation
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |