org.apache.pig.backend.hadoop.executionengine.mapReduceLayer
Class PigOutputFormat
java.lang.Object
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat
- All Implemented Interfaces:
- org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.WritableComparable,Tuple>
public class PigOutputFormat
- extends Object
- implements org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.WritableComparable,Tuple>
The better half of PigInputFormat which is responsible
for the Store functionality. It is the exact mirror
image of PigInputFormat having RecordWriter instead
of a RecordReader.
Method Summary |
void |
checkOutputSpecs(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.mapred.JobConf job)
|
PigOutputFormat.PigRecordWriter |
getRecordWriter(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.fs.Path outputDir,
String name,
org.apache.hadoop.util.Progressable progress)
|
org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.WritableComparable,Tuple> |
getRecordWriter(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.mapred.JobConf job,
String name,
org.apache.hadoop.util.Progressable progress)
In general, the mechanism for an OutputFormat in Pig to get hold of the storeFunc
and the metadata information (for now schema and location provided for the store in
the pig script) is through the following Utility static methods:
MapRedUtil.getStoreFunc(JobConf)
- this will get the StoreFunc reference to use in the RecordWriter.write()
MapRedUtil.getStoreConfig(JobConf) - this will get the StoreConfig
reference which has metadata like the location (the string supplied with store statement in the script)
and the Schema of the data. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
PIG_OUTPUT_FUNC
public static final String PIG_OUTPUT_FUNC
- See Also:
- Constant Field Values
PigOutputFormat
public PigOutputFormat()
getRecordWriter
public org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.WritableComparable,Tuple> getRecordWriter(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.mapred.JobConf job,
String name,
org.apache.hadoop.util.Progressable progress)
throws IOException
- In general, the mechanism for an OutputFormat in Pig to get hold of the storeFunc
and the metadata information (for now schema and location provided for the store in
the pig script) is through the following Utility static methods:
MapRedUtil.getStoreFunc(JobConf)
- this will get the StoreFunc
reference to use in the RecordWriter.write()
MapRedUtil.getStoreConfig(JobConf)
- this will get the StoreConfig
reference which has metadata like the location (the string supplied with store statement in the script)
and the Schema
of the data. The OutputFormat
should NOT use the location in the StoreConfig to write the output if the location represents a
Hadoop dfs path. This is because when "speculative execution" is turned on in Hadoop, multiple
attempts for the same task (for a given partition) may be running at the same time. So using the
location will mean that these different attempts will over-write each other's output.
The OutputFormat should use
FileOutputFormat.getWorkOutputPath(JobConf)
which will provide a safe output directory into which the OutputFormat should write
the part file (given by the name argument in the getRecordWriter() call).
- Specified by:
getRecordWriter
in interface org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.WritableComparable,Tuple>
- Throws:
IOException
getRecordWriter
public PigOutputFormat.PigRecordWriter getRecordWriter(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.fs.Path outputDir,
String name,
org.apache.hadoop.util.Progressable progress)
throws IOException
- Throws:
IOException
checkOutputSpecs
public void checkOutputSpecs(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.mapred.JobConf job)
throws IOException
- Specified by:
checkOutputSpecs
in interface org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.WritableComparable,Tuple>
- Throws:
IOException
Copyright © ${year} The Apache Software Foundation