org.apache.pig.backend.hadoop.executionengine.mapReduceLayer
Class PigOutputFormat

java.lang.Object
  extended by org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat
All Implemented Interfaces:
org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.WritableComparable,Tuple>

public class PigOutputFormat
extends Object
implements org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.WritableComparable,Tuple>

The better half of PigInputFormat which is responsible for the Store functionality. It is the exact mirror image of PigInputFormat having RecordWriter instead of a RecordReader.


Nested Class Summary
static class PigOutputFormat.PigRecordWriter
           
 
Field Summary
static String PIG_OUTPUT_FUNC
           
 
Constructor Summary
PigOutputFormat()
           
 
Method Summary
 void checkOutputSpecs(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.mapred.JobConf job)
           
 PigOutputFormat.PigRecordWriter getRecordWriter(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.fs.Path outputDir, String name, org.apache.hadoop.util.Progressable progress)
           
 org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.WritableComparable,Tuple> getRecordWriter(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.mapred.JobConf job, String name, org.apache.hadoop.util.Progressable progress)
          In general, the mechanism for an OutputFormat in Pig to get hold of the storeFunc and the metadata information (for now schema and location provided for the store in the pig script) is through the following Utility static methods: MapRedUtil.getStoreFunc(JobConf) - this will get the StoreFunc reference to use in the RecordWriter.write() MapRedUtil.getStoreConfig(JobConf) - this will get the StoreConfig reference which has metadata like the location (the string supplied with store statement in the script) and the Schema of the data.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PIG_OUTPUT_FUNC

public static final String PIG_OUTPUT_FUNC
See Also:
Constant Field Values
Constructor Detail

PigOutputFormat

public PigOutputFormat()
Method Detail

getRecordWriter

public org.apache.hadoop.mapred.RecordWriter<org.apache.hadoop.io.WritableComparable,Tuple> getRecordWriter(org.apache.hadoop.fs.FileSystem fs,
                                                                                                            org.apache.hadoop.mapred.JobConf job,
                                                                                                            String name,
                                                                                                            org.apache.hadoop.util.Progressable progress)
                                                                                                     throws IOException
In general, the mechanism for an OutputFormat in Pig to get hold of the storeFunc and the metadata information (for now schema and location provided for the store in the pig script) is through the following Utility static methods: MapRedUtil.getStoreFunc(JobConf) - this will get the StoreFunc reference to use in the RecordWriter.write() MapRedUtil.getStoreConfig(JobConf) - this will get the StoreConfig reference which has metadata like the location (the string supplied with store statement in the script) and the Schema of the data. The OutputFormat should NOT use the location in the StoreConfig to write the output if the location represents a Hadoop dfs path. This is because when "speculative execution" is turned on in Hadoop, multiple attempts for the same task (for a given partition) may be running at the same time. So using the location will mean that these different attempts will over-write each other's output. The OutputFormat should use FileOutputFormat.getWorkOutputPath(JobConf) which will provide a safe output directory into which the OutputFormat should write the part file (given by the name argument in the getRecordWriter() call).

Specified by:
getRecordWriter in interface org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.WritableComparable,Tuple>
Throws:
IOException

getRecordWriter

public PigOutputFormat.PigRecordWriter getRecordWriter(org.apache.hadoop.fs.FileSystem fs,
                                                       org.apache.hadoop.mapred.JobConf job,
                                                       org.apache.hadoop.fs.Path outputDir,
                                                       String name,
                                                       org.apache.hadoop.util.Progressable progress)
                                                throws IOException
Throws:
IOException

checkOutputSpecs

public void checkOutputSpecs(org.apache.hadoop.fs.FileSystem fs,
                             org.apache.hadoop.mapred.JobConf job)
                      throws IOException
Specified by:
checkOutputSpecs in interface org.apache.hadoop.mapred.OutputFormat<org.apache.hadoop.io.WritableComparable,Tuple>
Throws:
IOException


Copyright © ${year} The Apache Software Foundation