org.apache.hcatalog.mapreduce
Class HCatOutputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.WritableComparable<?>,HCatRecord>
      extended by org.apache.hcatalog.mapreduce.HCatBaseOutputFormat
          extended by org.apache.hcatalog.mapreduce.HCatOutputFormat

public class HCatOutputFormat
extends HCatBaseOutputFormat

The OutputFormat to use to write data to HCat. The key value is ignored and and should be given as null. The value is the HCatRecord to write.


Field Summary
protected static java.lang.String DYNTEMP_DIR_NAME
           
protected static java.lang.String TEMP_DIR_NAME
          The directory under which data is initially written for a non partitioned table
 
Constructor Summary
HCatOutputFormat()
           
 
Method Summary
 org.apache.hadoop.mapreduce.OutputCommitter getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
          Get the output committer for this output format.
 org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.WritableComparable<?>,HCatRecord> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
          Get the record writer for the job.
static void prepareOutputLocation(HCatOutputStorageDriver osd, org.apache.hadoop.mapreduce.TaskAttemptContext context)
          Any initialization of file paths, set permissions and group on freshly created files This is called at RecordWriter instantiation time which can be at write-time for a dynamic partitioning usecase
static void setOutput(org.apache.hadoop.mapreduce.Job job, HCatTableInfo outputInfo)
          Set the info about the output to write for the Job.
static void setSchema(org.apache.hadoop.mapreduce.Job job, HCatSchema schema)
          Set the schema for the data being written out to the partition.
 
Methods inherited from class org.apache.hcatalog.mapreduce.HCatBaseOutputFormat
checkOutputSpecs, getJobInfo, getOutputDriverInstance, getOutputFormat, getTableSchema, setPartDetails
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TEMP_DIR_NAME

protected static final java.lang.String TEMP_DIR_NAME
The directory under which data is initially written for a non partitioned table

See Also:
Constant Field Values

DYNTEMP_DIR_NAME

protected static final java.lang.String DYNTEMP_DIR_NAME
See Also:
Constant Field Values
Constructor Detail

HCatOutputFormat

public HCatOutputFormat()
Method Detail

setOutput

public static void setOutput(org.apache.hadoop.mapreduce.Job job,
                             HCatTableInfo outputInfo)
                      throws java.io.IOException
Set the info about the output to write for the Job. This queries the metadata server to find the StorageDriver to use for the table. Throws error if partition is already published.

Parameters:
job - the job object
outputInfo - the table output info
Throws:
java.io.IOException - the exception in communicating with the metadata server

setSchema

public static void setSchema(org.apache.hadoop.mapreduce.Job job,
                             HCatSchema schema)
                      throws java.io.IOException
Set the schema for the data being written out to the partition. The table schema is used by default for the partition if this is not called.

Parameters:
job - the job object
schema - the schema for the data
Throws:
java.io.IOException

getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter<org.apache.hadoop.io.WritableComparable<?>,HCatRecord> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                                                throws java.io.IOException,
                                                                                                                       java.lang.InterruptedException
Get the record writer for the job. Uses the Table's default OutputStorageDriver to get the record writer.

Specified by:
getRecordWriter in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.WritableComparable<?>,HCatRecord>
Parameters:
context - the information about the current task.
Returns:
a RecordWriter to write the output for the job.
Throws:
java.io.IOException
java.lang.InterruptedException

getOutputCommitter

public org.apache.hadoop.mapreduce.OutputCommitter getOutputCommitter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                               throws java.io.IOException,
                                                                      java.lang.InterruptedException
Get the output committer for this output format. This is responsible for ensuring the output is committed correctly.

Specified by:
getOutputCommitter in class org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.WritableComparable<?>,HCatRecord>
Parameters:
context - the task context
Returns:
an output committer
Throws:
java.io.IOException
java.lang.InterruptedException

prepareOutputLocation

public static void prepareOutputLocation(HCatOutputStorageDriver osd,
                                         org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                  throws java.io.IOException
Any initialization of file paths, set permissions and group on freshly created files This is called at RecordWriter instantiation time which can be at write-time for a dynamic partitioning usecase

Parameters:
context -
Throws:
java.io.IOException