org.apache.hcatalog.mapreduce
Class HCatOutputStorageDriver

java.lang.Object
  extended by org.apache.hcatalog.mapreduce.HCatOutputStorageDriver
Direct Known Subclasses:
RCFileOutputDriver

public abstract class HCatOutputStorageDriver
extends java.lang.Object

The abstract class to be implemented by underlying storage drivers to enable data access from HCat through HCatOutputFormat.


Constructor Summary
HCatOutputStorageDriver()
           
 
Method Summary
 void abortOutputCommitterJob(org.apache.hadoop.mapreduce.TaskAttemptContext context, org.apache.hadoop.mapreduce.JobStatus.State state)
          Implementation that calls the underlying output committer's abortJob, used in lieu of underlying committer's abortJob when using dynamic partitioning This should be written in a manner that is okay to call after having had multiple underlying outputcommitters write to task dirs inside it.
 void cleanupOutputCommitterJob(org.apache.hadoop.mapreduce.TaskAttemptContext context)
          Implementation that calls the underlying output committer's cleanupJob, used in lieu of underlying committer's cleanupJob when using dynamic partitioning This should be written in a manner that is okay to call after having had multiple underlying outputcommitters write to task dirs inside it.
abstract  org.apache.hadoop.io.Writable convertValue(HCatRecord value)
          Convert the given HCatRecord value to the actual value type.
abstract  org.apache.hadoop.io.WritableComparable<?> generateKey(HCatRecord value)
          Generate the key for the underlying outputformat.
abstract  org.apache.hadoop.mapreduce.OutputFormat<? super org.apache.hadoop.io.WritableComparable<?>,? super org.apache.hadoop.io.Writable> getOutputFormat()
          Returns the OutputFormat to use with this Storage Driver.
 java.lang.String getOutputLocation(org.apache.hadoop.mapreduce.JobContext jobContext, java.lang.String tableLocation, java.util.List<java.lang.String> partitionCols, java.util.Map<java.lang.String,java.lang.String> partitionValues, java.lang.String dynHash)
          Gets the location to use for the specified partition values.
 org.apache.hadoop.fs.Path getWorkFilePath(org.apache.hadoop.mapreduce.TaskAttemptContext context, java.lang.String outputLoc)
          Default implementation assumes FileOutputFormat.
 void initialize(org.apache.hadoop.mapreduce.JobContext context, java.util.Properties hcatProperties)
          Initialize the storage driver with specified properties, default implementation does nothing.
abstract  void setOutputPath(org.apache.hadoop.mapreduce.JobContext jobContext, java.lang.String location)
          Set the data location for the output.
abstract  void setPartitionValues(org.apache.hadoop.mapreduce.JobContext jobContext, java.util.Map<java.lang.String,java.lang.String> partitionValues)
          Sets the partition key values for the partition being written.
abstract  void setSchema(org.apache.hadoop.mapreduce.JobContext jobContext, HCatSchema schema)
          Set the schema for the data being written out.
 void setupOutputCommitterJob(org.apache.hadoop.mapreduce.TaskAttemptContext context)
          Implementation that calls the underlying output committer's setupJob, used in lieu of underlying committer's setupJob when using dynamic partitioning The default implementation should be overriden by underlying implementations that do not use FileOutputCommitter.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HCatOutputStorageDriver

public HCatOutputStorageDriver()
Method Detail

initialize

public void initialize(org.apache.hadoop.mapreduce.JobContext context,
                       java.util.Properties hcatProperties)
                throws java.io.IOException
Initialize the storage driver with specified properties, default implementation does nothing.

Parameters:
context - the job context object
hcatProperties - the properties for the storage driver
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

getOutputFormat

public abstract org.apache.hadoop.mapreduce.OutputFormat<? super org.apache.hadoop.io.WritableComparable<?>,? super org.apache.hadoop.io.Writable> getOutputFormat()
                                                                                                                                                            throws java.io.IOException
Returns the OutputFormat to use with this Storage Driver.

Returns:
the OutputFormat instance
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

setOutputPath

public abstract void setOutputPath(org.apache.hadoop.mapreduce.JobContext jobContext,
                                   java.lang.String location)
                            throws java.io.IOException
Set the data location for the output.

Parameters:
jobContext - the job context object
location - the data location
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

setSchema

public abstract void setSchema(org.apache.hadoop.mapreduce.JobContext jobContext,
                               HCatSchema schema)
                        throws java.io.IOException
Set the schema for the data being written out.

Parameters:
jobContext - the job context object
schema - the data schema
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

setPartitionValues

public abstract void setPartitionValues(org.apache.hadoop.mapreduce.JobContext jobContext,
                                        java.util.Map<java.lang.String,java.lang.String> partitionValues)
                                 throws java.io.IOException
Sets the partition key values for the partition being written.

Parameters:
jobContext - the job context object
partitionValues - the partition values
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

generateKey

public abstract org.apache.hadoop.io.WritableComparable<?> generateKey(HCatRecord value)
                                                                throws java.io.IOException
Generate the key for the underlying outputformat. The value given to HCatOutputFormat is passed as the argument. The key given to HCatOutputFormat is ignored..

Parameters:
value - the value given to HCatOutputFormat
Returns:
a key instance
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

convertValue

public abstract org.apache.hadoop.io.Writable convertValue(HCatRecord value)
                                                    throws java.io.IOException
Convert the given HCatRecord value to the actual value type.

Parameters:
value - the HCatRecord value to convert
Returns:
a value instance
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

getOutputLocation

public java.lang.String getOutputLocation(org.apache.hadoop.mapreduce.JobContext jobContext,
                                          java.lang.String tableLocation,
                                          java.util.List<java.lang.String> partitionCols,
                                          java.util.Map<java.lang.String,java.lang.String> partitionValues,
                                          java.lang.String dynHash)
                                   throws java.io.IOException
Gets the location to use for the specified partition values. The storage driver can override as required.

Parameters:
jobContext - the job context object
tableLocation - the location of the table
partitionValues - the partition values
dynHash - A unique hash value that represents the dynamic partitioning job used
Returns:
the location String.
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

getWorkFilePath

public org.apache.hadoop.fs.Path getWorkFilePath(org.apache.hadoop.mapreduce.TaskAttemptContext context,
                                                 java.lang.String outputLoc)
                                          throws java.io.IOException
Default implementation assumes FileOutputFormat. Storage drivers wrapping other OutputFormats should override this method.

Throws:
java.io.IOException

setupOutputCommitterJob

public void setupOutputCommitterJob(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                             throws java.io.IOException,
                                    java.lang.InterruptedException
Implementation that calls the underlying output committer's setupJob, used in lieu of underlying committer's setupJob when using dynamic partitioning The default implementation should be overriden by underlying implementations that do not use FileOutputCommitter. The reason this function exists is so as to allow a storage driver implementor to override underlying OutputCommitter's setupJob implementation to allow for being called multiple times in a job, to make it idempotent. This should be written in a manner that is callable multiple times from individual tasks without stepping on each others' toes

Parameters:
context -
Throws:
java.lang.InterruptedException
java.io.IOException

cleanupOutputCommitterJob

public void cleanupOutputCommitterJob(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                               throws java.io.IOException,
                                      java.lang.InterruptedException
Implementation that calls the underlying output committer's cleanupJob, used in lieu of underlying committer's cleanupJob when using dynamic partitioning This should be written in a manner that is okay to call after having had multiple underlying outputcommitters write to task dirs inside it. While the base MR cleanupJob should have sufficed normally, this is provided in order to let people implementing setupOutputCommitterJob to cleanup properly

Parameters:
context -
Throws:
java.io.IOException
java.lang.InterruptedException

abortOutputCommitterJob

public void abortOutputCommitterJob(org.apache.hadoop.mapreduce.TaskAttemptContext context,
                                    org.apache.hadoop.mapreduce.JobStatus.State state)
                             throws java.io.IOException,
                                    java.lang.InterruptedException
Implementation that calls the underlying output committer's abortJob, used in lieu of underlying committer's abortJob when using dynamic partitioning This should be written in a manner that is okay to call after having had multiple underlying outputcommitters write to task dirs inside it. While the base MR cleanupJob should have sufficed normally, this is provided in order to let people implementing setupOutputCommitterJob to abort properly

Parameters:
context -
state -
Throws:
java.io.IOException
java.lang.InterruptedException