org.apache.hcatalog.mapreduce
Class HCatInputStorageDriver

java.lang.Object
  extended by org.apache.hcatalog.mapreduce.HCatInputStorageDriver
Direct Known Subclasses:
LoadFuncBasedInputDriver, RCFileInputDriver

public abstract class HCatInputStorageDriver
extends java.lang.Object

The abstract class to be implemented by underlying storage drivers to enable data access from HCat through HCatInputFormat.


Constructor Summary
HCatInputStorageDriver()
           
 
Method Summary
abstract  HCatRecord convertToHCatRecord(org.apache.hadoop.io.WritableComparable baseKey, org.apache.hadoop.io.Writable baseValue)
          Converts to HCatRecord format usable by HCatInputFormat to convert to required valuetype.
abstract  org.apache.hadoop.mapreduce.InputFormat<? extends org.apache.hadoop.io.WritableComparable,? extends org.apache.hadoop.io.Writable> getInputFormat(java.util.Properties hcatProperties)
          Returns the InputFormat to use with this Storage Driver.
 void initialize(org.apache.hadoop.mapreduce.JobContext context, java.util.Properties storageDriverArgs)
           
 void setInputPath(org.apache.hadoop.mapreduce.JobContext jobContext, java.lang.String location)
          Set the data location for the input.
abstract  void setOriginalSchema(org.apache.hadoop.mapreduce.JobContext jobContext, HCatSchema hcatSchema)
          Set the schema of the data as originally published in HCat.
abstract  void setOutputSchema(org.apache.hadoop.mapreduce.JobContext jobContext, HCatSchema hcatSchema)
          Set the consolidated schema for the HCatRecord data returned by the storage driver.
abstract  void setPartitionValues(org.apache.hadoop.mapreduce.JobContext jobContext, java.util.Map<java.lang.String,java.lang.String> partitionValues)
          Sets the partition key values for the current partition.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HCatInputStorageDriver

public HCatInputStorageDriver()
Method Detail

initialize

public void initialize(org.apache.hadoop.mapreduce.JobContext context,
                       java.util.Properties storageDriverArgs)
                throws java.io.IOException
Throws:
java.io.IOException

getInputFormat

public abstract org.apache.hadoop.mapreduce.InputFormat<? extends org.apache.hadoop.io.WritableComparable,? extends org.apache.hadoop.io.Writable> getInputFormat(java.util.Properties hcatProperties)
Returns the InputFormat to use with this Storage Driver.

Parameters:
hcatProperties - the properties containing parameters required for initialization of InputFormat
Returns:
the InputFormat instance

convertToHCatRecord

public abstract HCatRecord convertToHCatRecord(org.apache.hadoop.io.WritableComparable baseKey,
                                               org.apache.hadoop.io.Writable baseValue)
                                        throws java.io.IOException
Converts to HCatRecord format usable by HCatInputFormat to convert to required valuetype. Implementers of StorageDriver should look to overwriting this function so as to convert their value type to HCatRecord. Default implementation is provided for StorageDriver implementations on top of an underlying InputFormat that already uses HCatRecord as a tuple

Parameters:
baseValue - the underlying value to convert to HCatRecord
Throws:
java.io.IOException

setInputPath

public void setInputPath(org.apache.hadoop.mapreduce.JobContext jobContext,
                         java.lang.String location)
                  throws java.io.IOException
Set the data location for the input.

Parameters:
jobContext - the job context object
location - the data location
Throws:
java.io.IOException - Signals that an I/O exception has occurred. Default implementation for FileInputFormat based Input Formats. Override this for other input formats.

setOriginalSchema

public abstract void setOriginalSchema(org.apache.hadoop.mapreduce.JobContext jobContext,
                                       HCatSchema hcatSchema)
                                throws java.io.IOException
Set the schema of the data as originally published in HCat. The storage driver might validate that this matches with the schema it has (like Zebra) or it will use this to create a HCatRecord matching the output schema.

Parameters:
jobContext - the job context object
hcatSchema - the schema published in HCat for this data
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

setOutputSchema

public abstract void setOutputSchema(org.apache.hadoop.mapreduce.JobContext jobContext,
                                     HCatSchema hcatSchema)
                              throws java.io.IOException
Set the consolidated schema for the HCatRecord data returned by the storage driver. All tuples returned by the RecordReader should have this schema. Nulls should be inserted for columns not present in the data.

Parameters:
jobContext - the job context object
hcatSchema - the schema to use as the consolidated schema
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

setPartitionValues

public abstract void setPartitionValues(org.apache.hadoop.mapreduce.JobContext jobContext,
                                        java.util.Map<java.lang.String,java.lang.String> partitionValues)
                                 throws java.io.IOException
Sets the partition key values for the current partition. The storage driver is passed this so that the storage driver can add the partition key values to the output HCatRecord if the partition key values are not present on disk.

Parameters:
jobContext - the job context object
partitionValues - the partition values having a map with partition key name as key and the HCatKeyValue as value
Throws:
java.io.IOException - Signals that an I/O exception has occurred.