org.apache.hcatalog.pig.drivers
Class LoadFuncBasedInputDriver

java.lang.Object
  extended by org.apache.hcatalog.mapreduce.HCatInputStorageDriver
      extended by org.apache.hcatalog.pig.drivers.LoadFuncBasedInputDriver
Direct Known Subclasses:
PigStorageInputDriver

public abstract class LoadFuncBasedInputDriver
extends HCatInputStorageDriver

This is a base class which wraps a Load func in HCatInputStorageDriver. If you already have a LoadFunc, then this class along with LoadFuncBasedInputFormat is doing all the heavy lifting. For a new HCat Input Storage Driver just extend it and override the initialize(). PigStorageInputDriver illustrates that well.


Field Summary
protected  org.apache.pig.LoadFunc lf
           
 
Constructor Summary
LoadFuncBasedInputDriver()
           
 
Method Summary
 HCatRecord convertToHCatRecord(org.apache.hadoop.io.WritableComparable baseKey, org.apache.hadoop.io.Writable baseValue)
          Converts to HCatRecord format usable by HCatInputFormat to convert to required valuetype.
 org.apache.hadoop.mapreduce.InputFormat<? extends org.apache.hadoop.io.WritableComparable,? extends org.apache.hadoop.io.Writable> getInputFormat(java.util.Properties hcatProperties)
          Returns the InputFormat to use with this Storage Driver.
 void initialize(org.apache.hadoop.mapreduce.JobContext context, java.util.Properties storageDriverArgs)
           
 void setInputPath(org.apache.hadoop.mapreduce.JobContext jobContext, java.lang.String location)
          Set the data location for the input.
 void setOriginalSchema(org.apache.hadoop.mapreduce.JobContext jobContext, HCatSchema hcatSchema)
          Set the schema of the data as originally published in HCat.
 void setOutputSchema(org.apache.hadoop.mapreduce.JobContext jobContext, HCatSchema hcatSchema)
          Set the consolidated schema for the HCatRecord data returned by the storage driver.
 void setPartitionValues(org.apache.hadoop.mapreduce.JobContext jobContext, java.util.Map<java.lang.String,java.lang.String> partitionValues)
          Sets the partition key values for the current partition.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

lf

protected org.apache.pig.LoadFunc lf
Constructor Detail

LoadFuncBasedInputDriver

public LoadFuncBasedInputDriver()
Method Detail

convertToHCatRecord

public HCatRecord convertToHCatRecord(org.apache.hadoop.io.WritableComparable baseKey,
                                      org.apache.hadoop.io.Writable baseValue)
                               throws java.io.IOException
Description copied from class: HCatInputStorageDriver
Converts to HCatRecord format usable by HCatInputFormat to convert to required valuetype. Implementers of StorageDriver should look to overwriting this function so as to convert their value type to HCatRecord. Default implementation is provided for StorageDriver implementations on top of an underlying InputFormat that already uses HCatRecord as a tuple

Specified by:
convertToHCatRecord in class HCatInputStorageDriver
baseValue - the underlying value to convert to HCatRecord
Throws:
java.io.IOException

getInputFormat

public org.apache.hadoop.mapreduce.InputFormat<? extends org.apache.hadoop.io.WritableComparable,? extends org.apache.hadoop.io.Writable> getInputFormat(java.util.Properties hcatProperties)
Description copied from class: HCatInputStorageDriver
Returns the InputFormat to use with this Storage Driver.

Specified by:
getInputFormat in class HCatInputStorageDriver
Parameters:
hcatProperties - the properties containing parameters required for initialization of InputFormat
Returns:
the InputFormat instance

setOriginalSchema

public void setOriginalSchema(org.apache.hadoop.mapreduce.JobContext jobContext,
                              HCatSchema hcatSchema)
                       throws java.io.IOException
Description copied from class: HCatInputStorageDriver
Set the schema of the data as originally published in HCat. The storage driver might validate that this matches with the schema it has (like Zebra) or it will use this to create a HCatRecord matching the output schema.

Specified by:
setOriginalSchema in class HCatInputStorageDriver
Parameters:
jobContext - the job context object
hcatSchema - the schema published in HCat for this data
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

setOutputSchema

public void setOutputSchema(org.apache.hadoop.mapreduce.JobContext jobContext,
                            HCatSchema hcatSchema)
                     throws java.io.IOException
Description copied from class: HCatInputStorageDriver
Set the consolidated schema for the HCatRecord data returned by the storage driver. All tuples returned by the RecordReader should have this schema. Nulls should be inserted for columns not present in the data.

Specified by:
setOutputSchema in class HCatInputStorageDriver
Parameters:
jobContext - the job context object
hcatSchema - the schema to use as the consolidated schema
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

setPartitionValues

public void setPartitionValues(org.apache.hadoop.mapreduce.JobContext jobContext,
                               java.util.Map<java.lang.String,java.lang.String> partitionValues)
                        throws java.io.IOException
Description copied from class: HCatInputStorageDriver
Sets the partition key values for the current partition. The storage driver is passed this so that the storage driver can add the partition key values to the output HCatRecord if the partition key values are not present on disk.

Specified by:
setPartitionValues in class HCatInputStorageDriver
Parameters:
jobContext - the job context object
partitionValues - the partition values having a map with partition key name as key and the HCatKeyValue as value
Throws:
java.io.IOException - Signals that an I/O exception has occurred.

initialize

public void initialize(org.apache.hadoop.mapreduce.JobContext context,
                       java.util.Properties storageDriverArgs)
                throws java.io.IOException
Overrides:
initialize in class HCatInputStorageDriver
Throws:
java.io.IOException

setInputPath

public void setInputPath(org.apache.hadoop.mapreduce.JobContext jobContext,
                         java.lang.String location)
                  throws java.io.IOException
Description copied from class: HCatInputStorageDriver
Set the data location for the input.

Overrides:
setInputPath in class HCatInputStorageDriver
Parameters:
jobContext - the job context object
location - the data location
Throws:
java.io.IOException - Signals that an I/O exception has occurred. Default implementation for FileInputFormat based Input Formats. Override this for other input formats.