org.apache.hcatalog.mapreduce
Class HCatBaseInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.WritableComparable,HCatRecord>
      extended by org.apache.hcatalog.mapreduce.HCatBaseInputFormat
Direct Known Subclasses:
HCatEximInputFormat, HCatInputFormat

public abstract class HCatBaseInputFormat
extends org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.WritableComparable,HCatRecord>


Constructor Summary
HCatBaseInputFormat()
           
 
Method Summary
 org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.WritableComparable,HCatRecord> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
          Create the RecordReader for the given InputSplit.
static HCatSchema getOutputSchema(org.apache.hadoop.mapreduce.JobContext context)
          get the schema for the HCatRecord data returned by HCatInputFormat.
 java.util.List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
          Logically split the set of input files for the job.
static HCatSchema getTableSchema(org.apache.hadoop.mapreduce.JobContext context)
          Gets the HCatTable schema for the table specified in the HCatInputFormat.setInput call on the specified job context.
static void setOutputSchema(org.apache.hadoop.mapreduce.Job job, HCatSchema hcatSchema)
          Set the schema for the HCatRecord data returned by HCatInputFormat.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HCatBaseInputFormat

public HCatBaseInputFormat()
Method Detail

getOutputSchema

public static HCatSchema getOutputSchema(org.apache.hadoop.mapreduce.JobContext context)
                                  throws java.lang.Exception
get the schema for the HCatRecord data returned by HCatInputFormat.

Parameters:
context - the jobContext
Throws:
java.lang.IllegalArgumentException
java.lang.Exception

setOutputSchema

public static void setOutputSchema(org.apache.hadoop.mapreduce.Job job,
                                   HCatSchema hcatSchema)
                            throws java.lang.Exception
Set the schema for the HCatRecord data returned by HCatInputFormat.

Parameters:
job - the job object
hcatSchema - the schema to use as the consolidated schema
Throws:
java.lang.Exception

getSplits

public java.util.List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)
                                                                 throws java.io.IOException,
                                                                        java.lang.InterruptedException
Logically split the set of input files for the job. Returns the underlying InputFormat's splits

Specified by:
getSplits in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.WritableComparable,HCatRecord>
Parameters:
jobContext - the job context object
Returns:
the splits, an HCatInputSplit wrapper over the storage driver InputSplits
Throws:
java.io.IOException - or InterruptedException
java.lang.InterruptedException

createRecordReader

public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.WritableComparable,HCatRecord> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                                                       org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
                                                                                                                throws java.io.IOException,
                                                                                                                       java.lang.InterruptedException
Create the RecordReader for the given InputSplit. Returns the underlying RecordReader if the required operations are supported and schema matches with HCatTable schema. Returns an HCatRecordReader if operations need to be implemented in HCat.

Specified by:
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.WritableComparable,HCatRecord>
Parameters:
split - the split
taskContext - the task attempt context
Returns:
the record reader instance, either an HCatRecordReader(later) or the underlying storage driver's RecordReader
Throws:
java.io.IOException - or InterruptedException
java.lang.InterruptedException

getTableSchema

public static HCatSchema getTableSchema(org.apache.hadoop.mapreduce.JobContext context)
                                 throws java.lang.Exception
Gets the HCatTable schema for the table specified in the HCatInputFormat.setInput call on the specified job context. This information is available only after HCatInputFormat.setInput has been called for a JobContext.

Parameters:
context - the context
Returns:
the table schema
Throws:
java.lang.Exception - if HCatInputFromat.setInput has not been called for the current context