org.apache.hadoop.vertica
Class VerticaInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,VerticaRecord>
      extended by org.apache.hadoop.vertica.VerticaInputFormat

public class VerticaInputFormat
extends InputFormat<org.apache.hadoop.io.LongWritable,VerticaRecord>

Input formatter that returns the results of a query executed against Vertica. The key is a record number within the result set of each mapper The value is a VerticaRecord, which uses a similar interface to JDBC ResultSets for returning values.


Constructor Summary
VerticaInputFormat()
           
 
Method Summary
 RecordReader<org.apache.hadoop.io.LongWritable,VerticaRecord> createRecordReader(InputSplit split, TaskAttemptContext context)
          Create a record reader for a given split.
 List<InputSplit> getSplits(JobContext context)
          Logically split the set of input files for the job.
static void setInput(Job job, String inputQuery)
          Set the input query for a job
static void setInput(Job job, String inpuQuery, Collection<List<Object>> segmentParams)
          Set the input query and a collection of parameter lists
static void setInput(Job job, String inputQuery, String... segmentParams)
          Set the input query and any number of comma delimited literal list of parameters
static void setInput(Job job, String inputQuery, String segmentParamsQuery)
          Set a parameterized input query for a job and the query that returns the parameters.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

VerticaInputFormat

public VerticaInputFormat()
Method Detail

setInput

public static void setInput(Job job,
                            String inputQuery)
Set the input query for a job

Parameters:
job -
inputQuery - query to run against Vertica

setInput

public static void setInput(Job job,
                            String inputQuery,
                            String segmentParamsQuery)
Set a parameterized input query for a job and the query that returns the parameters.

Parameters:
job -
inputQuery - SQL query that has parameters specified by question marks ("?")
segmentParamsQuery - SQL query that returns parameters for the input query

setInput

public static void setInput(Job job,
                            String inputQuery,
                            String... segmentParams)
                     throws IOException
Set the input query and any number of comma delimited literal list of parameters

Parameters:
job -
inputQuery - SQL query that has parameters specified by question marks ("?")
segmentParams - any numer of comma delimited strings with literal parameters to substitute in the input query
Throws:
IOException

setInput

public static void setInput(Job job,
                            String inpuQuery,
                            Collection<List<Object>> segmentParams)
                     throws IOException
Set the input query and a collection of parameter lists

Parameters:
job -
inpuQuery - SQL query that has parameters specified by question marks ("?")
segmentParams - collection of ordered lists to subtitute into the input query
Throws:
IOException

createRecordReader

public RecordReader<org.apache.hadoop.io.LongWritable,VerticaRecord> createRecordReader(InputSplit split,
                                                                                        TaskAttemptContext context)
                                                                                 throws IOException
Create a record reader for a given split. The framework will call RecordReader.initialize(InputSplit, TaskAttemptContext) before the split is used.

Specified by:
createRecordReader in class InputFormat<org.apache.hadoop.io.LongWritable,VerticaRecord>
Parameters:
split - the split to be read
context - the information about the task
Returns:
a new record reader
Throws:
IOException

getSplits

public List<InputSplit> getSplits(JobContext context)
                           throws IOException
Logically split the set of input files for the job.

Each InputSplit is then assigned to an individual Mapper for processing.

Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple. The InputFormat also creates the RecordReader to read the InputSplit.

Specified by:
getSplits in class InputFormat<org.apache.hadoop.io.LongWritable,VerticaRecord>
Parameters:
context - job configuration.
Returns:
an array of InputSplits for the job.
Throws:
IOException


Copyright © 2009 The Apache Software Foundation