org.apache.hadoop.vertica
Class VerticaOutputFormat

java.lang.Object
  extended by org.apache.hadoop.mapreduce.OutputFormat<org.apache.hadoop.io.Text,VerticaRecord>
      extended by org.apache.hadoop.vertica.VerticaOutputFormat

public class VerticaOutputFormat
extends OutputFormat<org.apache.hadoop.io.Text,VerticaRecord>

Output formatter for loading reducer output to Vertica


Constructor Summary
VerticaOutputFormat()
           
 
Method Summary
 void checkOutputSpecs(JobContext context)
          Check for validity of the output-specification for the job.
 void checkOutputSpecs(JobContext context, boolean test)
          Test check specs (don't connect to db)
 OutputCommitter getOutputCommitter(TaskAttemptContext context)
          (@inheritDoc)
 RecordWriter<org.apache.hadoop.io.Text,VerticaRecord> getRecordWriter(TaskAttemptContext context)
          Get the RecordWriter for the given task.
static VerticaRecord getValue(org.apache.hadoop.conf.Configuration conf)
           
static void optimize(org.apache.hadoop.conf.Configuration conf)
          Optionally called at the end of a job to optimize any newly created and loaded tables.
static void setOutput(Job job, String tableName)
          Set the output table
static void setOutput(Job job, String tableName, boolean dropTable)
          Set the output table and whether to drop it before loading
static void setOutput(Job job, String tableName, boolean dropTable, String... tableDef)
          Set the output table, whether to drop it before loading and the create table specification if it doesn't exist
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

VerticaOutputFormat

public VerticaOutputFormat()
Method Detail

setOutput

public static void setOutput(Job job,
                             String tableName)
Set the output table

Parameters:
job -
tableName -

setOutput

public static void setOutput(Job job,
                             String tableName,
                             boolean dropTable)
Set the output table and whether to drop it before loading

Parameters:
job -
tableName -
dropTable -

setOutput

public static void setOutput(Job job,
                             String tableName,
                             boolean dropTable,
                             String... tableDef)
Set the output table, whether to drop it before loading and the create table specification if it doesn't exist

Parameters:
job -
tableName -
dropTable -
tableDef - list of column definitions such as "foo int", "bar varchar(10)"

checkOutputSpecs

public void checkOutputSpecs(JobContext context)
                      throws IOException
Check for validity of the output-specification for the job.

This is to validate the output specification for the job when it is a job is submitted. Typically checks that it does not already exist, throwing an exception when it already exists, so that output is not overwritten.

Specified by:
checkOutputSpecs in class OutputFormat<org.apache.hadoop.io.Text,VerticaRecord>
Parameters:
context - information about the job
Throws:
IOException - when output should not be attempted

checkOutputSpecs

public void checkOutputSpecs(JobContext context,
                             boolean test)
                      throws IOException
Test check specs (don't connect to db)

Parameters:
context -
test - true if testing
Throws:
IOException

getRecordWriter

public RecordWriter<org.apache.hadoop.io.Text,VerticaRecord> getRecordWriter(TaskAttemptContext context)
                                                                      throws IOException
Get the RecordWriter for the given task.

Specified by:
getRecordWriter in class OutputFormat<org.apache.hadoop.io.Text,VerticaRecord>
Parameters:
context - the information about the current task.
Returns:
a RecordWriter to write the output for the job.
Throws:
IOException

getValue

public static VerticaRecord getValue(org.apache.hadoop.conf.Configuration conf)
                              throws Exception
Throws:
Exception

optimize

public static void optimize(org.apache.hadoop.conf.Configuration conf)
                     throws Exception
Optionally called at the end of a job to optimize any newly created and loaded tables. Useful for new tables with more than 100k records.

Parameters:
conf -
Throws:
Exception

getOutputCommitter

public OutputCommitter getOutputCommitter(TaskAttemptContext context)
                                   throws IOException,
                                          InterruptedException
(@inheritDoc)

Specified by:
getOutputCommitter in class OutputFormat<org.apache.hadoop.io.Text,VerticaRecord>
Parameters:
context - the task context
Returns:
an output committer
Throws:
IOException
InterruptedException


Copyright © 2009 The Apache Software Foundation