org.apache.hadoop.hive.ql.io
Class HiveFileFormatUtils

java.lang.Object
  extended by org.apache.hadoop.hive.ql.io.HiveFileFormatUtils

public class HiveFileFormatUtils
extends Object

An util class for various Hive file format tasks. registerOutputFormatSubstitute(Class, Class) getOutputFormatSubstitute(Class) are added for backward compatibility. They return the newly added HiveOutputFormat for the older ones.


Field Summary
static String READ_COLUMN_IDS_CONF_STR
           
 
Constructor Summary
HiveFileFormatUtils()
           
 
Method Summary
static boolean checkInputFormat(org.apache.hadoop.fs.FileSystem fs, HiveConf conf, Class<? extends org.apache.hadoop.mapred.InputFormat> inputFormatCls, ArrayList<org.apache.hadoop.fs.FileStatus> files)
          checks if files are in same format as the given input format
static Class<? extends InputFormatChecker> getInputFormatChecker(Class<?> inputFormat)
          get an InputFormatChecker for a file format.
static org.apache.hadoop.fs.Path getOutputFormatFinalPath(org.apache.hadoop.fs.Path parent, org.apache.hadoop.mapred.JobConf jc, HiveOutputFormat<?,?> hiveOutputFormat, boolean isCompressed, org.apache.hadoop.fs.Path defaultFinalPath)
          get the final output path of a given FileOutputFormat.
static Class<? extends HiveOutputFormat> getOutputFormatSubstitute(Class<?> origin)
          get a OutputFormat's substitute HiveOutputFormat
static ArrayList<Integer> getReadColumnIDs(org.apache.hadoop.conf.Configuration conf)
          Returns an array of column ids(start from zero) which is set in the given parameter conf.
static void registerInputFormatChecker(Class<? extends org.apache.hadoop.mapred.InputFormat> format, Class<? extends InputFormatChecker> checker)
          register an InputFormatChecker for a given InputFormat
static void registerOutputFormatSubstitute(Class<? extends org.apache.hadoop.mapred.OutputFormat> origin, Class<? extends HiveOutputFormat> substitute)
          register a substitute
static void setFullyReadColumns(org.apache.hadoop.conf.Configuration conf)
          Clears the read column ids set in the conf, and will read all columns.
static void setReadColumnIDs(org.apache.hadoop.conf.Configuration conf, ArrayList<Integer> ids)
          Sets read columns' ids(start from zero) for RCFile's Reader.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

READ_COLUMN_IDS_CONF_STR

public static String READ_COLUMN_IDS_CONF_STR
Constructor Detail

HiveFileFormatUtils

public HiveFileFormatUtils()
Method Detail

registerOutputFormatSubstitute

public static void registerOutputFormatSubstitute(Class<? extends org.apache.hadoop.mapred.OutputFormat> origin,
                                                  Class<? extends HiveOutputFormat> substitute)
register a substitute

Parameters:
origin - the class that need to be substituted
substitute -

getOutputFormatSubstitute

public static Class<? extends HiveOutputFormat> getOutputFormatSubstitute(Class<?> origin)
get a OutputFormat's substitute HiveOutputFormat


getOutputFormatFinalPath

public static org.apache.hadoop.fs.Path getOutputFormatFinalPath(org.apache.hadoop.fs.Path parent,
                                                                 org.apache.hadoop.mapred.JobConf jc,
                                                                 HiveOutputFormat<?,?> hiveOutputFormat,
                                                                 boolean isCompressed,
                                                                 org.apache.hadoop.fs.Path defaultFinalPath)
                                                          throws IOException
get the final output path of a given FileOutputFormat.

Parameters:
parent - parent dir of the expected final output path
jc - job configuration
Throws:
IOException

registerInputFormatChecker

public static void registerInputFormatChecker(Class<? extends org.apache.hadoop.mapred.InputFormat> format,
                                              Class<? extends InputFormatChecker> checker)
register an InputFormatChecker for a given InputFormat

Parameters:
format - the class that need to be substituted
checker -

getInputFormatChecker

public static Class<? extends InputFormatChecker> getInputFormatChecker(Class<?> inputFormat)
get an InputFormatChecker for a file format.


checkInputFormat

public static boolean checkInputFormat(org.apache.hadoop.fs.FileSystem fs,
                                       HiveConf conf,
                                       Class<? extends org.apache.hadoop.mapred.InputFormat> inputFormatCls,
                                       ArrayList<org.apache.hadoop.fs.FileStatus> files)
                                throws HiveException
checks if files are in same format as the given input format

Throws:
HiveException

setReadColumnIDs

public static void setReadColumnIDs(org.apache.hadoop.conf.Configuration conf,
                                    ArrayList<Integer> ids)
Sets read columns' ids(start from zero) for RCFile's Reader. Once a column is included in the list, RCFile's reader will not skip its value.


getReadColumnIDs

public static ArrayList<Integer> getReadColumnIDs(org.apache.hadoop.conf.Configuration conf)
Returns an array of column ids(start from zero) which is set in the given parameter conf.


setFullyReadColumns

public static void setFullyReadColumns(org.apache.hadoop.conf.Configuration conf)
Clears the read column ids set in the conf, and will read all columns.



Copyright © 2009 The Apache Software Foundation