|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,BytesRefArrayWritable>
org.apache.pig.piggybank.storage.hiverc.HiveRCInputFormat
public class HiveRCInputFormat
HiveRCInputFormat used by HiveColumnarLoader as the InputFormat;
Reasons for implementing a new InputFormat sub class:
Constructor Summary | |
---|---|
HiveRCInputFormat()
No date partitioning is applied |
|
HiveRCInputFormat(String dateRange)
Date partitioning will be applied to the input path. The path must be partitioned as input-path/daydate=yyyy-MM-dd. |
Method Summary | |
---|---|
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,BytesRefArrayWritable> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext ctx)
Initialises an instance of HiveRCRecordReader. |
protected long |
getFormatMinSplitSize()
The input split size should never be smaller than the RCFile.SYNC_INTERVAL |
protected List<org.apache.hadoop.fs.FileStatus> |
listStatus(org.apache.hadoop.mapreduce.JobContext ctx)
This method is called by the FileInputFormat to find the input paths for which splits should be calculated. If applyDateRanges == true: Then the HiveRCDateSplitter is used to apply filtering on the input files. Else the default FileInputFormat listStatus method is used. |
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat |
---|
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public HiveRCInputFormat()
public HiveRCInputFormat(String dateRange)
dateRange
- Must have format yyyy-MM-dd:yyyy-MM-dd with the left most being the start of the range.Method Detail |
---|
public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,BytesRefArrayWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext ctx) throws IOException, InterruptedException
createRecordReader
in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,BytesRefArrayWritable>
IOException
InterruptedException
protected List<org.apache.hadoop.fs.FileStatus> listStatus(org.apache.hadoop.mapreduce.JobContext ctx) throws IOException
listStatus
in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,BytesRefArrayWritable>
IOException
protected long getFormatMinSplitSize()
getFormatMinSplitSize
in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,BytesRefArrayWritable>
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |