org.apache.pig.backend.hadoop.executionengine.mapReduceLayer
Class PigInputFormat
java.lang.Object
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat
- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Tuple>, org.apache.hadoop.mapred.JobConfigurable
public class PigInputFormat
- extends Object
- implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Tuple>, org.apache.hadoop.mapred.JobConfigurable
Field Summary |
static org.apache.commons.logging.Log |
LOG
|
static org.apache.hadoop.mapred.JobConf |
sJob
|
Method Summary |
void |
configure(org.apache.hadoop.mapred.JobConf conf)
|
static SliceWrapper |
getActiveSplit()
|
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Tuple> |
getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
|
org.apache.hadoop.mapred.InputSplit[] |
getSplits(org.apache.hadoop.mapred.JobConf job,
int numSplits)
Creates input splits one per input and slices of it
per DFS block of the input file. |
protected boolean |
isSplitable(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path filename)
Is the given filename splitable? Usually, true, but if the file is stream
compressed, it will not be. |
protected org.apache.hadoop.fs.Path[] |
listPaths(org.apache.hadoop.mapred.JobConf job)
List input directories. |
void |
validateInput(org.apache.hadoop.mapred.JobConf job)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final org.apache.commons.logging.Log LOG
sJob
public static org.apache.hadoop.mapred.JobConf sJob
PigInputFormat
public PigInputFormat()
isSplitable
protected boolean isSplitable(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path filename)
- Is the given filename splitable? Usually, true, but if the file is stream
compressed, it will not be.
FileInputFormat
implementations can override this and
return false
to ensure that individual input files are
never split-up so that Mapper
s process entire files.
- Parameters:
fs
- the file system that the file is onfilename
- the file name to check
- Returns:
- is this file splitable?
listPaths
protected org.apache.hadoop.fs.Path[] listPaths(org.apache.hadoop.mapred.JobConf job)
throws IOException
- List input directories. Subclasses may override to, e.g., select only
files matching a regular expression.
- Parameters:
job
- the job to list input paths for
- Returns:
- array of Path objects
- Throws:
IOException
- if zero items.
validateInput
public void validateInput(org.apache.hadoop.mapred.JobConf job)
throws IOException
- Specified by:
validateInput
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Tuple>
- Throws:
IOException
getSplits
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job,
int numSplits)
throws IOException
- Creates input splits one per input and slices of it
per DFS block of the input file. Configures the PigSlice
and returns the list of PigSlices as an array
- Specified by:
getSplits
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Tuple>
- Throws:
IOException
getRecordReader
public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,Tuple> getRecordReader(org.apache.hadoop.mapred.InputSplit split,
org.apache.hadoop.mapred.JobConf job,
org.apache.hadoop.mapred.Reporter reporter)
throws IOException
- Specified by:
getRecordReader
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,Tuple>
- Throws:
IOException
configure
public void configure(org.apache.hadoop.mapred.JobConf conf)
- Specified by:
configure
in interface org.apache.hadoop.mapred.JobConfigurable
getActiveSplit
public static SliceWrapper getActiveSplit()
Copyright © ${year} The Apache Software Foundation