org.apache.hadoop.mapred
Class InputFormatBase

java.lang.Object
  extended by org.apache.hadoop.mapred.InputFormatBase
All Implemented Interfaces:
InputFormat
Direct Known Subclasses:
SequenceFileInputFormat, TextInputFormat

public abstract class InputFormatBase
extends Object
implements InputFormat

A base class for InputFormat.


Field Summary
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
InputFormatBase()
           
 
Method Summary
 boolean[] areValidInputDirectories(FileSystem fileSys, Path[] inputDirs)
          Are the input directories valid? This method is used to test the input directories when a job is submitted so that the framework can fail early with a useful error message when the input directory does not exist.
abstract  RecordReader getRecordReader(FileSystem fs, FileSplit split, JobConf job, Reporter reporter)
          Construct a RecordReader for a FileSplit.
 FileSplit[] getSplits(FileSystem fs, JobConf job, int numSplits)
          Splits files returned by listPaths(FileSystem,JobConf) when they're too big.
protected  boolean isSplitable(FileSystem fs, Path filename)
          Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be.
protected  File[] listFiles(FileSystem fs, JobConf job)
          Deprecated. Call listFiles(FileSystem,JobConf) instead.
protected  Path[] listPaths(FileSystem fs, JobConf job)
          List input directories.
protected  void setMinSplitSize(long minSplitSize)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

InputFormatBase

public InputFormatBase()
Method Detail

setMinSplitSize

protected void setMinSplitSize(long minSplitSize)

isSplitable

protected boolean isSplitable(FileSystem fs,
                              Path filename)
Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be.

Parameters:
fs - the file system that the file is on
filename - the file name to check
Returns:
is this file splitable?

getRecordReader

public abstract RecordReader getRecordReader(FileSystem fs,
                                             FileSplit split,
                                             JobConf job,
                                             Reporter reporter)
                                      throws IOException
Description copied from interface: InputFormat
Construct a RecordReader for a FileSplit.

Specified by:
getRecordReader in interface InputFormat
Parameters:
fs - the FileSystem
split - the FileSplit
job - the job that this split belongs to
Returns:
a RecordReader
Throws:
IOException

listFiles

protected File[] listFiles(FileSystem fs,
                           JobConf job)
                    throws IOException
Deprecated. Call listFiles(FileSystem,JobConf) instead.

Throws:
IOException

listPaths

protected Path[] listPaths(FileSystem fs,
                           JobConf job)
                    throws IOException
List input directories. Subclasses may override to, e.g., select only files matching a regular expression.

Property mapred.input.subdir, if set, names a subdirectory that is appended to all input dirs specified by job, and if the given fs lists those too, each is added to the returned array of Path.

Parameters:
fs -
job -
Returns:
array of Path objects, never zero length.
Throws:
IOException - if zero items.

areValidInputDirectories

public boolean[] areValidInputDirectories(FileSystem fileSys,
                                          Path[] inputDirs)
                                   throws IOException
Description copied from interface: InputFormat
Are the input directories valid? This method is used to test the input directories when a job is submitted so that the framework can fail early with a useful error message when the input directory does not exist.

Specified by:
areValidInputDirectories in interface InputFormat
Parameters:
fileSys - the file system to check for the directories
inputDirs - the list of input directories
Returns:
is each inputDir valid?
Throws:
IOException

getSplits

public FileSplit[] getSplits(FileSystem fs,
                             JobConf job,
                             int numSplits)
                      throws IOException
Splits files returned by listPaths(FileSystem,JobConf) when they're too big.

Specified by:
getSplits in interface InputFormat
Parameters:
fs - the filesystem containing the files to be split
job - the job whose input files are to be split
numSplits - the desired number of splits
Returns:
the splits
Throws:
IOException


Copyright © 2006 The Apache Software Foundation