org.apache.hadoop.mapred
Interface InputFormat

All Known Implementing Classes:
InputFormatBase, SequenceFileInputFormat, TextInputFormat

public interface InputFormat

An input data format. Input files are stored in a FileSystem. The processing of an input file may be split across multiple machines. Files are processed as sequences of records, implementing RecordReader. Files must thus be split on record boundaries.


Method Summary
 boolean[] areValidInputDirectories(FileSystem fileSys, Path[] inputDirs)
          Are the input directories valid? This method is used to test the input directories when a job is submitted so that the framework can fail early with a useful error message when the input directory does not exist.
 RecordReader getRecordReader(FileSystem fs, FileSplit split, JobConf job, Reporter reporter)
          Construct a RecordReader for a FileSplit.
 FileSplit[] getSplits(FileSystem fs, JobConf job, int numSplits)
          Splits a set of input files.
 

Method Detail

areValidInputDirectories

boolean[] areValidInputDirectories(FileSystem fileSys,
                                   Path[] inputDirs)
                                   throws IOException
Are the input directories valid? This method is used to test the input directories when a job is submitted so that the framework can fail early with a useful error message when the input directory does not exist.

Parameters:
fileSys - the file system to check for the directories
inputDirs - the list of input directories
Returns:
is each inputDir valid?
Throws:
IOException

getSplits

FileSplit[] getSplits(FileSystem fs,
                      JobConf job,
                      int numSplits)
                      throws IOException
Splits a set of input files. One split is created per map task.

Parameters:
fs - the filesystem containing the files to be split
job - the job whose input files are to be split
numSplits - the desired number of splits
Returns:
the splits
Throws:
IOException

getRecordReader

RecordReader getRecordReader(FileSystem fs,
                             FileSplit split,
                             JobConf job,
                             Reporter reporter)
                             throws IOException
Construct a RecordReader for a FileSplit.

Parameters:
fs - the FileSystem
split - the FileSplit
job - the job that this split belongs to
Returns:
a RecordReader
Throws:
IOException


Copyright © 2006 The Apache Software Foundation