org.apache.hadoop.mapred
Interface InputFormat

All Known Implementing Classes:
InputFormatBase

public interface InputFormat

An input data format. Input files are stored in a FileSystem. The processing of an input file may be split across multiple machines. Files are processed as sequences of records, implementing RecordReader. Files must thus be split on record boundaries.


Method Summary
 RecordReader getRecordReader(FileSystem fs, FileSplit split, JobConf job, Reporter reporter)
          Construct a RecordReader for a FileSplit.
 FileSplit[] getSplits(FileSystem fs, JobConf job, int numSplits)
          Splits a set of input files.
 

Method Detail

getSplits

public FileSplit[] getSplits(FileSystem fs,
                             JobConf job,
                             int numSplits)
                      throws IOException
Splits a set of input files. One split is created per map task.

Parameters:
fs - the filesystem containing the files to be split
job - the job whose input files are to be split
numSplits - the desired number of splits
Returns:
the splits
Throws:
IOException

getRecordReader

public RecordReader getRecordReader(FileSystem fs,
                                    FileSplit split,
                                    JobConf job,
                                    Reporter reporter)
                             throws IOException
Construct a RecordReader for a FileSplit.

Parameters:
fs - the FileSystem
split - the FileSplit
job - the job that this split belongs to
Returns:
a RecordReader
Throws:
IOException


Copyright © 2006 The Apache Software Foundation