Package net.nutch.mapReduce

A system for scalable, fault-tolerant, distributed computation over large data collections.

See:
          Description

Interface Summary
InputFormat An input data format.
InputFormat.Split A section of an input file.
Mapper Maps input key/value pairs to a set of intermediate key/value pairs.
OutputCollector Passed to Mapper and Reducer implementations to collect output data.
OutputFormat An output data format.
Partitioner Partitions the key space.
RecordReader Reads key/value pairs from an input file InputFormat.Split.
RecordWriter Writes key/value pairs to an output file.
Reducer Reduces a set of intermediate values which share a key to a smaller set of values.
 

Class Summary
DefaultMapper The default Mapper.
DefaultPartitioner The default Partitioner.
DefaultReducer The default Reducer.
FileSplit An InputFormat.Split implementation for sections of files.
MapReduceJob Specifies a map/reduce job.
TextInputFormat An InputFormat for plain text files.
 

Package net.nutch.mapReduce Description

A system for scalable, fault-tolerant, distributed computation over large data collections.

Applications implement Mapper and Reducer interfaces. These are submitted as a MapReduceJob and are applied to data stored in a NutchFileSystem.

See Google's original Map/Reduce paper for background information.



Copyright © 2005 The Nutch Organization.