org.apache.hadoop.examples
Class RandomWriter

java.lang.Object
  extended byorg.apache.hadoop.mapred.MapReduceBase
      extended byorg.apache.hadoop.examples.RandomWriter
All Implemented Interfaces:
Closeable, JobConfigurable, Reducer

public class RandomWriter
extends MapReduceBase
implements Reducer

This program uses map/reduce to just run a distributed job where there is no interaction between the tasks and each task write a large unsorted random binary sequence file of BytesWritable.

Author:
Owen O'Malley

Nested Class Summary
static class RandomWriter.Map
           
 
Constructor Summary
RandomWriter()
           
 
Method Summary
static void main(String[] args)
          This is the main routine for launching a distributed random write job.
 void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter)
          Combines values for a given key.
 
Methods inherited from class org.apache.hadoop.mapred.MapReduceBase
close, configure
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.mapred.JobConfigurable
configure
 
Methods inherited from interface org.apache.hadoop.io.Closeable
close
 

Constructor Detail

RandomWriter

public RandomWriter()
Method Detail

reduce

public void reduce(WritableComparable key,
                   Iterator values,
                   OutputCollector output,
                   Reporter reporter)
            throws IOException
Description copied from interface: Reducer
Combines values for a given key. Output values must be of the same type as input values. Input keys must not be altered. Typically all values are combined into zero or one value. Output pairs are collected with calls to OutputCollector.collect(WritableComparable,Writable).

Specified by:
reduce in interface Reducer
Parameters:
key - the key
values - the values to combine
output - to collect combined values
Throws:
IOException

main

public static void main(String[] args)
                 throws IOException
This is the main routine for launching a distributed random write job. It runs 10 maps/node and each node writes 1 gig of data to a DFS file. The reduce doesn't do anything. This program uses a useful pattern for dealing with Hadoop's constraints on InputSplits. Since each input split can only consist of a file and byte range and we want to control how many maps there are (and we don't really have any inputs), we create a directory with a set of artificial files that each contain the filename that we want a given map to write to. Then, using the text line reader and this "fake" input directory, we generate exactly the right number of maps. Each map gets a single record that is the filename it is supposed to write its output to.

Throws:
IOException


Copyright © 2006 The Apache Software Foundation