org.apache.mahout.df.mapred.partial
Class Step0Job

java.lang.Object
  extended by org.apache.mahout.df.mapred.partial.Step0Job

public class Step0Job
extends java.lang.Object

preparation step of the partial mapreduce builder. Computes some stats that will be used by the builder.


Nested Class Summary
static class Step0Job.Step0Output
          Output of the step0's mappers
 
Constructor Summary
Step0Job(org.apache.hadoop.fs.Path base, org.apache.hadoop.fs.Path dataPath, org.apache.hadoop.fs.Path datasetPath)
           
 
Method Summary
protected  Step0Job.Step0Output[] parseOutput(org.apache.hadoop.mapred.JobConf job)
          Extracts the output and processes it
protected static Step0Job.Step0Output[] processOutput(int[] keys, Step0Job.Step0Output[] values)
          Replaces the first id for each partition in Hadoop's order
 Step0Job.Step0Output[] run(org.apache.hadoop.conf.Configuration conf)
          Computes the partitions' first ids in Hadoop's order
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Step0Job

public Step0Job(org.apache.hadoop.fs.Path base,
                org.apache.hadoop.fs.Path dataPath,
                org.apache.hadoop.fs.Path datasetPath)
Parameters:
base - base directory
dataPath - data used in the first step
datasetPath -
Method Detail

run

public Step0Job.Step0Output[] run(org.apache.hadoop.conf.Configuration conf)
                           throws java.io.IOException
Computes the partitions' first ids in Hadoop's order

Parameters:
conf - configuration
Returns:
first ids for all the partitions
Throws:
java.io.IOException

parseOutput

protected Step0Job.Step0Output[] parseOutput(org.apache.hadoop.mapred.JobConf job)
                                      throws java.io.IOException
Extracts the output and processes it

Parameters:
job -
Returns:
firstIds for each partition in Hadoop's order
Throws:
java.io.IOException

processOutput

protected static Step0Job.Step0Output[] processOutput(int[] keys,
                                                      Step0Job.Step0Output[] values)
Replaces the first id for each partition in Hadoop's order

Parameters:
keys -
values -
Returns:


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.