org.apache.mahout.df.mapred.partial
Class PartialBuilder

java.lang.Object
  extended by org.apache.mahout.df.mapred.Builder
      extended by org.apache.mahout.df.mapred.partial.PartialBuilder

public class PartialBuilder
extends Builder

Builds a random forest using partial data. Each mapper uses only the data given by its InputSplit


Constructor Summary
PartialBuilder(TreeBuilder treeBuilder, org.apache.hadoop.fs.Path dataPath, org.apache.hadoop.fs.Path datasetPath, java.lang.Long seed)
           
PartialBuilder(TreeBuilder treeBuilder, org.apache.hadoop.fs.Path dataPath, org.apache.hadoop.fs.Path datasetPath, java.lang.Long seed, org.apache.hadoop.conf.Configuration conf)
           
 
Method Summary
protected  void configureJob(org.apache.hadoop.mapred.JobConf job, int nbTrees, boolean oobEstimate)
          Used by the inheriting classes to configure the job
protected static boolean isStep2(org.apache.hadoop.conf.Configuration conf)
          Indicates if we should run the second step of the builder.
This parameter is only meant for debuging, so we keep it protected.
protected  DecisionForest parseOutput(org.apache.hadoop.mapred.JobConf job, PredictionCallback callback)
          Parse the output files to extract the trees and pass the predictions to the callback
protected static void processOutput(org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.fs.Path outputPath, int[] firstIds, TreeID[] keys, Node[] trees, PredictionCallback callback)
          Processes the output from the output path.
protected static void setStep2(org.apache.hadoop.conf.Configuration conf, boolean value)
          Should run the second step of the builder ?
 
Methods inherited from class org.apache.mahout.df.mapred.Builder
build, getConf, getDataPath, getDatasetPath, getDistributedCacheFile, getNbTrees, getOutputPath, getRandomSeed, getSeed, getTreeBuilder, getTreeBuilder, isOobEstimate, isOutput, loadDataset, runJob, setNbTrees, setOutputDirName, sortSplits
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PartialBuilder

public PartialBuilder(TreeBuilder treeBuilder,
                      org.apache.hadoop.fs.Path dataPath,
                      org.apache.hadoop.fs.Path datasetPath,
                      java.lang.Long seed)

PartialBuilder

public PartialBuilder(TreeBuilder treeBuilder,
                      org.apache.hadoop.fs.Path dataPath,
                      org.apache.hadoop.fs.Path datasetPath,
                      java.lang.Long seed,
                      org.apache.hadoop.conf.Configuration conf)
Method Detail

isStep2

protected static boolean isStep2(org.apache.hadoop.conf.Configuration conf)
Indicates if we should run the second step of the builder.
This parameter is only meant for debuging, so we keep it protected.

Parameters:
conf -
Returns:

setStep2

protected static void setStep2(org.apache.hadoop.conf.Configuration conf,
                               boolean value)
Should run the second step of the builder ?

Parameters:
conf -
value - true to indicate that the second step will be launched

configureJob

protected void configureJob(org.apache.hadoop.mapred.JobConf job,
                            int nbTrees,
                            boolean oobEstimate)
                     throws java.io.IOException
Description copied from class: Builder
Used by the inheriting classes to configure the job

Specified by:
configureJob in class Builder
nbTrees - number of trees to grow
oobEstimate - true, if oob error should be estimated
Throws:
java.io.IOException

parseOutput

protected DecisionForest parseOutput(org.apache.hadoop.mapred.JobConf job,
                                     PredictionCallback callback)
                              throws java.io.IOException
Description copied from class: Builder
Parse the output files to extract the trees and pass the predictions to the callback

Specified by:
parseOutput in class Builder
callback - can be null
Returns:
Throws:
java.io.IOException

processOutput

protected static void processOutput(org.apache.hadoop.mapred.JobConf job,
                                    org.apache.hadoop.fs.Path outputPath,
                                    int[] firstIds,
                                    TreeID[] keys,
                                    Node[] trees,
                                    PredictionCallback callback)
                             throws java.io.IOException
Processes the output from the output path.

Parameters:
job -
outputPath - directory that contains the output of the job
firstIds - partitions' first ids in hadoop's order
keys -
callback - can be null
Throws:
java.io.IOException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.