org.apache.mahout.df.mapred
Class Builder

java.lang.Object
  extended by org.apache.mahout.df.mapred.Builder
Direct Known Subclasses:
InMemBuilder, PartialBuilder

public abstract class Builder
extends java.lang.Object

Base class for Mapred DecisionForest builders. Takes care of storing the parameters common to the mapred implementations.
The child classes must implement at least :


Constructor Summary
protected Builder(TreeBuilder treeBuilder, org.apache.hadoop.fs.Path dataPath, org.apache.hadoop.fs.Path datasetPath, java.lang.Long seed, org.apache.hadoop.conf.Configuration conf)
           
 
Method Summary
 DecisionForest build(int nbTrees, PredictionCallback callback)
           
protected abstract  void configureJob(org.apache.hadoop.mapred.JobConf conf, int nbTrees, boolean oobEstimate)
          Used by the inheriting classes to configure the job
protected  org.apache.hadoop.conf.Configuration getConf()
           
protected  org.apache.hadoop.fs.Path getDataPath()
           
protected  org.apache.hadoop.fs.Path getDatasetPath()
           
static org.apache.hadoop.fs.Path getDistributedCacheFile(org.apache.hadoop.conf.Configuration job, int index)
          Helper method.
static int getNbTrees(org.apache.hadoop.conf.Configuration conf)
          Get the number of trees for the map-reduce job.
 org.apache.hadoop.fs.Path getOutputPath(org.apache.hadoop.conf.Configuration conf)
          Output Directory name
static java.lang.Long getRandomSeed(org.apache.hadoop.conf.Configuration conf)
          Returns the random seed
protected  java.lang.Long getSeed()
           
protected  TreeBuilder getTreeBuilder()
           
static TreeBuilder getTreeBuilder(org.apache.hadoop.conf.Configuration conf)
           
protected static boolean isOobEstimate(org.apache.hadoop.conf.Configuration conf)
           
protected static boolean isOutput(org.apache.hadoop.conf.Configuration conf)
          Used only for DEBUG purposes.
static Dataset loadDataset(org.apache.hadoop.mapred.JobConf job)
          Helper method.
protected abstract  DecisionForest parseOutput(org.apache.hadoop.mapred.JobConf job, PredictionCallback callback)
          Parse the output files to extract the trees and pass the predictions to the callback
protected  void runJob(org.apache.hadoop.mapred.JobConf job)
          Sequential implementation should override this method to simulate the job execution
static void setNbTrees(org.apache.hadoop.conf.Configuration conf, int nbTrees)
          Set the number of trees to grow for the map-reduce job
 void setOutputDirName(java.lang.String name)
          Sets the Output directory name, will be creating in the working directory
static void sortSplits(org.apache.hadoop.mapred.InputSplit[] splits)
          sort the splits into order based on size, so that the biggest go first.
This is the same code used by Hadoop's JobClient.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Builder

protected Builder(TreeBuilder treeBuilder,
                  org.apache.hadoop.fs.Path dataPath,
                  org.apache.hadoop.fs.Path datasetPath,
                  java.lang.Long seed,
                  org.apache.hadoop.conf.Configuration conf)
Method Detail

getTreeBuilder

protected TreeBuilder getTreeBuilder()

getDataPath

protected org.apache.hadoop.fs.Path getDataPath()

getDatasetPath

protected org.apache.hadoop.fs.Path getDatasetPath()

getSeed

protected java.lang.Long getSeed()

getConf

protected org.apache.hadoop.conf.Configuration getConf()

isOutput

protected static boolean isOutput(org.apache.hadoop.conf.Configuration conf)
Used only for DEBUG purposes. if false, the mappers doesn't output anything, so the builder has nothing to process

Parameters:
conf -
Returns:

isOobEstimate

protected static boolean isOobEstimate(org.apache.hadoop.conf.Configuration conf)

getRandomSeed

public static java.lang.Long getRandomSeed(org.apache.hadoop.conf.Configuration conf)
Returns the random seed

Parameters:
conf -
Returns:
null if no seed is available

getTreeBuilder

public static TreeBuilder getTreeBuilder(org.apache.hadoop.conf.Configuration conf)

getNbTrees

public static int getNbTrees(org.apache.hadoop.conf.Configuration conf)
Get the number of trees for the map-reduce job. The default value is 100

Parameters:
conf -
Returns:

setNbTrees

public static void setNbTrees(org.apache.hadoop.conf.Configuration conf,
                              int nbTrees)
Set the number of trees to grow for the map-reduce job

Parameters:
conf -
nbTrees -
Throws:
java.lang.IllegalArgumentException - if (nbTrees <= 0)

setOutputDirName

public void setOutputDirName(java.lang.String name)
Sets the Output directory name, will be creating in the working directory

Parameters:
name -

getOutputPath

public org.apache.hadoop.fs.Path getOutputPath(org.apache.hadoop.conf.Configuration conf)
                                        throws java.io.IOException
Output Directory name

Parameters:
conf -
Returns:
Throws:
java.io.IOException

getDistributedCacheFile

public static org.apache.hadoop.fs.Path getDistributedCacheFile(org.apache.hadoop.conf.Configuration job,
                                                                int index)
                                                         throws java.io.IOException
Helper method. Get a path from the DistributedCache

Parameters:
job -
index - index of the path in the DistributedCache files
Returns:
Throws:
java.io.IOException

loadDataset

public static Dataset loadDataset(org.apache.hadoop.mapred.JobConf job)
                           throws java.io.IOException
Helper method. Load a Dataset stored in the DistributedCache

Parameters:
job -
Returns:
Throws:
java.io.IOException

configureJob

protected abstract void configureJob(org.apache.hadoop.mapred.JobConf conf,
                                     int nbTrees,
                                     boolean oobEstimate)
                              throws java.io.IOException
Used by the inheriting classes to configure the job

Parameters:
conf -
nbTrees - number of trees to grow
oobEstimate - true, if oob error should be estimated
Throws:
java.io.IOException

runJob

protected void runJob(org.apache.hadoop.mapred.JobConf job)
               throws java.io.IOException
Sequential implementation should override this method to simulate the job execution

Throws:
java.io.IOException

parseOutput

protected abstract DecisionForest parseOutput(org.apache.hadoop.mapred.JobConf job,
                                              PredictionCallback callback)
                                       throws java.io.IOException
Parse the output files to extract the trees and pass the predictions to the callback

Parameters:
job -
callback - can be null
Returns:
Throws:
java.io.IOException

build

public DecisionForest build(int nbTrees,
                            PredictionCallback callback)
                     throws java.io.IOException
Throws:
java.io.IOException

sortSplits

public static void sortSplits(org.apache.hadoop.mapred.InputSplit[] splits)
sort the splits into order based on size, so that the biggest go first.
This is the same code used by Hadoop's JobClient.

Parameters:
splits -


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.