Package org.apache.mahout.df.mapreduce.partial

Partial-data mapreduce implementation of Random Decision Forests
 
The builder splits the data, using a FileInputSplit, among the mappers.

See:
          Description

Class Summary
InterResults Stores/Loads the intermediate results of step1 needed by step2.
This class should not be needed outside of the partial package, so all its methods are protected.
PartialBuilder Builds a random forest using partial data.
Step0Job preparation step of the partial mapreduce builder.
Step0Job.Step0Output Output of the step0's mappers
Step1Mapper First step of the Partial Data Builder.
Step2Job 2nd step of the partial mapreduce builder.
Step2Mapper Second step of PartialBuilder.
TreeID Indicates both the tree and the data partition used to grow the tree
 

Package org.apache.mahout.df.mapreduce.partial Description

Partial-data mapreduce implementation of Random Decision Forests
 
The builder splits the data, using a FileInputSplit, among the mappers. Building the forest and estimating the oob error takes two job steps.

In the first step, each mapper is responsible for growing a number of trees with its partition's, loading the data instances in its map() function, then building the trees in the close() method. It uses the reference implementation's code to build each tree and estimate the oob error.

The second step is needed when estimating the oob error. Each mapper loads all the trees that does not belong to its own partition (were not built using the partition's data) and uses them to classify the partition's data instances. The data instances are loaded in the map() method and the classification is performed in the close() method.

 
Copyright © 2009 Apache Software Foundation - Mahout



Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.