The builder splits the data, using a
FileInputSplit, among the mappers.
See:
Description
Class Summary |
InterResults |
Stores/Loads the intermediate results of step1 needed by step2.
This class should not be needed outside of the partial package, so all its methods are protected.
|
PartialBuilder |
Builds a random forest using partial data. |
Step0Job |
preparation step of the partial mapreduce builder. |
Step0Job.Step0Mapper |
Outputs the first key and the size of the partition |
Step0Job.Step0Output |
Output of the step0's mappers |
Step1Mapper |
First step of the Partial Data Builder. |
Step2Job |
2nd step of the partial mapreduce builder. |
Step2Mapper |
Second step of PartialBuilder. |
TreeID |
Indicates both the tree and the data partition used to grow the tree |
Package org.apache.mahout.df.mapreduce.partial Description
Partial-data mapreduce implementation of Random Decision Forests
The builder splits the data, using a FileInputSplit, among the mappers. Building the forest and estimating the oob error takes two job steps.
In the first step, each mapper is responsible for growing a number of trees with its partition's, loading the data instances in its map() function, then building the trees in the close() method. It uses the reference implementation's code to build each tree and estimate the oob error.
The second step is needed when estimating the oob error. Each mapper loads all the trees that does not belong to its own partition (were not built using the partition's data) and uses them to classify the partition's data instances. The data instances are loaded in the map() method and the classification is performed in the close() method.
Copyright © 2008-2010
The Apache Software Foundation. All Rights Reserved.