|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
TestClassifier | Test the Naive Bayes classifier with improved weighting To run the twenty newsgroups example: refer http://cwiki.apache.org/MAHOUT/twentynewsgroups.html |
TrainClassifier | Train the Naive Bayes classifier with improved weighting To run the twenty newsgroups example: refer http://cwiki.apache.org/MAHOUT/twentynewsgroups.html |
The implementation is divided up into three parts:
The trainer is manifested in several classes:
org.apache.mahout.classifier.bayes.BayesDriver
-- Creates the Hadoop Naive Bayes job and outputs
the model. This Driver encapsulates a lot of intermediate Map-Reduce Classes
org.apache.mahout.classifier.bayes.common.BayesFeatureDriver
org.apache.mahout.classifier.bayes.common.BayesTfIdfDriver
org.apache.mahout.classifier.bayes.common.BayesWeightSummerDriver
org.apache.mahout.classifier.bayes.BayesThetaNormalizerDriver
KeyValueTextInputFormat
, i.e.
the first token of the line
is the label and separated from the remaining tokens on the line by a tab-delimiter. The remaining tokens are the
unique features (words). Thus, input documents might look
like:
hockey puck stick goalie forward defenseman referee ice checking slapshot helmet football field football pigskin referee helmet turf tacklewhere hockey and football are the labels and the remaining words are the features associated with those particular labels.
The output from the trainer is a SequenceFile
.
The org.apache.mahout.classifier.bayes.BayesModel
is the data structure used to represent the results of
the training
for use by the org.apache.mahout.classifier.bayes.BayesClassifier
. A Model can be created by hand, or, if
using
the org.apache.mahout.classifier.bayes.BayesDriver
, it can be created from the SequenceFile
that is output. To create it from the SequenceFile, use the
SequenceFileModelReader
located in the io subpackage.
The org.apache.mahout.classifier.bayes.BayesClassifier
is responsible for using a org.apache.mahout.classifier.bayes.BayesModel
to classify
documents into categories.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |