org.apache.mahout.classifier.bayes.mapreduce.common
Class BayesFeatureMapper
java.lang.Object
org.apache.hadoop.mapred.MapReduceBase
org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureMapper
- All Implemented Interfaces:
- java.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringTuple,org.apache.hadoop.io.DoubleWritable>
public class BayesFeatureMapper
- extends org.apache.hadoop.mapred.MapReduceBase
- implements org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringTuple,org.apache.hadoop.io.DoubleWritable>
Reads the input train set(preprocessed using the BayesFileFormatter
).
Method Summary |
void |
configure(org.apache.hadoop.mapred.JobConf job)
|
void |
map(org.apache.hadoop.io.Text key,
org.apache.hadoop.io.Text value,
org.apache.hadoop.mapred.OutputCollector<StringTuple,org.apache.hadoop.io.DoubleWritable> output,
org.apache.hadoop.mapred.Reporter reporter)
We need to count the number of times we've seen a term with a given label and we need to output that. |
Methods inherited from class org.apache.hadoop.mapred.MapReduceBase |
close |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface java.io.Closeable |
close |
BayesFeatureMapper
public BayesFeatureMapper()
map
public void map(org.apache.hadoop.io.Text key,
org.apache.hadoop.io.Text value,
org.apache.hadoop.mapred.OutputCollector<StringTuple,org.apache.hadoop.io.DoubleWritable> output,
org.apache.hadoop.mapred.Reporter reporter)
throws java.io.IOException
- We need to count the number of times we've seen a term with a given label and we need to output that. But
this Mapper does more than just outputing the count. It first does weight normalisation. Secondly, it
outputs for each unique word in a document value 1 for summing up as the Term Document Frequency. Which
later is used to calculate the Idf Thirdly, it outputs for each label the number of times a document was
seen(Also used in Idf Calculation)
- Specified by:
map
in interface org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringTuple,org.apache.hadoop.io.DoubleWritable>
- Parameters:
key
- The labelvalue
- the features (all unique) associated w/ this label in stringtuple formatoutput
- The OutputCollector to write the results toreporter
- Not used
- Throws:
java.io.IOException
configure
public void configure(org.apache.hadoop.mapred.JobConf job)
- Specified by:
configure
in interface org.apache.hadoop.mapred.JobConfigurable
- Overrides:
configure
in class org.apache.hadoop.mapred.MapReduceBase
Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.