org.apache.mahout.classifier.bayes.mapreduce.common
Class BayesFeatureMapper

java.lang.Object
  extended by org.apache.hadoop.mapred.MapReduceBase
      extended by org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureMapper
All Implemented Interfaces:
Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringTuple,org.apache.hadoop.io.DoubleWritable>

public class BayesFeatureMapper
extends org.apache.hadoop.mapred.MapReduceBase
implements org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringTuple,org.apache.hadoop.io.DoubleWritable>

Reads the input train set(preprocessed using the BayesFileFormatter).


Constructor Summary
BayesFeatureMapper()
           
 
Method Summary
 void configure(org.apache.hadoop.mapred.JobConf job)
           
 void map(org.apache.hadoop.io.Text key, org.apache.hadoop.io.Text value, org.apache.hadoop.mapred.OutputCollector<StringTuple,org.apache.hadoop.io.DoubleWritable> output, org.apache.hadoop.mapred.Reporter reporter)
          We need to count the number of times we've seen a term with a given label and we need to output that.
 
Methods inherited from class org.apache.hadoop.mapred.MapReduceBase
close
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface java.io.Closeable
close
 

Constructor Detail

BayesFeatureMapper

public BayesFeatureMapper()
Method Detail

map

public void map(org.apache.hadoop.io.Text key,
                org.apache.hadoop.io.Text value,
                org.apache.hadoop.mapred.OutputCollector<StringTuple,org.apache.hadoop.io.DoubleWritable> output,
                org.apache.hadoop.mapred.Reporter reporter)
         throws IOException
We need to count the number of times we've seen a term with a given label and we need to output that. But this Mapper does more than just outputing the count. It first does weight normalisation. Secondly, it outputs for each unique word in a document value 1 for summing up as the Term Document Frequency. Which later is used to calculate the Idf Thirdly, it outputs for each label the number of times a document was seen(Also used in Idf Calculation)

Specified by:
map in interface org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,StringTuple,org.apache.hadoop.io.DoubleWritable>
Parameters:
key - The label
value - the features (all unique) associated w/ this label in stringtuple format
output - The OutputCollector to write the results to
reporter - Not used
Throws:
IOException

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable
Overrides:
configure in class org.apache.hadoop.mapred.MapReduceBase


Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.