org.apache.mahout.utils.nlp.collocations.llr
Class CollocReducer
java.lang.Object
org.apache.hadoop.mapred.MapReduceBase
org.apache.mahout.utils.nlp.collocations.llr.CollocReducer
- All Implemented Interfaces:
- java.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Reducer<GramKey,Gram,Gram,Gram>
public class CollocReducer
- extends org.apache.hadoop.mapred.MapReduceBase
- implements org.apache.hadoop.mapred.Reducer<GramKey,Gram,Gram,Gram>
Reducer for Pass 1 of the collocation identification job. Generates counts for ngrams and subgrams.
Method Summary |
void |
configure(org.apache.hadoop.mapred.JobConf job)
|
protected void |
processSubgram(GramKey key,
java.util.Iterator<Gram> values,
org.apache.hadoop.mapred.OutputCollector<Gram,Gram> output,
org.apache.hadoop.mapred.Reporter reporter)
Sum frequencies for subgram, ngrams and deliver ngram, subgram pairs to the collector. |
protected void |
processUnigram(GramKey key,
java.util.Iterator<Gram> values,
org.apache.hadoop.mapred.OutputCollector<Gram,Gram> output,
org.apache.hadoop.mapred.Reporter reporter)
Sum frequencies for unigrams and deliver to the collector |
void |
reduce(GramKey key,
java.util.Iterator<Gram> values,
org.apache.hadoop.mapred.OutputCollector<Gram,Gram> output,
org.apache.hadoop.mapred.Reporter reporter)
collocation finder: pass 1 reduce phase:
given input from the mapper, |
Methods inherited from class org.apache.hadoop.mapred.MapReduceBase |
close |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface java.io.Closeable |
close |
MIN_SUPPORT
public static final java.lang.String MIN_SUPPORT
- See Also:
- Constant Field Values
DEFAULT_MIN_SUPPORT
public static final int DEFAULT_MIN_SUPPORT
- See Also:
- Constant Field Values
CollocReducer
public CollocReducer()
configure
public void configure(org.apache.hadoop.mapred.JobConf job)
- Specified by:
configure
in interface org.apache.hadoop.mapred.JobConfigurable
- Overrides:
configure
in class org.apache.hadoop.mapred.MapReduceBase
reduce
public void reduce(GramKey key,
java.util.Iterator<Gram> values,
org.apache.hadoop.mapred.OutputCollector<Gram,Gram> output,
org.apache.hadoop.mapred.Reporter reporter)
throws java.io.IOException
- collocation finder: pass 1 reduce phase:
given input from the mapper,
k:head_subgram,ngram, v:ngram:partial freq
k:head_subgram v:head_subgram:partial freq
k:tail_subgram,ngram, v:ngram:partial freq
k:tail_subgram v:tail_subgram:partial freq
k:unigram v:unigram:partial freq
sum gram frequencies and output for llr calculation
output is:
k:ngram:ngramfreq v:head_subgram:head_subgramfreq
k:ngram:ngramfreq v:tail_subgram:tail_subgramfreq
k:unigram:unigramfreq v:unigram:unigramfreq
Each ngram's frequency is essentially counted twice, once for head, once for tail.
frequency should be the same for the head and tail. Fix this to count only for the
head and move the count into the value?
- Specified by:
reduce
in interface org.apache.hadoop.mapred.Reducer<GramKey,Gram,Gram,Gram>
- Throws:
java.io.IOException
processUnigram
protected void processUnigram(GramKey key,
java.util.Iterator<Gram> values,
org.apache.hadoop.mapred.OutputCollector<Gram,Gram> output,
org.apache.hadoop.mapred.Reporter reporter)
throws java.io.IOException
- Sum frequencies for unigrams and deliver to the collector
- Throws:
java.io.IOException
processSubgram
protected void processSubgram(GramKey key,
java.util.Iterator<Gram> values,
org.apache.hadoop.mapred.OutputCollector<Gram,Gram> output,
org.apache.hadoop.mapred.Reporter reporter)
throws java.io.IOException
- Sum frequencies for subgram, ngrams and deliver ngram, subgram pairs to the collector.
Sort order guarantees that the subgram/subgram pairs will be seen first and then
subgram/ngram1 pairs, subgram/ngram2 pairs ... subgram/ngramN pairs, so frequencies for
ngrams can be calcualted here as well.
We end up calculating frequencies for ngrams for each sugram (head, tail) here, which is
some extra work.
- Throws:
java.io.IOException
Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.