org.apache.mahout.utils.nlp.collocations.llr
Class LLRReducer
java.lang.Object
org.apache.hadoop.mapred.MapReduceBase
org.apache.mahout.utils.nlp.collocations.llr.LLRReducer
- All Implemented Interfaces:
- java.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>
public class LLRReducer
- extends org.apache.hadoop.mapred.MapReduceBase
- implements org.apache.hadoop.mapred.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>
Reducer for pass 2 of the collocation discovery job. Collects ngram and sub-ngram frequencies and performs
the Log-likelihood ratio calculation.
Nested Class Summary |
static class |
LLRReducer.ConcreteLLCallback
concrete implementation delegates to LogLikelihood class |
static interface |
LLRReducer.LLCallback
provide interface so the input to the llr calculation can be captured for validation in unit testing |
static class |
LLRReducer.Skipped
Counter to track why a particlar entry was skipped |
Method Summary |
void |
configure(org.apache.hadoop.mapred.JobConf job)
|
void |
reduce(Gram ngram,
java.util.Iterator<Gram> values,
org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable> output,
org.apache.hadoop.mapred.Reporter reporter)
Perform LLR calculation, input is: k:ngram:ngramFreq v:(h_|t_)subgram:subgramfreq N = ngram total
Each ngram will have 2 subgrams, a head and a tail, referred to as A and B respectively below. |
Methods inherited from class org.apache.hadoop.mapred.MapReduceBase |
close |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface java.io.Closeable |
close |
NGRAM_TOTAL
public static final java.lang.String NGRAM_TOTAL
- See Also:
- Constant Field Values
MIN_LLR
public static final java.lang.String MIN_LLR
- See Also:
- Constant Field Values
DEFAULT_MIN_LLR
public static final float DEFAULT_MIN_LLR
- See Also:
- Constant Field Values
LLRReducer
public LLRReducer()
configure
public void configure(org.apache.hadoop.mapred.JobConf job)
- Specified by:
configure
in interface org.apache.hadoop.mapred.JobConfigurable
- Overrides:
configure
in class org.apache.hadoop.mapred.MapReduceBase
reduce
public void reduce(Gram ngram,
java.util.Iterator<Gram> values,
org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable> output,
org.apache.hadoop.mapred.Reporter reporter)
throws java.io.IOException
- Perform LLR calculation, input is: k:ngram:ngramFreq v:(h_|t_)subgram:subgramfreq N = ngram total
Each ngram will have 2 subgrams, a head and a tail, referred to as A and B respectively below.
A+ B: number of times a+b appear together: ngramFreq A+!B: number of times A appears without B:
hSubgramFreq - ngramFreq !A+ B: number of times B appears without A: tSubgramFreq - ngramFreq !A+!B:
number of times neither A or B appears (in that order): N - (subgramFreqA + subgramFreqB - ngramFreq)
- Specified by:
reduce
in interface org.apache.hadoop.mapred.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>
- Throws:
java.io.IOException
Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.