org.apache.mahout.utils.nlp.collocations.llr
Class LLRReducer

java.lang.Object
  extended by org.apache.hadoop.mapred.MapReduceBase
      extended by org.apache.mahout.utils.nlp.collocations.llr.LLRReducer
All Implemented Interfaces:
java.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>

public class LLRReducer
extends org.apache.hadoop.mapred.MapReduceBase
implements org.apache.hadoop.mapred.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>

Reducer for pass 2 of the collocation discovery job. Collects ngram and sub-ngram frequencies and performs the Log-likelihood ratio calculation.


Nested Class Summary
static class LLRReducer.ConcreteLLCallback
          concrete implementation delegates to LogLikelihood class
static interface LLRReducer.LLCallback
          provide interface so the input to the llr calculation can be captured for validation in unit testing
static class LLRReducer.Skipped
          Counter to track why a particlar entry was skipped
 
Field Summary
static float DEFAULT_MIN_LLR
           
static java.lang.String MIN_LLR
           
static java.lang.String NGRAM_TOTAL
           
 
Constructor Summary
LLRReducer()
           
 
Method Summary
 void configure(org.apache.hadoop.mapred.JobConf job)
           
 void reduce(Gram ngram, java.util.Iterator<Gram> values, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable> output, org.apache.hadoop.mapred.Reporter reporter)
          Perform LLR calculation, input is: k:ngram:ngramFreq v:(h_|t_)subgram:subgramfreq N = ngram total Each ngram will have 2 subgrams, a head and a tail, referred to as A and B respectively below.
 
Methods inherited from class org.apache.hadoop.mapred.MapReduceBase
close
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface java.io.Closeable
close
 

Field Detail

NGRAM_TOTAL

public static final java.lang.String NGRAM_TOTAL
See Also:
Constant Field Values

MIN_LLR

public static final java.lang.String MIN_LLR
See Also:
Constant Field Values

DEFAULT_MIN_LLR

public static final float DEFAULT_MIN_LLR
See Also:
Constant Field Values
Constructor Detail

LLRReducer

public LLRReducer()
Method Detail

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable
Overrides:
configure in class org.apache.hadoop.mapred.MapReduceBase

reduce

public void reduce(Gram ngram,
                   java.util.Iterator<Gram> values,
                   org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable> output,
                   org.apache.hadoop.mapred.Reporter reporter)
            throws java.io.IOException
Perform LLR calculation, input is: k:ngram:ngramFreq v:(h_|t_)subgram:subgramfreq N = ngram total Each ngram will have 2 subgrams, a head and a tail, referred to as A and B respectively below. A+ B: number of times a+b appear together: ngramFreq A+!B: number of times A appears without B: hSubgramFreq - ngramFreq !A+ B: number of times B appears without A: tSubgramFreq - ngramFreq !A+!B: number of times neither A or B appears (in that order): N - (subgramFreqA + subgramFreqB - ngramFreq)

Specified by:
reduce in interface org.apache.hadoop.mapred.Reducer<Gram,Gram,org.apache.hadoop.io.Text,org.apache.hadoop.io.DoubleWritable>
Throws:
java.io.IOException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.