Package org.apache.mahout.math.stats.entropy

Class Summary
CalculateEntropyMapper Calculates the entropy for the value with H(x) = x * log(x)
CalculateEntropyReducer Subtracts the partial entropy.
CalculateSpecificConditionalEntropyMapper Drops the key.
ConditionalEntropy A Hadoop job to compute the conditional entropy H(Value|Key) for a sequence file.
DoubleSumReducer Analog of org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer which sums the double values.
Entropy A Hadoop job to compute the entropy of keys or values in a SequenceFile.
GroupAndCountByKeyAndValueMapper Groups the input by key and value.
InformationGain Calculates the information gain for a SequenceFile.
InformationGainRatio A job to calculate the normalized information gain.
KeyCounterMapper Emits the key and the count of 1 as VarIntWritable.
SpecificConditionalEntropyMapper Converts the key from StringTuple with values [key, value] to Text with value key.
SpecificConditionalEntropyReducer Does the weighted conditional entropy calculation with

H(values|key) = p(key) * sum_i(p(values_i|key) * log_2(p(values_i|key))) = p(key) * (log(|key|) - sum_i(values_i * log_2(values_i)) / |key|) = (sum * log_2(sum) - sum_i(values_i * log_2(values_i))/n WITH sum = sum_i(values_i) = (sum * log(sum) - sum_i(values_i * log(values_i)) / (n * log(2))

ValueCounterMapper Emits the value and the count of 1 as VarIntWritable.
VarIntSumReducer The analog of org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer which uses VarIntWritable.
 



Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.