org.apache.mahout.math.hadoop.similarity.vector
Class AbstractDistributedVectorSimilarity

java.lang.Object
  extended by org.apache.mahout.math.hadoop.similarity.vector.AbstractDistributedVectorSimilarity
All Implemented Interfaces:
DistributedVectorSimilarity
Direct Known Subclasses:
DistributedEuclideanDistanceVectorSimilarity, DistributedLoglikelihoodVectorSimilarity, DistributedPearsonCorrelationVectorSimilarity, DistributedTanimotoCoefficientVectorSimilarity, DistributedUncenteredCosineVectorSimilarity, DistributedUncenteredZeroAssumingCosineVectorSimilarity

public abstract class AbstractDistributedVectorSimilarity
extends java.lang.Object
implements DistributedVectorSimilarity

abstract base implementation of DistributedVectorSimilarity


Constructor Summary
AbstractDistributedVectorSimilarity()
           
 
Method Summary
protected static int countElements(java.lang.Iterable<?> iterable)
          computes the number of elements in the Iterable
protected static int countElements(java.util.Iterator<?> iterator)
          computes the number of elements in the Iterator
protected abstract  double doComputeResult(int rowA, int rowB, java.lang.Iterable<Cooccurrence> cooccurrences, double weightOfVectorA, double weightOfVectorB, int numberOfColumns)
          do the actual similarity computation
 double similarity(int rowA, int rowB, java.lang.Iterable<Cooccurrence> cooccurrences, double weightOfVectorA, double weightOfVectorB, int numberOfColumns)
          ensures that the computed similarity is in [-1,1]
 double weight(Vector v)
          vectors have no weight (NaN) by default, subclasses may override this
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AbstractDistributedVectorSimilarity

public AbstractDistributedVectorSimilarity()
Method Detail

similarity

public final double similarity(int rowA,
                               int rowB,
                               java.lang.Iterable<Cooccurrence> cooccurrences,
                               double weightOfVectorA,
                               double weightOfVectorB,
                               int numberOfColumns)
ensures that the computed similarity is in [-1,1]

Specified by:
similarity in interface DistributedVectorSimilarity
Parameters:
rowA - offset of the first row
rowB - offset of the second row
cooccurrences - all column entries where both vectors have a nonZero entry
weightOfVectorA - the result of DistributedVectorSimilarity.weight(Vector) for the first row vector
weightOfVectorB - the result of DistributedVectorSimilarity.weight(Vector) for the first row vector

countElements

protected static int countElements(java.lang.Iterable<?> iterable)
computes the number of elements in the Iterable


countElements

protected static int countElements(java.util.Iterator<?> iterator)
computes the number of elements in the Iterator


doComputeResult

protected abstract double doComputeResult(int rowA,
                                          int rowB,
                                          java.lang.Iterable<Cooccurrence> cooccurrences,
                                          double weightOfVectorA,
                                          double weightOfVectorB,
                                          int numberOfColumns)
do the actual similarity computation

See Also:
DistributedVectorSimilarity.similarity(int, int, Iterable, double, double, int)

weight

public double weight(Vector v)
vectors have no weight (NaN) by default, subclasses may override this

Specified by:
weight in interface DistributedVectorSimilarity


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.