org.apache.mahout.clustering.meanshift
Class MeanShiftCanopyClusterer

java.lang.Object
  extended by org.apache.mahout.clustering.meanshift.MeanShiftCanopyClusterer

public class MeanShiftCanopyClusterer
extends java.lang.Object


Constructor Summary
MeanShiftCanopyClusterer(DistanceMeasure aMeasure, double aT1, double aT2, double aDelta)
           
MeanShiftCanopyClusterer(org.apache.hadoop.mapred.JobConf job)
           
 
Method Summary
 boolean closelyBound(MeanShiftCanopy canopy, Vector point)
          Return if the point is closely covered by the canopy
static java.util.List<MeanShiftCanopy> clusterPoints(java.util.List<Vector> points, DistanceMeasure measure, double convergenceThreshold, double t1, double t2, int numIter)
          This is the reference mean-shift implementation.
 void config(DistanceMeasure aMeasure, double aT1, double aT2, double aDelta)
          Configure the Canopy for unit tests
 void configure(org.apache.hadoop.mapred.JobConf job)
          Configure the Canopy and its distance measure
 double getT1()
           
 double getT2()
           
 void mergeCanopy(MeanShiftCanopy aCanopy, java.util.List<MeanShiftCanopy> canopies)
          Merge the given canopy into the canopies list.
static boolean runMeanShiftCanopyIteration(java.util.List<MeanShiftCanopy> canopies, MeanShiftCanopyClusterer clusterer)
          Perform a single iteration over the points and clusters, assigning points to clusters and returning if the iterations are completed.
 boolean shiftToMean(MeanShiftCanopy canopy)
          Shift the center to the new centroid of the cluster
 void testReferenceImplementation()
          Story: User can exercise the reference implementation to verify that the test datapoints are clustered in a reasonable manner.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MeanShiftCanopyClusterer

public MeanShiftCanopyClusterer(org.apache.hadoop.mapred.JobConf job)

MeanShiftCanopyClusterer

public MeanShiftCanopyClusterer(DistanceMeasure aMeasure,
                                double aT1,
                                double aT2,
                                double aDelta)
Method Detail

getT1

public double getT1()

getT2

public double getT2()

configure

public void configure(org.apache.hadoop.mapred.JobConf job)
Configure the Canopy and its distance measure

Parameters:
job - the JobConf for this job

config

public void config(DistanceMeasure aMeasure,
                   double aT1,
                   double aT2,
                   double aDelta)
Configure the Canopy for unit tests

Parameters:
aDelta - the convergence criteria

mergeCanopy

public void mergeCanopy(MeanShiftCanopy aCanopy,
                        java.util.List<MeanShiftCanopy> canopies)
Merge the given canopy into the canopies list. If it touches any existing canopy (norm
Parameters:
aCanopy - a MeanShiftCanopy to be merged
canopies - the List to be appended

shiftToMean

public boolean shiftToMean(MeanShiftCanopy canopy)
Shift the center to the new centroid of the cluster

Parameters:
canopy - the canopy to shift.
Returns:
if the cluster is converged

closelyBound

public boolean closelyBound(MeanShiftCanopy canopy,
                            Vector point)
Return if the point is closely covered by the canopy

Parameters:
canopy - a canopy.
point - a Vector point
Returns:
if the point is covered

testReferenceImplementation

public void testReferenceImplementation()
Story: User can exercise the reference implementation to verify that the test datapoints are clustered in a reasonable manner.


clusterPoints

public static java.util.List<MeanShiftCanopy> clusterPoints(java.util.List<Vector> points,
                                                            DistanceMeasure measure,
                                                            double convergenceThreshold,
                                                            double t1,
                                                            double t2,
                                                            int numIter)
This is the reference mean-shift implementation. Given its inputs it iterates over the points and clusters until their centers converge or until the maximum number of iterations is exceeded.

Parameters:
points - the input List of points
measure - the DistanceMeasure to use
numIter - the maximum number of iterations

runMeanShiftCanopyIteration

public static boolean runMeanShiftCanopyIteration(java.util.List<MeanShiftCanopy> canopies,
                                                  MeanShiftCanopyClusterer clusterer)
Perform a single iteration over the points and clusters, assigning points to clusters and returning if the iterations are completed.

Parameters:
canopies - the List clusters


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.