org.apache.mahout.clustering.canopy
Class CanopyClusterer

java.lang.Object
  extended by org.apache.mahout.clustering.canopy.CanopyClusterer

public class CanopyClusterer
extends java.lang.Object


Constructor Summary
CanopyClusterer(org.apache.hadoop.conf.Configuration config)
           
CanopyClusterer(DistanceMeasure measure, double t1, double t2)
           
 
Method Summary
 void addPointToCanopies(Vector point, java.util.Collection<Canopy> canopies)
          This is the same algorithm as the reference but inverted to iterate over existing canopies instead of the points.
 boolean canopyCovers(Canopy canopy, Vector point)
          Return if the point is covered by the canopy
 void config(DistanceMeasure aMeasure, double aT1, double aT2)
          Configure the Canopy for unit tests
 void configure(org.apache.hadoop.conf.Configuration configuration)
          Configure the Canopy and its distance measure
static java.util.List<Canopy> createCanopies(java.util.List<Vector> points, DistanceMeasure measure, double t1, double t2)
          Iterate through the points, adding new canopies.
 void emitPointToClosestCanopy(Vector point, java.lang.Iterable<Canopy> canopies, org.apache.hadoop.mapreduce.Mapper.Context context)
          Emit the point to the closest Canopy
protected  Canopy findClosestCanopy(Vector point, java.lang.Iterable<Canopy> canopies)
           
static java.util.List<Vector> getCenters(java.lang.Iterable<Canopy> canopies)
          Iterate through the canopies, adding their centroids to a list
static void updateCentroids(java.lang.Iterable<Canopy> canopies)
          Iterate through the canopies, resetting their center to their centroids
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CanopyClusterer

public CanopyClusterer(DistanceMeasure measure,
                       double t1,
                       double t2)

CanopyClusterer

public CanopyClusterer(org.apache.hadoop.conf.Configuration config)
Method Detail

configure

public void configure(org.apache.hadoop.conf.Configuration configuration)
Configure the Canopy and its distance measure

Parameters:
configuration - the JobConf for this job

config

public void config(DistanceMeasure aMeasure,
                   double aT1,
                   double aT2)
Configure the Canopy for unit tests


addPointToCanopies

public void addPointToCanopies(Vector point,
                               java.util.Collection<Canopy> canopies)
This is the same algorithm as the reference but inverted to iterate over existing canopies instead of the points. Because of this it does not need to actually store the points, instead storing a total points vector and the number of points. From this a centroid can be computed.

This method is used by the CanopyMapper, CanopyReducer and CanopyDriver.

Parameters:
point - the point to be added
canopies - the List to be appended

emitPointToClosestCanopy

public void emitPointToClosestCanopy(Vector point,
                                     java.lang.Iterable<Canopy> canopies,
                                     org.apache.hadoop.mapreduce.Mapper.Context context)
                              throws java.io.IOException,
                                     java.lang.InterruptedException
Emit the point to the closest Canopy

Throws:
java.io.IOException
java.lang.InterruptedException

findClosestCanopy

protected Canopy findClosestCanopy(Vector point,
                                   java.lang.Iterable<Canopy> canopies)

canopyCovers

public boolean canopyCovers(Canopy canopy,
                            Vector point)
Return if the point is covered by the canopy

Parameters:
point - a point
Returns:
if the point is covered

createCanopies

public static java.util.List<Canopy> createCanopies(java.util.List<Vector> points,
                                                    DistanceMeasure measure,
                                                    double t1,
                                                    double t2)
Iterate through the points, adding new canopies. Return the canopies.

Parameters:
points - a list defining the points to be clustered
measure - a DistanceMeasure to use
t1 - the T1 distance threshold
t2 - the T2 distance threshold
Returns:
the List created

getCenters

public static java.util.List<Vector> getCenters(java.lang.Iterable<Canopy> canopies)
Iterate through the canopies, adding their centroids to a list

Parameters:
canopies - a List
Returns:
the List

updateCentroids

public static void updateCentroids(java.lang.Iterable<Canopy> canopies)
Iterate through the canopies, resetting their center to their centroids

Parameters:
canopies - a List


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.