|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.clustering.kmeans.KMeansClusterer
public class KMeansClusterer
This class implements the k-means clustering algorithm. It uses Cluster
as a cluster
representation. The class can be used as part of a clustering job to be started as map/reduce job.
Constructor Summary | |
---|---|
KMeansClusterer(DistanceMeasure measure)
Init the k-means clusterer with the distance measure to use for comparison. |
Method Summary | |
---|---|
protected void |
addPointToNearestCluster(Vector point,
Iterable<Cluster> clusters)
Sequential implementation to add point to the nearest cluster |
static List<List<Cluster>> |
clusterPoints(Iterable<Vector> points,
List<Cluster> clusters,
DistanceMeasure measure,
int maxIter,
double distanceThreshold)
This is the reference k-means implementation. |
boolean |
computeConvergence(Cluster cluster,
double distanceThreshold)
|
void |
emitPointToNearestCluster(Vector point,
Iterable<Cluster> clusters,
org.apache.hadoop.mapreduce.Mapper.Context context)
Iterates over all clusters and identifies the one closes to the given point. |
protected void |
emitPointToNearestCluster(Vector point,
Iterable<Cluster> clusters,
org.apache.hadoop.io.SequenceFile.Writer writer)
Iterates over all clusters and identifies the one closes to the given point. |
void |
outputPointWithClusterInfo(Vector vector,
Iterable<Cluster> clusters,
org.apache.hadoop.mapreduce.Mapper.Context context)
|
protected static boolean |
runKMeansIteration(Iterable<Vector> points,
Iterable<Cluster> clusters,
DistanceMeasure measure,
double distanceThreshold)
Perform a single iteration over the points and clusters, assigning points to clusters and returning if the iterations are completed. |
protected boolean |
testConvergence(Iterable<Cluster> clusters,
double distanceThreshold)
Sequential implementation to test convergence and update cluster centers |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public KMeansClusterer(DistanceMeasure measure)
measure
- The distance measure to use for comparing clusters against points.Method Detail |
---|
public void emitPointToNearestCluster(Vector point, Iterable<Cluster> clusters, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException
point
- a point to find a cluster for.clusters
- a ListIOException
InterruptedException
protected void addPointToNearestCluster(Vector point, Iterable<Cluster> clusters)
point
- clusters
- protected boolean testConvergence(Iterable<Cluster> clusters, double distanceThreshold)
public void outputPointWithClusterInfo(Vector vector, Iterable<Cluster> clusters, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException
IOException
InterruptedException
protected void emitPointToNearestCluster(Vector point, Iterable<Cluster> clusters, org.apache.hadoop.io.SequenceFile.Writer writer) throws IOException
point
- a point to find a cluster for.clusters
- a ListIOException
public static List<List<Cluster>> clusterPoints(Iterable<Vector> points, List<Cluster> clusters, DistanceMeasure measure, int maxIter, double distanceThreshold)
points
- the input Listclusters
- the Listmeasure
- the DistanceMeasure to usemaxIter
- the maximum number of iterationsprotected static boolean runKMeansIteration(Iterable<Vector> points, Iterable<Cluster> clusters, DistanceMeasure measure, double distanceThreshold)
points
- the Listclusters
- the Listmeasure
- a DistanceMeasure to usepublic boolean computeConvergence(Cluster cluster, double distanceThreshold)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |