|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.clustering.kmeans.KMeansClusterer
public class KMeansClusterer
This class implements the k-means clustering algorithm. It uses Cluster
as a cluster
representation. The class can be used as part of a clustering job to be started as map/reduce job.
Constructor Summary | |
---|---|
KMeansClusterer(org.apache.hadoop.conf.Configuration conf)
|
|
KMeansClusterer(DistanceMeasure measure)
Init the k-means clusterer with the distance measure to use for comparison. |
Method Summary | |
---|---|
protected void |
addPointToNearestCluster(Vector point,
java.lang.Iterable<Cluster> clusters)
Sequential implementation to add point to the nearest cluster |
static java.util.List<java.util.List<Cluster>> |
clusterPoints(java.lang.Iterable<Vector> points,
java.util.List<Cluster> clusters,
DistanceMeasure measure,
int maxIter,
double distanceThreshold)
This is the reference k-means implementation. |
boolean |
computeConvergence(Cluster cluster)
|
void |
emitPointToNearestCluster(Vector point,
java.lang.Iterable<Cluster> clusters,
org.apache.hadoop.mapreduce.Mapper.Context context)
Iterates over all clusters and identifies the one closes to the given point. |
protected void |
emitPointToNearestCluster(Vector point,
java.lang.Iterable<Cluster> clusters,
org.apache.hadoop.io.SequenceFile.Writer writer)
Iterates over all clusters and identifies the one closes to the given point. |
void |
outputPointWithClusterInfo(Vector vector,
java.lang.Iterable<Cluster> clusters,
org.apache.hadoop.mapreduce.Mapper.Context context)
|
protected static boolean |
runKMeansIteration(java.lang.Iterable<Vector> points,
java.lang.Iterable<Cluster> clusters,
DistanceMeasure measure,
double distanceThreshold)
Perform a single iteration over the points and clusters, assigning points to clusters and returning if the iterations are completed. |
protected boolean |
testConvergence(java.lang.Iterable<Cluster> clusters,
double distanceThreshold)
Sequential implementation to test convergence and update cluster centers |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public KMeansClusterer(DistanceMeasure measure)
measure
- The distance measure to use for comparing clusters against points.public KMeansClusterer(org.apache.hadoop.conf.Configuration conf) throws java.lang.ClassNotFoundException, java.lang.InstantiationException, java.lang.IllegalAccessException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException
Method Detail |
---|
public void emitPointToNearestCluster(Vector point, java.lang.Iterable<Cluster> clusters, org.apache.hadoop.mapreduce.Mapper.Context context) throws java.io.IOException, java.lang.InterruptedException
point
- a point to find a cluster for.clusters
- a Listjava.io.IOException
java.lang.InterruptedException
protected void addPointToNearestCluster(Vector point, java.lang.Iterable<Cluster> clusters)
point
- clusters
- protected boolean testConvergence(java.lang.Iterable<Cluster> clusters, double distanceThreshold)
public void outputPointWithClusterInfo(Vector vector, java.lang.Iterable<Cluster> clusters, org.apache.hadoop.mapreduce.Mapper.Context context) throws java.io.IOException, java.lang.InterruptedException
java.io.IOException
java.lang.InterruptedException
protected void emitPointToNearestCluster(Vector point, java.lang.Iterable<Cluster> clusters, org.apache.hadoop.io.SequenceFile.Writer writer) throws java.io.IOException, java.lang.InterruptedException
point
- a point to find a cluster for.clusters
- a Listjava.io.IOException
java.lang.InterruptedException
public static java.util.List<java.util.List<Cluster>> clusterPoints(java.lang.Iterable<Vector> points, java.util.List<Cluster> clusters, DistanceMeasure measure, int maxIter, double distanceThreshold)
points
- the input Listclusters
- the Listmeasure
- the DistanceMeasure to usemaxIter
- the maximum number of iterationsprotected static boolean runKMeansIteration(java.lang.Iterable<Vector> points, java.lang.Iterable<Cluster> clusters, DistanceMeasure measure, double distanceThreshold)
points
- the Listclusters
- the Listmeasure
- a DistanceMeasure to usepublic boolean computeConvergence(Cluster cluster)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |