|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.clustering.kmeans.KMeansClusterer
public class KMeansClusterer
This class implements the k-means clustering algorithm. It uses Cluster
as a cluster
representation. The class can be used as part of a clustering job to be started as map/reduce job.
Constructor Summary | |
---|---|
KMeansClusterer(DistanceMeasure measure)
Init the k-means clusterer with the distance measure to use for comparison. |
Method Summary | |
---|---|
static java.util.List<java.util.List<Cluster>> |
clusterPoints(java.util.List<Vector> points,
java.util.List<Cluster> clusters,
DistanceMeasure measure,
int maxIter,
double distanceThreshold)
This is the reference k-means implementation. |
void |
emitPointToNearestCluster(Vector point,
java.util.List<Cluster> clusters,
org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,KMeansInfo> output)
Iterates over all clusters and identifies the one closes to the given point. |
void |
outputPointWithClusterInfo(Vector point,
java.util.List<Cluster> clusters,
org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> output)
|
static boolean |
runKMeansIteration(java.util.List<Vector> points,
java.util.List<Cluster> clusters,
DistanceMeasure measure,
double distanceThreshold)
Perform a single iteration over the points and clusters, assigning points to clusters and returning if the iterations are completed. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public KMeansClusterer(DistanceMeasure measure)
measure
- The distance measure to use for comparing clusters against points.Method Detail |
---|
public void emitPointToNearestCluster(Vector point, java.util.List<Cluster> clusters, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,KMeansInfo> output) throws java.io.IOException
point
- a point to find a cluster for.clusters
- a Listjava.io.IOException
public void outputPointWithClusterInfo(Vector point, java.util.List<Cluster> clusters, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text> output) throws java.io.IOException
java.io.IOException
public static java.util.List<java.util.List<Cluster>> clusterPoints(java.util.List<Vector> points, java.util.List<Cluster> clusters, DistanceMeasure measure, int maxIter, double distanceThreshold)
points
- the input Listclusters
- the Listmeasure
- the DistanceMeasure to usemaxIter
- the maximum number of iterationspublic static boolean runKMeansIteration(java.util.List<Vector> points, java.util.List<Cluster> clusters, DistanceMeasure measure, double distanceThreshold)
points
- the Listclusters
- the Listmeasure
- a DistanceMeasure to use
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |