org.apache.mahout.clustering.fuzzykmeans
Class FuzzyKMeansClusterer

java.lang.Object
  extended by org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansClusterer

public class FuzzyKMeansClusterer
extends java.lang.Object


Constructor Summary
FuzzyKMeansClusterer(DistanceMeasure measure, double convergenceDelta, double m)
          Init the fuzzy k-means clusterer with the distance measure to use for comparison.
FuzzyKMeansClusterer(org.apache.hadoop.mapred.JobConf job)
           
 
Method Summary
static java.util.List<java.util.List<SoftCluster>> clusterPoints(java.util.List<Vector> points, java.util.List<SoftCluster> clusters, DistanceMeasure measure, double threshold, double m, int numIter)
          This is the reference k-means implementation.
 boolean computeConvergence(SoftCluster cluster)
          Return if the cluster is converged by comparing its center and centroid.
 double computeProbWeight(double clusterDistance, java.util.List<java.lang.Double> clusterDistanceList)
          Computes the probability of a point belonging to a cluster
 void emitPointProbToCluster(Vector point, java.util.List<SoftCluster> clusters, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,FuzzyKMeansInfo> output)
          Emit the point and its probability of belongingness to each cluster
 double getM()
           
 DistanceMeasure getMeasure()
           
 void outputPointWithClusterProbabilities(java.lang.String key, Vector point, java.util.List<SoftCluster> clusters, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,FuzzyKMeansOutput> output)
          Output point with cluster info (Cluster and probability)
static boolean runFuzzyKMeansIteration(java.util.List<Vector> points, java.util.List<SoftCluster> clusterList, FuzzyKMeansClusterer clusterer)
          Perform a single iteration over the points and clusters, assigning points to clusters and returning if the iterations are completed.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FuzzyKMeansClusterer

public FuzzyKMeansClusterer(DistanceMeasure measure,
                            double convergenceDelta,
                            double m)
Init the fuzzy k-means clusterer with the distance measure to use for comparison.

Parameters:
measure - The distance measure to use for comparing clusters against points.
convergenceDelta - When do we define a cluster to have converged?

FuzzyKMeansClusterer

public FuzzyKMeansClusterer(org.apache.hadoop.mapred.JobConf job)
Method Detail

emitPointProbToCluster

public void emitPointProbToCluster(Vector point,
                                   java.util.List<SoftCluster> clusters,
                                   org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,FuzzyKMeansInfo> output)
                            throws java.io.IOException
Emit the point and its probability of belongingness to each cluster

Parameters:
point - a point
clusters - a List
output - the OutputCollector to emit into
Throws:
java.io.IOException

outputPointWithClusterProbabilities

public void outputPointWithClusterProbabilities(java.lang.String key,
                                                Vector point,
                                                java.util.List<SoftCluster> clusters,
                                                org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.Text,FuzzyKMeansOutput> output)
                                         throws java.io.IOException
Output point with cluster info (Cluster and probability)

Parameters:
point - a point
clusters - a List to test
output - the OutputCollector to emit into
Throws:
java.io.IOException

computeProbWeight

public double computeProbWeight(double clusterDistance,
                                java.util.List<java.lang.Double> clusterDistanceList)
Computes the probability of a point belonging to a cluster


computeConvergence

public boolean computeConvergence(SoftCluster cluster)
Return if the cluster is converged by comparing its center and centroid.

Returns:
if the cluster is converged

getM

public double getM()

getMeasure

public DistanceMeasure getMeasure()

clusterPoints

public static java.util.List<java.util.List<SoftCluster>> clusterPoints(java.util.List<Vector> points,
                                                                        java.util.List<SoftCluster> clusters,
                                                                        DistanceMeasure measure,
                                                                        double threshold,
                                                                        double m,
                                                                        int numIter)
This is the reference k-means implementation. Given its inputs it iterates over the points and clusters until their centers converge or until the maximum number of iterations is exceeded.

Parameters:
points - the input List of points
clusters - the initial List of clusters
measure - the DistanceMeasure to use
numIter - the maximum number of iterations

runFuzzyKMeansIteration

public static boolean runFuzzyKMeansIteration(java.util.List<Vector> points,
                                              java.util.List<SoftCluster> clusterList,
                                              FuzzyKMeansClusterer clusterer)
Perform a single iteration over the points and clusters, assigning points to clusters and returning if the iterations are completed.

Parameters:
points - the List having the input points
clusterList - the List clusters
Returns:


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.