org.apache.mahout.clustering.fuzzykmeans
Class FuzzyKMeansClusterer

java.lang.Object
  extended by org.apache.mahout.clustering.fuzzykmeans.FuzzyKMeansClusterer

public class FuzzyKMeansClusterer
extends java.lang.Object


Constructor Summary
FuzzyKMeansClusterer()
           
FuzzyKMeansClusterer(org.apache.hadoop.conf.Configuration conf)
           
FuzzyKMeansClusterer(DistanceMeasure measure, double convergenceDelta, double m)
          Init the fuzzy k-means clusterer with the distance measure to use for comparison.
 
Method Summary
protected  void addPointToClusters(java.util.List<SoftCluster> clusterList, Vector point)
           
static java.util.List<java.util.List<SoftCluster>> clusterPoints(java.lang.Iterable<Vector> points, java.util.List<SoftCluster> clusters, DistanceMeasure measure, double threshold, double m, int numIter)
          This is the reference k-means implementation.
 boolean computeConvergence(Cluster cluster)
          Return if the cluster is converged by comparing its center and centroid.
 Vector computePi(java.util.List<SoftCluster> clusters, java.util.List<java.lang.Double> clusterDistanceList)
           
 double computeProbWeight(double clusterDistance, java.lang.Iterable<java.lang.Double> clusterDistanceList)
          Computes the probability of a point belonging to a cluster
 void emitPointProbToCluster(Vector point, java.util.List<SoftCluster> clusters, org.apache.hadoop.mapreduce.Mapper.Context context)
          Emit the point and its probability of belongingness to each cluster
 void emitPointToClusters(VectorWritable point, java.util.List<SoftCluster> clusters, org.apache.hadoop.mapreduce.Mapper.Context context)
           
 void emitPointToClusters(VectorWritable point, java.util.List<SoftCluster> clusters, org.apache.hadoop.io.SequenceFile.Writer writer)
           
 double getM()
           
 DistanceMeasure getMeasure()
           
protected static boolean runFuzzyKMeansIteration(java.lang.Iterable<Vector> points, java.util.List<SoftCluster> clusterList, FuzzyKMeansClusterer clusterer)
          Perform a single iteration over the points and clusters, assigning points to clusters and returning if the iterations are completed.
protected  boolean testConvergence(java.lang.Iterable<SoftCluster> clusters)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FuzzyKMeansClusterer

public FuzzyKMeansClusterer(DistanceMeasure measure,
                            double convergenceDelta,
                            double m)
Init the fuzzy k-means clusterer with the distance measure to use for comparison.


FuzzyKMeansClusterer

public FuzzyKMeansClusterer(org.apache.hadoop.conf.Configuration conf)

FuzzyKMeansClusterer

public FuzzyKMeansClusterer()
Method Detail

clusterPoints

public static java.util.List<java.util.List<SoftCluster>> clusterPoints(java.lang.Iterable<Vector> points,
                                                                        java.util.List<SoftCluster> clusters,
                                                                        DistanceMeasure measure,
                                                                        double threshold,
                                                                        double m,
                                                                        int numIter)
This is the reference k-means implementation. Given its inputs it iterates over the points and clusters until their centers converge or until the maximum number of iterations is exceeded.

Parameters:
points - the input List of points
clusters - the initial List of clusters
measure - the DistanceMeasure to use
threshold - the double convergence threshold
m - the double "fuzzyness" argument (>1)
numIter - the maximum number of iterations
Returns:
a List> of clusters produced per iteration

runFuzzyKMeansIteration

protected static boolean runFuzzyKMeansIteration(java.lang.Iterable<Vector> points,
                                                 java.util.List<SoftCluster> clusterList,
                                                 FuzzyKMeansClusterer clusterer)
Perform a single iteration over the points and clusters, assigning points to clusters and returning if the iterations are completed.

Parameters:
points - the List having the input points
clusterList - the List clusters

emitPointProbToCluster

public void emitPointProbToCluster(Vector point,
                                   java.util.List<SoftCluster> clusters,
                                   org.apache.hadoop.mapreduce.Mapper.Context context)
                            throws java.io.IOException,
                                   java.lang.InterruptedException
Emit the point and its probability of belongingness to each cluster

Parameters:
point - a point
clusters - a List
context - the Context to emit into
Throws:
java.io.IOException
java.lang.InterruptedException

computeProbWeight

public double computeProbWeight(double clusterDistance,
                                java.lang.Iterable<java.lang.Double> clusterDistanceList)
Computes the probability of a point belonging to a cluster


computeConvergence

public boolean computeConvergence(Cluster cluster)
Return if the cluster is converged by comparing its center and centroid.

Returns:
if the cluster is converged

getM

public double getM()

getMeasure

public DistanceMeasure getMeasure()

emitPointToClusters

public void emitPointToClusters(VectorWritable point,
                                java.util.List<SoftCluster> clusters,
                                org.apache.hadoop.mapreduce.Mapper.Context context)
                         throws java.io.IOException,
                                java.lang.InterruptedException
Throws:
java.io.IOException
java.lang.InterruptedException

computePi

public Vector computePi(java.util.List<SoftCluster> clusters,
                        java.util.List<java.lang.Double> clusterDistanceList)

addPointToClusters

protected void addPointToClusters(java.util.List<SoftCluster> clusterList,
                                  Vector point)

testConvergence

protected boolean testConvergence(java.lang.Iterable<SoftCluster> clusters)

emitPointToClusters

public void emitPointToClusters(VectorWritable point,
                                java.util.List<SoftCluster> clusters,
                                org.apache.hadoop.io.SequenceFile.Writer writer)
                         throws java.io.IOException
Throws:
java.io.IOException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.