org.apache.mahout.clustering.cdbw
Class CDbwEvaluator

java.lang.Object
  extended by org.apache.mahout.clustering.cdbw.CDbwEvaluator

public class CDbwEvaluator
extends Object

This class calculates the CDbw metric as defined in http://www.db-net.aueb.gr/index.php/corporate/content/download/227/833/file/HV_poster2002.pdf


Constructor Summary
CDbwEvaluator(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path clustersIn)
          Initialize a new instance from job information
CDbwEvaluator(Map<Integer,List<VectorWritable>> representativePoints, List<Cluster> clusters, DistanceMeasure measure)
          For testing only
 
Method Summary
 double getCDbw()
          Compute the CDbw validity metric (eqn 8).
 double interClusterDensity()
          This function evaluates the average density of points in the regions between clusters (eqn 1).
 double intraClusterDensity()
          The average density within clusters is defined as the percentage of representative points that reside in the neighborhood of the clusters' centers.
 double separation()
          Calculate the separation of clusters (eqn 4) taking into account both the distances between the clusters' closest points and the Inter-cluster density.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CDbwEvaluator

public CDbwEvaluator(Map<Integer,List<VectorWritable>> representativePoints,
                     List<Cluster> clusters,
                     DistanceMeasure measure)
For testing only

Parameters:
representativePoints - a Map> of representative points keyed by clusterId
clusters - a Map of the clusters keyed by clusterId
measure - an appropriate DistanceMeasure

CDbwEvaluator

public CDbwEvaluator(org.apache.hadoop.conf.Configuration conf,
                     org.apache.hadoop.fs.Path clustersIn)
              throws ClassNotFoundException,
                     InstantiationException,
                     IllegalAccessException,
                     IOException
Initialize a new instance from job information

Parameters:
conf - a JobConf with appropriate parameters
clustersIn - a String path to the input clusters directory
Throws:
ClassNotFoundException
InstantiationException
IllegalAccessException
IOException
Method Detail

getCDbw

public double getCDbw()
Compute the CDbw validity metric (eqn 8). The goal of this metric is to reward clusterings which have a high intraClusterDensity and also a high cluster separation.

Returns:
a double

intraClusterDensity

public double intraClusterDensity()
The average density within clusters is defined as the percentage of representative points that reside in the neighborhood of the clusters' centers. The goal is the density within clusters to be significantly high. (eqn 5)

Returns:
a double

separation

public double separation()
Calculate the separation of clusters (eqn 4) taking into account both the distances between the clusters' closest points and the Inter-cluster density. The goal is the distances between clusters to be high while the representative point density in the areas between them are low.

Returns:
a double

interClusterDensity

public double interClusterDensity()
This function evaluates the average density of points in the regions between clusters (eqn 1). The goal is the density in the area between clusters to be significant low.

Returns:
a double


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.