org.apache.mahout.clustering.kmeans
Class KMeansDriver

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.clustering.kmeans.KMeansDriver
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class KMeansDriver
extends AbstractJob


Constructor Summary
KMeansDriver()
           
 
Method Summary
static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, DistanceMeasure measure, int maxIterations, java.lang.String delta, boolean runSequential)
          Iterate over the input vectors to produce cluster directories for each iteration
static void clusterData(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, DistanceMeasure measure, java.lang.String convergenceDelta, boolean runSequential)
          Run the job using supplied arguments
static void main(java.lang.String[] args)
           
static void run(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double convergenceDelta, int maxIterations, boolean runClustering, boolean runSequential)
          Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to cluster the input vectors.
static void run(org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double convergenceDelta, int maxIterations, boolean runClustering, boolean runSequential)
          Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to cluster the input vectors.
 int run(java.lang.String[] args)
           
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, getInputPath, getOption, getOutputPath, hasOption, keyFor, maybePut, parseArguments, parseDirectories, prepareJob, shouldRunNextPhase
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Constructor Detail

KMeansDriver

public KMeansDriver()
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception

run

public int run(java.lang.String[] args)
        throws java.lang.Exception
Throws:
java.lang.Exception

run

public static void run(org.apache.hadoop.conf.Configuration conf,
                       org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path clustersIn,
                       org.apache.hadoop.fs.Path output,
                       DistanceMeasure measure,
                       double convergenceDelta,
                       int maxIterations,
                       boolean runClustering,
                       boolean runSequential)
                throws java.io.IOException,
                       java.lang.InterruptedException,
                       java.lang.ClassNotFoundException,
                       java.lang.InstantiationException,
                       java.lang.IllegalAccessException
Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to cluster the input vectors.

Parameters:
input - the directory pathname for input points
clustersIn - the directory pathname for initial & computed clusters
output - the directory pathname for output points
measure - the DistanceMeasure to use
convergenceDelta - the convergence delta value
maxIterations - the maximum number of iterations
runClustering - true if points are to be clustered after iterations are completed
runSequential - if true execute sequential algorithm
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException

run

public static void run(org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path clustersIn,
                       org.apache.hadoop.fs.Path output,
                       DistanceMeasure measure,
                       double convergenceDelta,
                       int maxIterations,
                       boolean runClustering,
                       boolean runSequential)
                throws java.io.IOException,
                       java.lang.InterruptedException,
                       java.lang.ClassNotFoundException,
                       java.lang.InstantiationException,
                       java.lang.IllegalAccessException
Iterate over the input vectors to produce clusters and, if requested, use the results of the final iteration to cluster the input vectors.

Parameters:
input - the directory pathname for input points
clustersIn - the directory pathname for initial & computed clusters
output - the directory pathname for output points
measure - the DistanceMeasure to use
convergenceDelta - the convergence delta value
maxIterations - the maximum number of iterations
runClustering - true if points are to be clustered after iterations are completed
runSequential - if true execute sequential algorithm
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException

buildClusters

public static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
                                                      org.apache.hadoop.fs.Path input,
                                                      org.apache.hadoop.fs.Path clustersIn,
                                                      org.apache.hadoop.fs.Path output,
                                                      DistanceMeasure measure,
                                                      int maxIterations,
                                                      java.lang.String delta,
                                                      boolean runSequential)
                                               throws java.io.IOException,
                                                      java.lang.InterruptedException,
                                                      java.lang.ClassNotFoundException,
                                                      java.lang.InstantiationException,
                                                      java.lang.IllegalAccessException
Iterate over the input vectors to produce cluster directories for each iteration

Parameters:
conf - the Configuration to use
input - the directory pathname for input points
clustersIn - the directory pathname for initial & computed clusters
output - the directory pathname for output points
measure - the classname of the DistanceMeasure
maxIterations - the maximum number of iterations
delta - the convergence delta value
runSequential - if true execute sequential algorithm
Returns:
the Path of the final clusters directory
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException

clusterData

public static void clusterData(org.apache.hadoop.conf.Configuration conf,
                               org.apache.hadoop.fs.Path input,
                               org.apache.hadoop.fs.Path clustersIn,
                               org.apache.hadoop.fs.Path output,
                               DistanceMeasure measure,
                               java.lang.String convergenceDelta,
                               boolean runSequential)
                        throws java.io.IOException,
                               java.lang.InterruptedException,
                               java.lang.ClassNotFoundException,
                               java.lang.InstantiationException,
                               java.lang.IllegalAccessException
Run the job using supplied arguments

Parameters:
conf - TODO
input - the directory pathname for input points
clustersIn - the directory pathname for input clusters
output - the directory pathname for output points
measure - the classname of the DistanceMeasure
convergenceDelta - the convergence delta value
runSequential - if true execute sequential algorithm
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.