org.apache.mahout.clustering.meanshift
Class MeanShiftCanopyDriver

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class MeanShiftCanopyDriver
extends AbstractJob


Field Summary
static java.lang.String INPUT_IS_CANOPIES_OPTION
           
static java.lang.String STATE_IN_KEY
           
 
Constructor Summary
MeanShiftCanopyDriver()
           
 
Method Summary
 org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double t1, double t2, double convergenceDelta, int maxIterations, boolean runSequential)
          Iterate over the input clusters to produce the next cluster directories for each iteration
static void clusterData(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, boolean runSequential)
          Run the job using supplied arguments
static void createCanopyFromVectors(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, boolean runSequential)
          Convert input vectors to MeanShiftCanopies for further processing
static void main(java.lang.String[] args)
           
 void run(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double t1, double t2, double convergenceDelta, int maxIterations, boolean inputIsCanopies, boolean runClustering, boolean runSequential)
          Run the job where the input format can be either Vectors or Canopies.
 int run(java.lang.String[] args)
           
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, getInputPath, getOption, getOutputPath, hasOption, keyFor, maybePut, parseArguments, parseDirectories, prepareJob, shouldRunNextPhase
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

INPUT_IS_CANOPIES_OPTION

public static final java.lang.String INPUT_IS_CANOPIES_OPTION
See Also:
Constant Field Values

STATE_IN_KEY

public static final java.lang.String STATE_IN_KEY
See Also:
Constant Field Values
Constructor Detail

MeanShiftCanopyDriver

public MeanShiftCanopyDriver()
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception

run

public int run(java.lang.String[] args)
        throws java.lang.Exception
Throws:
java.lang.Exception

run

public void run(org.apache.hadoop.conf.Configuration conf,
                org.apache.hadoop.fs.Path input,
                org.apache.hadoop.fs.Path output,
                DistanceMeasure measure,
                double t1,
                double t2,
                double convergenceDelta,
                int maxIterations,
                boolean inputIsCanopies,
                boolean runClustering,
                boolean runSequential)
         throws java.io.IOException,
                java.lang.InterruptedException,
                java.lang.ClassNotFoundException,
                java.lang.InstantiationException,
                java.lang.IllegalAccessException
Run the job where the input format can be either Vectors or Canopies. If requested, cluster the input data using the computed Canopies

Parameters:
conf - the Configuration to use
input - the input pathname String
output - the output pathname String
measure - the DistanceMeasure
t1 - the T1 distance threshold
t2 - the T2 distance threshold
convergenceDelta - the double convergence criteria
maxIterations - an int number of iterations
inputIsCanopies - true if the input path already contains MeanShiftCanopies and does not need to be converted from Vectors
runClustering - true if the input points are to be clustered once the iterations complete
runSequential - if true run in sequential execution mode
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException

createCanopyFromVectors

public static void createCanopyFromVectors(org.apache.hadoop.conf.Configuration conf,
                                           org.apache.hadoop.fs.Path input,
                                           org.apache.hadoop.fs.Path output,
                                           DistanceMeasure measure,
                                           boolean runSequential)
                                    throws java.io.IOException,
                                           java.lang.InterruptedException,
                                           java.lang.ClassNotFoundException,
                                           java.lang.InstantiationException,
                                           java.lang.IllegalAccessException
Convert input vectors to MeanShiftCanopies for further processing

Parameters:
conf -
input -
output -
measure -
runSequential -
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException

buildClusters

public org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
                                               org.apache.hadoop.fs.Path clustersIn,
                                               org.apache.hadoop.fs.Path output,
                                               DistanceMeasure measure,
                                               double t1,
                                               double t2,
                                               double convergenceDelta,
                                               int maxIterations,
                                               boolean runSequential)
                                        throws java.io.IOException,
                                               java.lang.InterruptedException,
                                               java.lang.ClassNotFoundException,
                                               java.lang.InstantiationException,
                                               java.lang.IllegalAccessException
Iterate over the input clusters to produce the next cluster directories for each iteration

Parameters:
conf - the Configuration to use
clustersIn - the input directory Path
output - the output Path
measure - the DistanceMeasure class name
t1 - the T1 distance threshold
t2 - the T2 distance threshold
convergenceDelta - the double convergence criteria
maxIterations - an int number of iterations
runSequential - if true run in sequential execution mode
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException

clusterData

public static void clusterData(org.apache.hadoop.conf.Configuration conf,
                               org.apache.hadoop.fs.Path input,
                               org.apache.hadoop.fs.Path clustersIn,
                               org.apache.hadoop.fs.Path output,
                               boolean runSequential)
                        throws java.io.IOException,
                               java.lang.InterruptedException,
                               java.lang.ClassNotFoundException,
                               java.lang.InstantiationException,
                               java.lang.IllegalAccessException
Run the job using supplied arguments

Parameters:
conf - the Configuration to use
input - the directory pathname for input points
clustersIn - the directory pathname for input clusters
output - the directory pathname for output clustered points
runSequential - if true run in sequential execution mode
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.