org.apache.mahout.clustering.meanshift
Class MeanShiftCanopyDriver

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class MeanShiftCanopyDriver
extends AbstractJob

This class implements the driver for Mean Shift Canopy clustering


Field Summary
static String INPUT_IS_CANOPIES_OPTION
           
static String MAPRED_REDUCE_TASKS
           
static String STATE_IN_KEY
           
 
Constructor Summary
MeanShiftCanopyDriver()
           
 
Method Summary
static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, DistanceMeasure measure, IKernelProfile kernelProfile, double t1, double t2, double convergenceDelta, int maxIterations, boolean runSequential, boolean runClustering)
          Iterate over the input clusters to produce the next cluster directories for each iteration
static void clusterData(org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path clustersIn, org.apache.hadoop.fs.Path output, boolean runSequential)
          Run the job using supplied arguments
static void createCanopyFromVectors(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, boolean runSequential)
          Convert input vectors to MeanShiftCanopies for further processing
static void main(String[] args)
           
static void run(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, IKernelProfile kernelProfile, double t1, double t2, double convergenceDelta, int maxIterations, boolean inputIsCanopies, boolean runClustering, boolean runSequential)
          Run the job where the input format can be either Vectors or Canopies.
 int run(String[] args)
           
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getCombinedTempPath, getGroup, getInputPath, getOption, getOption, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, setS3SafeCombinedInputPath, shouldRunNextPhase
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

MAPRED_REDUCE_TASKS

public static final String MAPRED_REDUCE_TASKS
See Also:
Constant Field Values

INPUT_IS_CANOPIES_OPTION

public static final String INPUT_IS_CANOPIES_OPTION
See Also:
Constant Field Values

STATE_IN_KEY

public static final String STATE_IN_KEY
See Also:
Constant Field Values
Constructor Detail

MeanShiftCanopyDriver

public MeanShiftCanopyDriver()
Method Detail

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

run

public int run(String[] args)
        throws Exception
Throws:
Exception

run

public static void run(org.apache.hadoop.conf.Configuration conf,
                       org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path output,
                       DistanceMeasure measure,
                       IKernelProfile kernelProfile,
                       double t1,
                       double t2,
                       double convergenceDelta,
                       int maxIterations,
                       boolean inputIsCanopies,
                       boolean runClustering,
                       boolean runSequential)
                throws IOException,
                       InterruptedException,
                       ClassNotFoundException
Run the job where the input format can be either Vectors or Canopies. If requested, cluster the input data using the computed Canopies

Parameters:
conf - the Configuration to use
input - the input pathname String
output - the output pathname String
measure - the DistanceMeasure
kernelProfile - the IKernelProfile
t1 - the T1 distance threshold
t2 - the T2 distance threshold
convergenceDelta - the double convergence criteria
maxIterations - an int number of iterations
inputIsCanopies - true if the input path already contains MeanShiftCanopies and does not need to be converted from Vectors
runClustering - true if the input points are to be clustered once the iterations complete
runSequential - if true run in sequential execution mode
Throws:
IOException
InterruptedException
ClassNotFoundException

createCanopyFromVectors

public static void createCanopyFromVectors(org.apache.hadoop.conf.Configuration conf,
                                           org.apache.hadoop.fs.Path input,
                                           org.apache.hadoop.fs.Path output,
                                           DistanceMeasure measure,
                                           boolean runSequential)
                                    throws IOException,
                                           InterruptedException,
                                           ClassNotFoundException
Convert input vectors to MeanShiftCanopies for further processing

Throws:
IOException
InterruptedException
ClassNotFoundException

buildClusters

public static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
                                                      org.apache.hadoop.fs.Path clustersIn,
                                                      org.apache.hadoop.fs.Path output,
                                                      DistanceMeasure measure,
                                                      IKernelProfile kernelProfile,
                                                      double t1,
                                                      double t2,
                                                      double convergenceDelta,
                                                      int maxIterations,
                                                      boolean runSequential,
                                                      boolean runClustering)
                                               throws IOException,
                                                      InterruptedException,
                                                      ClassNotFoundException
Iterate over the input clusters to produce the next cluster directories for each iteration

Parameters:
conf - the Configuration to use
clustersIn - the input directory Path
output - the output Path
measure - the DistanceMeasure
kernelProfile - the IKernelProfile
t1 - the T1 distance threshold
t2 - the T2 distance threshold
convergenceDelta - the double convergence criteria
maxIterations - an int number of iterations
runSequential - if true run in sequential execution mode
runClustering - if true accumulate merged clusters for subsequent clustering step
Throws:
IOException
InterruptedException
ClassNotFoundException

clusterData

public static void clusterData(org.apache.hadoop.fs.Path input,
                               org.apache.hadoop.fs.Path clustersIn,
                               org.apache.hadoop.fs.Path output,
                               boolean runSequential)
                        throws IOException,
                               InterruptedException,
                               ClassNotFoundException
Run the job using supplied arguments

Parameters:
input - the directory pathname for input points
clustersIn - the directory pathname for input clusters
output - the directory pathname for output clustered points
runSequential - if true run in sequential execution mode
Throws:
IOException
InterruptedException
ClassNotFoundException


Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.