org.apache.mahout.clustering.canopy
Class CanopyDriver

java.lang.Object
  extended by org.apache.hadoop.conf.Configured
      extended by org.apache.mahout.common.AbstractJob
          extended by org.apache.mahout.clustering.canopy.CanopyDriver
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class CanopyDriver
extends AbstractJob


Field Summary
static java.lang.String DEFAULT_CLUSTERED_POINTS_DIRECTORY
           
 
Constructor Summary
CanopyDriver()
           
 
Method Summary
static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double t1, double t2, boolean runSequential)
          Build a directory of Canopy clusters from the input vectors and other arguments.
static void clusterData(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path points, org.apache.hadoop.fs.Path canopies, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double t1, double t2, boolean runSequential)
           
static void main(java.lang.String[] args)
           
static void run(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double t1, double t2, boolean runClustering, boolean runSequential)
          Build a directory of Canopy clusters from the input arguments and, if requested, cluster the input vectors using these clusters
static void run(org.apache.hadoop.fs.Path input, org.apache.hadoop.fs.Path output, DistanceMeasure measure, double t1, double t2, boolean runClustering, boolean runSequential)
          Convenience method creates new Configuration() Build a directory of Canopy clusters from the input arguments and, if requested, cluster the input vectors using these clusters
 int run(java.lang.String[] args)
           
 
Methods inherited from class org.apache.mahout.common.AbstractJob
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, getInputPath, getOption, getOutputPath, hasOption, keyFor, maybePut, parseArguments, parseDirectories, prepareJob, shouldRunNextPhase
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

DEFAULT_CLUSTERED_POINTS_DIRECTORY

public static final java.lang.String DEFAULT_CLUSTERED_POINTS_DIRECTORY
See Also:
Constant Field Values
Constructor Detail

CanopyDriver

public CanopyDriver()
Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception

run

public int run(java.lang.String[] args)
        throws java.lang.Exception
Throws:
java.lang.Exception

run

public static void run(org.apache.hadoop.conf.Configuration conf,
                       org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path output,
                       DistanceMeasure measure,
                       double t1,
                       double t2,
                       boolean runClustering,
                       boolean runSequential)
                throws java.io.IOException,
                       java.lang.InterruptedException,
                       java.lang.ClassNotFoundException,
                       java.lang.InstantiationException,
                       java.lang.IllegalAccessException
Build a directory of Canopy clusters from the input arguments and, if requested, cluster the input vectors using these clusters

Parameters:
input - the Path to the directory containing input vectors
output - the Path for all output directories
t1 - the double T1 distance metric
t2 - the double T2 distance metric
runClustering - cluster the input vectors if true
runSequential - execute sequentially if true
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException

run

public static void run(org.apache.hadoop.fs.Path input,
                       org.apache.hadoop.fs.Path output,
                       DistanceMeasure measure,
                       double t1,
                       double t2,
                       boolean runClustering,
                       boolean runSequential)
                throws java.io.IOException,
                       java.lang.InterruptedException,
                       java.lang.ClassNotFoundException,
                       java.lang.InstantiationException,
                       java.lang.IllegalAccessException
Convenience method creates new Configuration() Build a directory of Canopy clusters from the input arguments and, if requested, cluster the input vectors using these clusters

Parameters:
input - the Path to the directory containing input vectors
output - the Path for all output directories
t1 - the double T1 distance metric
t2 - the double T2 distance metric
runClustering - cluster the input vectors if true
runSequential - execute sequentially if true
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException

buildClusters

public static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
                                                      org.apache.hadoop.fs.Path input,
                                                      org.apache.hadoop.fs.Path output,
                                                      DistanceMeasure measure,
                                                      double t1,
                                                      double t2,
                                                      boolean runSequential)
                                               throws java.lang.InstantiationException,
                                                      java.lang.IllegalAccessException,
                                                      java.io.IOException,
                                                      java.lang.InterruptedException,
                                                      java.lang.ClassNotFoundException
Build a directory of Canopy clusters from the input vectors and other arguments. Run sequential or mapreduce execution as requested

Parameters:
conf - the Configuration to use
input - the Path to the directory containing input vectors
output - the Path for all output directories
measure - the DistanceMeasure
t1 - the double T1 distance metric
t2 - the double T2 distance metric
runSequential - a boolean indicates to run the sequential (reference) algorithm
Returns:
the canopy output directory Path
Throws:
java.lang.InstantiationException
java.lang.IllegalAccessException
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException

clusterData

public static void clusterData(org.apache.hadoop.conf.Configuration conf,
                               org.apache.hadoop.fs.Path points,
                               org.apache.hadoop.fs.Path canopies,
                               org.apache.hadoop.fs.Path output,
                               DistanceMeasure measure,
                               double t1,
                               double t2,
                               boolean runSequential)
                        throws java.lang.InstantiationException,
                               java.lang.IllegalAccessException,
                               java.io.IOException,
                               java.lang.InterruptedException,
                               java.lang.ClassNotFoundException
Throws:
java.lang.InstantiationException
java.lang.IllegalAccessException
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.