org.apache.mahout.clustering.meanshift
Class MeanShiftCanopyDriver
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class MeanShiftCanopyDriver
- extends AbstractJob
Method Summary |
org.apache.hadoop.fs.Path |
buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path clustersIn,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
double convergenceDelta,
int maxIterations,
boolean runSequential)
Iterate over the input clusters to produce the next cluster directories for each iteration |
static void |
clusterData(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path clustersIn,
org.apache.hadoop.fs.Path output,
boolean runSequential)
Run the job using supplied arguments |
static void |
createCanopyFromVectors(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
boolean runSequential)
Convert input vectors to MeanShiftCanopies for further processing |
static void |
main(java.lang.String[] args)
|
void |
run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
double convergenceDelta,
int maxIterations,
boolean inputIsCanopies,
boolean runClustering,
boolean runSequential)
Run the job where the input format can be either Vectors or Canopies. |
int |
run(java.lang.String[] args)
|
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, getInputPath, getOption, getOutputPath, hasOption, keyFor, maybePut, parseArguments, parseDirectories, prepareJob, shouldRunNextPhase |
Methods inherited from class org.apache.hadoop.conf.Configured |
getConf, setConf |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.hadoop.conf.Configurable |
getConf, setConf |
INPUT_IS_CANOPIES_OPTION
public static final java.lang.String INPUT_IS_CANOPIES_OPTION
- See Also:
- Constant Field Values
STATE_IN_KEY
public static final java.lang.String STATE_IN_KEY
- See Also:
- Constant Field Values
MeanShiftCanopyDriver
public MeanShiftCanopyDriver()
main
public static void main(java.lang.String[] args)
throws java.lang.Exception
- Throws:
java.lang.Exception
run
public int run(java.lang.String[] args)
throws java.lang.Exception
- Throws:
java.lang.Exception
run
public void run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
double convergenceDelta,
int maxIterations,
boolean inputIsCanopies,
boolean runClustering,
boolean runSequential)
throws java.io.IOException,
java.lang.InterruptedException,
java.lang.ClassNotFoundException,
java.lang.InstantiationException,
java.lang.IllegalAccessException
- Run the job where the input format can be either Vectors or Canopies.
If requested, cluster the input data using the computed Canopies
- Parameters:
conf
- the Configuration to useinput
- the input pathname Stringoutput
- the output pathname Stringmeasure
- the DistanceMeasuret1
- the T1 distance thresholdt2
- the T2 distance thresholdconvergenceDelta
- the double convergence criteriamaxIterations
- an int number of iterationsinputIsCanopies
- true if the input path already contains MeanShiftCanopies and does not need to be converted from VectorsrunClustering
- true if the input points are to be clustered once the iterations completerunSequential
- if true run in sequential execution mode
- Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException
createCanopyFromVectors
public static void createCanopyFromVectors(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
boolean runSequential)
throws java.io.IOException,
java.lang.InterruptedException,
java.lang.ClassNotFoundException,
java.lang.InstantiationException,
java.lang.IllegalAccessException
- Convert input vectors to MeanShiftCanopies for further processing
- Parameters:
conf
- input
- output
- measure
- runSequential
-
- Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException
buildClusters
public org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path clustersIn,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
double t1,
double t2,
double convergenceDelta,
int maxIterations,
boolean runSequential)
throws java.io.IOException,
java.lang.InterruptedException,
java.lang.ClassNotFoundException,
java.lang.InstantiationException,
java.lang.IllegalAccessException
- Iterate over the input clusters to produce the next cluster directories for each iteration
- Parameters:
conf
- the Configuration to useclustersIn
- the input directory Pathoutput
- the output Pathmeasure
- the DistanceMeasure class namet1
- the T1 distance thresholdt2
- the T2 distance thresholdconvergenceDelta
- the double convergence criteriamaxIterations
- an int number of iterationsrunSequential
- if true run in sequential execution mode
- Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException
clusterData
public static void clusterData(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path clustersIn,
org.apache.hadoop.fs.Path output,
boolean runSequential)
throws java.io.IOException,
java.lang.InterruptedException,
java.lang.ClassNotFoundException,
java.lang.InstantiationException,
java.lang.IllegalAccessException
- Run the job using supplied arguments
- Parameters:
conf
- the Configuration to useinput
- the directory pathname for input pointsclustersIn
- the directory pathname for input clustersoutput
- the directory pathname for output clustered pointsrunSequential
- if true run in sequential execution mode
- Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException
java.lang.InstantiationException
java.lang.IllegalAccessException
Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.