org.apache.mahout.clustering.meanshift
Class MeanShiftCanopyDriver
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class MeanShiftCanopyDriver
- extends AbstractJob
This class implements the driver for Mean Shift Canopy clustering
Method Summary |
static org.apache.hadoop.fs.Path |
buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path clustersIn,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
IKernelProfile kernelProfile,
double t1,
double t2,
double convergenceDelta,
int maxIterations,
boolean runSequential,
boolean runClustering)
Iterate over the input clusters to produce the next cluster directories for
each iteration |
static void |
clusterData(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path clustersIn,
org.apache.hadoop.fs.Path output,
boolean runSequential)
Run the job using supplied arguments |
static void |
createCanopyFromVectors(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
boolean runSequential)
Convert input vectors to MeanShiftCanopies for further processing |
static void |
main(String[] args)
|
static void |
run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
IKernelProfile kernelProfile,
double t1,
double t2,
double convergenceDelta,
int maxIterations,
boolean inputIsCanopies,
boolean runClustering,
boolean runSequential)
Run the job where the input format can be either Vectors or Canopies. |
int |
run(String[] args)
|
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getCombinedTempPath, getGroup, getInputPath, getOption, getOption, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, setS3SafeCombinedInputPath, shouldRunNextPhase |
Methods inherited from class org.apache.hadoop.conf.Configured |
getConf, setConf |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.hadoop.conf.Configurable |
getConf, setConf |
MAPRED_REDUCE_TASKS
public static final String MAPRED_REDUCE_TASKS
- See Also:
- Constant Field Values
INPUT_IS_CANOPIES_OPTION
public static final String INPUT_IS_CANOPIES_OPTION
- See Also:
- Constant Field Values
STATE_IN_KEY
public static final String STATE_IN_KEY
- See Also:
- Constant Field Values
MeanShiftCanopyDriver
public MeanShiftCanopyDriver()
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Throws:
Exception
run
public static void run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
IKernelProfile kernelProfile,
double t1,
double t2,
double convergenceDelta,
int maxIterations,
boolean inputIsCanopies,
boolean runClustering,
boolean runSequential)
throws IOException,
InterruptedException,
ClassNotFoundException
- Run the job where the input format can be either Vectors or Canopies. If
requested, cluster the input data using the computed Canopies
- Parameters:
conf
- the Configuration to useinput
- the input pathname Stringoutput
- the output pathname Stringmeasure
- the DistanceMeasurekernelProfile
- the IKernelProfilet1
- the T1 distance thresholdt2
- the T2 distance thresholdconvergenceDelta
- the double convergence criteriamaxIterations
- an int number of iterationsinputIsCanopies
- true if the input path already contains MeanShiftCanopies and does
not need to be converted from VectorsrunClustering
- true if the input points are to be clustered once the iterations
completerunSequential
- if true run in sequential execution mode
- Throws:
IOException
InterruptedException
ClassNotFoundException
createCanopyFromVectors
public static void createCanopyFromVectors(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
boolean runSequential)
throws IOException,
InterruptedException,
ClassNotFoundException
- Convert input vectors to MeanShiftCanopies for further processing
- Throws:
IOException
InterruptedException
ClassNotFoundException
buildClusters
public static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path clustersIn,
org.apache.hadoop.fs.Path output,
DistanceMeasure measure,
IKernelProfile kernelProfile,
double t1,
double t2,
double convergenceDelta,
int maxIterations,
boolean runSequential,
boolean runClustering)
throws IOException,
InterruptedException,
ClassNotFoundException
- Iterate over the input clusters to produce the next cluster directories for
each iteration
- Parameters:
conf
- the Configuration to useclustersIn
- the input directory Pathoutput
- the output Pathmeasure
- the DistanceMeasurekernelProfile
- the IKernelProfilet1
- the T1 distance thresholdt2
- the T2 distance thresholdconvergenceDelta
- the double convergence criteriamaxIterations
- an int number of iterationsrunSequential
- if true run in sequential execution moderunClustering
- if true accumulate merged clusters for subsequent clustering step
- Throws:
IOException
InterruptedException
ClassNotFoundException
clusterData
public static void clusterData(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path clustersIn,
org.apache.hadoop.fs.Path output,
boolean runSequential)
throws IOException,
InterruptedException,
ClassNotFoundException
- Run the job using supplied arguments
- Parameters:
input
- the directory pathname for input pointsclustersIn
- the directory pathname for input clustersoutput
- the directory pathname for output clustered pointsrunSequential
- if true run in sequential execution mode
- Throws:
IOException
InterruptedException
ClassNotFoundException
Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.