org.apache.mahout.clustering.dirichlet
Class DirichletDriver
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.clustering.dirichlet.DirichletDriver
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class DirichletDriver
- extends AbstractJob
Method Summary |
static org.apache.hadoop.fs.Path |
buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistributionDescription description,
int numClusters,
int maxIterations,
double alpha0,
boolean runSequential)
Iterate over the input vectors to produce cluster directories for each iteration |
static void |
clusterData(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path stateIn,
org.apache.hadoop.fs.Path output,
boolean emitMostLikely,
double threshold,
boolean runSequential)
Run the job using supplied arguments |
static void |
main(String[] args)
|
static int |
readPrototypeSize(org.apache.hadoop.fs.Path input)
Read the first input vector to determine the prototype size for the modelPrototype |
static void |
run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistributionDescription description,
int numModels,
int maxIterations,
double alpha0,
boolean runClustering,
boolean emitMostLikely,
double threshold,
boolean runSequential)
Iterate over the input vectors to produce clusters and, if requested, use the
results of the final iteration to cluster the input vectors. |
static void |
run(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistributionDescription description,
int numClusters,
int maxIterations,
double alpha0,
boolean runClustering,
boolean emitMostLikely,
double threshold,
boolean runSequential)
Convenience method provides default Configuration
Iterate over the input vectors to produce clusters and, if requested, use the
results of the final iteration to cluster the input vectors. |
int |
run(String[] args)
|
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getCombinedTempPath, getGroup, getInputPath, getOption, getOption, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, setS3SafeCombinedInputPath, shouldRunNextPhase |
Methods inherited from class org.apache.hadoop.conf.Configured |
getConf, setConf |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.hadoop.conf.Configurable |
getConf, setConf |
STATE_IN_KEY
public static final String STATE_IN_KEY
- See Also:
- Constant Field Values
MODEL_DISTRIBUTION_KEY
public static final String MODEL_DISTRIBUTION_KEY
- See Also:
- Constant Field Values
NUM_CLUSTERS_KEY
public static final String NUM_CLUSTERS_KEY
- See Also:
- Constant Field Values
ALPHA_0_KEY
public static final String ALPHA_0_KEY
- See Also:
- Constant Field Values
EMIT_MOST_LIKELY_KEY
public static final String EMIT_MOST_LIKELY_KEY
- See Also:
- Constant Field Values
THRESHOLD_KEY
public static final String THRESHOLD_KEY
- See Also:
- Constant Field Values
MODEL_PROTOTYPE_CLASS_OPTION
public static final String MODEL_PROTOTYPE_CLASS_OPTION
- See Also:
- Constant Field Values
MODEL_DISTRIBUTION_CLASS_OPTION
public static final String MODEL_DISTRIBUTION_CLASS_OPTION
- See Also:
- Constant Field Values
ALPHA_OPTION
public static final String ALPHA_OPTION
- See Also:
- Constant Field Values
DirichletDriver
public DirichletDriver()
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
run
public int run(String[] args)
throws Exception
- Throws:
Exception
run
public static void run(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistributionDescription description,
int numModels,
int maxIterations,
double alpha0,
boolean runClustering,
boolean emitMostLikely,
double threshold,
boolean runSequential)
throws IOException,
ClassNotFoundException,
InterruptedException
- Iterate over the input vectors to produce clusters and, if requested, use the
results of the final iteration to cluster the input vectors.
- Parameters:
conf
- the Configuration to useinput
- the directory Path for input pointsoutput
- the directory Path for output pointsdescription
- model distribution parametersmaxIterations
- the maximum number of iterationsalpha0
- the alpha_0 value for the DirichletDistributionrunClustering
- true if clustering of points to be done after iterationsemitMostLikely
- a boolean if true emit only most likely cluster for each pointthreshold
- a double threshold value emits all clusters having greater pdf (emitMostLikely = false)runSequential
- execute sequentially if true
- Throws:
IOException
ClassNotFoundException
InterruptedException
run
public static void run(org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistributionDescription description,
int numClusters,
int maxIterations,
double alpha0,
boolean runClustering,
boolean emitMostLikely,
double threshold,
boolean runSequential)
throws IOException,
ClassNotFoundException,
InterruptedException
- Convenience method provides default Configuration
Iterate over the input vectors to produce clusters and, if requested, use the
results of the final iteration to cluster the input vectors.
- Parameters:
input
- the directory Path for input pointsoutput
- the directory Path for output pointsdescription
- model distribution parametersnumClusters
- the number of models to iterate overmaxIterations
- the maximum number of iterationsalpha0
- the alpha_0 value for the DirichletDistributionrunClustering
- true if clustering of points to be done after iterationsemitMostLikely
- a boolean if true emit only most likely cluster for each pointthreshold
- a double threshold value emits all clusters having greater pdf (emitMostLikely = false)runSequential
- execute sequentially if true
- Throws:
IOException
ClassNotFoundException
InterruptedException
readPrototypeSize
public static int readPrototypeSize(org.apache.hadoop.fs.Path input)
throws IOException
- Read the first input vector to determine the prototype size for the modelPrototype
- Throws:
IOException
buildClusters
public static org.apache.hadoop.fs.Path buildClusters(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path output,
DistributionDescription description,
int numClusters,
int maxIterations,
double alpha0,
boolean runSequential)
throws IOException,
ClassNotFoundException,
InterruptedException
- Iterate over the input vectors to produce cluster directories for each iteration
- Parameters:
conf
- input
- the directory Path for input pointsoutput
- the directory Path for output pointsdescription
- model distribution parametersnumClusters
- the number of models to iterate overmaxIterations
- the maximum number of iterationsalpha0
- the alpha_0 value for the DirichletDistributionrunSequential
- execute sequentially if true
- Returns:
- the Path of the final clusters directory
- Throws:
IOException
ClassNotFoundException
InterruptedException
clusterData
public static void clusterData(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path input,
org.apache.hadoop.fs.Path stateIn,
org.apache.hadoop.fs.Path output,
boolean emitMostLikely,
double threshold,
boolean runSequential)
throws IOException,
InterruptedException,
ClassNotFoundException
- Run the job using supplied arguments
- Parameters:
conf
- input
- the directory pathname for input pointsstateIn
- the directory pathname for input stateoutput
- the directory pathname for output pointsemitMostLikely
- a boolean if true emit only most likely cluster for each pointthreshold
- a double threshold value emits all clusters having greater pdf (emitMostLikely = false)runSequential
- execute sequentially if true
- Throws:
IOException
InterruptedException
ClassNotFoundException
Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.