org.apache.mahout.clustering.iterator
Class ClusterIterator

java.lang.Object
  extended by org.apache.mahout.clustering.iterator.ClusterIterator

public class ClusterIterator
extends Object

This is a clustering iterator which works with a set of Vector data and a prior ClusterClassifier which has been initialized with a set of models. Its implementation is algorithm-neutral and works for any iterative clustering algorithm (currently k-means, fuzzy-k-means and Dirichlet) that processes all the input vectors in each iteration. The cluster classifier is configured with a ClusteringPolicy to select the desired clustering algorithm.


Field Summary
static String PRIOR_PATH_KEY
           
 
Constructor Summary
ClusterIterator()
           
 
Method Summary
 ClusterClassifier iterate(Iterable<Vector> data, ClusterClassifier classifier, int numIterations)
          Iterate over data using a prior-trained ClusterClassifier, for a number of iterations
 void iterateMR(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path inPath, org.apache.hadoop.fs.Path priorPath, org.apache.hadoop.fs.Path outPath, int numIterations)
          Iterate over data using a prior-trained ClusterClassifier, for a number of iterations using a mapreduce implementation
 void iterateSeq(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path inPath, org.apache.hadoop.fs.Path priorPath, org.apache.hadoop.fs.Path outPath, int numIterations)
          Iterate over data using a prior-trained ClusterClassifier, for a number of iterations using a sequential implementation
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PRIOR_PATH_KEY

public static final String PRIOR_PATH_KEY
See Also:
Constant Field Values
Constructor Detail

ClusterIterator

public ClusterIterator()
Method Detail

iterate

public ClusterClassifier iterate(Iterable<Vector> data,
                                 ClusterClassifier classifier,
                                 int numIterations)
Iterate over data using a prior-trained ClusterClassifier, for a number of iterations

Parameters:
policy - the ClusteringPolicy to use
data - a List<Vector> of input vectors
classifier - a prior ClusterClassifier
numIterations - the int number of iterations to perform
Returns:
the posterior ClusterClassifier

iterateSeq

public void iterateSeq(org.apache.hadoop.conf.Configuration conf,
                       org.apache.hadoop.fs.Path inPath,
                       org.apache.hadoop.fs.Path priorPath,
                       org.apache.hadoop.fs.Path outPath,
                       int numIterations)
                throws IOException
Iterate over data using a prior-trained ClusterClassifier, for a number of iterations using a sequential implementation

Parameters:
conf - the Configuration
inPath - a Path to input VectorWritables
priorPath - a Path to the prior classifier
outPath - a Path of output directory
numIterations - the int number of iterations to perform
Throws:
IOException

iterateMR

public void iterateMR(org.apache.hadoop.conf.Configuration conf,
                      org.apache.hadoop.fs.Path inPath,
                      org.apache.hadoop.fs.Path priorPath,
                      org.apache.hadoop.fs.Path outPath,
                      int numIterations)
               throws IOException,
                      InterruptedException,
                      ClassNotFoundException
Iterate over data using a prior-trained ClusterClassifier, for a number of iterations using a mapreduce implementation

Parameters:
conf - the Configuration
inPath - a Path to input VectorWritables
priorPath - a Path to the prior classifier
outPath - a Path of output directory
numIterations - the int number of iterations to perform
Throws:
IOException
InterruptedException
ClassNotFoundException


Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.