org.apache.mahout.clustering.lda
Class LDADriver

java.lang.Object
  extended by org.apache.mahout.clustering.lda.LDADriver

public final class LDADriver
extends java.lang.Object

Estimates an LDA model from a corpus of documents, which are SparseVectors of word counts. At each phase, it outputs a matrix of log probabilities of each topic.


Method Summary
static void main(java.lang.String[] args)
           
static double runIteration(java.lang.String input, java.lang.String stateIn, java.lang.String stateOut, int numTopics, int numWords, double topicSmoothing, int numReducers)
          Run the job using supplied arguments
static void runJob(java.lang.String input, java.lang.String output, int numTopics, int numWords, double topicSmoothing, int maxIterations, int numReducers)
          Run the job using supplied arguments
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(java.lang.String[] args)
                 throws java.lang.ClassNotFoundException,
                        java.io.IOException,
                        java.lang.InterruptedException
Throws:
java.lang.ClassNotFoundException
java.io.IOException
java.lang.InterruptedException

runJob

public static void runJob(java.lang.String input,
                          java.lang.String output,
                          int numTopics,
                          int numWords,
                          double topicSmoothing,
                          int maxIterations,
                          int numReducers)
                   throws java.io.IOException,
                          java.lang.InterruptedException,
                          java.lang.ClassNotFoundException
Run the job using supplied arguments

Parameters:
input - the directory pathname for input points
output - the directory pathname for output points
numTopics - the number of topics
numWords - the number of words
topicSmoothing - pseudocounts for each topic, typically small < .5
maxIterations - the maximum number of iterations
numReducers - the number of Reducers desired
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException

runIteration

public static double runIteration(java.lang.String input,
                                  java.lang.String stateIn,
                                  java.lang.String stateOut,
                                  int numTopics,
                                  int numWords,
                                  double topicSmoothing,
                                  int numReducers)
                           throws java.io.IOException,
                                  java.lang.InterruptedException,
                                  java.lang.ClassNotFoundException
Run the job using supplied arguments

Parameters:
input - the directory pathname for input points
stateIn - the directory pathname for input state
stateOut - the directory pathname for output state
numTopics - the number of clusters
numReducers - the number of Reducers desired
Throws:
java.io.IOException
java.lang.InterruptedException
java.lang.ClassNotFoundException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.