org.apache.mahout.clustering.lda.cvb
Class InMemoryCollapsedVariationalBayes0
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.clustering.lda.cvb.InMemoryCollapsedVariationalBayes0
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class InMemoryCollapsedVariationalBayes0
- extends AbstractJob
Runs the same algorithm as CVB0Driver
, but sequentially, in memory. Memory requirements
are currently: the entire corpus is read into RAM, two copies of the model (each of size
numTerms * numTopics), and another matrix of size numDocs * numTopics is held in memory
(to store p(topic|doc) for all docs).
But if all this fits in memory, this can be significantly faster than an iterative MR job.
Constructor Summary |
InMemoryCollapsedVariationalBayes0(Matrix corpus,
String[] terms,
int numTopics,
double alpha,
double eta)
|
InMemoryCollapsedVariationalBayes0(Matrix corpus,
String[] terms,
int numTopics,
double alpha,
double eta,
int numTrainingThreads,
int numUpdatingThreads,
double modelCorpusFraction,
long seed)
|
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, buildOption, getAnalyzerClassFromOption, getCLIOption, getCombinedTempPath, getGroup, getInputPath, getOption, getOption, getOutputPath, getOutputPath, getTempPath, getTempPath, hasOption, keyFor, maybePut, parseArguments, parseDirectories, prepareJob, prepareJob, prepareJob, setS3SafeCombinedInputPath, shouldRunNextPhase |
Methods inherited from class org.apache.hadoop.conf.Configured |
setConf |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.hadoop.conf.Configurable |
setConf |
InMemoryCollapsedVariationalBayes0
public InMemoryCollapsedVariationalBayes0(Matrix corpus,
String[] terms,
int numTopics,
double alpha,
double eta)
InMemoryCollapsedVariationalBayes0
public InMemoryCollapsedVariationalBayes0(Matrix corpus,
String[] terms,
int numTopics,
double alpha,
double eta,
int numTrainingThreads,
int numUpdatingThreads,
double modelCorpusFraction,
long seed)
setVerbose
public void setVerbose(boolean verbose)
trainDocuments
public void trainDocuments()
trainDocuments
public void trainDocuments(double testFraction)
iterateUntilConvergence
public double iterateUntilConvergence(double minFractionalErrorChange,
int maxIterations,
int minIter)
iterateUntilConvergence
public double iterateUntilConvergence(double minFractionalErrorChange,
int maxIterations,
int minIter,
double testFraction)
writeModel
public void writeModel(org.apache.hadoop.fs.Path outputPath)
throws IOException
- Throws:
IOException
main2
public static int main2(String[] args,
org.apache.hadoop.conf.Configuration conf)
throws Exception
- Throws:
Exception
getConf
public org.apache.hadoop.conf.Configuration getConf()
- Specified by:
getConf
in interface org.apache.hadoop.conf.Configurable
- Overrides:
getConf
in class org.apache.hadoop.conf.Configured
run
public int run(String[] strings)
throws Exception
- Throws:
Exception
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.