org.apache.mahout.math.hadoop.decomposer
Class DistributedLanczosSolver
java.lang.Object
org.apache.mahout.math.decomposer.lanczos.LanczosSolver
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class DistributedLanczosSolver
- extends LanczosSolver
- implements org.apache.hadoop.util.Tool
Method Summary |
org.apache.hadoop.conf.Configuration |
getConf()
|
protected Vector |
getInitialVector(VectorIterable corpus)
For the distributed case, the best guess at a useful initialization state for Lanczos we'll chose to be
uniform over all input dimensions, L_2 normalized. |
DistributedLanczosSolver.DistributedLanczosSolverJob |
job()
|
static void |
main(java.lang.String[] args)
|
int |
run(org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.fs.Path outputTmpPath,
int numRows,
int numCols,
boolean isSymmetric,
int desiredRank)
Run the solver to produce the raw eigenvectors |
int |
run(org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.fs.Path outputTmpPath,
int numRows,
int numCols,
boolean isSymmetric,
int desiredRank,
double maxError,
double minEigenvalue,
boolean inMemory)
Run the solver to produce raw eigenvectors, then run the EigenVerificationJob to clean them |
int |
run(java.lang.String[] strings)
|
void |
runJob(org.apache.hadoop.conf.Configuration originalConfig,
org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputTmpPath,
int numRows,
int numCols,
boolean isSymmetric,
int desiredRank,
Matrix eigenVectors,
java.util.List<java.lang.Double> eigenValues,
java.lang.String outputEigenVectorPathString)
Factored-out LanczosSolver for the purpose of invoking it programmatically |
void |
serializeOutput(Matrix eigenVectors,
java.util.List<java.lang.Double> eigenValues,
org.apache.hadoop.fs.Path outputPath)
|
void |
setConf(org.apache.hadoop.conf.Configuration configuration)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RAW_EIGENVECTORS
public static final java.lang.String RAW_EIGENVECTORS
- See Also:
- Constant Field Values
DistributedLanczosSolver
public DistributedLanczosSolver()
getInitialVector
protected Vector getInitialVector(VectorIterable corpus)
- For the distributed case, the best guess at a useful initialization state for Lanczos we'll chose to be
uniform over all input dimensions, L_2 normalized.
- Overrides:
getInitialVector
in class LanczosSolver
runJob
public void runJob(org.apache.hadoop.conf.Configuration originalConfig,
org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputTmpPath,
int numRows,
int numCols,
boolean isSymmetric,
int desiredRank,
Matrix eigenVectors,
java.util.List<java.lang.Double> eigenValues,
java.lang.String outputEigenVectorPathString)
throws java.io.IOException
- Factored-out LanczosSolver for the purpose of invoking it programmatically
- Throws:
java.io.IOException
run
public int run(java.lang.String[] strings)
throws java.lang.Exception
- Specified by:
run
in interface org.apache.hadoop.util.Tool
- Throws:
java.lang.Exception
run
public int run(org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.fs.Path outputTmpPath,
int numRows,
int numCols,
boolean isSymmetric,
int desiredRank,
double maxError,
double minEigenvalue,
boolean inMemory)
throws java.lang.Exception
- Run the solver to produce raw eigenvectors, then run the EigenVerificationJob to clean them
- Parameters:
inputPath
- the Path to the input corpusoutputPath
- the Path to the outputoutputTmpPath
- a Path to a temporary working directorynumRows
- the int number of rowsnumCols
- the int number of columnsisSymmetric
- true if the input matrix is symmetricdesiredRank
- the int desired rank of eigenvectors to producemaxError
- the maximum allowable errorminEigenvalue
- the minimum usable eigenvalueinMemory
- true if the verification can be done in memory
- Returns:
- an int indicating success (0) or otherwise
- Throws:
java.lang.Exception
run
public int run(org.apache.hadoop.fs.Path inputPath,
org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.fs.Path outputTmpPath,
int numRows,
int numCols,
boolean isSymmetric,
int desiredRank)
throws java.lang.Exception
- Run the solver to produce the raw eigenvectors
- Parameters:
inputPath
- the Path to the input corpusoutputPath
- the Path to the outputoutputTmpPath
- a Path to a temporary working directorynumRows
- the int number of rowsnumCols
- the int number of columnsisSymmetric
- true if the input matrix is symmetricdesiredRank
- the int desired rank of eigenvectors to produce
- Returns:
- an int indicating success (0) or otherwise
- Throws:
java.lang.Exception
serializeOutput
public void serializeOutput(Matrix eigenVectors,
java.util.List<java.lang.Double> eigenValues,
org.apache.hadoop.fs.Path outputPath)
throws java.io.IOException
- Parameters:
eigenVectors
- The eigenvectors to be serializedeigenValues
- The eigenvalues to be serializedoutputPath
- The path (relative to the current Configuration's FileSystem) to save the output to.
- Throws:
java.io.IOException
setConf
public void setConf(org.apache.hadoop.conf.Configuration configuration)
- Specified by:
setConf
in interface org.apache.hadoop.conf.Configurable
getConf
public org.apache.hadoop.conf.Configuration getConf()
- Specified by:
getConf
in interface org.apache.hadoop.conf.Configurable
job
public DistributedLanczosSolver.DistributedLanczosSolverJob job()
main
public static void main(java.lang.String[] args)
throws java.lang.Exception
- Throws:
java.lang.Exception
Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.