org.apache.mahout.math.hadoop.decomposer
Class EigenVerificationJob
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.mahout.common.AbstractJob
org.apache.mahout.math.hadoop.decomposer.EigenVerificationJob
- All Implemented Interfaces:
- org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool
public class EigenVerificationJob
- extends AbstractJob
Class for taking the output of an eigendecomposition (specified as a Path location), and verifies correctness,
in terms of the following: if you have a vector e, and a matrix m, then let e' = m.timesSquared(v); the error
w.r.t. eigenvector-ness is the cosine of the angle between e and e':
error(e,e') = e.dot(e') / (e.norm(2)*e'.norm(2))
A set of eigenvectors should also all be very close to orthogonal, so this job computes all inner products
between eigenvectors, and checks that this is close to the identity matrix.
Parameters used in the cleanup (other than in the input/output path options) include --minEigenvalue, which
specifies the value below which eigenvector/eigenvalue pairs will be discarded, and --maxError, which specifies
the maximum error (as defined above) to be tolerated in an eigenvector.
If all the eigenvectors can fit in memory, --inMemory allows for a speedier completion of this task by doing so.
Method Summary |
org.apache.hadoop.fs.Path |
getCleanedEigensPath()
|
static void |
main(java.lang.String[] args)
|
int |
run(org.apache.hadoop.fs.Path corpusInput,
org.apache.hadoop.fs.Path eigenInput,
org.apache.hadoop.fs.Path output,
org.apache.hadoop.fs.Path tempOut,
double maxError,
double minEigenValue,
boolean inMemory,
org.apache.hadoop.mapred.JobConf config)
Run the job with the given arguments |
int |
run(java.lang.String[] args)
|
void |
runJob(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path eigenInput,
org.apache.hadoop.fs.Path corpusInput,
org.apache.hadoop.fs.Path output,
boolean inMemory,
double maxError,
double minEigenValue,
int maxEigens)
Progammatic invocation of run() |
void |
setEigensToVerify(VectorIterable eigens)
|
Methods inherited from class org.apache.mahout.common.AbstractJob |
addFlag, addInputOption, addOption, addOption, addOption, addOption, addOutputOption, getInputPath, getOption, getOutputPath, hasOption, keyFor, maybePut, parseArguments, parseDirectories, prepareJob, shouldRunNextPhase |
Methods inherited from class org.apache.hadoop.conf.Configured |
getConf, setConf |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.hadoop.conf.Configurable |
getConf, setConf |
CLEAN_EIGENVECTORS
public static final java.lang.String CLEAN_EIGENVECTORS
- See Also:
- Constant Field Values
EigenVerificationJob
public EigenVerificationJob()
setEigensToVerify
public void setEigensToVerify(VectorIterable eigens)
run
public int run(java.lang.String[] args)
throws java.lang.Exception
- Throws:
java.lang.Exception
run
public int run(org.apache.hadoop.fs.Path corpusInput,
org.apache.hadoop.fs.Path eigenInput,
org.apache.hadoop.fs.Path output,
org.apache.hadoop.fs.Path tempOut,
double maxError,
double minEigenValue,
boolean inMemory,
org.apache.hadoop.mapred.JobConf config)
throws java.io.IOException
- Run the job with the given arguments
- Parameters:
corpusInput
- the corpus input PatheigenInput
- the eigenvector input Pathoutput
- the output PathtempOut
- temporary output PathmaxError
- a double representing the maximum errorminEigenValue
- a double representing the minimum eigenvalueinMemory
- a boolean requesting in-memory preparationconfig
- the JobConf to use, or null if a default is ok (saves referencing JobConf in calling classes unless needed)
- Throws:
java.io.IOException
getCleanedEigensPath
public org.apache.hadoop.fs.Path getCleanedEigensPath()
main
public static void main(java.lang.String[] args)
throws java.lang.Exception
- Throws:
java.lang.Exception
runJob
public void runJob(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path eigenInput,
org.apache.hadoop.fs.Path corpusInput,
org.apache.hadoop.fs.Path output,
boolean inMemory,
double maxError,
double minEigenValue,
int maxEigens)
throws java.io.IOException
- Progammatic invocation of run()
- Parameters:
conf
- TODOeigenInput
- Output of LanczosSolvercorpusInput
- Input of LanczosSolveroutput
- inMemory
- maxError
- minEigenValue
- maxEigens
-
- Throws:
java.io.IOException
Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.