org.apache.mahout.math.hadoop.decomposer
Class DistributedLanczosSolver

java.lang.Object
  extended by org.apache.mahout.math.decomposer.lanczos.LanczosSolver
      extended by org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class DistributedLanczosSolver
extends LanczosSolver
implements org.apache.hadoop.util.Tool


Nested Class Summary
 class DistributedLanczosSolver.DistributedLanczosSolverJob
          Inner subclass of AbstractJob so we get access to AbstractJob's functionality w.r.t.
 
Nested classes/interfaces inherited from class org.apache.mahout.math.decomposer.lanczos.LanczosSolver
LanczosSolver.TimingSection
 
Field Summary
 
Fields inherited from class org.apache.mahout.math.decomposer.lanczos.LanczosSolver
SAFE_MAX, scaleFactor
 
Constructor Summary
DistributedLanczosSolver()
           
 
Method Summary
 org.apache.hadoop.conf.Configuration getConf()
           
protected  Vector getInitialVector(VectorIterable corpus)
          For the distributed case, the best guess at a useful initialization state for Lanczos we'll chose to be uniform over all input dimensions, L_2 normalized.
 DistributedLanczosSolver.DistributedLanczosSolverJob job()
           
static void main(java.lang.String[] args)
           
 int run(java.lang.String[] strings)
           
 void serializeOutput(Matrix eigenVectors, java.util.List<java.lang.Double> eigenValues, java.lang.String outputPath)
          TODO: this should be refactored to allow both LanczosSolver impls to properly serialize output in a generic way.
 void setConf(org.apache.hadoop.conf.Configuration configuration)
           
 
Methods inherited from class org.apache.mahout.math.decomposer.lanczos.LanczosSolver
calculateScaleFactor, getTimeMillis, solve, solve
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DistributedLanczosSolver

public DistributedLanczosSolver()
Method Detail

getInitialVector

protected Vector getInitialVector(VectorIterable corpus)
For the distributed case, the best guess at a useful initialization state for Lanczos we'll chose to be uniform over all input dimensions, L_2 normalized.

Overrides:
getInitialVector in class LanczosSolver
Parameters:
corpus -
Returns:

run

public int run(java.lang.String[] strings)
        throws java.lang.Exception
Specified by:
run in interface org.apache.hadoop.util.Tool
Throws:
java.lang.Exception

serializeOutput

public void serializeOutput(Matrix eigenVectors,
                            java.util.List<java.lang.Double> eigenValues,
                            java.lang.String outputPath)
                     throws java.io.IOException
TODO: this should be refactored to allow both LanczosSolver impls to properly serialize output in a generic way.

Parameters:
eigenVectors - The eigenvectors to be serialized
eigenValues - The eigenvalues to be serialized
outputPath - The path (relative to the current Configuration's FileSystem) to save the output to.
Throws:
java.io.IOException

setConf

public void setConf(org.apache.hadoop.conf.Configuration configuration)
Specified by:
setConf in interface org.apache.hadoop.conf.Configurable

getConf

public org.apache.hadoop.conf.Configuration getConf()
Specified by:
getConf in interface org.apache.hadoop.conf.Configurable

job

public DistributedLanczosSolver.DistributedLanczosSolverJob job()

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.