org.apache.mahout.clustering.spectral.eigencuts
Class EigencutsSensitivityJob

java.lang.Object
  extended by org.apache.mahout.clustering.spectral.eigencuts.EigencutsSensitivityJob

public final class EigencutsSensitivityJob
extends java.lang.Object

There are a quite a few operations bundled within this mapper. Gather 'round and listen, all of ye.

The input to this job is eight items:

  1. B0, which is a command-line parameter fed through the Configuration object
  2. diagonal matrix, a constant vector fed through the Hadoop cache
  3. list of eigenvalues, a constant vector fed through the Hadoop cache
  4. eigenvector, the input value to the mapper
  5. epsilon
  6. delta
  7. tau
  8. output, the Path to the output matrix of sensitivities

The first three items are constant and are used in all of the map tasks. The row index indicates which eigenvalue from the list to use, and also serves as the output identifier. The diagonal matrix and the eigenvector are both of equal length and are iterated through twice within each map task, unfortunately lending each task to a runtime of n2. This is unavoidable.

For each (i, j) combination of elements within the eigenvector, a complex equation is run that explicitly computes the sensitivity to perturbation of the flow of probability within the specific edge of the graph. Each sensitivity, as it is computed, is simultaneously applied to a non-maximal suppression step: for a given sensitivity S_ij, it must be suppressed if any other S_in or S_mj has a more negative value. Thus, only the most negative S_ij within its row i or its column j is stored in the return array, leading to an output (per eigenvector!) with maximum length n, minimum length 1.

Overall, this creates an n-by-n (possibly sparse) matrix with a maximum of n^2 non-zero elements, minimum of n non-zero elements.


Method Summary
static void runJob(Vector eigenvalues, Vector diagonal, org.apache.hadoop.fs.Path eigenvectors, double beta, double tau, double delta, double epsilon, org.apache.hadoop.fs.Path output)
          Initializes the configuration tasks, loads the needed data into the HDFS cache, and executes the job.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

runJob

public static void runJob(Vector eigenvalues,
                          Vector diagonal,
                          org.apache.hadoop.fs.Path eigenvectors,
                          double beta,
                          double tau,
                          double delta,
                          double epsilon,
                          org.apache.hadoop.fs.Path output)
                   throws java.io.IOException,
                          java.lang.ClassNotFoundException,
                          java.lang.InterruptedException
Initializes the configuration tasks, loads the needed data into the HDFS cache, and executes the job.

Parameters:
eigenvalues - Vector of eigenvalues
diagonal - Vector representing the diagonal matrix
eigenvectors - Path to the DRM of eigenvectors
output - Path to the output matrix (will have between n and full-rank non-zero elements)
Throws:
java.io.IOException
java.lang.ClassNotFoundException
java.lang.InterruptedException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.