org.apache.mahout.math.hadoop
Class DistributedRowMatrix

java.lang.Object
  extended by org.apache.mahout.math.hadoop.DistributedRowMatrix
All Implemented Interfaces:
java.lang.Iterable<MatrixSlice>, org.apache.hadoop.mapred.JobConfigurable, VectorIterable

public class DistributedRowMatrix
extends java.lang.Object
implements VectorIterable, org.apache.hadoop.mapred.JobConfigurable

DistributedRowMatrix is a FileSystem-backed VectorIterable in which the vectors live in a SequenceFile, and distributed operations are executed as M/R passes on Hadoop. The usage is as follows:

   // the path must already contain an already created SequenceFile!
   DistributedRowMatrix m = new DistributedRowMatrix("path/to/vector/sequenceFile", "tmp/path", 10000000, 250000);
   m.configure(new JobConf());
   // now if we want to multiply a vector by this matrix, it's dimension must equal the row dimension of this
   // matrix.  If we want to timesSquared() a vector by this matrix, its dimension must equal the column dimension
   // of the matrix.
   Vector v = new DenseVector(250000);
   // now the following operation will be done via a M/R pass via Hadoop.
   Vector w = m.timesSquared(v);
 


Nested Class Summary
static class DistributedRowMatrix.DistributedMatrixIterator
           
static class DistributedRowMatrix.MatrixEntryWritable
           
 
Constructor Summary
DistributedRowMatrix(org.apache.hadoop.fs.Path inputPathString, org.apache.hadoop.fs.Path outputTmpPathString, int numRows, int numCols)
           
 
Method Summary
 void configure(org.apache.hadoop.mapred.JobConf conf)
           
 org.apache.hadoop.fs.Path getOutputTempPath()
           
 org.apache.hadoop.fs.Path getRowPath()
           
 java.util.Iterator<MatrixSlice> iterateAll()
           
 java.util.Iterator<MatrixSlice> iterator()
           
 int numCols()
           
 int numRows()
           
 int numSlices()
           
 void setOutputTempPathString(java.lang.String outPathString)
           
 DistributedRowMatrix times(DistributedRowMatrix other)
          This implements matrix this.transpose().times(other)
 Vector times(Vector v)
           
 Vector timesSquared(Vector v)
           
 DistributedRowMatrix transpose()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DistributedRowMatrix

public DistributedRowMatrix(org.apache.hadoop.fs.Path inputPathString,
                            org.apache.hadoop.fs.Path outputTmpPathString,
                            int numRows,
                            int numCols)
Method Detail

configure

public void configure(org.apache.hadoop.mapred.JobConf conf)
Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable

getRowPath

public org.apache.hadoop.fs.Path getRowPath()

getOutputTempPath

public org.apache.hadoop.fs.Path getOutputTempPath()

setOutputTempPathString

public void setOutputTempPathString(java.lang.String outPathString)

iterateAll

public java.util.Iterator<MatrixSlice> iterateAll()
Specified by:
iterateAll in interface VectorIterable

numSlices

public int numSlices()
Specified by:
numSlices in interface VectorIterable

numRows

public int numRows()
Specified by:
numRows in interface VectorIterable

numCols

public int numCols()
Specified by:
numCols in interface VectorIterable

times

public DistributedRowMatrix times(DistributedRowMatrix other)
                           throws java.io.IOException
This implements matrix this.transpose().times(other)

Parameters:
other - a DistributedRowMatrix
Returns:
a DistributedRowMatrix containing the product
Throws:
java.io.IOException

transpose

public DistributedRowMatrix transpose()
                               throws java.io.IOException
Throws:
java.io.IOException

times

public Vector times(Vector v)
Specified by:
times in interface VectorIterable

timesSquared

public Vector timesSquared(Vector v)
Specified by:
timesSquared in interface VectorIterable

iterator

public java.util.Iterator<MatrixSlice> iterator()
Specified by:
iterator in interface java.lang.Iterable<MatrixSlice>


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.