|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver
public class SSVDSolver
Stochastic SVD solver (API class).
Implementation details are in my working notes in MAHOUT-376 (https://issues.apache.org/jira/browse/MAHOUT-376).
As of the time of this writing, I don't have benchmarks for this method in comparison to other methods. However, non-hadoop differentiating characteristics of this method are thought to be :
Specifically in regards to this implementation, I think couple of other differentiating points are:
This class is central public API for SSVD solver. The use pattern is as follows:
run()
.
getUPath()
(if computed) returns the path to the directory
containing m x k U matrix file(s).
getVPath()
(if computed) returns the path to the directory
containing n x k V matrix file(s).
Constructor Summary | |
---|---|
SSVDSolver(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path[] inputPath,
org.apache.hadoop.fs.Path outputPath,
int ablockRows,
int k,
int p,
int reduceTasks)
create new SSVD solver. |
Method Summary | |
---|---|
int |
getAbtBlockHeight()
|
int |
getOuterBlockHeight()
|
int |
getQ()
|
double[] |
getSingularValues()
This contains k+p singular values resulted from the solver run. |
String |
getUPath()
returns U path (if computation were requested and successful). |
String |
getVPath()
return V path ( if computation was requested and successful ) . |
boolean |
isBroadcast()
|
boolean |
isOverwrite()
|
static UpperTriangular |
loadAndSumUpperTriangularMatrices(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path glob,
org.apache.hadoop.conf.Configuration conf)
Load multiplel upper triangular matrices and sum them up. |
static double[][] |
loadDistributedRowMatrix(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path glob,
org.apache.hadoop.conf.Configuration conf)
helper capabiltiy to load distributed row matrices into dense matrix (to support tests mainly). |
static UpperTriangular |
loadUpperTriangularMatrix(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path glob,
org.apache.hadoop.conf.Configuration conf)
Load only one upper triangular matrix and issue error if mroe than one is found. |
void |
run()
run all SSVD jobs. |
void |
setAbtBlockHeight(int abtBlockHeight)
the block height of Y_i during power iterations. |
void |
setBroadcast(boolean broadcast)
If this property is true, use DestributedCache mechanism to broadcast some stuff around. |
void |
setComputeU(boolean val)
The setting controlling whether to compute U matrix of low rank SSVD. |
void |
setComputeV(boolean val)
Setting controlling whether to compute V matrix of low-rank SSVD. |
void |
setcUHalfSigma(boolean cUHat)
|
void |
setcVHalfSigma(boolean cVHat)
|
void |
setMinSplitSize(int size)
Sometimes, if requested A blocks become larger than a split, we may need to use that to ensure at least k+p rows of A get into a split. |
void |
setOuterBlockHeight(int outerBlockHeight)
The height of outer blocks during Q'A multiplication. |
void |
setOverwrite(boolean overwrite)
if true, driver to clean output folder first if exists. |
void |
setQ(int q)
sets q, amount of additional power iterations to increase precision (0..2!). |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public SSVDSolver(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path[] inputPath, org.apache.hadoop.fs.Path outputPath, int ablockRows, int k, int p, int reduceTasks)
conf
- hadoop configurationinputPath
- Input path (should be compatible with DistributedRowMatrix as of
the time of this writing).outputPath
- Output path containing U, V and singular values vector files.ablockRows
- The vertical hight of a q-block (bigger value require more memory
in mappers+ perhaps larger minSplitSize
valuesk
- desired rankp
- SSVD oversampling parameterreduceTasks
- Number of reduce tasks (where applicable)
IOException
- when IO condition occurs.Method Detail |
---|
public void setcUHalfSigma(boolean cUHat)
public void setcVHalfSigma(boolean cVHat)
public int getQ()
public void setQ(int q)
q
- public void setComputeU(boolean val)
public void setComputeV(boolean val)
val
- true if we want to output V matrix. Default is true.public void setMinSplitSize(int size)
size
- the minimum split size to usepublic double[] getSingularValues()
public String getUPath()
public String getVPath()
public boolean isOverwrite()
public void setOverwrite(boolean overwrite)
overwrite
- public int getOuterBlockHeight()
public void setOuterBlockHeight(int outerBlockHeight)
outerBlockHeight
- public int getAbtBlockHeight()
public void setAbtBlockHeight(int abtBlockHeight)
abtBlockHeight
- public boolean isBroadcast()
public void setBroadcast(boolean broadcast)
broadcast
- public void run() throws IOException
IOException
- if I/O condition occurs.public static double[][] loadDistributedRowMatrix(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path glob, org.apache.hadoop.conf.Configuration conf) throws IOException
fs
- filesystemglob
- FS globconf
- configuration
IOException
- when I/O occurs.public static UpperTriangular loadAndSumUpperTriangularMatrices(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path glob, org.apache.hadoop.conf.Configuration conf) throws IOException
fs
- glob
- conf
-
IOException
public static UpperTriangular loadUpperTriangularMatrix(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path glob, org.apache.hadoop.conf.Configuration conf) throws IOException
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |