org.apache.mahout.utils.vectors.common
Class PartialVectorMerger

java.lang.Object
  extended by org.apache.mahout.utils.vectors.common.PartialVectorMerger

public final class PartialVectorMerger
extends java.lang.Object

This class groups a set of input vectors. The Sequence file input should have a WritableComparable key containing document id and a VectorWritable value containing the term frequency vector. This class also does normalization of the vector.


Field Summary
static java.lang.String DIMENSION
           
static float NO_NORMALIZING
           
static java.lang.String NORMALIZATION_POWER
           
static java.lang.String SEQUENTIAL_ACCESS
           
 
Method Summary
static void mergePartialVectors(java.util.List<org.apache.hadoop.fs.Path> partialVectorPaths, java.lang.String output, float normPower, int dimension, boolean sequentialAccess)
          Merge all the partial RandomAccessSparseVectors into the complete Document RandomAccessSparseVector
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NO_NORMALIZING

public static final float NO_NORMALIZING
See Also:
Constant Field Values

NORMALIZATION_POWER

public static final java.lang.String NORMALIZATION_POWER
See Also:
Constant Field Values

DIMENSION

public static final java.lang.String DIMENSION
See Also:
Constant Field Values

SEQUENTIAL_ACCESS

public static final java.lang.String SEQUENTIAL_ACCESS
See Also:
Constant Field Values
Method Detail

mergePartialVectors

public static void mergePartialVectors(java.util.List<org.apache.hadoop.fs.Path> partialVectorPaths,
                                       java.lang.String output,
                                       float normPower,
                                       int dimension,
                                       boolean sequentialAccess)
                                throws java.io.IOException
Merge all the partial RandomAccessSparseVectors into the complete Document RandomAccessSparseVector

Parameters:
partialVectorPaths - input directory of the vectors in SequenceFile format
output - output directory were the partial vectors have to be created
normPower - The normalization value. Must be greater than or equal to 0 or equal to NO_NORMALIZING
Throws:
java.io.IOException


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.