org.apache.mahout.vectorizer.encoders
Class AdaptiveWordValueEncoder

java.lang.Object
  extended by org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
      extended by org.apache.mahout.vectorizer.encoders.WordValueEncoder
          extended by org.apache.mahout.vectorizer.encoders.AdaptiveWordValueEncoder

public class AdaptiveWordValueEncoder
extends WordValueEncoder

Encodes words into vectors much as does WordValueEncoder while maintaining an adaptive dictionary of values seen so far. This allows weighting of terms without a pre-scan of all of the data.


Field Summary
 
Fields inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
CONTINUOUS_VALUE_HASH_SEED, WORD_LIKE_VALUE_HASH_SEED
 
Constructor Summary
AdaptiveWordValueEncoder(java.lang.String name)
           
 
Method Summary
 void addToVector(java.lang.String originalForm, double weight, Vector data)
          Adds a value to a vector.
 com.google.common.collect.Multiset<java.lang.String> getDictionary()
           
protected  double getWeight(byte[] originalForm, double w)
           
protected  int hashForProbe(byte[] originalForm, int dataSize, java.lang.String name, int probe)
          Provides the unique hash for a particular probe.
protected  double weight(byte[] originalForm)
           
 
Methods inherited from class org.apache.mahout.vectorizer.encoders.WordValueEncoder
addToVector, asString
 
Methods inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
addToVector, addToVector, bytesForString, getName, getProbes, hash, hash, hash, hash, hash, hashesForProbe, isTraceEnabled, setProbes, setTraceDictionary, trace, trace
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AdaptiveWordValueEncoder

public AdaptiveWordValueEncoder(java.lang.String name)
Method Detail

addToVector

public void addToVector(java.lang.String originalForm,
                        double weight,
                        Vector data)
Adds a value to a vector.

Overrides:
addToVector in class FeatureVectorEncoder
Parameters:
originalForm - The original form of the value as a string.
data - The vector to which the value should be added.
weight - The weight to be applied to this feature.

hashForProbe

protected int hashForProbe(byte[] originalForm,
                           int dataSize,
                           java.lang.String name,
                           int probe)
Description copied from class: FeatureVectorEncoder
Provides the unique hash for a particular probe. For all encoders except text, this is all that is needed and the default implementation of hashesForProbe will do the right thing. For text and similar values, hashesForProbe should be over-ridden and this method should not be used.

Overrides:
hashForProbe in class WordValueEncoder
Parameters:
originalForm - The original byte array value
dataSize - The length of hte vector being encoded
name - The name of the variable being encoded
probe - The probe number
Returns:
The hash of the current probe

getWeight

protected double getWeight(byte[] originalForm,
                           double w)
Overrides:
getWeight in class WordValueEncoder

weight

protected double weight(byte[] originalForm)
Specified by:
weight in class WordValueEncoder

getDictionary

public com.google.common.collect.Multiset<java.lang.String> getDictionary()


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.