org.apache.mahout.vectorizer.encoders
Class AdaptiveWordValueEncoder
java.lang.Object
org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder
org.apache.mahout.vectorizer.encoders.WordValueEncoder
org.apache.mahout.vectorizer.encoders.AdaptiveWordValueEncoder
public class AdaptiveWordValueEncoder
- extends WordValueEncoder
Encodes words into vectors much as does WordValueEncoder while maintaining
an adaptive dictionary of values seen so far. This allows weighting of terms
without a pre-scan of all of the data.
Method Summary |
void |
addToVector(java.lang.String originalForm,
double weight,
Vector data)
Adds a value to a vector. |
com.google.common.collect.Multiset<java.lang.String> |
getDictionary()
|
protected double |
getWeight(byte[] originalForm,
double w)
|
protected int |
hashForProbe(byte[] originalForm,
int dataSize,
java.lang.String name,
int probe)
Provides the unique hash for a particular probe. |
protected double |
weight(byte[] originalForm)
|
Methods inherited from class org.apache.mahout.vectorizer.encoders.FeatureVectorEncoder |
addToVector, addToVector, bytesForString, getName, getProbes, hash, hash, hash, hash, hash, hashesForProbe, isTraceEnabled, setProbes, setTraceDictionary, trace, trace |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
AdaptiveWordValueEncoder
public AdaptiveWordValueEncoder(java.lang.String name)
addToVector
public void addToVector(java.lang.String originalForm,
double weight,
Vector data)
- Adds a value to a vector.
- Overrides:
addToVector
in class FeatureVectorEncoder
- Parameters:
originalForm
- The original form of the value as a string.data
- The vector to which the value should be added.weight
- The weight to be applied to this feature.
hashForProbe
protected int hashForProbe(byte[] originalForm,
int dataSize,
java.lang.String name,
int probe)
- Description copied from class:
FeatureVectorEncoder
- Provides the unique hash for a particular probe. For all encoders except text, this
is all that is needed and the default implementation of hashesForProbe will do the right
thing. For text and similar values, hashesForProbe should be over-ridden and this method
should not be used.
- Overrides:
hashForProbe
in class WordValueEncoder
- Parameters:
originalForm
- The original byte array valuedataSize
- The length of hte vector being encodedname
- The name of the variable being encodedprobe
- The probe number
- Returns:
- The hash of the current probe
getWeight
protected double getWeight(byte[] originalForm,
double w)
- Overrides:
getWeight
in class WordValueEncoder
weight
protected double weight(byte[] originalForm)
- Specified by:
weight
in class WordValueEncoder
getDictionary
public com.google.common.collect.Multiset<java.lang.String> getDictionary()
Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.