org.apache.mahout.utils.vectors.lucene
Class ClusterLabels
java.lang.Object
org.apache.mahout.utils.vectors.lucene.ClusterLabels
public class ClusterLabels
- extends java.lang.Object
Get labels for the cluster using Log Likelihood Ratio (LLR).
"The most useful way to think of this (LLR) is as the percentage of in-cluster documents that have the
feature (term) versus the percentage out, keeping in mind that both percentages are uncertain since we have
only a sample of all possible documents." - Ted Dunning
More about LLR can be found at : http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html
Constructor Summary |
ClusterLabels(java.lang.String seqFileDir,
java.lang.String pointsDir,
java.lang.String indexDir,
java.lang.String contentField,
int minNumIds,
int maxLabels)
|
Method Summary |
protected java.util.List<org.apache.mahout.utils.vectors.lucene.ClusterLabels.TermInfoClusterInOut> |
getClusterLabels(java.lang.String clusterID,
java.util.List<java.lang.String> ids)
Get the list of labels, sorted by best score. |
java.lang.String |
getIdField()
|
void |
getLabels()
|
java.lang.String |
getOutput()
|
static void |
main(java.lang.String[] args)
|
void |
setIdField(java.lang.String idField)
|
void |
setOutput(java.lang.String output)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULT_MIN_IDS
public static final int DEFAULT_MIN_IDS
- See Also:
- Constant Field Values
DEFAULT_MAX_LABELS
public static final int DEFAULT_MAX_LABELS
- See Also:
- Constant Field Values
ClusterLabels
public ClusterLabels(java.lang.String seqFileDir,
java.lang.String pointsDir,
java.lang.String indexDir,
java.lang.String contentField,
int minNumIds,
int maxLabels)
throws java.io.IOException
- Throws:
java.io.IOException
getLabels
public void getLabels()
throws java.io.IOException
- Throws:
java.io.IOException
getClusterLabels
protected java.util.List<org.apache.mahout.utils.vectors.lucene.ClusterLabels.TermInfoClusterInOut> getClusterLabels(java.lang.String clusterID,
java.util.List<java.lang.String> ids)
throws java.io.IOException
- Get the list of labels, sorted by best score.
- Parameters:
clusterID
- ids
-
- Returns:
-
- Throws:
org.apache.lucene.index.CorruptIndexException
java.io.IOException
getIdField
public java.lang.String getIdField()
setIdField
public void setIdField(java.lang.String idField)
getOutput
public java.lang.String getOutput()
setOutput
public void setOutput(java.lang.String output)
main
public static void main(java.lang.String[] args)
Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.