org.apache.mahout.utils.vectors.lucene
Class ClusterLabels

java.lang.Object
  extended by org.apache.mahout.utils.vectors.lucene.ClusterLabels

public class ClusterLabels
extends java.lang.Object

Get labels for the cluster using Log Likelihood Ratio (LLR).

"The most useful way to think of this (LLR) is as the percentage of in-cluster documents that have the feature (term) versus the percentage out, keeping in mind that both percentages are uncertain since we have only a sample of all possible documents." - Ted Dunning

More about LLR can be found at : http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html


Field Summary
static int DEFAULT_MAX_LABELS
           
static int DEFAULT_MIN_IDS
           
 
Constructor Summary
ClusterLabels(java.lang.String seqFileDir, java.lang.String pointsDir, java.lang.String indexDir, java.lang.String contentField, int minNumIds, int maxLabels)
           
 
Method Summary
protected  java.util.List<org.apache.mahout.utils.vectors.lucene.ClusterLabels.TermInfoClusterInOut> getClusterLabels(java.lang.String clusterID, java.util.List<java.lang.String> ids)
          Get the list of labels, sorted by best score.
 java.lang.String getIdField()
           
 void getLabels()
           
 java.lang.String getOutput()
           
static void main(java.lang.String[] args)
           
 void setIdField(java.lang.String idField)
           
 void setOutput(java.lang.String output)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_MIN_IDS

public static final int DEFAULT_MIN_IDS
See Also:
Constant Field Values

DEFAULT_MAX_LABELS

public static final int DEFAULT_MAX_LABELS
See Also:
Constant Field Values
Constructor Detail

ClusterLabels

public ClusterLabels(java.lang.String seqFileDir,
                     java.lang.String pointsDir,
                     java.lang.String indexDir,
                     java.lang.String contentField,
                     int minNumIds,
                     int maxLabels)
              throws java.io.IOException
Throws:
java.io.IOException
Method Detail

getLabels

public void getLabels()
               throws java.io.IOException
Throws:
java.io.IOException

getClusterLabels

protected java.util.List<org.apache.mahout.utils.vectors.lucene.ClusterLabels.TermInfoClusterInOut> getClusterLabels(java.lang.String clusterID,
                                                                                                                     java.util.List<java.lang.String> ids)
                                                                                                              throws java.io.IOException
Get the list of labels, sorted by best score.

Parameters:
clusterID -
ids -
Returns:
Throws:
org.apache.lucene.index.CorruptIndexException
java.io.IOException

getIdField

public java.lang.String getIdField()

setIdField

public void setIdField(java.lang.String idField)

getOutput

public java.lang.String getOutput()

setOutput

public void setOutput(java.lang.String output)

main

public static void main(java.lang.String[] args)


Copyright © 2008-2010 The Apache Software Foundation. All Rights Reserved.