org.apache.lucene.misc
Class HighFreqTerms

java.lang.Object
  extended by org.apache.lucene.misc.HighFreqTerms

public class HighFreqTerms
extends Object

HighFreqTerms class extracts the top n most frequent terms (by document frequency ) from an existing Lucene index and reports their document frequencey. If the -t flag is and reports both their document frequency and their total tf (total number of occurences) in order of highest total tf


Field Summary
static int DEFAULTnumTerms
           
static int numTerms
           
 
Constructor Summary
HighFreqTerms()
           
 
Method Summary
static TermStats[] getHighFreqTerms(IndexReader reader, int numTerms, String field)
           
static long getTotalTermFreq(IndexReader reader, String field, BytesRef termText)
           
static void main(String[] args)
           
static TermStats[] sortByTotalTermFreq(IndexReader reader, TermStats[] terms)
          Takes array of TermStats.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULTnumTerms

public static final int DEFAULTnumTerms
See Also:
Constant Field Values

numTerms

public static int numTerms
Constructor Detail

HighFreqTerms

public HighFreqTerms()
Method Detail

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception

getHighFreqTerms

public static TermStats[] getHighFreqTerms(IndexReader reader,
                                           int numTerms,
                                           String field)
                                    throws Exception
Parameters:
reader -
numTerms -
field -
Returns:
TermStats[] ordered by terms with highest docFreq first.
Throws:
Exception

sortByTotalTermFreq

public static TermStats[] sortByTotalTermFreq(IndexReader reader,
                                              TermStats[] terms)
                                       throws Exception
Takes array of TermStats. For each term looks up the tf for each doc containing the term and stores the total in the output array of TermStats. Output array is sorted by highest total tf.

Parameters:
reader -
terms - TermStats[]
Returns:
TermStats[]
Throws:
Exception

getTotalTermFreq

public static long getTotalTermFreq(IndexReader reader,
                                    String field,
                                    BytesRef termText)
                             throws Exception
Throws:
Exception


Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.