org.apache.lucene.misc
Class HighFreqTerms
java.lang.Object
org.apache.lucene.misc.HighFreqTerms
public class HighFreqTerms
- extends Object
HighFreqTerms
class extracts the top n most frequent terms
(by document frequency ) from an existing Lucene index and reports their document frequencey.
If the -t flag is and reports both their document frequency and their total tf (total number of occurences)
in order of highest total tf
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULTnumTerms
public static final int DEFAULTnumTerms
- See Also:
- Constant Field Values
numTerms
public static int numTerms
HighFreqTerms
public HighFreqTerms()
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
getHighFreqTerms
public static TermStats[] getHighFreqTerms(IndexReader reader,
int numTerms,
String field)
throws Exception
- Returns TermStats[] ordered by terms with highest docFreq first.
- Throws:
Exception
sortByTotalTermFreq
public static TermStats[] sortByTotalTermFreq(IndexReader reader,
TermStats[] terms)
throws Exception
- Takes array of TermStats. For each term looks up the tf for each doc
containing the term and stores the total in the output array of TermStats.
Output array is sorted by highest total tf.
- Parameters:
terms
- TermStats[]
- Returns:
- TermStats[]
- Throws:
Exception
getTotalTermFreq
public static long getTotalTermFreq(IndexReader reader,
Term term)
throws Exception
- Throws:
Exception
Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.