org.apache.mahout.text
Class MailArchivesClusteringAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.ReusableAnalyzerBase
org.apache.lucene.analysis.StopwordAnalyzerBase
org.apache.mahout.text.MailArchivesClusteringAnalyzer
- All Implemented Interfaces:
- Closeable
public final class MailArchivesClusteringAnalyzer
- extends org.apache.lucene.analysis.StopwordAnalyzerBase
Custom Lucene Analyzer designed for aggressive feature reduction
for clustering the ASF Mail Archives using an extended set of
stop words, excluding non-alpha-numeric tokens, and porter stemming.
Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase |
org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents |
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase |
matchVersion, stopwords |
Method Summary |
protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents |
createComponents(String fieldName,
Reader reader)
|
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase |
getStopwordSet, loadStopwordSet |
Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase |
initReader, reusableTokenStream, tokenStream |
Methods inherited from class org.apache.lucene.analysis.Analyzer |
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
MailArchivesClusteringAnalyzer
public MailArchivesClusteringAnalyzer()
MailArchivesClusteringAnalyzer
public MailArchivesClusteringAnalyzer(Set<?> stopSet)
createComponents
protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName,
Reader reader)
- Specified by:
createComponents
in class org.apache.lucene.analysis.ReusableAnalyzerBase
Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.