org.apache.mahout.text
Class MailArchivesClusteringAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.mahout.text.MailArchivesClusteringAnalyzer
- All Implemented Interfaces:
- Closeable
public final class MailArchivesClusteringAnalyzer
- extends org.apache.lucene.analysis.Analyzer
Custom Lucene Analyzer designed for aggressive feature reduction
for clustering the ASF Mail Archives using an extended set of
stop words, excluding non-alpha-numeric tokens, and porter stemming.
Methods inherited from class org.apache.lucene.analysis.Analyzer |
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, reusableTokenStream, setPreviousTokenStream |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
MailArchivesClusteringAnalyzer
public MailArchivesClusteringAnalyzer()
MailArchivesClusteringAnalyzer
public MailArchivesClusteringAnalyzer(org.apache.lucene.analysis.CharArraySet stopSet)
tokenStream
public org.apache.lucene.analysis.TokenStream tokenStream(String fieldName,
Reader reader)
- Specified by:
tokenStream
in class org.apache.lucene.analysis.Analyzer
Copyright © 2008-2011 The Apache Software Foundation. All Rights Reserved.