org.apache.mahout.text
Class MailArchivesClusteringAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.mahout.text.MailArchivesClusteringAnalyzer
All Implemented Interfaces:
Closeable

public final class MailArchivesClusteringAnalyzer
extends org.apache.lucene.analysis.Analyzer

Custom Lucene Analyzer designed for aggressive feature reduction for clustering the ASF Mail Archives using an extended set of stop words, excluding non-alpha-numeric tokens, and porter stemming.


Constructor Summary
MailArchivesClusteringAnalyzer()
           
MailArchivesClusteringAnalyzer(org.apache.lucene.analysis.CharArraySet stopSet)
           
 
Method Summary
 org.apache.lucene.analysis.TokenStream tokenStream(String fieldName, Reader reader)
           
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, reusableTokenStream, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MailArchivesClusteringAnalyzer

public MailArchivesClusteringAnalyzer()

MailArchivesClusteringAnalyzer

public MailArchivesClusteringAnalyzer(org.apache.lucene.analysis.CharArraySet stopSet)
Method Detail

tokenStream

public org.apache.lucene.analysis.TokenStream tokenStream(String fieldName,
                                                          Reader reader)
Specified by:
tokenStream in class org.apache.lucene.analysis.Analyzer


Copyright © 2008-2011 The Apache Software Foundation. All Rights Reserved.