org.apache.mahout.text
Class MailArchivesClusteringAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by org.apache.lucene.analysis.ReusableAnalyzerBase
          extended by org.apache.lucene.analysis.StopwordAnalyzerBase
              extended by org.apache.mahout.text.MailArchivesClusteringAnalyzer
All Implemented Interfaces:
Closeable

public final class MailArchivesClusteringAnalyzer
extends org.apache.lucene.analysis.StopwordAnalyzerBase

Custom Lucene Analyzer designed for aggressive feature reduction for clustering the ASF Mail Archives using an extended set of stop words, excluding non-alpha-numeric tokens, and porter stemming.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
matchVersion, stopwords
 
Constructor Summary
MailArchivesClusteringAnalyzer()
           
MailArchivesClusteringAnalyzer(Set<?> stopSet)
           
 
Method Summary
protected  org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName, Reader reader)
           
 
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet
 
Methods inherited from class org.apache.lucene.analysis.ReusableAnalyzerBase
initReader, reusableTokenStream, tokenStream
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getPreviousTokenStream, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MailArchivesClusteringAnalyzer

public MailArchivesClusteringAnalyzer()

MailArchivesClusteringAnalyzer

public MailArchivesClusteringAnalyzer(Set<?> stopSet)
Method Detail

createComponents

protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents(String fieldName,
                                                                                                 Reader reader)
Specified by:
createComponents in class org.apache.lucene.analysis.ReusableAnalyzerBase


Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.