Package org.apache.mahout.text

Class Summary
MailArchivesClusteringAnalyzer Custom Lucene Analyzer designed for aggressive feature reduction for clustering the ASF Mail Archives using an extended set of stop words, excluding non-alpha-numeric tokens, and porter stemming.
PrefixAdditionFilter Default parser for parsing text into sequence files.
SequenceFilesFromDirectory Converts a directory of text documents into SequenceFiles of Specified chunkSize.
SequenceFilesFromDirectoryFilter Implement this interface if you wish to extend SequenceFilesFromDirectory with your own parsing logic.
SequenceFilesFromMailArchives Converts a directory of gzipped mail archives into SequenceFiles of specified chunkSize.
TextParagraphSplittingJob  
TextParagraphSplittingJob.SplitMap  
 



Copyright © 2008-2012 The Apache Software Foundation. All Rights Reserved.