|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
Class Summary | |
---|---|
MailArchivesClusteringAnalyzer | Custom Lucene Analyzer designed for aggressive feature reduction for clustering the ASF Mail Archives using an extended set of stop words, excluding non-alpha-numeric tokens, and porter stemming. |
PrefixAdditionFilter | Default parser for parsing text into sequence files. |
SequenceFilesFromDirectory | Converts a directory of text documents into SequenceFiles of Specified chunkSize. |
SequenceFilesFromDirectoryFilter | Implement this interface if you wish to extend SequenceFilesFromDirectory with your own parsing logic. |
SequenceFilesFromMailArchives | Converts a directory of gzipped mail archives into SequenceFiles of specified chunkSize. |
TextParagraphSplittingJob | |
TextParagraphSplittingJob.SplitMap |
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |