Package org.apache.lucene.analysis.miscellaneous

Miscellaneous TokenStreams

See:
          Description

Class Summary
ASCIIFoldingFilter This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.
CapitalizationFilter A filter to apply normal capitalization rules to Tokens.
EmptyTokenStream An always exhausted token stream.
HyphenatedWordsFilter When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines.
KeepWordFilter A TokenFilter that only keeps tokens with text contained in the required words.
KeywordMarkerFilter Marks terms as keywords via the KeywordAttribute.
LengthFilter Removes words that are too long or too short from the stream.
LimitTokenCountAnalyzer This Analyzer limits the number of tokens while indexing.
LimitTokenCountFilter This TokenFilter limits the number of tokens while indexing.
PatternAnalyzer Deprecated. (4.0) use the pattern-based analysis in the analysis/pattern package instead.
PerFieldAnalyzerWrapper This analyzer is used to facilitate scenarios where different fields require different analysis techniques.
PrefixAndSuffixAwareTokenFilter Links two PrefixAwareTokenFilter.
PrefixAwareTokenFilter Joins two token streams and leaves the last token of the first stream available to be used when updating the token values in the second stream based on that token.
RemoveDuplicatesTokenFilter A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.
SingleTokenTokenStream A TokenStream containing a single token.
StemmerOverrideFilter Provides the ability to override any KeywordAttribute aware stemmer with custom dictionary-based stemming.
TrimFilter Trims leading and trailing whitespace from Tokens in the stream.
WordDelimiterFilter Splits words into subwords and performs optional transformations on subword groups.
WordDelimiterIterator A BreakIterator-like API for iterating over subwords in text, according to WordDelimiterFilter rules.
 

Package org.apache.lucene.analysis.miscellaneous Description

Miscellaneous TokenStreams



Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.