public class LuceneTokenizer
extends java.lang.Object
Modifier and Type | Class and Description |
---|---|
static class |
LuceneTokenizer.TokenizerType |
Constructor and Description |
---|
LuceneTokenizer(java.lang.String content,
LuceneTokenizer.TokenizerType tokenizer,
boolean useStopFilter,
LuceneAnalyzerUtil.StemFilterType stemFilterType)
Creates a tokenizer based on param values
|
LuceneTokenizer(java.lang.String content,
LuceneTokenizer.TokenizerType tokenizer,
java.util.List<java.lang.String> stopWords,
boolean addToDefault,
LuceneAnalyzerUtil.StemFilterType stemFilterType)
Creates a tokenizer based on param values
|
LuceneTokenizer(java.lang.String content,
LuceneTokenizer.TokenizerType tokenizer,
LuceneAnalyzerUtil.StemFilterType stemFilterType,
int mingram,
int maxgram)
Creates a tokenizer for the ngram model based on param values
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
getTokenStream()
Returns the tokenStream created by the Tokenizer
|
public LuceneTokenizer(java.lang.String content, LuceneTokenizer.TokenizerType tokenizer, boolean useStopFilter, LuceneAnalyzerUtil.StemFilterType stemFilterType)
content
- - The text to tokenizetokenizer
- - the type of tokenizer to use CLASSIC or DEFAULTuseStopFilter
- - if set to true the token stream will be filtered using default Lucene stopsetstemFilterType
- - Type of stemming to performpublic LuceneTokenizer(java.lang.String content, LuceneTokenizer.TokenizerType tokenizer, java.util.List<java.lang.String> stopWords, boolean addToDefault, LuceneAnalyzerUtil.StemFilterType stemFilterType)
content
- - The text to tokenizetokenizer
- - the type of tokenizer to use CLASSIC or DEFAULTstopWords
- - Provide a set of user defined stop wordsaddToDefault
- - If set to true, the stopSet words will be added to the Lucene default stop set.
If false, then only the user provided words will be used as the stop setstemFilterType
- public LuceneTokenizer(java.lang.String content, LuceneTokenizer.TokenizerType tokenizer, LuceneAnalyzerUtil.StemFilterType stemFilterType, int mingram, int maxgram)
content
- - The text to tokenizetokenizer
- - the type of tokenizer to use CLASSIC or DEFAULTstemFilterType
- - Type of stemming to performmingram
- - Value of mingram for tokenizingmaxgram
- - Value of maxgram for tokenizingCopyright © 2019 The Apache Software Foundation