org.apache.lucene.search.spell
Class WordBreakSpellChecker

java.lang.Object
  extended by org.apache.lucene.search.spell.WordBreakSpellChecker

public class WordBreakSpellChecker
extends Object

A spell checker whose sole function is to offer suggestions by combining multiple terms into one word and/or breaking terms into multiple words.


Nested Class Summary
static class WordBreakSpellChecker.BreakSuggestionSortMethod
           Determines the order to list word break suggestions
 
Field Summary
static Term SEPARATOR_TERM
           
 
Constructor Summary
WordBreakSpellChecker()
           
 
Method Summary
 int getMaxChanges()
           
 int getMaxCombineWordLength()
           
 int getMaxEvaluations()
           
 int getMinBreakWordLength()
           
 int getMinSuggestionFrequency()
           
 void setMaxChanges(int maxChanges)
           The maximum numbers of changes (word breaks or combinations) to make on the original term(s).
 void setMaxCombineWordLength(int maxCombineWordLength)
           The maximum length of a suggestion made by combining 1 or more original terms.
 void setMaxEvaluations(int maxEvaluations)
           The maximum number of word combinations to evaluate.
 void setMinBreakWordLength(int minBreakWordLength)
           The minimum length to break words down to.
 void setMinSuggestionFrequency(int minSuggestionFrequency)
           The minimum frequency a term must have to be included as part of a suggestion.
 SuggestWord[][] suggestWordBreaks(Term term, int maxSuggestions, IndexReader ir, SuggestMode suggestMode, WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod)
           Generate suggestions by breaking the passed-in term into multiple words.
 CombineSuggestion[] suggestWordCombinations(Term[] terms, int maxSuggestions, IndexReader ir, SuggestMode suggestMode)
           Generate suggestions by combining one or more of the passed-in terms into single words.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SEPARATOR_TERM

public static final Term SEPARATOR_TERM
Constructor Detail

WordBreakSpellChecker

public WordBreakSpellChecker()
Method Detail

suggestWordBreaks

public SuggestWord[][] suggestWordBreaks(Term term,
                                         int maxSuggestions,
                                         IndexReader ir,
                                         SuggestMode suggestMode,
                                         WordBreakSpellChecker.BreakSuggestionSortMethod sortMethod)
                                  throws IOException

Generate suggestions by breaking the passed-in term into multiple words. The scores returned are equal to the number of word breaks needed so a lower score is generally preferred over a higher score.

Parameters:
term -
maxSuggestions -
ir -
suggestMode - - default = SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX
sortMethod - - default = WordBreakSpellChecker.BreakSuggestionSortMethod.NUM_CHANGES_THEN_MAX_FREQUENCY
Returns:
one or more arrays of words formed by breaking up the original term
Throws:
IOException

suggestWordCombinations

public CombineSuggestion[] suggestWordCombinations(Term[] terms,
                                                   int maxSuggestions,
                                                   IndexReader ir,
                                                   SuggestMode suggestMode)
                                            throws IOException

Generate suggestions by combining one or more of the passed-in terms into single words. The returned CombineSuggestion contains both a SuggestWord and also an array detailing which passed-in terms were involved in creating this combination. The scores returned are equal to the number of word combinations needed, also one less than the length of the array CombineSuggestion.originalTermIndexes. Generally, a suggestion with a lower score is preferred over a higher score.

To prevent two adjacent terms from being combined (for instance, if one is mandatory and the other is prohibited), separate the two terms with SEPARATOR_TERM

When suggestMode equals SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX, each suggestion will include at least one term not in the index.

When suggestMode equals SuggestMode.SUGGEST_MORE_POPULAR, each suggestion will have the same, or better frequency than the most-popular included term.

Parameters:
terms -
maxSuggestions -
ir -
suggestMode -
Returns:
an array of words generated by combining original terms
Throws:
IOException

getMinSuggestionFrequency

public int getMinSuggestionFrequency()

getMaxCombineWordLength

public int getMaxCombineWordLength()

getMinBreakWordLength

public int getMinBreakWordLength()

getMaxChanges

public int getMaxChanges()

getMaxEvaluations

public int getMaxEvaluations()

setMinSuggestionFrequency

public void setMinSuggestionFrequency(int minSuggestionFrequency)

The minimum frequency a term must have to be included as part of a suggestion. Default=1 Not applicable when used with SuggestMode.SUGGEST_MORE_POPULAR

Parameters:
minSuggestionFrequency -

setMaxCombineWordLength

public void setMaxCombineWordLength(int maxCombineWordLength)

The maximum length of a suggestion made by combining 1 or more original terms. Default=20

Parameters:
maxCombineWordLength -

setMinBreakWordLength

public void setMinBreakWordLength(int minBreakWordLength)

The minimum length to break words down to. Default=1

Parameters:
minBreakWordLength -

setMaxChanges

public void setMaxChanges(int maxChanges)

The maximum numbers of changes (word breaks or combinations) to make on the original term(s). Default=1

Parameters:
maxChanges -

setMaxEvaluations

public void setMaxEvaluations(int maxEvaluations)

The maximum number of word combinations to evaluate. Default=1000. A higher value might improve result quality. A lower value might improve performance.

Parameters:
maxEvaluations -


Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.