org.apache.lucene.analysis.cn.smart
Class SentenceTokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
              extended by org.apache.lucene.analysis.cn.smart.SentenceTokenizer

public final class SentenceTokenizer
extends Tokenizer

Tokenizes input text into sentences.

The output tokens can then be broken into words with WordTokenFilter

WARNING: The status of the analyzers/smartcn analysis.cn.smart package is experimental. The APIs and file formats introduced here might change in the future and will not be supported anymore in such a case.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
SentenceTokenizer(AttributeSource.AttributeFactory factory, Reader reader)
           
SentenceTokenizer(AttributeSource source, Reader reader)
           
SentenceTokenizer(Reader reader)
           
 
Method Summary
 boolean incrementToken()
          Consumers (ie IndexWriter) use this method to advance the stream to the next token.
 void reset()
          Resets this stream to the beginning.
 void reset(Reader input)
          Expert: Reset the tokenizer to a new reader.
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
end, getOnlyUseNewAPI, next, next, setOnlyUseNewAPI
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SentenceTokenizer

public SentenceTokenizer(Reader reader)

SentenceTokenizer

public SentenceTokenizer(AttributeSource source,
                         Reader reader)

SentenceTokenizer

public SentenceTokenizer(AttributeSource.AttributeFactory factory,
                         Reader reader)
Method Detail

incrementToken

public boolean incrementToken()
                       throws IOException
Description copied from class: TokenStream
Consumers (ie IndexWriter) use this method to advance the stream to the next token. Implementing classes must implement this method and update the appropriate AttributeImpls with the attributes of the next token.

This method is called for every token of a document, so an efficient implementation is crucial for good performance. To avoid calls to AttributeSource.addAttribute(Class) and AttributeSource.getAttribute(Class) or downcasts, references to all AttributeImpls that this stream uses should be retrieved during instantiation.

To ensure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in TokenStream.incrementToken().

Overrides:
incrementToken in class TokenStream
Returns:
false for end of stream; true otherwise

Note that this method will be defined abstract in Lucene 3.0.

Throws:
IOException

reset

public void reset()
           throws IOException
Description copied from class: TokenStream
Resets this stream to the beginning. This is an optional operation, so subclasses may or may not implement this method. TokenStream.reset() is not needed for the standard indexing process. However, if the tokens of a TokenStream are intended to be consumed more than once, it is necessary to implement TokenStream.reset(). Note that if your TokenStream caches tokens and feeds them back again after a reset, it is imperative that you clone the tokens when you store them away (on the first pass) as well as when you return them (on future passes after TokenStream.reset()).

Overrides:
reset in class TokenStream
Throws:
IOException

reset

public void reset(Reader input)
           throws IOException
Description copied from class: Tokenizer
Expert: Reset the tokenizer to a new reader. Typically, an analyzer (in its reusableTokenStream method) will use this to re-use a previously created tokenizer.

Overrides:
reset in class Tokenizer
Throws:
IOException


Copyright © 2000-2009 Apache Software Foundation. All Rights Reserved.