opennlp.tools.sentdetect
Class SentenceDetectorME

java.lang.Object
  extended by opennlp.tools.sentdetect.SentenceDetectorME
All Implemented Interfaces:
SentenceDetector

public class SentenceDetectorME
extends java.lang.Object
implements SentenceDetector

A sentence detector for splitting up raw text into sentences.

A maximum entropy model is used to evaluate the characters ".", "!", and "?" in a string to determine if they signify the end of a sentence.


Field Summary
static java.lang.String NO_SPLIT
          Constant indicates no sentence split.
static java.lang.String SPLIT
          Constant indicates a sentence split.
 
Constructor Summary
SentenceDetectorME(SentenceModel model)
          Initializes the current instance.
SentenceDetectorME(SentenceModel model, Factory factory)
           
 
Method Summary
 double[] getSentenceProbabilities()
          Returns the probabilities associated with the most recent calls to sentDetect().
static void main(java.lang.String[] args)
          Trains a new sentence detection model.
 java.lang.String[] sentDetect(java.lang.String s)
          Detect sentences in a String.
 Span[] sentPosDetect(java.lang.String s)
          Detect the position of the first words of sentences in a String.
static SentenceModel train(java.lang.String languageCode, ObjectStream<SentenceSample> samples, boolean useTokenEnd, Dictionary abbreviations)
           
static SentenceModel train(java.lang.String languageCode, ObjectStream<SentenceSample> samples, boolean useTokenEnd, Dictionary abbreviations, int cutoff, int iterations)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SPLIT

public static final java.lang.String SPLIT
Constant indicates a sentence split.

See Also:
Constant Field Values

NO_SPLIT

public static final java.lang.String NO_SPLIT
Constant indicates no sentence split.

See Also:
Constant Field Values
Constructor Detail

SentenceDetectorME

public SentenceDetectorME(SentenceModel model)
Initializes the current instance.

Parameters:
model - the SentenceModel

SentenceDetectorME

public SentenceDetectorME(SentenceModel model,
                          Factory factory)
Method Detail

sentDetect

public java.lang.String[] sentDetect(java.lang.String s)
Detect sentences in a String.

Specified by:
sentDetect in interface SentenceDetector
Parameters:
s - The string to be processed.
Returns:
A string array containing individual sentences as elements.

sentPosDetect

public Span[] sentPosDetect(java.lang.String s)
Detect the position of the first words of sentences in a String.

Specified by:
sentPosDetect in interface SentenceDetector
Parameters:
s - The string to be processed.
Returns:
A integer array containing the positions of the end index of every sentence

getSentenceProbabilities

public double[] getSentenceProbabilities()
Returns the probabilities associated with the most recent calls to sentDetect().

Returns:
probability for each sentence returned for the most recent call to sentDetect. If not applicable an empty array is returned.

train

public static SentenceModel train(java.lang.String languageCode,
                                  ObjectStream<SentenceSample> samples,
                                  boolean useTokenEnd,
                                  Dictionary abbreviations)
                           throws java.io.IOException
Throws:
java.io.IOException

train

public static SentenceModel train(java.lang.String languageCode,
                                  ObjectStream<SentenceSample> samples,
                                  boolean useTokenEnd,
                                  Dictionary abbreviations,
                                  int cutoff,
                                  int iterations)
                           throws java.io.IOException
Throws:
java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException

Trains a new sentence detection model.

Usage: opennlp.tools.sentdetect.SentenceDetectorME data_file new_model_name (iterations cutoff)?

Parameters:
args -
Throws:
java.io.IOException


Copyright © 2011 The Apache Software Foundation. All Rights Reserved.