org.odftoolkit.simple.common
Class TextExtractor

java.lang.Object
  extended by org.odftoolkit.odfdom.dom.DefaultElementVisitor
      extended by org.odftoolkit.simple.common.TextExtractor
All Implemented Interfaces:
ElementVisitor
Direct Known Subclasses:
EditableTextExtractor

public class TextExtractor
extends DefaultElementVisitor

This is a sub class of DefaultElementVisitor, which is used to extract display text from ODF element. For example, if you want to get all of the text content in a slide notes, you can call getOdfElement() to get the ODF element of this notes, then pass it to newOdfTextExtractor to create a TextExtractor. The last step is very easy, you only need to use getText(), all of the text content will be return as string. Another easier way is pass the ODF element to the static method TextExtractor.getText(OdfElement) directly.

If you pass the content root which you can get by Document.getContentRoot() as the parameter, the whole document content will be returned, without any tag information.

This extractor implements parts of ODF elements' white space handling functions. They are text:p, text:h, text:s, text:tab and text:linebreak, which visit() are override to process white space, according to ODF specification.

See Also:
OdfElement

Nested Class Summary
protected static class TextExtractor.ExtractorStringBuilder
          This class is used to provide the string builder functions to extractor.
 
Field Summary
protected  TextExtractor.ExtractorStringBuilder mTextBuilder
           
protected static char NewLineChar
           
protected static char TabChar
           
 
Constructor Summary
protected TextExtractor()
          Default constructor
protected TextExtractor(OdfElement element)
          Constructor with an ODF element as parameter
 
Method Summary
protected  void appendElementText(OdfElement ele)
          Append the text content of this element to string buffer.
 String getText()
          Return the text content of specified ODF element as a string.
static String getText(OdfElement ele)
          Return the text content of a element as String
static TextExtractor newOdfTextExtractor(OdfElement element)
          Create a TextExtractor instance using specified ODF element, which text content can be extracted by getText().
 void visit(OdfElement element)
          The end users needn't to care of this method, if you don't want to override the text content handling strategy of OdfElement.
 void visit(TextHElement ele)
          The end users needn't to care of this method, if you don't want to override the text content handling strategy of text:h.
 void visit(TextLineBreakElement ele)
          The end users needn't to care of this method, if you don't want to override the text content handling strategy of text:linebreak.
 void visit(TextPElement ele)
          The end users needn't to care of this method, if you don't want to override the text content handling strategy of text:p.
 void visit(TextSElement ele)
          The end users needn't to care of this method, if you don't want to override the text content handling strategy of text:s.
 void visit(TextTabElement ele)
          The end users needn't to care of this method, if you don't want to override the text content handling strategy of text:tab.
 
Methods inherited from class org.odftoolkit.odfdom.dom.DefaultElementVisitor
visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit, visit
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NewLineChar

protected static final char NewLineChar
See Also:
Constant Field Values

TabChar

protected static final char TabChar
See Also:
Constant Field Values

mTextBuilder

protected final TextExtractor.ExtractorStringBuilder mTextBuilder
Constructor Detail

TextExtractor

protected TextExtractor()
Default constructor


TextExtractor

protected TextExtractor(OdfElement element)
Constructor with an ODF element as parameter

Parameters:
element - the ODF element whose text would be extracted.
Method Detail

getText

public static String getText(OdfElement ele)
Return the text content of a element as String

Parameters:
ele - the ODF element
Returns:
the text content of the element

newOdfTextExtractor

public static TextExtractor newOdfTextExtractor(OdfElement element)
Create a TextExtractor instance using specified ODF element, which text content can be extracted by getText().

Parameters:
element - the ODF element whose text will be extracted.
Returns:
an instance of TextExtractor

getText

public String getText()
Return the text content of specified ODF element as a string.

Returns:
the text content as a string

visit

public void visit(OdfElement element)
The end users needn't to care of this method, if you don't want to override the text content handling strategy of OdfElement.

Specified by:
visit in interface ElementVisitor
Specified by:
visit in class DefaultElementVisitor
See Also:
DefaultElementVisitor.visit(org.odftoolkit.odfdom.pkg.OdfElement)

visit

public void visit(TextPElement ele)
The end users needn't to care of this method, if you don't want to override the text content handling strategy of text:p.

Overrides:
visit in class DefaultElementVisitor
See Also:
DefaultElementVisitor.visit(org.odftoolkit.odfdom.dom.element.text.TextPElement)

visit

public void visit(TextHElement ele)
The end users needn't to care of this method, if you don't want to override the text content handling strategy of text:h.

Overrides:
visit in class DefaultElementVisitor
See Also:
DefaultElementVisitor.visit(org.odftoolkit.odfdom.dom.element.text.TextHElement)

visit

public void visit(TextSElement ele)
The end users needn't to care of this method, if you don't want to override the text content handling strategy of text:s.

Overrides:
visit in class DefaultElementVisitor
See Also:
DefaultElementVisitor.visit(org.odftoolkit.odfdom.dom.element.text.TextSElement)

visit

public void visit(TextTabElement ele)
The end users needn't to care of this method, if you don't want to override the text content handling strategy of text:tab.

Overrides:
visit in class DefaultElementVisitor
See Also:
DefaultElementVisitor.visit(org.odftoolkit.odfdom.dom.element.text.TextTabElement)

visit

public void visit(TextLineBreakElement ele)
The end users needn't to care of this method, if you don't want to override the text content handling strategy of text:linebreak.

Overrides:
visit in class DefaultElementVisitor
See Also:
DefaultElementVisitor.visit(org.odftoolkit.odfdom.dom.element.text.TextLineBreakElement)

appendElementText

protected void appendElementText(OdfElement ele)
Append the text content of this element to string buffer.

Parameters:
ele - the ODF element whose text will be appended.


Copyright © 2010-2017 The Apache Software Foundation. All Rights Reserved.