|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.nutch.parse.ms.MSExtractor
Defines a Microsoft document content extractor.
Field Summary | |
protected static org.apache.commons.logging.Log |
LOG
|
Constructor Summary | |
protected |
MSExtractor()
Constructs a new Microsoft document extractor. |
Method Summary | |
protected void |
extract(InputStream input)
Extracts properties and text from an MS Document input stream |
protected abstract String |
extractText(InputStream input)
Extracts the text content from a Microsoft document input stream. |
protected Properties |
getProperties()
Get the Properties of the Microsoft document. |
protected String |
getText()
Get the content text of the Microsoft document. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
protected static final org.apache.commons.logging.Log LOG
Constructor Detail |
protected MSExtractor()
Method Detail |
protected void extract(InputStream input) throws Exception
Exception
protected abstract String extractText(InputStream input) throws Exception
Exception
protected String getText()
protected Properties getProperties()
Properties
of the Microsoft document.
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |