public class SingleDocumentExtraction extends Object
Constructor and Description |
---|
SingleDocumentExtraction(Configuration configuration,
DocumentSource in,
ExtractorFactory<?> factory,
TripleHandler output)
Builds an extractor by the specification of document source,
extractors factory and output triple handler.
|
SingleDocumentExtraction(Configuration configuration,
DocumentSource in,
ExtractorGroup extractors,
TripleHandler output)
Builds an extractor by the specification of document source,
list of extractors and output triple handler.
|
SingleDocumentExtraction(DocumentSource in,
ExtractorFactory<?> factory,
TripleHandler output)
Builds an extractor by the specification of document source,
extractors factory and output triple handler, using the
DefaultConfiguration . |
Modifier and Type | Method and Description |
---|---|
String |
getDetectedMIMEType()
Returns the detected mimetype for the given
DocumentSource . |
List<Extractor> |
getMatchingExtractors() |
String |
getParserEncoding() |
boolean |
hasMatchingExtractors()
Check whether the given
DocumentSource content activates of not at least an extractor. |
SingleDocumentExtractionReport |
run()
Triggers the execution of all the
Extractor
registered to this class using the default extraction parameters. |
SingleDocumentExtractionReport |
run(ExtractionParameters extractionParameters)
Triggers the execution of all the
Extractor
registered to this class using the specified extraction parameters. |
void |
setLocalCopyFactory(LocalCopyFactory copyFactory)
Sets the internal factory for generating the document local copy,
if
null the MemCopyFactory will be used. |
void |
setMIMETypeDetector(MIMETypeDetector detector)
Sets the internal mime type detector,
if
null mimetype detection will
be skipped and all extractors will be activated. |
void |
setParserEncoding(String encoding)
Sets the document parser encoding.
|
public SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorGroup extractors, TripleHandler output)
configuration
- configuration applied during extraction.in
- input document source.extractors
- list of extractors to be applied.output
- output triple handler.public SingleDocumentExtraction(Configuration configuration, DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)
configuration
- configuration applied during extraction.in
- input document source.factory
- the extractors factory.output
- output triple handler.public SingleDocumentExtraction(DocumentSource in, ExtractorFactory<?> factory, TripleHandler output)
DefaultConfiguration
.in
- input document source.factory
- the extractors factory.output
- output triple handler.public void setLocalCopyFactory(LocalCopyFactory copyFactory)
null
the MemCopyFactory
will be used.copyFactory
- local copy factory.DocumentSource
public void setMIMETypeDetector(MIMETypeDetector detector)
null
mimetype detection will
be skipped and all extractors will be activated.detector
- detector instance.public SingleDocumentExtractionReport run(ExtractionParameters extractionParameters) throws ExtractionException, IOException
Extractor
registered to this class using the specified extraction parameters.extractionParameters
- the parameters applied to the run execution.ExtractionException
- if an error occurred during the data extraction.IOException
- if an error occurred during the data access.public SingleDocumentExtractionReport run() throws IOException, ExtractionException
Extractor
registered to this class using the default extraction parameters.IOException
ExtractionException
public String getDetectedMIMEType() throws IOException
DocumentSource
.IOException
- if an error occurred while accessing the data.public boolean hasMatchingExtractors() throws IOException
DocumentSource
content activates of not at least an extractor.true
if at least an extractor is activated, false
otherwise.IOException
public List<Extractor> getMatchingExtractors()
DocumentSource
.public String getParserEncoding()
public void setParserEncoding(String encoding)
encoding
- parser encoding.Copyright © 2010-2013 The Apache Software Foundation. All Rights Reserved.