public abstract class MicroformatExtractor extends Object implements Extractor.TagSoupDOMExtractor
Extractor.BlindExtractor, Extractor.ContentExtractor, Extractor.TagSoupDOMExtractor
Modifier and Type | Field and Description |
---|---|
static String |
BEGIN_SCRIPT |
static String |
END_SCRIPT |
protected Any23ValueFactoryWrapper |
valueFactory |
Constructor and Description |
---|
MicroformatExtractor() |
Modifier and Type | Method and Description |
---|---|
protected void |
addBNodeProperty(Node n,
org.openrdf.model.Resource subject,
org.openrdf.model.URI property,
org.openrdf.model.BNode bnode)
Helper method that adds a BNode property to a node.
|
protected void |
addBNodeProperty(org.openrdf.model.Resource subject,
org.openrdf.model.URI property,
org.openrdf.model.BNode bnode)
Helper method that adds a BNode property to a node.
|
protected void |
addURIProperty(org.openrdf.model.Resource subject,
org.openrdf.model.URI property,
org.openrdf.model.URI object)
Helper method that adds a URI property to a node.
|
protected boolean |
conditionallyAddLiteralProperty(Node n,
org.openrdf.model.Resource subject,
org.openrdf.model.URI property,
org.openrdf.model.Literal literal)
Helper method that adds a literal property to a node.
|
protected boolean |
conditionallyAddResourceProperty(org.openrdf.model.Resource subject,
org.openrdf.model.URI property,
org.openrdf.model.URI uri)
Helper method that adds a URI property to a node.
|
protected boolean |
conditionallyAddStringProperty(Node n,
org.openrdf.model.Resource subject,
org.openrdf.model.URI p,
String value)
Helper method that adds a literal property to a subject only if the value of the property
is a valid string.
|
protected abstract boolean |
extract()
Performs the extraction of the data and writes them to the model.
|
protected org.openrdf.model.URI |
fixLink(String link) |
protected org.openrdf.model.URI |
fixLink(String link,
String defaultSchema) |
protected ExtractionResult |
getCurrentExtractionResult()
Returns the
ExtractionResult associated
to the extraction session. |
abstract ExtractorDescription |
getDescription()
Returns the description of this extractor.
|
org.openrdf.model.URI |
getDocumentURI() |
ExtractionContext |
getExtractionContext() |
HTMLDocument |
getHTMLDocument() |
static boolean |
includes(Class<? extends MicroformatExtractor> including,
Class<? extends MicroformatExtractor> included)
This method checks if there is a native nesting relationship between two
MicroformatExtractor . |
protected ExtractionResult |
openSubResult(ExtractionContext context) |
void |
run(ExtractionParameters extractionParameters,
ExtractionContext extractionContext,
Document in,
ExtractionResult out) |
public static final String BEGIN_SCRIPT
public static final String END_SCRIPT
protected final Any23ValueFactoryWrapper valueFactory
public abstract ExtractorDescription getDescription()
getDescription
in interface Extractor<Document>
protected abstract boolean extract() throws ExtractionException
ExtractionException
public HTMLDocument getHTMLDocument()
public ExtractionContext getExtractionContext()
public org.openrdf.model.URI getDocumentURI()
public final void run(ExtractionParameters extractionParameters, ExtractionContext extractionContext, Document in, ExtractionResult out) throws IOException, ExtractionException
run
in interface Extractor<Document>
IOException
ExtractionException
protected ExtractionResult getCurrentExtractionResult()
ExtractionResult
associated
to the extraction session.protected ExtractionResult openSubResult(ExtractionContext context)
protected boolean conditionallyAddStringProperty(Node n, org.openrdf.model.Resource subject, org.openrdf.model.URI p, String value)
n
- the HTML node from which the property value has been extracted.subject
- the property subject.p
- the property URI.value
- the property value.true
if the value has been accepted and added, false
otherwise.protected boolean conditionallyAddLiteralProperty(Node n, org.openrdf.model.Resource subject, org.openrdf.model.URI property, org.openrdf.model.Literal literal)
n
- the HTML node from which the property value has been extracted.subject
- subject the property subject.property
- the property URI.literal
- value the property value.true
if the literal has been accepted and added, false
otherwise.protected boolean conditionallyAddResourceProperty(org.openrdf.model.Resource subject, org.openrdf.model.URI property, org.openrdf.model.URI uri)
subject
- the property subject.property
- the property URI.uri
- the property object.true
if the the resource has been added, false
otherwise.protected void addBNodeProperty(Node n, org.openrdf.model.Resource subject, org.openrdf.model.URI property, org.openrdf.model.BNode bnode)
n
- the HTML node used for extracting such property.subject
- the property subject.property
- the property URI.bnode
- the property value.protected void addBNodeProperty(org.openrdf.model.Resource subject, org.openrdf.model.URI property, org.openrdf.model.BNode bnode)
subject
- the property subject.property
- the property URI.bnode
- the property value.protected void addURIProperty(org.openrdf.model.Resource subject, org.openrdf.model.URI property, org.openrdf.model.URI object)
subject
- property
- object
- protected org.openrdf.model.URI fixLink(String link)
public static boolean includes(Class<? extends MicroformatExtractor> including, Class<? extends MicroformatExtractor> included)
MicroformatExtractor
.including
- the including MicroformatExtractor
included
- the included MicroformatExtractor
true
if there is a declared nesting relationshipIncludes
Copyright © 2010-2013 The Apache Software Foundation. All Rights Reserved.