Package | Description |
---|---|
org.apache.nutch.fetcher |
The Nutch robot.
|
org.apache.nutch.parse |
The
Parse interface and related classes. |
org.apache.nutch.parse.html |
An HTML document parsing plugin.
|
org.apache.nutch.parse.tika |
Parse various document formats with help of
Apache Tika.
|
org.apache.nutch.parse.zip |
Parse ZIP files: embedded files are recursively passed to appropriate parsers.
|
org.apache.nutch.service.model.response |
Modifier and Type | Method and Description |
---|---|
Outlink[] |
FetchNode.getOutlinks() |
Modifier and Type | Method and Description |
---|---|
void |
FetchNode.setOutlinks(Outlink[] links) |
Modifier and Type | Method and Description |
---|---|
void |
FetcherThreadEvent.addOutlinksToEventData(java.util.Collection<Outlink> links)
Given a collection of lists this method will add it
the oultink metadata
|
Modifier and Type | Method and Description |
---|---|
Outlink[] |
ParseData.getOutlinks()
The outlinks of the page.
|
static Outlink[] |
OutlinkExtractor.getOutlinks(java.lang.String plainText,
Configuration conf)
Extracts
Outlink from given plain text. |
static Outlink[] |
OutlinkExtractor.getOutlinks(java.lang.String plainText,
java.lang.String anchor,
Configuration conf)
Extracts
Outlink from given plain text and adds anchor to the
extracted Outlink s |
static Outlink |
Outlink.read(java.io.DataInput in) |
Modifier and Type | Method and Description |
---|---|
void |
ParseData.setOutlinks(Outlink[] outlinks) |
Constructor and Description |
---|
ParseData(ParseStatus status,
java.lang.String title,
Outlink[] outlinks,
Metadata contentMeta) |
ParseData(ParseStatus status,
java.lang.String title,
Outlink[] outlinks,
Metadata contentMeta,
Metadata parseMeta) |
Modifier and Type | Method and Description |
---|---|
void |
DOMContentUtils.getOutlinks(java.net.URL base,
java.util.ArrayList<Outlink> outlinks,
org.w3c.dom.Node node)
This method finds all anchors below the supplied DOM
node , and
creates appropriate Outlink records for each (relative to the
supplied base URL), and adds them to the outlinks
ArrayList . |
Modifier and Type | Method and Description |
---|---|
void |
DOMContentUtils.getOutlinks(java.net.URL base,
java.util.ArrayList<Outlink> outlinks,
java.util.List<org.apache.tika.sax.Link> tikaExtractedOutlinks) |
void |
DOMContentUtils.getOutlinks(java.net.URL base,
java.util.ArrayList<Outlink> outlinks,
org.w3c.dom.Node node)
This method finds all anchors below the supplied DOM
node , and
creates appropriate Outlink records for each (relative to the
supplied base URL), and adds them to the outlinks
ArrayList . |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
ZipTextExtractor.extractText(java.io.InputStream input,
java.lang.String url,
java.util.List<Outlink> outLinksList) |
Modifier and Type | Method and Description |
---|---|
void |
FetchNodeDbInfo.setChildNodes(Outlink[] links) |
Copyright © 2018 The Apache Software Foundation