|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Uses of Content in org.apache.nutch.analysis.lang |
---|
Methods in org.apache.nutch.analysis.lang with parameters of type Content | |
---|---|
ParseResult |
HTMLLanguageParser.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
Scan the HTML document looking at possible indications of content language 1. |
Uses of Content in org.apache.nutch.crawl |
---|
Methods in org.apache.nutch.crawl with parameters of type Content | |
---|---|
byte[] |
MD5Signature.calculate(Content content,
Parse parse)
|
byte[] |
TextProfileSignature.calculate(Content content,
Parse parse)
|
abstract byte[] |
Signature.calculate(Content content,
Parse parse)
|
Uses of Content in org.apache.nutch.fetcher |
---|
Methods in org.apache.nutch.fetcher that return Content | |
---|---|
Content |
FetcherOutput.getContent()
|
Constructors in org.apache.nutch.fetcher with parameters of type Content | |
---|---|
FetcherOutput(CrawlDatum crawlDatum,
Content content,
ParseImpl parse)
|
Uses of Content in org.apache.nutch.microformats.reltag |
---|
Methods in org.apache.nutch.microformats.reltag with parameters of type Content | |
---|---|
ParseResult |
RelTagParser.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
Scan the HTML document looking at possible rel-tags |
Uses of Content in org.apache.nutch.parse |
---|
Methods in org.apache.nutch.parse with parameters of type Content | |
---|---|
ParseResult |
MetaTagsParser.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
|
ParseResult |
HtmlParseFilter.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page. |
ParseResult |
HtmlParseFilters.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
Run all defined filters. |
ParseResult |
Parser.getParse(Content c)
This method parses the given content and returns a map of <key, parse> pairs. |
static boolean |
ParseSegment.isTruncated(Content content)
Checks if the page's content is truncated. |
void |
ParseSegment.map(WritableComparable key,
Content content,
OutputCollector<Text,ParseImpl> output,
Reporter reporter)
|
ParseResult |
ParseUtil.parse(Content content)
Performs a parse by iterating through a List of preferred Parser s
until a successful parse is performed and a Parse object is
returned. |
ParseResult |
ParseUtil.parseByExtensionId(String extId,
Content content)
Method parses a Content object using the Parser specified
by the parameter extId , i.e., the Parser's extension ID. |
Uses of Content in org.apache.nutch.parse.ext |
---|
Methods in org.apache.nutch.parse.ext with parameters of type Content | |
---|---|
ParseResult |
ExtParser.getParse(Content content)
|
Uses of Content in org.apache.nutch.parse.feed |
---|
Methods in org.apache.nutch.parse.feed with parameters of type Content | |
---|---|
ParseResult |
FeedParser.getParse(Content content)
Parses the given feed and extracts out and parsers all linked items within the feed, using the underlying ROME feed parsing library. |
Uses of Content in org.apache.nutch.parse.headings |
---|
Methods in org.apache.nutch.parse.headings with parameters of type Content | |
---|---|
ParseResult |
HeadingsParseFilter.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
|
Uses of Content in org.apache.nutch.parse.html |
---|
Methods in org.apache.nutch.parse.html with parameters of type Content | |
---|---|
ParseResult |
HtmlParser.getParse(Content content)
|
Uses of Content in org.apache.nutch.parse.js |
---|
Methods in org.apache.nutch.parse.js with parameters of type Content | |
---|---|
ParseResult |
JSParseFilter.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
|
ParseResult |
JSParseFilter.getParse(Content c)
|
Uses of Content in org.apache.nutch.parse.swf |
---|
Methods in org.apache.nutch.parse.swf with parameters of type Content | |
---|---|
ParseResult |
SWFParser.getParse(Content content)
|
Uses of Content in org.apache.nutch.parse.tika |
---|
Methods in org.apache.nutch.parse.tika with parameters of type Content | |
---|---|
ParseResult |
TikaParser.getParse(Content content)
|
Uses of Content in org.apache.nutch.parse.zip |
---|
Methods in org.apache.nutch.parse.zip with parameters of type Content | |
---|---|
ParseResult |
ZipParser.getParse(Content content)
|
Uses of Content in org.apache.nutch.protocol |
---|
Methods in org.apache.nutch.protocol that return Content | |
---|---|
Content |
ProtocolOutput.getContent()
|
static Content |
Content.read(DataInput in)
|
Methods in org.apache.nutch.protocol with parameters of type Content | |
---|---|
void |
ProtocolOutput.setContent(Content content)
|
Constructors in org.apache.nutch.protocol with parameters of type Content | |
---|---|
ProtocolOutput(Content content)
|
|
ProtocolOutput(Content content,
ProtocolStatus status)
|
Uses of Content in org.apache.nutch.protocol.file |
---|
Methods in org.apache.nutch.protocol.file that return Content | |
---|---|
Content |
FileResponse.toContent()
|
Uses of Content in org.apache.nutch.protocol.ftp |
---|
Methods in org.apache.nutch.protocol.ftp that return Content | |
---|---|
Content |
FtpResponse.toContent()
|
Uses of Content in org.apache.nutch.scoring |
---|
Methods in org.apache.nutch.scoring with parameters of type Content | |
---|---|
void |
ScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
Currently a part of score distribution is performed using only data coming from the parsing process. |
void |
ScoringFilters.passScoreAfterParsing(Text url,
Content content,
Parse parse)
|
void |
ScoringFilter.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content)
This method takes all relevant score information from the current datum (coming from a generated fetchlist) and stores it into Content metadata. |
void |
ScoringFilters.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content)
|
Uses of Content in org.apache.nutch.scoring.link |
---|
Methods in org.apache.nutch.scoring.link with parameters of type Content | |
---|---|
void |
LinkAnalysisScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
|
void |
LinkAnalysisScoringFilter.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content)
|
Uses of Content in org.apache.nutch.scoring.opic |
---|
Methods in org.apache.nutch.scoring.opic with parameters of type Content | |
---|---|
void |
OPICScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
Copy the value from Content metadata under Fetcher.SCORE_KEY to parseData. |
void |
OPICScoringFilter.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content)
Store a float value of CrawlDatum.getScore() under Fetcher.SCORE_KEY. |
Uses of Content in org.apache.nutch.scoring.tld |
---|
Methods in org.apache.nutch.scoring.tld with parameters of type Content | |
---|---|
void |
TLDScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
|
void |
TLDScoringFilter.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content)
|
Uses of Content in org.apache.nutch.scoring.urlmeta |
---|
Methods in org.apache.nutch.scoring.urlmeta with parameters of type Content | |
---|---|
void |
URLMetaScoringFilter.passScoreAfterParsing(Text url,
Content content,
Parse parse)
Takes the metadata, which was lumped inside the content, and replicates it within your parse data. |
void |
URLMetaScoringFilter.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content)
Takes the metadata, specified in your "urlmeta.tags" property, from the datum object and injects it into the content. |
Uses of Content in org.apache.nutch.segment |
---|
Methods in org.apache.nutch.segment with parameters of type Content | |
---|---|
boolean |
SegmentMergeFilters.filter(WritableComparable key,
CrawlDatum generateData,
CrawlDatum fetchData,
CrawlDatum sigData,
Content content,
ParseData parseData,
ParseText parseText,
Collection<CrawlDatum> linked)
Iterates over all SegmentMergeFilter extensions and if any of them
returns false, it will return false as well. |
boolean |
SegmentMergeFilter.filter(WritableComparable key,
CrawlDatum generateData,
CrawlDatum fetchData,
CrawlDatum sigData,
Content content,
ParseData parseData,
ParseText parseText,
Collection<CrawlDatum> linked)
The filtering method which gets all information being merged for a given key (URL). |
Uses of Content in org.apache.nutch.util |
---|
Methods in org.apache.nutch.util with parameters of type Content | |
---|---|
void |
EncodingDetector.autoDetectClues(Content content,
boolean filter)
|
String |
EncodingDetector.guessEncoding(Content content,
String defaultValue)
Guess the encoding with the previously specified list of clues. |
Uses of Content in org.creativecommons.nutch |
---|
Methods in org.creativecommons.nutch with parameters of type Content | |
---|---|
ParseResult |
CCParseFilter.filter(Content content,
ParseResult parseResult,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page. |
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |