|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use CrawlDatum | |
---|---|
org.apache.nutch.analysis.lang | Text document language identifier. |
org.apache.nutch.crawl | Crawl control code. |
org.apache.nutch.fetcher | The Nutch robot. |
org.apache.nutch.indexer | Maintain Lucene full-text indexes. |
org.apache.nutch.indexer.basic | A basic indexing plugin. |
org.apache.nutch.indexer.more | A more indexing plugin. |
org.apache.nutch.indexer.solr | |
org.apache.nutch.microformats.reltag | A microformats Rel-Tag Parser/Indexer/Querier plugin. |
org.apache.nutch.protocol | |
org.apache.nutch.protocol.file | Protocol plugin which supports retrieving local file resources. |
org.apache.nutch.protocol.ftp | Protocol plugin which supports retrieving documents via the ftp protocol. |
org.apache.nutch.protocol.http | Protocol plugin which supports retrieving documents via the http protocol. |
org.apache.nutch.protocol.http.api | Common API used by HTTP plugins (http ,
httpclient ) |
org.apache.nutch.protocol.httpclient | Protocol plugin which supports retrieving documents via the HTTP and HTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server. |
org.apache.nutch.scoring | |
org.apache.nutch.scoring.opic | |
org.apache.nutch.scoring.webgraph | |
org.apache.nutch.segment | |
org.apache.nutch.tools | |
org.apache.nutch.util.domain | org.apache.nutch.util.domain |
org.creativecommons.nutch | Sample plugins that parse and index Creative Commons medadata. |
Uses of CrawlDatum in org.apache.nutch.analysis.lang |
---|
Methods in org.apache.nutch.analysis.lang with parameters of type CrawlDatum | |
---|---|
NutchDocument |
LanguageIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of CrawlDatum in org.apache.nutch.crawl |
---|
Fields in org.apache.nutch.crawl declared as CrawlDatum | |
---|---|
CrawlDatum |
Generator.SelectorEntry.datum
|
Methods in org.apache.nutch.crawl that return CrawlDatum | |
---|---|
CrawlDatum |
AbstractFetchSchedule.forceRefetch(Text url,
CrawlDatum datum,
boolean asap)
This method resets fetchTime, fetchInterval, modifiedTime, retriesSinceFetch and page signature, so that it forces refetching. |
CrawlDatum |
FetchSchedule.forceRefetch(Text url,
CrawlDatum datum,
boolean asap)
This method resets fetchTime, fetchInterval, modifiedTime and page signature, so that it forces refetching. |
CrawlDatum |
CrawlDbReader.get(String crawlDb,
String url,
Configuration config)
|
CrawlDatum |
AbstractFetchSchedule.initializeSchedule(Text url,
CrawlDatum datum)
Initialize fetch schedule related data. |
CrawlDatum |
FetchSchedule.initializeSchedule(Text url,
CrawlDatum datum)
Initialize fetch schedule related data. |
static CrawlDatum |
CrawlDatum.read(DataInput in)
|
CrawlDatum |
AbstractFetchSchedule.setFetchSchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime,
long modifiedTime,
int state)
Sets the fetchInterval and fetchTime on a
successfully fetched page. |
CrawlDatum |
FetchSchedule.setFetchSchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime,
long modifiedTime,
int state)
Sets the fetchInterval and fetchTime on a
successfully fetched page. |
CrawlDatum |
DefaultFetchSchedule.setFetchSchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime,
long modifiedTime,
int state)
|
CrawlDatum |
AdaptiveFetchSchedule.setFetchSchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime,
long modifiedTime,
int state)
|
CrawlDatum |
AbstractFetchSchedule.setPageGoneSchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime)
This method specifies how to schedule refetching of pages marked as GONE. |
CrawlDatum |
FetchSchedule.setPageGoneSchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime)
This method specifies how to schedule refetching of pages marked as GONE. |
CrawlDatum |
AbstractFetchSchedule.setPageRetrySchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime)
This method adjusts the fetch schedule if fetching needs to be re-tried due to transient errors. |
CrawlDatum |
FetchSchedule.setPageRetrySchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime)
This method adjusts the fetch schedule if fetching needs to be re-tried due to transient errors. |
Methods in org.apache.nutch.crawl that return types with arguments of type CrawlDatum | |
---|---|
RecordWriter<Text,CrawlDatum> |
CrawlDbReader.CrawlDatumCsvOutputFormat.getRecordWriter(FileSystem fs,
JobConf job,
String name,
Progressable progress)
|
Methods in org.apache.nutch.crawl with parameters of type CrawlDatum | |
---|---|
long |
AbstractFetchSchedule.calculateLastFetchTime(CrawlDatum datum)
This method return the last fetch time of the CrawlDatum |
long |
FetchSchedule.calculateLastFetchTime(CrawlDatum datum)
Calculates last fetch time of the given CrawlDatum. |
int |
CrawlDatum.compareTo(CrawlDatum that)
Sort by decreasing score. |
CrawlDatum |
AbstractFetchSchedule.forceRefetch(Text url,
CrawlDatum datum,
boolean asap)
This method resets fetchTime, fetchInterval, modifiedTime, retriesSinceFetch and page signature, so that it forces refetching. |
CrawlDatum |
FetchSchedule.forceRefetch(Text url,
CrawlDatum datum,
boolean asap)
This method resets fetchTime, fetchInterval, modifiedTime and page signature, so that it forces refetching. |
static boolean |
CrawlDatum.hasDbStatus(CrawlDatum datum)
|
static boolean |
CrawlDatum.hasFetchStatus(CrawlDatum datum)
|
CrawlDatum |
AbstractFetchSchedule.initializeSchedule(Text url,
CrawlDatum datum)
Initialize fetch schedule related data. |
CrawlDatum |
FetchSchedule.initializeSchedule(Text url,
CrawlDatum datum)
Initialize fetch schedule related data. |
void |
Generator.Selector.map(Text key,
CrawlDatum value,
OutputCollector<FloatWritable,Generator.SelectorEntry> output,
Reporter reporter)
Select & invert subset due for fetch. |
void |
CrawlDbReader.CrawlDbTopNMapper.map(Text key,
CrawlDatum value,
OutputCollector<FloatWritable,Text> output,
Reporter reporter)
|
void |
Generator.CrawlDbUpdater.map(Text key,
CrawlDatum value,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
CrawlDbFilter.map(Text key,
CrawlDatum value,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
CrawlDbReader.CrawlDbStatMapper.map(Text key,
CrawlDatum value,
OutputCollector<Text,LongWritable> output,
Reporter reporter)
|
void |
CrawlDatum.putAllMetaData(CrawlDatum other)
Add all metadata from other CrawlDatum to this CrawlDatum. |
void |
CrawlDatum.set(CrawlDatum that)
Copy the contents of another instance into this instance. |
CrawlDatum |
AbstractFetchSchedule.setFetchSchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime,
long modifiedTime,
int state)
Sets the fetchInterval and fetchTime on a
successfully fetched page. |
CrawlDatum |
FetchSchedule.setFetchSchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime,
long modifiedTime,
int state)
Sets the fetchInterval and fetchTime on a
successfully fetched page. |
CrawlDatum |
DefaultFetchSchedule.setFetchSchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime,
long modifiedTime,
int state)
|
CrawlDatum |
AdaptiveFetchSchedule.setFetchSchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime,
long modifiedTime,
int state)
|
CrawlDatum |
AbstractFetchSchedule.setPageGoneSchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime)
This method specifies how to schedule refetching of pages marked as GONE. |
CrawlDatum |
FetchSchedule.setPageGoneSchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime)
This method specifies how to schedule refetching of pages marked as GONE. |
CrawlDatum |
AbstractFetchSchedule.setPageRetrySchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime)
This method adjusts the fetch schedule if fetching needs to be re-tried due to transient errors. |
CrawlDatum |
FetchSchedule.setPageRetrySchedule(Text url,
CrawlDatum datum,
long prevFetchTime,
long prevModifiedTime,
long fetchTime)
This method adjusts the fetch schedule if fetching needs to be re-tried due to transient errors. |
boolean |
AbstractFetchSchedule.shouldFetch(Text url,
CrawlDatum datum,
long curTime)
This method provides information whether the page is suitable for selection in the current fetchlist. |
boolean |
FetchSchedule.shouldFetch(Text url,
CrawlDatum datum,
long curTime)
This method provides information whether the page is suitable for selection in the current fetchlist. |
void |
CrawlDbReader.CrawlDatumCsvOutputFormat.LineRecordWriter.write(Text key,
CrawlDatum value)
|
Method parameters in org.apache.nutch.crawl with type arguments of type CrawlDatum | |
---|---|
void |
Generator.CrawlDbUpdater.map(Text key,
CrawlDatum value,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
CrawlDbFilter.map(Text key,
CrawlDatum value,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
Injector.InjectMapper.map(WritableComparable key,
Text value,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
CrawlDbMerger.Merger.reduce(Text key,
Iterator<CrawlDatum> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
CrawlDbMerger.Merger.reduce(Text key,
Iterator<CrawlDatum> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
Injector.InjectReducer.reduce(Text key,
Iterator<CrawlDatum> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
Injector.InjectReducer.reduce(Text key,
Iterator<CrawlDatum> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
Generator.CrawlDbUpdater.reduce(Text key,
Iterator<CrawlDatum> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
Generator.CrawlDbUpdater.reduce(Text key,
Iterator<CrawlDatum> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
CrawlDbReducer.reduce(Text key,
Iterator<CrawlDatum> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
CrawlDbReducer.reduce(Text key,
Iterator<CrawlDatum> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
Generator.PartitionReducer.reduce(Text key,
Iterator<Generator.SelectorEntry> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
Uses of CrawlDatum in org.apache.nutch.fetcher |
---|
Methods in org.apache.nutch.fetcher that return CrawlDatum | |
---|---|
CrawlDatum |
FetcherOutput.getCrawlDatum()
|
Method parameters in org.apache.nutch.fetcher with type arguments of type CrawlDatum | |
---|---|
void |
Fetcher.run(RecordReader<Text,CrawlDatum> input,
OutputCollector<Text,NutchWritable> output,
Reporter reporter)
|
Constructors in org.apache.nutch.fetcher with parameters of type CrawlDatum | |
---|---|
FetcherOutput(CrawlDatum crawlDatum,
Content content,
ParseImpl parse)
|
Uses of CrawlDatum in org.apache.nutch.indexer |
---|
Methods in org.apache.nutch.indexer with parameters of type CrawlDatum | |
---|---|
NutchDocument |
IndexingFilters.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Run all defined filters. |
NutchDocument |
IndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a parse. |
Uses of CrawlDatum in org.apache.nutch.indexer.basic |
---|
Methods in org.apache.nutch.indexer.basic with parameters of type CrawlDatum | |
---|---|
NutchDocument |
BasicIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of CrawlDatum in org.apache.nutch.indexer.more |
---|
Methods in org.apache.nutch.indexer.more with parameters of type CrawlDatum | |
---|---|
NutchDocument |
MoreIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of CrawlDatum in org.apache.nutch.indexer.solr |
---|
Methods in org.apache.nutch.indexer.solr with parameters of type CrawlDatum | |
---|---|
void |
SolrClean.DBFilter.map(Text key,
CrawlDatum value,
OutputCollector<ByteWritable,Text> output,
Reporter reporter)
|
Uses of CrawlDatum in org.apache.nutch.microformats.reltag |
---|
Methods in org.apache.nutch.microformats.reltag with parameters of type CrawlDatum | |
---|---|
NutchDocument |
RelTagIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
Uses of CrawlDatum in org.apache.nutch.protocol |
---|
Methods in org.apache.nutch.protocol with parameters of type CrawlDatum | |
---|---|
ProtocolOutput |
Protocol.getProtocolOutput(Text url,
CrawlDatum datum)
Returns the Content for a fetchlist entry. |
RobotRules |
Protocol.getRobotRules(Text url,
CrawlDatum datum)
Retrieve robot rules applicable for this url. |
Uses of CrawlDatum in org.apache.nutch.protocol.file |
---|
Methods in org.apache.nutch.protocol.file with parameters of type CrawlDatum | |
---|---|
ProtocolOutput |
File.getProtocolOutput(Text url,
CrawlDatum datum)
|
RobotRules |
File.getRobotRules(Text url,
CrawlDatum datum)
|
Constructors in org.apache.nutch.protocol.file with parameters of type CrawlDatum | |
---|---|
FileResponse(URL url,
CrawlDatum datum,
File file,
Configuration conf)
|
Uses of CrawlDatum in org.apache.nutch.protocol.ftp |
---|
Methods in org.apache.nutch.protocol.ftp with parameters of type CrawlDatum | |
---|---|
ProtocolOutput |
Ftp.getProtocolOutput(Text url,
CrawlDatum datum)
|
RobotRules |
Ftp.getRobotRules(Text url,
CrawlDatum datum)
|
Constructors in org.apache.nutch.protocol.ftp with parameters of type CrawlDatum | |
---|---|
FtpResponse(URL url,
CrawlDatum datum,
Ftp ftp,
Configuration conf)
|
Uses of CrawlDatum in org.apache.nutch.protocol.http |
---|
Methods in org.apache.nutch.protocol.http with parameters of type CrawlDatum | |
---|---|
protected Response |
Http.getResponse(URL url,
CrawlDatum datum,
boolean redirect)
|
Constructors in org.apache.nutch.protocol.http with parameters of type CrawlDatum | |
---|---|
HttpResponse(HttpBase http,
URL url,
CrawlDatum datum)
|
Uses of CrawlDatum in org.apache.nutch.protocol.http.api |
---|
Methods in org.apache.nutch.protocol.http.api with parameters of type CrawlDatum | |
---|---|
ProtocolOutput |
HttpBase.getProtocolOutput(Text url,
CrawlDatum datum)
|
protected abstract Response |
HttpBase.getResponse(URL url,
CrawlDatum datum,
boolean followRedirects)
|
RobotRules |
HttpBase.getRobotRules(Text url,
CrawlDatum datum)
|
Uses of CrawlDatum in org.apache.nutch.protocol.httpclient |
---|
Methods in org.apache.nutch.protocol.httpclient with parameters of type CrawlDatum | |
---|---|
protected Response |
Http.getResponse(URL url,
CrawlDatum datum,
boolean redirect)
Fetches the url with a configured HTTP client and
gets the response. |
Uses of CrawlDatum in org.apache.nutch.scoring |
---|
Methods in org.apache.nutch.scoring that return CrawlDatum | |
---|---|
CrawlDatum |
ScoringFilters.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
|
CrawlDatum |
ScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
Distribute score value from the current page to all its outlinked pages. |
Methods in org.apache.nutch.scoring with parameters of type CrawlDatum | |
---|---|
CrawlDatum |
ScoringFilters.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
|
CrawlDatum |
ScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
Distribute score value from the current page to all its outlinked pages. |
float |
ScoringFilters.generatorSortValue(Text url,
CrawlDatum datum,
float initSort)
Calculate a sort value for Generate. |
float |
ScoringFilter.generatorSortValue(Text url,
CrawlDatum datum,
float initSort)
This method prepares a sort value for the purpose of sorting and selecting top N scoring pages during fetchlist generation. |
float |
ScoringFilters.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
|
float |
ScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
This method calculates a Lucene document boost. |
void |
ScoringFilters.initialScore(Text url,
CrawlDatum datum)
Calculate a new initial score, used when adding newly discovered pages. |
void |
ScoringFilter.initialScore(Text url,
CrawlDatum datum)
Set an initial score for newly discovered pages. |
void |
ScoringFilters.injectedScore(Text url,
CrawlDatum datum)
Calculate a new initial score, used when injecting new pages. |
void |
ScoringFilter.injectedScore(Text url,
CrawlDatum datum)
Set an initial score for newly injected pages. |
void |
ScoringFilters.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content)
|
void |
ScoringFilter.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content)
This method takes all relevant score information from the current datum (coming from a generated fetchlist) and stores it into Content metadata. |
void |
ScoringFilters.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinked)
Calculate updated page score during CrawlDb.update(). |
void |
ScoringFilter.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinked)
This method calculates a new score of CrawlDatum during CrawlDb update, based on the initial value of the original CrawlDatum, and also score values contributed by inlinked pages. |
Method parameters in org.apache.nutch.scoring with type arguments of type CrawlDatum | |
---|---|
CrawlDatum |
ScoringFilters.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
|
CrawlDatum |
ScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
Distribute score value from the current page to all its outlinked pages. |
void |
ScoringFilters.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinked)
Calculate updated page score during CrawlDb.update(). |
void |
ScoringFilter.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List<CrawlDatum> inlinked)
This method calculates a new score of CrawlDatum during CrawlDb update, based on the initial value of the original CrawlDatum, and also score values contributed by inlinked pages. |
Uses of CrawlDatum in org.apache.nutch.scoring.opic |
---|
Methods in org.apache.nutch.scoring.opic that return CrawlDatum | |
---|---|
CrawlDatum |
OPICScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply. |
Methods in org.apache.nutch.scoring.opic with parameters of type CrawlDatum | |
---|---|
CrawlDatum |
OPICScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply. |
float |
OPICScoringFilter.generatorSortValue(Text url,
CrawlDatum datum,
float initSort)
Use getScore() . |
float |
OPICScoringFilter.indexerScore(Text url,
NutchDocument doc,
CrawlDatum dbDatum,
CrawlDatum fetchDatum,
Parse parse,
Inlinks inlinks,
float initScore)
Dampen the boost value by scorePower. |
void |
OPICScoringFilter.initialScore(Text url,
CrawlDatum datum)
Set to 0.0f (unknown value) - inlink contributions will bring it to a correct level. |
void |
OPICScoringFilter.injectedScore(Text url,
CrawlDatum datum)
|
void |
OPICScoringFilter.passScoreBeforeParsing(Text url,
CrawlDatum datum,
Content content)
Store a float value of CrawlDatum.getScore() under Fetcher.SCORE_KEY. |
void |
OPICScoringFilter.updateDbScore(Text url,
CrawlDatum old,
CrawlDatum datum,
List inlinked)
Increase the score by a sum of inlinked scores. |
Method parameters in org.apache.nutch.scoring.opic with type arguments of type CrawlDatum | |
---|---|
CrawlDatum |
OPICScoringFilter.distributeScoreToOutlinks(Text fromUrl,
ParseData parseData,
Collection<Map.Entry<Text,CrawlDatum>> targets,
CrawlDatum adjust,
int allCount)
Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply. |
Uses of CrawlDatum in org.apache.nutch.scoring.webgraph |
---|
Method parameters in org.apache.nutch.scoring.webgraph with type arguments of type CrawlDatum | |
---|---|
void |
ScoreUpdater.reduce(Text key,
Iterator<ObjectWritable> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
Creates new CrawlDatum objects with the updated score from the NodeDb or with a cleared score. |
Uses of CrawlDatum in org.apache.nutch.segment |
---|
Methods in org.apache.nutch.segment with parameters of type CrawlDatum | |
---|---|
boolean |
SegmentMergeFilters.filter(WritableComparable key,
CrawlDatum generateData,
CrawlDatum fetchData,
CrawlDatum sigData,
Content content,
ParseData parseData,
ParseText parseText,
Collection<CrawlDatum> linked)
Iterates over all SegmentMergeFilter extensions and if any of them
returns false, it will return false as well. |
boolean |
SegmentMergeFilter.filter(WritableComparable key,
CrawlDatum generateData,
CrawlDatum fetchData,
CrawlDatum sigData,
Content content,
ParseData parseData,
ParseText parseText,
Collection<CrawlDatum> linked)
The filtering method which gets all information being merged for a given key (URL). |
Method parameters in org.apache.nutch.segment with type arguments of type CrawlDatum | |
---|---|
boolean |
SegmentMergeFilters.filter(WritableComparable key,
CrawlDatum generateData,
CrawlDatum fetchData,
CrawlDatum sigData,
Content content,
ParseData parseData,
ParseText parseText,
Collection<CrawlDatum> linked)
Iterates over all SegmentMergeFilter extensions and if any of them
returns false, it will return false as well. |
boolean |
SegmentMergeFilter.filter(WritableComparable key,
CrawlDatum generateData,
CrawlDatum fetchData,
CrawlDatum sigData,
Content content,
ParseData parseData,
ParseText parseText,
Collection<CrawlDatum> linked)
The filtering method which gets all information being merged for a given key (URL). |
Uses of CrawlDatum in org.apache.nutch.tools |
---|
Methods in org.apache.nutch.tools with parameters of type CrawlDatum | |
---|---|
void |
CrawlDBScanner.map(Text url,
CrawlDatum crawlDatum,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
Method parameters in org.apache.nutch.tools with type arguments of type CrawlDatum | |
---|---|
void |
CrawlDBScanner.map(Text url,
CrawlDatum crawlDatum,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
CrawlDBScanner.reduce(Text key,
Iterator<CrawlDatum> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
CrawlDBScanner.reduce(Text key,
Iterator<CrawlDatum> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
void |
FreeGenerator.FG.reduce(Text key,
Iterator<Generator.SelectorEntry> values,
OutputCollector<Text,CrawlDatum> output,
Reporter reporter)
|
Uses of CrawlDatum in org.apache.nutch.util.domain |
---|
Methods in org.apache.nutch.util.domain with parameters of type CrawlDatum | |
---|---|
void |
DomainStatistics.map(Text urlText,
CrawlDatum datum,
OutputCollector<Text,LongWritable> output,
Reporter reporter)
|
Uses of CrawlDatum in org.creativecommons.nutch |
---|
Methods in org.creativecommons.nutch with parameters of type CrawlDatum | |
---|---|
NutchDocument |
CCIndexingFilter.filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
|
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |