Package | Description |
---|---|
org.apache.nutch.fetcher |
The Nutch robot.
|
org.apache.nutch.hostdb | |
org.apache.nutch.indexer |
Index content, configure and run indexing and cleaning jobs to
add, update, and delete documents from an index.
|
org.apache.nutch.metadata |
A Multi-valued Metadata container, and set
of constant fields for Nutch Metadata.
|
org.apache.nutch.scoring.webgraph | |
org.apache.nutch.segment |
A segment stores all data from on generate/fetch/update cycle:
fetch list, protocol status, raw content, parsed content, and extracted outgoing links.
|
org.apache.nutch.tools.arc |
Tools to read the
Arc file format.
|
org.apache.nutch.tools.warc |
Tools to import / export between Nutch segments and
WARC archives.
|
Modifier and Type | Method and Description |
---|---|
RecordWriter<Text,NutchWritable> |
FetcherOutputFormat.getRecordWriter(FileSystem fs,
JobConf job,
String name,
Progressable progress) |
Modifier and Type | Method and Description |
---|---|
void |
Fetcher.run(RecordReader<Text,CrawlDatum> input,
OutputCollector<Text,NutchWritable> output,
Reporter reporter) |
Constructor and Description |
---|
FetcherThread(Configuration conf,
AtomicInteger activeThreads,
FetchItemQueues fetchQueues,
QueueFeeder feeder,
AtomicInteger spinWaiting,
AtomicLong lastRequestStart,
Reporter reporter,
AtomicInteger errors,
String segmentName,
boolean parsing,
OutputCollector<Text,NutchWritable> output,
boolean storingContent,
AtomicInteger pages,
AtomicLong bytes) |
Modifier and Type | Method and Description |
---|---|
void |
UpdateHostDbMapper.map(Text key,
Writable value,
OutputCollector<Text,NutchWritable> output,
Reporter reporter)
Mapper ingesting records from the HostDB, CrawlDB and plaintext host
scores file.
|
void |
UpdateHostDbReducer.reduce(Text key,
Iterator<NutchWritable> values,
OutputCollector<Text,HostDatum> output,
Reporter reporter) |
Modifier and Type | Method and Description |
---|---|
void |
IndexerMapReduce.map(Text key,
Writable value,
OutputCollector<Text,NutchWritable> output,
Reporter reporter) |
void |
IndexerMapReduce.reduce(Text key,
Iterator<NutchWritable> values,
OutputCollector<Text,NutchIndexAction> output,
Reporter reporter) |
Modifier and Type | Class and Description |
---|---|
class |
MetaWrapper
This is a simple decorator that adds metadata to any Writable-s that can be
serialized by NutchWritable.
|
Modifier and Type | Method and Description |
---|---|
void |
WebGraph.OutlinkDb.map(Text key,
Writable value,
OutputCollector<Text,NutchWritable> output,
Reporter reporter)
Passes through existing LinkDatum objects from an existing OutlinkDb and
maps out new LinkDatum objects from new crawls ParseData.
|
void |
WebGraph.OutlinkDb.reduce(Text key,
Iterator<NutchWritable> values,
OutputCollector<Text,LinkDatum> output,
Reporter reporter) |
Modifier and Type | Method and Description |
---|---|
void |
SegmentReader.InputCompatMapper.map(WritableComparable<?> key,
Writable value,
OutputCollector<Text,NutchWritable> collector,
Reporter reporter) |
void |
SegmentReader.reduce(Text key,
Iterator<NutchWritable> values,
OutputCollector<Text,Text> output,
Reporter reporter) |
Modifier and Type | Method and Description |
---|---|
void |
ArcSegmentCreator.map(Text key,
BytesWritable bytes,
OutputCollector<Text,NutchWritable> output,
Reporter reporter)
Runs the Map job to translate an arc record into output for Nutch segments.
|
Modifier and Type | Method and Description |
---|---|
void |
WARCExporter.WARCReducer.map(Text key,
Writable value,
OutputCollector<Text,NutchWritable> output,
Reporter reporter) |
void |
WARCExporter.WARCReducer.reduce(Text key,
Iterator<NutchWritable> values,
OutputCollector<NullWritable,com.martinkl.warc.WARCWritable> output,
Reporter reporter) |
Copyright © 2016 The Apache Software Foundation