Package | Description |
---|---|
org.apache.nutch.segment |
A segment stores all data from on generate/fetch/update cycle:
fetch list, protocol status, raw content, parsed content, and extracted outgoing links.
|
Modifier and Type | Method and Description |
---|---|
RecordReader<Text,MetaWrapper> |
SegmentMerger.ObjectInputFormat.getRecordReader(InputSplit split,
JobConf job,
Reporter reporter) |
RecordWriter<Text,MetaWrapper> |
SegmentMerger.SegmentOutputFormat.getRecordWriter(FileSystem fs,
JobConf job,
String name,
Progressable progress) |
Modifier and Type | Method and Description |
---|---|
void |
SegmentMerger.map(Text key,
MetaWrapper value,
OutputCollector<Text,MetaWrapper> output,
Reporter reporter) |
Modifier and Type | Method and Description |
---|---|
void |
SegmentMerger.map(Text key,
MetaWrapper value,
OutputCollector<Text,MetaWrapper> output,
Reporter reporter) |
void |
SegmentMerger.reduce(Text key,
Iterator<MetaWrapper> values,
OutputCollector<Text,MetaWrapper> output,
Reporter reporter)
NOTE: in selecting the latest version we rely exclusively on the segment
name (not all segment data contain time information).
|
void |
SegmentMerger.reduce(Text key,
Iterator<MetaWrapper> values,
OutputCollector<Text,MetaWrapper> output,
Reporter reporter)
NOTE: in selecting the latest version we rely exclusively on the segment
name (not all segment data contain time information).
|
Copyright © 2016 The Apache Software Foundation