Class | Description |
---|---|
CommandRunner | |
CrawlCompletionStats |
Extracts some simple crawl completion stats from the crawldb
Stats will be sorted by host/domain and will be of the form:
1 www.spitzer.caltech.edu FETCHED
50 www.spitzer.caltech.edu UNFETCHED
|
CrawlCompletionStats.CrawlCompletionStatsCombiner | |
DeflateUtils |
A collection of utility methods for working on deflated data.
|
DomUtil | |
DumpFileUtil | |
EncodingDetector |
A simple class for detecting character encodings.
|
FSUtils |
Utility methods for common filesystem operations.
|
GenericWritableConfigurable |
A generic Writable wrapper that can inject Configuration to
Configurable s |
GZIPUtils |
A collection of utility methods for working on GZIPed data.
|
HadoopFSUtil | |
JexlUtil |
A collection of Jexl utilit(y|ies).
|
LockUtil |
Utility methods for handling application-level locking.
|
MimeUtil | |
NodeWalker |
A utility class that allows the walking of any DOM tree using a stack instead
of recursion.
|
NutchConfiguration |
Utility to create Hadoop
Configuration s that include Nutch-specific
resources. |
NutchJob |
A
JobConf for Nutch jobs. |
NutchTool | |
ObjectCache | |
PrefixStringMatcher |
A class for efficiently matching
String s against a set of
prefixes. |
ProtocolStatusStatistics |
Extracts protocol status code information from the crawl database.
|
ProtocolStatusStatistics.ProtocolStatusStatisticsCombiner | |
StringUtil |
A collection of String processing utility methods.
|
SuffixStringMatcher |
A class for efficiently matching
String s against a set of
suffixes. |
TableUtil | |
TimingUtil | |
TrieStringMatcher |
TrieStringMatcher is a base class for simple tree-based string matching.
|
URLUtil |
Utility class for URL analysis
|
Copyright © 2017 The Apache Software Foundation