Package | Description |
---|---|
org.apache.nutch.collection |
Subcollection is a subset of an index.
|
org.apache.nutch.hostdb | |
org.apache.nutch.indexer |
Index content, configure and run indexing and cleaning jobs to
add, update, and delete documents from an index.
|
org.apache.nutch.net |
Web-related interfaces: URL
filters
and normalizers . |
org.apache.nutch.net.urlnormalizer.ajax | |
org.apache.nutch.net.urlnormalizer.basic |
URL normalizer performing basic normalizations: remove default ports
and dot segments in path.
|
org.apache.nutch.net.urlnormalizer.host |
URL normalizer renaming hosts to a canonical form listed in the
configuration file.
|
org.apache.nutch.net.urlnormalizer.pass |
URL normalizer dummy which does not change URLs.
|
org.apache.nutch.net.urlnormalizer.protocol | |
org.apache.nutch.net.urlnormalizer.querystring |
URL normalizer which sort the elements in the query part to avoid duplicates
by permutations.
|
org.apache.nutch.net.urlnormalizer.regex |
URL normalizer with configurable rules based on regular expressions
(
Pattern ). |
org.apache.nutch.net.urlnormalizer.slash | |
org.apache.nutch.parse |
The
Parse interface and related classes. |
org.apache.nutch.urlfilter.api |
Generic
URL filter library,
abstracting away from regular expression implementations. |
org.apache.nutch.urlfilter.automaton |
URL filter plugin based on
dk.brics.automaton Finite-State
Automata for JavaTM.
|
org.apache.nutch.urlfilter.domain |
URL filter plugin to include only URLs which match an element in a given list of
domain suffixes, domain names, and/or host names.
|
org.apache.nutch.urlfilter.domainblacklist |
URL filter plugin to exclude URLs by domain suffixes, domain names, and/or host names.
|
org.apache.nutch.urlfilter.ignoreexempt |
URL filter plugin which identifies exemptions to external urls when
when external urls are set to ignore.
|
org.apache.nutch.urlfilter.prefix |
URL filter plugin to include only URLs which match one of a given list of URL prefixes.
|
org.apache.nutch.urlfilter.regex |
URL filter plugin to include and/or exclude URLs matching Java regular expressions.
|
org.apache.nutch.urlfilter.suffix |
URL filter plugin to either exclude or include only URLs which match
one of the given (path) suffixes.
|
org.apache.nutch.urlfilter.validator |
URL filter plugin that validates given urls.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilters
Creates and caches
URLFilter implementing plugins. |
URLNormalizers
This class uses a "chained filter" pattern to run defined normalizers.
|
Class and Description |
---|
URLNormalizers
This class uses a "chained filter" pattern to run defined normalizers.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
URLFilterException |
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLNormalizer
Interface used to convert URLs to normal form and optionally perform
substitutions
|
Class and Description |
---|
URLExemptionFilters
Creates and caches
URLExemptionFilter implementing plugins. |
URLFilters
Creates and caches
URLFilter implementing plugins. |
URLNormalizers
This class uses a "chained filter" pattern to run defined normalizers.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLExemptionFilter
Interface used to allow exemptions to external domain resources by overriding
db.ignore.external.links . |
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Class and Description |
---|
URLFilter
Interface used to limit which URLs enter Nutch.
|
Copyright © 2018 The Apache Software Foundation