|
|||||||||||
PREV NEXT | FRAMES NO FRAMES |
NutchAnalyzer
plugins.item
, with the supplied
priority
.
Configuration
.
String
can be decoded in reverse and the
first character is represented by a terminal node.
String
can be decoded and the last character is
represented by a terminal node.
CircularDependencyException
will be thrown if a circular
dependency is detected.OnlineClusterer
extension using clustering components of the Carrot2 project
(http://carrot2.sourceforge.net).HitDetails
objects) and
their previously extracted summaries (String
s).
OnlineClusterer
for documentation.
true
if item
exists in this
FibonacciHeap
, false otherwise.
Configuration
for Nutch.
RegexRule
.
application/octet-stream
MimeType
priority
value associated with
item
.
Extension
is a kind of listener descriptor that will be
installed on a concrete ExtensionPoint
that acts as kind of
Publisher.ExtensionPoint
provide meta information of a extension
point.HitSummarizer
and HitContent
for a set of
fetched segments.FibonacciHeap
.
CrawlDatum.getScore()
.
analyzer
implementation
given a language code.
Configuration
for Nutch front-end.
List
of RSSChannel
s that the listener parsed from
the RSS document.
Configurable
String
description of the RSS Channel.
i
th field.
i
th hit in this list.
robotsMeta
to appropriate
values, based on any META tags found under the given
node
.
Outlink
from given plain text.
Outlink
from given plain text and adds anchor
to the extracted Outlink
s
node
, and creates appropriate Outlink
records for each (relative to the supplied base
URL), and adds them to the outlinks
ArrayList
.
Microsoft document
extractor
.
ParseImpl
.
Parser
instance with the specified
extId
, representing its extension ID.
Parser
s for a given content type.
Plugin
class.
null
.
Properties
of the Microsoft document.
Protocol
implementation for a url.
Content
for a fetchlist entry.
Summarizer
extension.
StringBuffer
and a DOM Node
,
and will append all the content text found beneath the DOM node to
the StringBuffer
.
getText(sb, node, false)
.
StringBuffer
and a DOM Node
,
and will append the content text found beneath the first
title
node to the StringBuffer
.
i
th field.
RawCluster
interface to
HitsCluster
interface.HtmlParseFilter
implementing plugins.Searcher
and HitDetailer
for either a single
merged index, or a set of indexes.ObjectWritable
, to permit merging different
types in reduce.IndexingFilter
implementing plugins.Inlink
s.false
if the robots.txt
file
prohibits us from accessing the given path
, or
true
otherwise.
true
if this cluster constains documents
that did not fit anywhere else (presentation layer may
discard such clusters).
IndexingFilter
that
add a lang
(language) field to the document.s
padded with leading spaces so
that it's length is length
.
input that is matched,
or null if no match exists.
- longestMatch(String) -
Method in class org.apache.nutch.util.SuffixStringMatcher
- Returns the longest suffix of
input that is matched,
or null if no match exists.
- longestMatch(String) -
Method in class org.apache.nutch.util.TrieStringMatcher
- Returns the longest substring of
input that is
matched by a pattern in the trie, or null if no match
exists.
- lookingAhead -
Variable in class org.apache.nutch.analysis.NutchAnalysis
-
application/vnd.ms-excel
).
application/vnd.ms-powerpoint
).
application/msword
).
java.util.HashMap
.MissingDependencyException
will be thrown if a plugin
dependency cannot be found.TrieStringMatcher.TrieNode
visited, given that you are at
node
, and the the next character in the input is
the idx
'th character of s
.
String
is matched by a
prefix in the trie
String
is matched by a
suffix in the trie
String
is matched by a
pattern in the trie
Configuration
s that include Nutch-specific
resources.RawDocument
required for Carrot2.summary
and wrapping
a details
hit details.
JobConf
for Nutch jobs.OnlineClusterer
extensions.Ontology
extensions.Outlink
s
/ URLs from plain text using Regular Expressions.Plugin
System.http
,
httpclient
)Parser
s to obtain
Parse
objects.Protocol
implementation.Parser
plugins.PluginClassLoader
contains only classes of the runtime
libraries setuped in the plugin manifest file and exported libraries of
plugins that are required pluguin.PluginDescriptor
provide access to all meta information of
a nutch-plugin, as well to the internationalizable resources and the plugin
own classloader.PluginManifestParser
parser just parse the manifest file
in all plugin directories.PluginRuntimeException
will be thrown until a exception in the
plugin managemnt occurs.String
s against a set
of prefixes.PrefixStringMatcher
which will match
String
s with any prefix in the supplied array.
PrefixStringMatcher
which will match
String
s with any prefix in the supplied
Collection
.
ProtocolException
instead.Protocol
plugins.Parser
s
until a successful parse is performed and a Parse
object is
returned.
Content
object using the Parser
specified
by the parameter extId
, i.e., the Parser's extension ID.
Content
metadata.
FibonacciHeap.popMin()
would, without
removing it.
QueryFilter
implementing plugins.Java Regex implementation
.URL filter
based on
regular expressions.IndexingFilter
that
add tag
field(s) to the document."tag:" query clauses.- RelTagQueryFilter() -
Constructor for class org.apache.nutch.microformats.reltag.RelTagQueryFilter
-
- Response - interface org.apache.nutch.net.protocols.Response.
- A response inteface.
- RobotRulesParser - class org.apache.nutch.protocol.http.api.RobotRulesParser.
- This class handles the parsing of
robots.txt
files. - RobotRulesParser(Configuration) -
Constructor for class org.apache.nutch.protocol.http.api.RobotRulesParser
-
- RobotRulesParser.RobotRuleSet - class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet.
- This class holds the rules which were parsed from a robots.txt
file, and can test paths against those rules.
- RobotRulesParser.RobotRuleSet() -
Constructor for class org.apache.nutch.protocol.http.api.RobotRulesParser.RobotRuleSet
-
- rdfidToLabel(String) -
Method in class org.apache.nutch.ontology.jena.OwlParser
-
- read(DataInput) -
Static method in class org.apache.nutch.crawl.CrawlDatum
-
- read(DataInput) -
Static method in class org.apache.nutch.crawl.Inlink
-
- read(DataInput) -
Static method in class org.apache.nutch.parse.Outlink
-
- read(DataInput) -
Static method in class org.apache.nutch.parse.ParseData
-
- read(DataInput, Configuration) -
Static method in class org.apache.nutch.parse.ParseImpl
-
- read(DataInput) -
Static method in class org.apache.nutch.parse.ParseStatus
-
- read(DataInput) -
Static method in class org.apache.nutch.parse.ParseText
-
- read(DataInput) -
Static method in class org.apache.nutch.protocol.Content
-
- read(DataInput) -
Static method in class org.apache.nutch.protocol.ProtocolStatus
-
- read(DataInput) -
Static method in class org.apache.nutch.searcher.HitDetails
- Constructs, reads and returns an instance.
- read(DataInput, Configuration) -
Static method in class org.apache.nutch.searcher.Query.Clause
-
- read(DataInput) -
Static method in class org.apache.nutch.searcher.Query.Phrase
-
- read(DataInput) -
Static method in class org.apache.nutch.searcher.Query.Term
-
- read(DataInput, Configuration) -
Static method in class org.apache.nutch.searcher.Query
-
- read(DataInput) -
Static method in class org.apache.nutch.searcher.Summary
-
- readFields(DataInput) -
Method in class org.apache.nutch.crawl.CrawlDatum
-
- readFields(DataInput) -
Method in class org.apache.nutch.crawl.Generator.SelectorEntry
-
- readFields(DataInput) -
Method in class org.apache.nutch.crawl.Inlink
-
- readFields(DataInput) -
Method in class org.apache.nutch.crawl.Inlinks
-
- readFields(DataInput) -
Method in class org.apache.nutch.crawl.MapWritable
-
- readFields(DataInput) -
Method in class org.apache.nutch.fetcher.FetcherOutput
-
- readFields(DataInput) -
Method in class org.apache.nutch.indexer.DeleteDuplicates.HashScore
-
- readFields(DataInput) -
Method in class org.apache.nutch.indexer.DeleteDuplicates.IndexDoc
-
- readFields(DataInput) -
Method in class org.apache.nutch.metadata.Metadata
-
- readFields(DataInput) -
Method in class org.apache.nutch.parse.Outlink
-
- readFields(DataInput) -
Method in class org.apache.nutch.parse.ParseData
-
- readFields(DataInput) -
Method in class org.apache.nutch.parse.ParseImpl
-
- readFields(DataInput) -
Method in class org.apache.nutch.parse.ParseStatus
-
- readFields(DataInput) -
Method in class org.apache.nutch.parse.ParseText
-
- readFields(DataInput) -
Method in class org.apache.nutch.protocol.ProtocolStatus
-
- readFields(DataInput) -
Method in class org.apache.nutch.searcher.Hit
-
- readFields(DataInput) -
Method in class org.apache.nutch.searcher.HitDetails
-
- readFields(DataInput) -
Method in class org.apache.nutch.searcher.Hits
-
- readFields(DataInput) -
Method in class org.apache.nutch.searcher.Query
-
- readFields(DataInput) -
Method in class org.apache.nutch.searcher.Summary
-
- readFieldsCompressed(DataInput) -
Method in class org.apache.nutch.protocol.Content
-
- readUrl(String, String, Configuration) -
Method in class org.apache.nutch.crawl.CrawlDbReader
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.CrawlDbMerger.Merger
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbDumpReducer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbStatReducer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.CrawlDbReader.CrawlDbTopNReducer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.CrawlDbReducer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.Generator.Selector
- Collect until limit is reached.
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.Injector.InjectReducer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.LinkDb.Merger
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.crawl.LinkDb
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.indexer.DeleteDuplicates.HashReducer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.indexer.DeleteDuplicates
- Delete docs named in values from index named in key.
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.indexer.Indexer
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.parse.ParseSegment
-
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.segment.SegmentMerger
- NOTE: in selecting the latest version we rely exclusively on the segment
name (not all segment data contain time information).
- reduce(WritableComparable, Iterator, OutputCollector, Reporter) -
Method in class org.apache.nutch.segment.SegmentReader
-
- regexNormalize(String) -
Method in class org.apache.nutch.net.RegexUrlNormalizer
- This function does the replacements by iterating through all the regex patterns.
- remove(Writable) -
Method in class org.apache.nutch.crawl.MapWritable
-
- remove(String) -
Method in class org.apache.nutch.metadata.Metadata
- Remove a metadata and all its associated values.
- renameFile(String, String) -
Method in class org.apache.nutch.indexer.FsDirectory
-
- renderAnonymous(PrintStream, Resource, String) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- renderClassDescription(PrintStream, OntClass, int) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- renderHierarchy(PrintStream, OntClass, List, int) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- renderRestriction(PrintStream, Restriction) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- renderURI(PrintStream, PrefixMapping, String) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- reset() -
Method in class org.apache.nutch.parse.HTMLMetaTags
- Sets all boolean values to
false
.
- resolveEncodingAlias(String) -
Static method in class org.apache.nutch.util.StringUtil
-
- retrieve(String) -
Static method in class org.apache.nutch.ontology.jena.OntologyImpl
-
- retrieveFile(String, OutputStream, int) -
Method in class org.apache.nutch.protocol.ftp.Client
-
- retrieveList(String, List, int, FTPFileEntryParser) -
Method in class org.apache.nutch.protocol.ftp.Client
-
- rightPad(String, int) -
Static method in class org.apache.nutch.util.StringUtil
- Returns a copy of
s
padded with trailing spaces so
that it's length is length
.
- root -
Variable in class org.apache.nutch.util.TrieStringMatcher
-
- rootClasses(OntModel) -
Method in class org.apache.nutch.ontology.jena.OwlParser
-
- rootClasses(OntModel) -
Method in interface org.apache.nutch.ontology.jena.Parser
-
- run(RecordReader, OutputCollector, Reporter) -
Method in class org.apache.nutch.fetcher.Fetcher
-
- run() -
Method in class org.apache.nutch.searcher.DistributedSearch.Client
-
- run() -
Method in class org.apache.nutch.tools.PruneIndexTool
- For each query, find all matching documents and delete them from all input
indexes.
ScoringFilter
implementing plugins.ObjectWritable
, to permit merging different
types in reduce.ObjectWritable
, to permit merging different
types in reduce.String
s against a set
of suffixes.PrefixStringMatcher
which will match
String
s with any suffix in the supplied array.
PrefixStringMatcher
which will match
String
s with any suffix in the supplied
Collection
Summarizer
extensions.baseHref
.
Configurable
noCache
to true
.
noFollow
to true
.
noIndex
to true
.
refresh
to the supplied value.
refreshHref
.
refreshTime
.
Hits.totalIsExact()
.
input that is matched,
or null if no match exists.
- shortestMatch(String) -
Method in class org.apache.nutch.util.SuffixStringMatcher
- Returns the shortest suffix of
input that is matched,
or null if no match exists.
- shortestMatch(String) -
Method in class org.apache.nutch.util.TrieStringMatcher
- Returns the shortest substring of
input that is
matched by a pattern in the trie, or null if no match
exists.
- shutDown() -
Method in class org.apache.nutch.plugin.Plugin
- Shutdown the plugin.
- shutdown() -
Method in class org.apache.nutch.util.ThreadPool
- Turn off the pool.
- size() -
Method in class org.apache.nutch.crawl.Inlinks
-
- size() -
Method in class org.apache.nutch.crawl.MapWritable
-
- size() -
Method in class org.apache.nutch.metadata.Metadata
- Returns the number of metadata names in this metadata.
- size() -
Method in class org.apache.nutch.util.FibonacciHeap
- Returns the number of objects in the heap.
- skip(DataInput) -
Static method in class org.apache.nutch.crawl.Inlink
- Skips over one Inlink in the input.
- skip(DataInput) -
Static method in class org.apache.nutch.parse.Outlink
- Skips over one Outlink in the input.
- skippedEntity(String) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Receive notification of a skipped entity.
- sort(int) -
Method in class org.apache.nutch.indexer.IndexSorter
-
- start -
Variable in class org.apache.nutch.segment.SegmentReader.SegmentReaderStats
-
- startCDATA() -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Report the start of a CDATA section.
- startDTD(String, String, String) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Report the start of DTD declarations, if any.
- startDocument() -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Receive notification of the beginning of a document.
- startElement(String, String, String, Attributes) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Receive notification of the beginning of an element.
- startEntity(String) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Report the beginning of an entity.
- startPrefixMapping(String, String) -
Method in class org.apache.nutch.parse.html.DOMBuilder
- Begin the scope of a prefix-URI Namespace mapping.
- startProcessing(RequestContext) -
Method in class org.apache.nutch.clustering.carrot2.LocalNutchInputComponent
- A callback hook that starts the processing.
- startUp() -
Method in class org.apache.nutch.plugin.Plugin
- Will be invoked until plugin start up.
- statNames -
Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- subclasses(String) -
Method in interface org.apache.nutch.ontology.Ontology
-
- subclasses(String) -
Method in class org.apache.nutch.ontology.jena.OntologyImpl
- retrieve all subclasses of entity(ies) hashed to searchTerm
- synonyms(String) -
Method in interface org.apache.nutch.ontology.Ontology
-
- synonyms(String) -
Method in class org.apache.nutch.ontology.jena.OntologyImpl
- retrieves synonyms from wordnet via sweet's web interface
StringUtil.toHexString(byte[], String, int)
, where
sep = null; lineLen = Integer.MAX_VALUE
.
Hits.getTotal()
gives the exact number of hits, or false if
it is only an estimate of the total number of hits.
URLFilter
implementing plugins.sizeLimit
bytes, if necessary.
|
|||||||||||
PREV NEXT | FRAMES NO FRAMES |