A B C D E F G H I J K L M N O P Q R S T U V W X Z _

A

ACCESS_DENIED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Access denied - authorization required, but missing/incorrect.
ACRONYM - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
AFTER_EQUALS - Static variable in interface org.apache.nutch.quality.dynamic.PageDescriptionConstants
 
ANCHOR_ANALYZER - Static variable in class org.apache.nutch.analysis.NutchDocumentAnalyzer
Analyzer used to analyze anchors.
APOSTROPHE - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
ATSIGN - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
AUTH_HEADER - Static variable in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
The HTTP Authentication (WWW-Authenticate) header which is returned by a webserver requiring authentication.
ArrayFile - class org.apache.nutch.io.ArrayFile.
A dense file-based mapping from integers to values.
ArrayFile() - Constructor for class org.apache.nutch.io.ArrayFile
 
ArrayFile.Reader - class org.apache.nutch.io.ArrayFile.Reader.
Provide access to an existing array file.
ArrayFile.Reader(NutchFileSystem, String) - Constructor for class org.apache.nutch.io.ArrayFile.Reader
Construct an array reader for the named file.
ArrayFile.Writer - class org.apache.nutch.io.ArrayFile.Writer.
Write a new array file.
ArrayFile.Writer(NutchFileSystem, String, Class) - Constructor for class org.apache.nutch.io.ArrayFile.Writer
Create the named file for values of the named class.
ArrayWritable - class org.apache.nutch.io.ArrayWritable.
A Writable for arrays containing instances of a class.
ArrayWritable() - Constructor for class org.apache.nutch.io.ArrayWritable
 
ArrayWritable(Class) - Constructor for class org.apache.nutch.io.ArrayWritable
 
ArrayWritable(Class, Writable[]) - Constructor for class org.apache.nutch.io.ArrayWritable
 
ArrayWritable(String[]) - Constructor for class org.apache.nutch.io.ArrayWritable
 
abandonBlock(Block, UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
The client would like to let go of the given block
add(Token) - Method in class org.apache.nutch.analysis.lang.NGramProfile
Add ngrams from a token to this profile
add(StringBuffer) - Method in class org.apache.nutch.analysis.lang.NGramProfile
Add ngrams from a single word to this profile
add(InputFormat) - Static method in class org.apache.nutch.mapReduce.InputFormats
Define a named InputFormat.
add(OutputFormat) - Static method in class org.apache.nutch.mapReduce.OutputFormats
Define a named OutputFormat.
add(Summary.Fragment) - Method in class org.apache.nutch.searcher.Summary
Adds a fragment to a summary.
add(Object, int) - Method in class org.apache.nutch.util.FibonacciHeap
Adds the Object item, with the supplied priority.
addAttribute(String, String) - Method in class org.apache.nutch.plugin.Extension
Adds a attribute and is only used until model creation at plugin system start up.
addBlock(Block) - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
addConfResource(String) - Method in class org.apache.nutch.util.NutchConf
Adds a resource name to the chain of resources read.
addConfResource(File) - Method in class org.apache.nutch.util.NutchConf
Adds a file to the chain of resources read.
addDependency(String) - Method in class org.apache.nutch.plugin.PluginDescriptor
Adds a dependency
addEscapes(String) - Static method in class org.apache.nutch.quality.dynamic.TokenMgrError
Replaces unprintable characters by their espaced (or unicode escaped) equivalents in the given string
addExportedLibRelative(String) - Method in class org.apache.nutch.plugin.PluginDescriptor
Adds a exported library with a relative path to the plugin directory.
addExtension(Extension) - Method in class org.apache.nutch.plugin.ExtensionPoint
Install a coresponding extension to this extension point.
addExtension(Extension) - Method in class org.apache.nutch.plugin.PluginDescriptor
Adds a extension.
addExtensionPoint(ExtensionPoint) - Method in class org.apache.nutch.plugin.PluginDescriptor
Adds a extension point.
addFile(UTF8, Block[]) - Method in class org.apache.nutch.ndfs.FSDirectory
Add the given filename to the fs.
addJob(Runnable) - Method in class org.apache.nutch.util.ThreadPool
Post a Runnable to the queue.
addLink(Link) - Method in class org.apache.nutch.db.DistributedWebDBWriter
Add a link to the link database
addLink(Link) - Method in interface org.apache.nutch.db.IWebDBWriter
addLink(Link) will add the given Link to the webdb.
addLink(Link) - Method in class org.apache.nutch.db.WebDBWriter
Add a link to the link database
addName(Class, String) - Static method in class org.apache.nutch.io.WritableName
Add an alternate name for a class.
addNotExportedLibRelative(String) - Method in class org.apache.nutch.plugin.PluginDescriptor
Adds a not exported library with a plugin directory relative path.
addPage(Page) - Method in class org.apache.nutch.db.DistributedWebDBWriter
Add a page to the page database
addPage(Page) - Method in interface org.apache.nutch.db.IWebDBWriter
addPage(Page page) will insert a Page object into the webdb.
addPage(String) - Method in class org.apache.nutch.db.WebDBInjector
Add one page to WebDB.
addPage(Page) - Method in class org.apache.nutch.db.WebDBWriter
Add a page to the page database
addPageIfNotPresent(Page) - Method in class org.apache.nutch.db.DistributedWebDBWriter
Don't replace the one in the database, if there is one.
addPageIfNotPresent(Page, Link) - Method in class org.apache.nutch.db.DistributedWebDBWriter
Don't replace the one in the database, if there is one.
addPageIfNotPresent(Page) - Method in interface org.apache.nutch.db.IWebDBWriter
addPageIfNotPresent(Page) works just like addPage(), except that the insertion will not take place if there is already a Page with that URL in the webdb.
addPageIfNotPresent(Page, Link) - Method in interface org.apache.nutch.db.IWebDBWriter
addPageIfNotPresent(Page, Link) works just like the above addPage(), except that a Link is also conditionally added to the webdb.
addPageIfNotPresent(Page) - Method in class org.apache.nutch.db.WebDBWriter
Don't replace the one in the database, if there is one.
addPageIfNotPresent(Page, Link) - Method in class org.apache.nutch.db.WebDBWriter
Don't replace the one in the database, if there is one.
addPageWithScore(Page) - Method in class org.apache.nutch.db.DistributedWebDBWriter
Add a page to the page database, with a brand-new score
addPageWithScore(Page) - Method in interface org.apache.nutch.db.IWebDBWriter
addPageWithScore(Page page) inserts a Page into the webdb.
addPageWithScore(Page) - Method in class org.apache.nutch.db.WebDBWriter
Add a page to the page database, with a brand-new score
addPatternBackward(String) - Method in class org.apache.nutch.util.TrieStringMatcher
Adds any necessary nodes to the trie so that the given String can be decoded in reverse and the first character is represented by a terminal node.
addPatternForward(String) - Method in class org.apache.nutch.util.TrieStringMatcher
Adds any necessary nodes to the trie so that the given String can be decoded and the last character is represented by a terminal node.
addProhibitedPhrase(String[]) - Method in class org.apache.nutch.searcher.Query
Add a prohibited phrase in the default field.
addProhibitedPhrase(String[], String) - Method in class org.apache.nutch.searcher.Query
Add a prohibited phrase in the specified field.
addProhibitedTerm(String) - Method in class org.apache.nutch.searcher.Query
Add a prohibited term in the default field.
addProhibitedTerm(String, String) - Method in class org.apache.nutch.searcher.Query
Add a prohibited term in the specified field.
addRequiredPhrase(String[]) - Method in class org.apache.nutch.searcher.Query
Add a required phrase in the default field.
addRequiredPhrase(String[], String) - Method in class org.apache.nutch.searcher.Query
Add a required phrase in the specified field.
addRequiredTerm(String) - Method in class org.apache.nutch.searcher.Query
Add a required term in the default field.
addRequiredTerm(String, String) - Method in class org.apache.nutch.searcher.Query
Add a required term in a specified field.
addScore(float) - Method in class org.apache.nutch.util.ScoreStats
Increment the counter in the right place.
addSearchTerm(String, OntResource) - Static method in class org.apache.nutch.ontology.OntologyImpl
 
addUrlFeatures(Document, String) - Method in class org.creativecommons.nutch.CCIndexingFilter
Add the features represented by a license URL.
add_escapes(String) - Method in class org.apache.nutch.quality.dynamic.ParseException
Used to convert raw characters to their escaped version when these raw version cannot be used as part of an ASCII string literal.
adjustBeginLineColumn(int, int) - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
Method to adjust line and column numbers for the start of a token.
analyze(StringBuffer) - Method in class org.apache.nutch.analysis.lang.NGramProfile
Analyze a piece of text.
append(WritableComparable, Writable) - Method in class org.apache.nutch.db.EditSectionGroupWriter
Add an instruction and append it.
append(WritableComparable, Writable) - Method in class org.apache.nutch.db.EditSectionWriter
Add a key/val pair
append(Writable) - Method in class org.apache.nutch.io.ArrayFile.Writer
Append a value to the file.
append(WritableComparable, Writable) - Method in class org.apache.nutch.io.MapFile.Writer
Append a key/value pair to the map.
append(Writable, Writable) - Method in class org.apache.nutch.io.SequenceFile.Writer
Append a key/value pair.
append(byte[], int, int, int) - Method in class org.apache.nutch.io.SequenceFile.Writer
Append a key/value pair.
append(WritableComparable) - Method in class org.apache.nutch.io.SetFile.Writer
Append a key to a set.
append(Node) - Method in class org.apache.nutch.parse.html.DOMBuilder
Append a node to the current container.
append(String) - Method in class org.apache.nutch.parse.msword.WordTextBuffer
 
append(FetcherOutput, Content, ParseText, ParseData) - Method in class org.apache.nutch.segment.SegmentWriter
Append new values to the output segment.
appendInstructionInfo(EditSectionGroupWriter, Link, int, Writable) - Method in class org.apache.nutch.db.DistributedWebDBWriter.LinkInstructionWriter
Append the LinkInstruction info to the indicated SequenceFile and keep the LI for later reuse.
appendInstructionInfo(EditSectionGroupWriter, Page, int, Writable) - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstructionWriter
Append the PageInstruction info to the indicated SequenceFile, and keep the PI for later reuse.
appendInstructionInfo(EditSectionGroupWriter, Page, Link, int, Writable) - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstructionWriter
Append the PageInstruction info to the indicated SequenceFile, and keep the PI for later reuse.
appendInstructionInfo(SequenceFile.Writer, Link, int, Writable) - Method in class org.apache.nutch.db.WebDBWriter.LinkInstructionWriter
Append the LinkInstruction info to the indicated SequenceFile and keep the LI for later reuse.
appendInstructionInfo(SequenceFile.Writer, Page, int, Writable) - Method in class org.apache.nutch.db.WebDBWriter.PageInstructionWriter
Append the PageInstruction info to the indicated SequenceFile, and keep the PI for later reuse.
appendInstructionInfo(SequenceFile.Writer, Page, Link, int, Writable) - Method in class org.apache.nutch.db.WebDBWriter.PageInstructionWriter
Append the PageInstruction info to the indicated SequenceFile, and keep the PI for later reuse.
attemptedMaps() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
attemptedReduces() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
attrName - Variable in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
 

B

BLOCKREPORT_INTERVAL - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
BLOCK_SIZE - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
BasicIndexingFilter - class org.apache.nutch.indexer.basic.BasicIndexingFilter.
Adds basic searchable fields to a document.
BasicIndexingFilter() - Constructor for class org.apache.nutch.indexer.basic.BasicIndexingFilter
 
BasicUrlNormalizer - class org.apache.nutch.net.BasicUrlNormalizer.
Converts URLs to a normal form .
BasicUrlNormalizer() - Constructor for class org.apache.nutch.net.BasicUrlNormalizer
 
BeginToken() - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
Block - class org.apache.nutch.ndfs.Block.
A Block is a Nutch FS primitive, identified by a long.
Block() - Constructor for class org.apache.nutch.ndfs.Block
 
Block(long, long) - Constructor for class org.apache.nutch.ndfs.Block
 
Block(File, long) - Constructor for class org.apache.nutch.ndfs.Block
Find the blockid from the given filename
BooleanWritable - class org.apache.nutch.io.BooleanWritable.
A WritableComparable for booleans.
BooleanWritable() - Constructor for class org.apache.nutch.io.BooleanWritable
 
BooleanWritable(boolean) - Constructor for class org.apache.nutch.io.BooleanWritable
 
BooleanWritable.Comparator - class org.apache.nutch.io.BooleanWritable.Comparator.
A Comparator optimized for BooleanWritable.
BooleanWritable.Comparator() - Constructor for class org.apache.nutch.io.BooleanWritable.Comparator
 
BytesWritable - class org.apache.nutch.io.BytesWritable.
A Writable for byte arrays.
BytesWritable() - Constructor for class org.apache.nutch.io.BytesWritable
 
BytesWritable(byte[]) - Constructor for class org.apache.nutch.io.BytesWritable
 
backup(int) - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
beginColumn - Variable in class org.apache.nutch.quality.dynamic.Token
beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.
beginLine - Variable in class org.apache.nutch.quality.dynamic.Token
beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.
blockReceived(Block, UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
The given node is reporting that it received a certain block.
bufcolumn - Variable in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
buffer - Variable in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
bufline - Variable in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
bufpos - Variable in class org.apache.nutch.quality.dynamic.SimpleCharStream
 

C

CCDeleteUnlicensedTool - class org.creativecommons.nutch.CCDeleteUnlicensedTool.
Deletes documents in a set of Lucene indexes that do not have a Creative Commons license.
CCDeleteUnlicensedTool(IndexReader[]) - Constructor for class org.creativecommons.nutch.CCDeleteUnlicensedTool
Constructs a duplicate detector for the provided indexes.
CCIndexingFilter - class org.creativecommons.nutch.CCIndexingFilter.
Adds basic searchable fields to a document.
CCIndexingFilter() - Constructor for class org.creativecommons.nutch.CCIndexingFilter
 
CCParseFilter - class org.creativecommons.nutch.CCParseFilter.
Adds metadata identifying the Creative Commons license used, if any.
CCParseFilter() - Constructor for class org.creativecommons.nutch.CCParseFilter
 
CCParseFilter.Walker - class org.creativecommons.nutch.CCParseFilter.Walker.
Walks DOM tree, looking for RDF in comments and licenses in anchors.
CCQueryFilter - class org.creativecommons.nutch.CCQueryFilter.
Handles "cc:" query clauses, causing them to search the "cc" field indexed by CCIndexingFilter.
CCQueryFilter() - Constructor for class org.creativecommons.nutch.CCQueryFilter
 
CHUNKED_ENCODING - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
CJK - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
COLON - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
COMPLETE_SUCCESS - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
CONTENT_ANALYZER - Static variable in class org.apache.nutch.analysis.NutchDocumentAnalyzer
Analyzer used to index textual content.
C_PLUS_PLUS - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
C_SHARP - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
Cached - class org.apache.nutch.servlet.Cached.
A servlet that serves raw Content of any mime type.
Cached() - Constructor for class org.apache.nutch.servlet.Cached
 
Client - class org.apache.nutch.ipc.Client.
A client for an IPC service.
Client(Class) - Constructor for class org.apache.nutch.ipc.Client
Construct an IPC client whose values are of the given Writable class.
Client - class org.apache.nutch.protocol.ftp.Client.
Client.java encapsulates functionalities necessary for nutch to get dir list and retrieve file from an FTP server.
Client() - Constructor for class org.apache.nutch.protocol.ftp.Client
 
Clusterer - class org.apache.nutch.clustering.carrot2.Clusterer.
An plugin providing an implementation of OnlineClusterer extension using clustering components of the Carrot2 project (http://carrot2.sourceforge.net).
Clusterer() - Constructor for class org.apache.nutch.clustering.carrot2.Clusterer
An empty public constructor for making new instances of the clusterer.
CommandRunner - class org.apache.nutch.util.CommandRunner.
 
CommandRunner() - Constructor for class org.apache.nutch.util.CommandRunner
 
CommonGrams - class org.apache.nutch.analysis.CommonGrams.
Construct n-grams for frequently occuring terms and phrases while indexing.
Configurable - interface org.apache.nutch.mapReduce.Configurable.
That what may be configured.
Content - class org.apache.nutch.protocol.Content.
 
Content() - Constructor for class org.apache.nutch.protocol.Content
 
Content(String, String, byte[], String, Properties) - Constructor for class org.apache.nutch.protocol.Content
 
CrawlTool - class org.apache.nutch.tools.CrawlTool.
 
CrawlTool() - Constructor for class org.apache.nutch.tools.CrawlTool
 
calculateBoost(float, float, boolean, int) - Static method in class org.apache.nutch.indexer.IndexSegment
 
call(Writable, InetSocketAddress) - Method in class org.apache.nutch.ipc.Client
Make a call, passing param, to the IPC server running at address, returning the value.
call(Writable[], InetSocketAddress[]) - Method in class org.apache.nutch.ipc.Client
Makes a set of calls in parallel.
call(Method, Object[][], InetSocketAddress[]) - Static method in class org.apache.nutch.ipc.RPC
Expert: Make multiple, parallel calls to a set of servers.
call(Writable) - Method in class org.apache.nutch.ipc.Server
Called for each call.
call(Writable) - Method in class org.apache.nutch.ndfs.NDFS.NameNode
This method implements the call invoked by client.
canRead() - Method in class org.apache.nutch.ndfs.NDFSFile
A number of File methods are unsupported in this subclass
canWrite() - Method in class org.apache.nutch.ndfs.NDFSFile
 
cdata(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of cdata.
characters(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of character data.
charactersRaw(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
If available, when the disable-output-escaping attribute is used, output raw text without escaping.
checkObsoleteBlocks(UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
If the node has not been checked in some time, go through its blocks and find which ones are neither valid nor pending.
childLen - Variable in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
 
children - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
 
childrenList - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
 
cleanupStorage() - Static method in class org.apache.nutch.mapReduce.MapOutputFile
Removes all contents of temporary storage.
clone() - Method in class org.apache.nutch.db.Page
 
clone() - Method in class org.apache.nutch.pagedb.FetchListEntry
 
clone() - Method in class org.apache.nutch.searcher.Query.Clause
 
clone() - Method in class org.apache.nutch.searcher.Query
 
close() - Method in class org.apache.nutch.db.DBSectionReader
 
close() - Method in class org.apache.nutch.db.DistributedWebDBReader
Shutdown
close() - Method in class org.apache.nutch.db.DistributedWebDBWriter
Shutdown
close() - Method in class org.apache.nutch.db.EditSectionGroupWriter
Close down the writers
close() - Method in class org.apache.nutch.db.EditSectionWriter
Close down the EditSectionWriter.
close() - Method in interface org.apache.nutch.db.IWebDBReader
Done reading.
close() - Method in interface org.apache.nutch.db.IWebDBWriter
Flush and complete all writes to the db.
close() - Method in class org.apache.nutch.db.WebDBInjector
Close dbWriter and save changes
close() - Method in class org.apache.nutch.db.WebDBReader
Shutdown
close() - Method in class org.apache.nutch.db.WebDBWriter
Shutdown
close() - Method in class org.apache.nutch.fs.LocalFileSystem
Shut down the FS.
close() - Method in class org.apache.nutch.fs.NDFSFileSystem
Shut down the FS.
close() - Method in class org.apache.nutch.fs.NutchFileSystem
No more filesystem operations are needed.
close() - Method in class org.apache.nutch.indexer.DeleteDuplicates
Closes the indexes, saving changes.
close() - Method in class org.apache.nutch.io.MapFile.Reader
Close the map.
close() - Method in class org.apache.nutch.io.MapFile.Writer
Close the map.
close() - Method in class org.apache.nutch.io.SequenceFile.Reader
Close the file.
close() - Method in class org.apache.nutch.io.SequenceFile.Writer
Close the file.
close() - Method in class org.apache.nutch.mapReduce.JobClient
 
close() - Method in interface org.apache.nutch.mapReduce.RecordReader
Close this to future operations.
close() - Method in interface org.apache.nutch.mapReduce.RecordWriter
Close this to future operations.
close() - Method in class org.apache.nutch.mapReduce.TaskTracker
Close down the TaskTracker and all its components.
close() - Method in class org.apache.nutch.ndfs.FSDirectory
Shutdown the filestore
close() - Method in class org.apache.nutch.ndfs.FSNamesystem
 
close() - Method in class org.apache.nutch.ndfs.NDFSClient
 
close() - Method in class org.apache.nutch.searcher.DistributedSearch.Client
Stops the watchdog thread.
close() - Method in class org.apache.nutch.segment.SegmentReader
Close all readers.
close() - Method in class org.apache.nutch.segment.SegmentWriter
Close all writers.
close() - Method in class org.apache.nutch.tools.PruneIndexTool.PrintFieldsChecker
 
close() - Method in interface org.apache.nutch.tools.PruneIndexTool.PruneChecker
Close the checker - this could involve flushing output files or somesuch.
close() - Method in class org.apache.nutch.tools.PruneIndexTool.StoreUrlsChecker
 
close() - Method in class org.apache.nutch.tools.UpdateDatabaseTool
Shut everything down.
close() - Method in class org.creativecommons.nutch.CCDeleteUnlicensedTool
Closes the indexes, saving changes.
clusterHits(HitDetails[], String[]) - Method in interface org.apache.nutch.clustering.OnlineClusterer
Clusters an array of hits (HitDetails objects) and their previously extracted summaries (Strings).
clusterHits(HitDetails[], String[]) - Method in class org.apache.nutch.clustering.carrot2.Clusterer
See OnlineClusterer for documentation.
collect(WritableComparable, Writable) - Method in interface org.apache.nutch.mapReduce.OutputCollector
Adds a key/value pair to the output.
column - Variable in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
comment(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
Report an XML comment anywhere in the document.
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction.MD5Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction.MD5Comparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction.UrlComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction.UrlComparator
Optimized comparator.
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction.PageComparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction.UrlComparator
We need to sort by ordered URLs.
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction.UrlComparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.db.Link.MD5Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.db.Link.MD5Comparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.db.Link.UrlComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.db.Link.UrlComparator
Optimized comparator.
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.db.Page.Comparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.db.Page.UrlComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.db.Page.UrlComparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.db.WebDBWriter.LinkInstruction.MD5Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.db.WebDBWriter.LinkInstruction.MD5Comparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.db.WebDBWriter.LinkInstruction.UrlComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.db.WebDBWriter.LinkInstruction.UrlComparator
Optimized comparator.
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.db.WebDBWriter.PageInstruction.PageComparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.db.WebDBWriter.PageInstruction.UrlComparator
We need to sort by ordered URLs.
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.db.WebDBWriter.PageInstruction.UrlComparator
Optimized comparator.
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashDoc
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashScore
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.io.BooleanWritable.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.io.FloatWritable.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.io.IntWritable.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.io.LongWritable.Comparator
 
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.io.LongWritable.DecreasingComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.io.LongWritable.DecreasingComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.io.MD5Hash.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.io.UTF8.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.io.WritableComparator
Optimization hook.
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.io.WritableComparator
Compare two WritableComparables.
compare(Object, Object) - Method in class org.apache.nutch.io.WritableComparator
 
compare(WritableComparable, WritableComparable) - Method in class org.apache.nutch.tools.UpdateSegmentsFromDb.BySegmentComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.tools.UpdateSegmentsFromDb.BySegmentComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class org.apache.nutch.tools.UpdateSegmentsFromDb.ByUrlComparator
 
compareBytes(byte[], int, int, byte[], int, int) - Static method in class org.apache.nutch.io.WritableComparator
Lexicographic order of binary data.
compareTo(Object) - Method in class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction
 
compareTo(Object) - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction
 
compareTo(Object) - Method in class org.apache.nutch.db.Link
 
compareTo(Object) - Method in class org.apache.nutch.db.Page
Compare to another Page object
compareTo(Object) - Method in class org.apache.nutch.db.WebDBWriter.LinkInstruction
 
compareTo(Object) - Method in class org.apache.nutch.db.WebDBWriter.PageInstruction
 
compareTo(Object) - Method in class org.apache.nutch.indexer.DeleteDuplicates.IndexedDoc
 
compareTo(Object) - Method in class org.apache.nutch.io.BooleanWritable
 
compareTo(Object) - Method in class org.apache.nutch.io.FloatWritable
Compares two FloatWritables.
compareTo(Object) - Method in class org.apache.nutch.io.IntWritable
Compares two IntWritables.
compareTo(Object) - Method in class org.apache.nutch.io.LongWritable
Compares two LongWritables.
compareTo(Object) - Method in class org.apache.nutch.io.MD5Hash
Compares this object with the specified object for order.
compareTo(Object) - Method in class org.apache.nutch.io.UTF8
Compare two UTF8s.
compareTo(Object) - Method in class org.apache.nutch.ndfs.Block
 
compareTo(Object) - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
compareTo(Object) - Method in class org.apache.nutch.searcher.Hit
 
compareTo(Object) - Method in class org.apache.nutch.tools.FetchListTool.SortableScore
Sort them in descending order!
compareTo(Object) - Method in class org.apache.nutch.tools.UpdateSegmentsFromDb.SegmentPage
 
compareTo(Object) - Method in class org.apache.nutch.util.TrieStringMatcher.TrieNode
 
completeFile(UTF8, UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
Finalize the created file and make it world-accessible.
completeLocalInput(File) - Method in class org.apache.nutch.fs.LocalFileSystem
We're done reading.
completeLocalInput(File) - Method in class org.apache.nutch.fs.NDFSFileSystem
We're done with the local stuff, so delete it
completeLocalInput(File) - Method in class org.apache.nutch.fs.NutchFileSystem
Called when we're all done writing to the target.
completeLocalOutput(File, File) - Method in class org.apache.nutch.fs.LocalFileSystem
It's in the right place - nothing to do.
completeLocalOutput(File, File) - Method in class org.apache.nutch.fs.NDFSFileSystem
Move completed local data to NDFS destination
completeLocalOutput(File, File) - Method in class org.apache.nutch.fs.NutchFileSystem
Called when we're all done writing to the target.
completeRound(File, File) - Method in class org.apache.nutch.tools.DistributedAnalysisTool
This method collates and executes all the instructions computed by the many executors of computeRound().
completedJobs() - Method in class org.apache.nutch.mapReduce.JobTracker
 
completedMaps() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
completedRatio() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
completedReduces() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
completedTask(String) - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
A task assigned to this JobInProgress has reported in successfully.
compound(String) - Method in class org.apache.nutch.analysis.NutchAnalysis
Parse a compound term that is interpreted as an implicit phrase query.
computeDomainID() - Method in class org.apache.nutch.db.Page
Compute domain ID from URL
computeRound(int, File) - Method in class org.apache.nutch.tools.DistributedAnalysisTool
This method is invoked by one of the many processes involved in LinkAnalysis.
configure(JobConf) - Method in interface org.apache.nutch.mapReduce.Configurable
Initializes a new instance from a JobConf.
configure(JobConf) - Method in class org.apache.nutch.mapReduce.lib.HashPartitioner
 
configure(JobConf) - Method in class org.apache.nutch.mapReduce.lib.IdentityMapper
 
configure(JobConf) - Method in class org.apache.nutch.mapReduce.lib.IdentityReducer
 
configure(JobConf) - Method in class org.apache.nutch.mapReduce.lib.InverseMapper
 
configure(JobConf) - Method in class org.apache.nutch.mapReduce.lib.LongSumReducer
 
configure(JobConf) - Method in class org.apache.nutch.mapReduce.lib.RegexMapper
 
configure(JobConf) - Method in class org.apache.nutch.mapReduce.lib.TokenCountMapper
 
contains(Object) - Method in class org.apache.nutch.util.FibonacciHeap
Returns true if item exists in this FibonacciHeap, false otherwise.
contentReader - Variable in class org.apache.nutch.segment.SegmentReader
 
contentWriter - Variable in class org.apache.nutch.segment.SegmentWriter
 
coord(int, int) - Method in class org.apache.nutch.indexer.NutchSimilarity
 
copy(String, String) - Method in class org.apache.nutch.fs.TestClient
Copy an NDFS file
copyContents(NutchFileSystem, File, File, boolean) - Static method in class org.apache.nutch.fs.FileUtil
Copy a file's contents to a new location.
copyFromLocalFile(File, File) - Method in class org.apache.nutch.fs.LocalFileSystem
Similar to moveFromLocalFile(), except the source is kept intact.
copyFromLocalFile(File, File) - Method in class org.apache.nutch.fs.NDFSFileSystem
keep the src when finished.
copyFromLocalFile(File, File) - Method in class org.apache.nutch.fs.NutchFileSystem
The src file is on the local disk.
copyToLocalFile(File, File) - Method in class org.apache.nutch.fs.LocalFileSystem
We can't delete the src file in this case.
copyToLocalFile(File, File) - Method in class org.apache.nutch.fs.NDFSFileSystem
Takes a hierarchy of files from the NFS system and writes to the given local target.
copyToLocalFile(File, File) - Method in class org.apache.nutch.fs.NutchFileSystem
The src file is under NFS2, and the dst is on the local disk.
create(String, InputStream, String) - Static method in class org.apache.nutch.analysis.lang.NGramProfile
Create a new ngram profile from an input stream.
create(File) - Method in class org.apache.nutch.fs.LocalFileSystem
Create the file at f.
create(File, boolean) - Method in class org.apache.nutch.fs.LocalFileSystem
 
create(File) - Method in class org.apache.nutch.fs.NDFSFileSystem
Create the file at f.
create(File, boolean) - Method in class org.apache.nutch.fs.NDFSFileSystem
 
create(File) - Method in class org.apache.nutch.fs.NutchFileSystem
Opens an OutputStream at the indicated File, whether local or via NDFS.
create(File, boolean) - Method in class org.apache.nutch.fs.NutchFileSystem
 
create(UTF8) - Method in class org.apache.nutch.ndfs.NDFSClient
Create an output stream that writes to all the right places.
create(UTF8, boolean) - Method in class org.apache.nutch.ndfs.NDFSClient
 
createDB(NutchFileSystem, File, int) - Static method in class org.apache.nutch.db.DistributedWebDBWriter
Method useful for the first time we create a distributed db project.
createEditGroup(NutchFileSystem, File, String, int, int) - Static method in class org.apache.nutch.db.EditSectionGroupWriter
Initialize an EditSectionGroup.
createNewFile(File) - Method in class org.apache.nutch.fs.NutchFileSystem
Creates the given File as a brand-new zero-length file.
createNewFile() - Method in class org.apache.nutch.ndfs.NDFSFile
 
createRunner(TaskTracker) - Method in class org.apache.nutch.mapReduce.MapTask
 
createRunner(TaskTracker) - Method in class org.apache.nutch.mapReduce.ReduceTask
 
createRunner(TaskTracker) - Method in class org.apache.nutch.mapReduce.Task
Return an approprate thread runner for this task.
createSocket(String, int, InetAddress, int) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
 
createSocket(String, int, InetAddress, int, HttpConnectionParams) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
Attempts to get a new socket connection to the given host within the given time limit.
createSocket(String, int) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
 
createSocket(Socket, String, int, boolean) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
 
createSocketAddr(String) - Static method in class org.apache.nutch.ndfs.NDFS
Util method to build socket addr from string
createTracker() - Static method in class org.apache.nutch.mapReduce.JobTracker
 
createTracker(InetSocketAddress) - Static method in class org.apache.nutch.mapReduce.JobTracker
 
createWebDB(NutchFileSystem, File) - Static method in class org.apache.nutch.db.WebDBWriter
Create the WebDB for the first time.
curChar - Variable in class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
curChar - Variable in class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
curTime - Variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
currentToken - Variable in class org.apache.nutch.quality.dynamic.ParseException
This is the last token that has been consumed successfully.

D

DATANODE_STARTUP_PERIOD - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
DATA_FILE_NAME - Static variable in class org.apache.nutch.io.MapFile
The name of the data file.
DBKeyDivision - class org.apache.nutch.db.DBKeyDivision.
DBKeyDivision exists for other DB classes to figure out how to find the right distributed-DB section.
DBKeyDivision() - Constructor for class org.apache.nutch.db.DBKeyDivision
 
DBSectionReader - class org.apache.nutch.db.DBSectionReader.
DBSectionReader reads a discrete portion of a WebDB.
DBSectionReader(NutchFileSystem, File, WritableComparator) - Constructor for class org.apache.nutch.db.DBSectionReader
Right now we assume we're getting a File that is a MapFile.Reader directory.
DEFAULT - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
DEFAULT - Static variable in interface org.apache.nutch.quality.dynamic.PageDescriptionConstants
 
DEFAULT - Static variable in class org.apache.nutch.util.mime.MimeTypes
The default application/octet-stream MimeType
DEFAULT_FIELD - Static variable in class org.apache.nutch.searcher.Query.Clause
 
DELIMITER_SEARCHTERM - Static variable in class org.apache.nutch.ontology.OntologyImpl
 
DIGIT - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
DIR_NAME - Static variable in class org.apache.nutch.fetcher.FetcherOutput
 
DIR_NAME - Static variable in class org.apache.nutch.pagedb.FetchListEntry
 
DIR_NAME - Static variable in class org.apache.nutch.parse.ParseData
 
DIR_NAME - Static variable in class org.apache.nutch.parse.ParseText
 
DIR_NAME - Static variable in class org.apache.nutch.protocol.Content
 
DIR_NAME_NP - Static variable in class org.apache.nutch.fetcher.FetcherOutput
 
DOMBuilder - class org.apache.nutch.parse.html.DOMBuilder.
This class takes SAX events (in addition to some extra events that SAX doesn't handle yet) and adds the result to a document or document fragment.
DOMBuilder(Document, Node) - Constructor for class org.apache.nutch.parse.html.DOMBuilder
DOMBuilder instance constructor...
DOMBuilder(Document, DocumentFragment) - Constructor for class org.apache.nutch.parse.html.DOMBuilder
DOMBuilder instance constructor...
DOMBuilder(Document) - Constructor for class org.apache.nutch.parse.html.DOMBuilder
DOMBuilder instance constructor...
DOMContentUtils - class org.apache.nutch.parse.html.DOMContentUtils.
A collection of methods for extracting content from DOM trees.
DOMContentUtils() - Constructor for class org.apache.nutch.parse.html.DOMContentUtils
 
DOMContentUtils.LinkParams - class org.apache.nutch.parse.html.DOMContentUtils.LinkParams.
 
DOMContentUtils.LinkParams(String, String, int) - Constructor for class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
 
DONE_NAME - Static variable in class org.apache.nutch.fetcher.FetcherOutput
 
DONE_NAME - Static variable in class org.apache.nutch.indexer.IndexMerger
 
DONE_NAME - Static variable in class org.apache.nutch.indexer.IndexOptimizer
 
DONE_NAME - Static variable in class org.apache.nutch.indexer.IndexSegment
 
DOT - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
Daemon - class org.apache.nutch.util.Daemon.
A thread that has called Thread.setDaemon(boolean) with true.
Daemon() - Constructor for class org.apache.nutch.util.Daemon
Construct a daemon thread.
Daemon(Runnable) - Constructor for class org.apache.nutch.util.Daemon
Construct a daemon thread.
DataInputBuffer - class org.apache.nutch.io.DataInputBuffer.
A reusable DataInput implementation that reads from an in-memory buffer.
DataInputBuffer() - Constructor for class org.apache.nutch.io.DataInputBuffer
Constructs a new empty buffer.
DataOutputBuffer - class org.apache.nutch.io.DataOutputBuffer.
A reusable DataOutput implementation that writes to an in-memory buffer.
DataOutputBuffer() - Constructor for class org.apache.nutch.io.DataOutputBuffer
Constructs a new empty buffer.
DatanodeInfo - class org.apache.nutch.ndfs.DatanodeInfo.
DatanodeInfo tracks stats on a given node
DatanodeInfo() - Constructor for class org.apache.nutch.ndfs.DatanodeInfo
 
DatanodeInfo(UTF8) - Constructor for class org.apache.nutch.ndfs.DatanodeInfo
 
DatanodeInfo(UTF8, long, long) - Constructor for class org.apache.nutch.ndfs.DatanodeInfo
 
DateQueryFilter - class org.apache.nutch.searcher.more.DateQueryFilter.
Handles "date:" query clauses, causing them to search the field "date" indexed by MoreIndexingFilter.java
DateQueryFilter() - Constructor for class org.apache.nutch.searcher.more.DateQueryFilter
 
DeleteDuplicates - class org.apache.nutch.indexer.DeleteDuplicates.
Deletes duplicate documents in a set of Lucene indexes.
DeleteDuplicates(IndexReader[], File) - Constructor for class org.apache.nutch.indexer.DeleteDuplicates
Constructs a duplicate detector for the provided indexes.
DeleteDuplicates.IndexedDoc - class org.apache.nutch.indexer.DeleteDuplicates.IndexedDoc.
The key used in sorting for duplicates.
DeleteDuplicates.IndexedDoc() - Constructor for class org.apache.nutch.indexer.DeleteDuplicates.IndexedDoc
 
DeleteDuplicates.IndexedDoc.ByHashDoc - class org.apache.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashDoc.
Order equal hashes by decreasing index and document.
DeleteDuplicates.IndexedDoc.ByHashDoc() - Constructor for class org.apache.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashDoc
 
DeleteDuplicates.IndexedDoc.ByHashScore - class org.apache.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashScore.
Order equal hashes by decreasing score and increasing urlLen.
DeleteDuplicates.IndexedDoc.ByHashScore() - Constructor for class org.apache.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashScore
 
DistributedAnalysisTool - class org.apache.nutch.tools.DistributedAnalysisTool.
DistributedAnalysisTool performs link-analysis by reading exclusively from a IWebDBReader, and writing to an IWebDBWriter.
DistributedAnalysisTool(NutchFileSystem, File) - Constructor for class org.apache.nutch.tools.DistributedAnalysisTool
Give the pagedb and linkdb files and their cache sizes
DistributedSearch - class org.apache.nutch.searcher.DistributedSearch.
Implements the search API over IPC connnections.
DistributedSearch.Client - class org.apache.nutch.searcher.DistributedSearch.Client.
The search client.
DistributedSearch.Client(File) - Constructor for class org.apache.nutch.searcher.DistributedSearch.Client
Construct a client talking to servers listed in the named file.
DistributedSearch.Client(InetSocketAddress[]) - Constructor for class org.apache.nutch.searcher.DistributedSearch.Client
Construct a client talking to the named servers.
DistributedSearch.Protocol - interface org.apache.nutch.searcher.DistributedSearch.Protocol.
The distributed search protocol.
DistributedSearch.Server - class org.apache.nutch.searcher.DistributedSearch.Server.
The search server.
DistributedWebDBReader - class org.apache.nutch.db.DistributedWebDBReader.
The WebDBReader implements all the read-only parts of accessing our web database.
DistributedWebDBReader(NutchFileSystem, File) - Constructor for class org.apache.nutch.db.DistributedWebDBReader
Open a web db reader for the named directory.
DistributedWebDBWriter - class org.apache.nutch.db.DistributedWebDBWriter.
This is a wrapper class that allows us to reorder write operations to the linkdb and pagedb.
DistributedWebDBWriter(NutchFileSystem, File, int) - Constructor for class org.apache.nutch.db.DistributedWebDBWriter
Open the db files.
DistributedWebDBWriter.LinkInstruction - class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction.
Holds an instruction over a Link.
DistributedWebDBWriter.LinkInstruction() - Constructor for class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction
 
DistributedWebDBWriter.LinkInstruction(Link, int) - Constructor for class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction
 
DistributedWebDBWriter.LinkInstruction.MD5Comparator - class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction.MD5Comparator.
Sorts the instruction first by Md5, then by opcode.
DistributedWebDBWriter.LinkInstruction.MD5Comparator() - Constructor for class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction.MD5Comparator
 
DistributedWebDBWriter.LinkInstruction.UrlComparator - class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction.UrlComparator.
Sorts the instruction first by url, then by opcode.
DistributedWebDBWriter.LinkInstruction.UrlComparator() - Constructor for class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction.UrlComparator
 
DistributedWebDBWriter.LinkInstructionWriter - class org.apache.nutch.db.DistributedWebDBWriter.LinkInstructionWriter.
LinkInstructionWriter very efficiently writes a LinkInstruction to an EditSectionGroupWriter.
DistributedWebDBWriter.LinkInstructionWriter() - Constructor for class org.apache.nutch.db.DistributedWebDBWriter.LinkInstructionWriter
 
DistributedWebDBWriter.PageInstruction - class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction.
PageInstruction holds an operation over a Page.
DistributedWebDBWriter.PageInstruction() - Constructor for class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction
 
DistributedWebDBWriter.PageInstruction(Page, int) - Constructor for class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction
 
DistributedWebDBWriter.PageInstruction(Page, Link, int) - Constructor for class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction
 
DistributedWebDBWriter.PageInstruction.PageComparator - class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction.PageComparator.
Sorts the instruction first by Page, then by opcode.
DistributedWebDBWriter.PageInstruction.PageComparator() - Constructor for class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction.PageComparator
 
DistributedWebDBWriter.PageInstruction.UrlComparator - class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction.UrlComparator.
Sorts the instruction first by url, then by opcode.
DistributedWebDBWriter.PageInstruction.UrlComparator() - Constructor for class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction.UrlComparator
 
DistributedWebDBWriter.PageInstructionWriter - class org.apache.nutch.db.DistributedWebDBWriter.PageInstructionWriter.
PageInstructionWriter very efficiently writes a PageInstruction to an EditSectionGroupWriter.
DistributedWebDBWriter.PageInstructionWriter() - Constructor for class org.apache.nutch.db.DistributedWebDBWriter.PageInstructionWriter
 
Done() - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
DummySSLProtocolSocketFactory - class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory.
 
DummySSLProtocolSocketFactory() - Constructor for class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
Constructor for DummySSLProtocolSocketFactory.
DummyX509TrustManager - class org.apache.nutch.protocol.httpclient.DummyX509TrustManager.
 
DummyX509TrustManager(KeyStore) - Constructor for class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
Constructor for DummyX509TrustManager.
datanodeReport() - Method in class org.apache.nutch.ndfs.FSNamesystem
 
datanodeReport() - Method in class org.apache.nutch.ndfs.NDFSClient
 
debugStream - Variable in class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
debugStream - Variable in class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
decreaseKey(Object, int) - Method in class org.apache.nutch.util.FibonacciHeap
Decreases the priority value associated with item.
define(Class, WritableComparator) - Static method in class org.apache.nutch.io.WritableComparator
Register an optimized comparator for a WritableComparable implementation.
delete() - Method in class org.apache.nutch.db.EditSectionGroupReader
Get rid of the edits encapsulated by this file.
delete(File) - Method in class org.apache.nutch.fs.LocalFileSystem
Get rid of File f, whether a true file or dir.
delete(File) - Method in class org.apache.nutch.fs.NDFSFileSystem
Get rid of File f, whether a true file or dir.
delete(File) - Method in class org.apache.nutch.fs.NutchFileSystem
Deletes File
delete(String) - Method in class org.apache.nutch.fs.TestClient
Delete an NDFS file
delete(NutchFileSystem, String) - Static method in class org.apache.nutch.io.MapFile
Deletes the named map file.
delete(UTF8) - Method in class org.apache.nutch.ndfs.FSDirectory
Remove the file from management, return blocks
delete(UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
Remove the indicated filename from the namespace.
delete(UTF8) - Method in class org.apache.nutch.ndfs.NDFSClient
Make a direct connection to namenode and manipulate structures there.
delete() - Method in class org.apache.nutch.ndfs.NDFSFile
 
deleteContentDuplicates() - Method in class org.apache.nutch.indexer.DeleteDuplicates
Delete pages with duplicate content hashes.
deleteLink(MD5Hash) - Method in class org.apache.nutch.db.WebDBWriter
Remove links with the given MD5 from the db.
deleteOnExit() - Method in class org.apache.nutch.ndfs.NDFSFile
 
deletePage(String) - Method in class org.apache.nutch.db.DistributedWebDBWriter
Remove a page from the page database.
deletePage(String) - Method in interface org.apache.nutch.db.IWebDBWriter
deletePage(url) will remove a Page object from the db with the given URL.
deletePage(String) - Method in class org.apache.nutch.db.WebDBWriter
Remove a page from the page database.
deleteUnlicensed() - Method in class org.creativecommons.nutch.CCDeleteUnlicensedTool
Delete pages without CC licenes.
deleteUrlDuplicates() - Method in class org.apache.nutch.indexer.DeleteDuplicates
Delete pages with duplicate URLs.
desiredMaps() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
desiredReduces() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
destroy() - Method in class org.apache.nutch.servlet.Cached
 
digest(byte[]) - Static method in class org.apache.nutch.io.MD5Hash
Construct a hash value for a byte array.
digest(byte[], int, int) - Static method in class org.apache.nutch.io.MD5Hash
Construct a hash value for a byte array.
digest(String) - Static method in class org.apache.nutch.io.MD5Hash
Construct a hash value for a String.
digest(UTF8) - Static method in class org.apache.nutch.io.MD5Hash
Construct a hash value for a String.
disable_tracing() - Method in class org.apache.nutch.analysis.NutchAnalysis
 
disable_tracing() - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
disconnect() - Method in class org.apache.nutch.protocol.ftp.Client
Closes the connection to the FTP server and restores connection parameters to the default values.
displayByteArray(byte[]) - Static method in class org.apache.nutch.io.WritableUtils
 
doGet(HttpServletRequest, HttpServletResponse) - Method in class org.apache.nutch.searcher.OpenSearchServlet
 
doGet(HttpServletRequest, HttpServletResponse) - Method in class org.apache.nutch.servlet.Cached
 
doPost(HttpServletRequest, HttpServletResponse) - Method in class org.apache.nutch.servlet.Cached
 
done(String) - Method in class org.apache.nutch.mapReduce.TaskTracker
The task is done.
done(String) - Method in interface org.apache.nutch.mapReduce.TaskUmbilicalProtocol
Report that the task is successfully completed.
du(String) - Method in class org.apache.nutch.fs.TestClient
 
dump(boolean, PrintStream) - Method in class org.apache.nutch.segment.SegmentReader
Dump the segment's content in human-readable format.

E

EDITS_PREFIX - Static variable in class org.apache.nutch.db.EditSectionWriter
 
EOF - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
EOF - Static variable in interface org.apache.nutch.quality.dynamic.PageDescriptionConstants
 
EQUALS - Static variable in interface org.apache.nutch.quality.dynamic.PageDescriptionConstants
 
ERROR_NAME - Static variable in class org.apache.nutch.fetcher.FetcherOutput
 
EXCEPTION - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Unspecified exception occured.
EXPIRE_INTERVAL - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
EditSectionGroupReader - class org.apache.nutch.db.EditSectionGroupReader.
The EditSectionGroupReader will read in an edits-file that was built in a distributed way.
EditSectionGroupReader(NutchFileSystem, String, int, int) - Constructor for class org.apache.nutch.db.EditSectionGroupReader
Open the EditSectionGroupReader for the appropriate file.
EditSectionGroupWriter - class org.apache.nutch.db.EditSectionGroupWriter.
The EditSectionGroupWriter maintains a set of EditSectionWriter objects.
EditSectionGroupWriter(NutchFileSystem, int, int, String, Class, Class, EditSectionGroupWriter.KeyExtractor) - Constructor for class org.apache.nutch.db.EditSectionGroupWriter
Start a EditSectionGroupWriter at the indicated location, for a single emitter.
EditSectionGroupWriter.KeyExtractor - class org.apache.nutch.db.EditSectionGroupWriter.KeyExtractor.
Edit instructions are Comparable, but they also have an "inner" key like MD5Hash or URL that is also Comparable.
EditSectionGroupWriter.KeyExtractor() - Constructor for class org.apache.nutch.db.EditSectionGroupWriter.KeyExtractor
 
EditSectionGroupWriter.LinkMD5Extractor - class org.apache.nutch.db.EditSectionGroupWriter.LinkMD5Extractor.
Get the MD5 from a LinkInstruction
EditSectionGroupWriter.LinkMD5Extractor() - Constructor for class org.apache.nutch.db.EditSectionGroupWriter.LinkMD5Extractor
 
EditSectionGroupWriter.LinkURLExtractor - class org.apache.nutch.db.EditSectionGroupWriter.LinkURLExtractor.
Get the URL from a LinkInstruction
EditSectionGroupWriter.LinkURLExtractor() - Constructor for class org.apache.nutch.db.EditSectionGroupWriter.LinkURLExtractor
 
EditSectionGroupWriter.PageMD5Extractor - class org.apache.nutch.db.EditSectionGroupWriter.PageMD5Extractor.
Get the MD5 from a PageInstruction
EditSectionGroupWriter.PageMD5Extractor() - Constructor for class org.apache.nutch.db.EditSectionGroupWriter.PageMD5Extractor
 
EditSectionGroupWriter.PageURLExtractor - class org.apache.nutch.db.EditSectionGroupWriter.PageURLExtractor.
Get the URL from a PageInstruction
EditSectionGroupWriter.PageURLExtractor() - Constructor for class org.apache.nutch.db.EditSectionGroupWriter.PageURLExtractor
 
EditSectionWriter - class org.apache.nutch.db.EditSectionWriter.
EditSectionWriter writes a discrete portion of a WebDB.
EditSectionWriter(NutchFileSystem, String, int, int, Class, Class) - Constructor for class org.apache.nutch.db.EditSectionWriter
Make a EditSectionWriter for the appropriate file.
Entities - class org.apache.nutch.html.Entities.
 
Entities() - Constructor for class org.apache.nutch.html.Entities
 
ExpandBuff(boolean) - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
Extension - class org.apache.nutch.plugin.Extension.
An Extension is a kind of listener descriptor that will be installed on a concrete ExtensionPoint that acts as kind of Publisher.
Extension(PluginDescriptor, String, String, String) - Constructor for class org.apache.nutch.plugin.Extension
 
ExtensionPoint - class org.apache.nutch.plugin.ExtensionPoint.
The ExtensionPoint provide meta information of a extension point.
ExtensionPoint(String, String, String) - Constructor for class org.apache.nutch.plugin.ExtensionPoint
Constructor
elName - Variable in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
 
element() - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
emitDistribution(PrintStream) - Method in class org.apache.nutch.util.ScoreStats
Print out the distribution, with greater specificity for percentiles 90th - 100th.
emitFetchList(File, long, long) - Method in class org.apache.nutch.tools.FetchListTool
Spit out the fetchlist, to a BDB at the indicated filename.
emitHeartbeat(TaskTrackerStatus, BooleanWritable) - Method in interface org.apache.nutch.mapReduce.InterTrackerProtocol
Called regularly by the task tracker to update the status of its tasks within the job tracker.
emitHeartbeat(TaskTrackerStatus, BooleanWritable) - Method in class org.apache.nutch.mapReduce.JobTracker
Process incoming heartbeat messages from the task trackers.
emitMultipleLists(File, int, long, long) - Method in class org.apache.nutch.tools.FetchListTool
Spit out several fetchlists, so that we can fetch across several machines.
emitTopK(int) - Method in class org.apache.nutch.tools.WebDBAdminTool
Emit the top K-rated Pages.
enable_tracing() - Method in class org.apache.nutch.analysis.NutchAnalysis
 
enable_tracing() - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
encode(String) - Static method in class org.apache.nutch.html.Entities
 
endCDATA() - Method in class org.apache.nutch.parse.html.DOMBuilder
Report the end of a CDATA section.
endColumn - Variable in class org.apache.nutch.quality.dynamic.Token
beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.
endDTD() - Method in class org.apache.nutch.parse.html.DOMBuilder
Report the end of DTD declarations.
endDocument() - Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of the end of a document.
endElement(String, String, String) - Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of the end of an element.
endEntity(String) - Method in class org.apache.nutch.parse.html.DOMBuilder
Report the end of an entity.
endLine - Variable in class org.apache.nutch.quality.dynamic.Token
beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.
endPrefixMapping(String) - Method in class org.apache.nutch.parse.html.DOMBuilder
End the scope of a prefix-URI mapping.
entityReference(String) - Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notivication of a entityReference.
eol - Variable in class org.apache.nutch.quality.dynamic.ParseException
The end of line string for this machine.
equals(Object) - Method in class org.apache.nutch.db.Page
 
equals(Object) - Method in class org.apache.nutch.fetcher.FetcherOutput
 
equals(Object) - Method in class org.apache.nutch.io.BooleanWritable
 
equals(Object) - Method in class org.apache.nutch.io.FloatWritable
Returns true iff o is a FloatWritable with the same value.
equals(Object) - Method in class org.apache.nutch.io.IntWritable
Returns true iff o is a IntWritable with the same value.
equals(Object) - Method in class org.apache.nutch.io.LongWritable
Returns true iff o is a LongWritable with the same value.
equals(Object) - Method in class org.apache.nutch.io.MD5Hash
Returns true iff o is an MD5Hash whose digest contains the same values.
equals(Object) - Method in class org.apache.nutch.io.UTF8
Returns true iff o is a UTF8 with the same contents.
equals(Object) - Method in class org.apache.nutch.linkdb.LinkAnalysisEntry
 
equals(Object) - Method in class org.apache.nutch.pagedb.FetchListEntry
 
equals(Object) - Method in class org.apache.nutch.parse.Outlink
 
equals(Object) - Method in class org.apache.nutch.parse.ParseData
 
equals(Object) - Method in class org.apache.nutch.parse.ParseStatus
 
equals(Object) - Method in class org.apache.nutch.parse.ParseText
 
equals(Object) - Method in class org.apache.nutch.protocol.Content
 
equals(Object) - Method in class org.apache.nutch.protocol.ProtocolStatus
 
equals(Object) - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
 
equals(Object) - Method in class org.apache.nutch.searcher.Hit
 
equals(Object) - Method in class org.apache.nutch.searcher.Query.Clause
 
equals(Object) - Method in class org.apache.nutch.searcher.Query.Phrase
 
equals(Object) - Method in class org.apache.nutch.searcher.Query.Term
 
equals(Object) - Method in class org.apache.nutch.searcher.Query
 
equals(Object) - Method in class org.apache.nutch.util.mime.MimeType
Indicates if an object is equal to this mime-type.
evaluate() - Method in class org.apache.nutch.util.CommandRunner
 
exists(File) - Method in class org.apache.nutch.fs.LocalFileSystem
 
exists(File) - Method in class org.apache.nutch.fs.NDFSFileSystem
 
exists(File) - Method in class org.apache.nutch.fs.NutchFileSystem
Check if exists
exists(UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
Return whether the given filename exists
exists(UTF8) - Method in class org.apache.nutch.ndfs.NDFSClient
 
expectedTokenSequences - Variable in class org.apache.nutch.quality.dynamic.ParseException
Each entry in this array is an array of integers.
extractInnerKey(WritableComparable) - Method in class org.apache.nutch.db.EditSectionGroupWriter.KeyExtractor
 
extractInnerKey(WritableComparable) - Method in class org.apache.nutch.db.EditSectionGroupWriter.LinkMD5Extractor
 
extractInnerKey(WritableComparable) - Method in class org.apache.nutch.db.EditSectionGroupWriter.LinkURLExtractor
 
extractInnerKey(WritableComparable) - Method in class org.apache.nutch.db.EditSectionGroupWriter.PageMD5Extractor
 
extractInnerKey(WritableComparable) - Method in class org.apache.nutch.db.EditSectionGroupWriter.PageURLExtractor
 
extractProperties(InputStream) - Method in class org.apache.nutch.parse.msword.WordExtractor
 
extractText(InputStream) - Method in class org.apache.nutch.parse.msword.WordExtractor
Gets the text from a Word document.

F

FAILED - Static variable in class org.apache.nutch.mapReduce.JobStatus
 
FAILED - Static variable in class org.apache.nutch.mapReduce.TaskStatus
 
FAILED - Static variable in class org.apache.nutch.parse.ParseStatus
General failure.
FAILED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Content was not retrieved.
FAILED_EXCEPTION - Static variable in class org.apache.nutch.parse.ParseStatus
Parsing failed.
FAILED_INVALID_FORMAT - Static variable in class org.apache.nutch.parse.ParseStatus
Parsing failed.
FAILED_MISSING_CONTENT - Static variable in class org.apache.nutch.parse.ParseStatus
Parsing failed.
FAILED_MISSING_PARTS - Static variable in class org.apache.nutch.parse.ParseStatus
Parsing failed.
FAILED_TRUNCATED - Static variable in class org.apache.nutch.parse.ParseStatus
Parsing failed.
FIELD - Static variable in class org.creativecommons.nutch.CCIndexingFilter
The name of the document field we use.
FILE_NOT_FOUND - Static variable in interface org.apache.nutch.mapReduce.MRConstants
 
FSConstants - interface org.apache.nutch.ndfs.FSConstants.
Some handy constants
FSDataset - class org.apache.nutch.ndfs.FSDataset.
FSDataset manages a set of data blocks.
FSDataset(File) - Constructor for class org.apache.nutch.ndfs.FSDataset
An FSDataset has a directory where it loads its data files.
FSDirectory - class org.apache.nutch.ndfs.FSDirectory.
FSDirectory stores the filesystem directory state.
FSDirectory(File) - Constructor for class org.apache.nutch.ndfs.FSDirectory
Create a FileSystem directory, and load its info from the indicated place.
FSNamesystem - class org.apache.nutch.ndfs.FSNamesystem.
The FSNamesystem tracks several important tables.
FSNamesystem(File) - Constructor for class org.apache.nutch.ndfs.FSNamesystem
dir is where the filesystem directory state is stored
FSParam - class org.apache.nutch.ndfs.FSParam.
IPC param
FSParam() - Constructor for class org.apache.nutch.ndfs.FSParam
 
FSParam(byte) - Constructor for class org.apache.nutch.ndfs.FSParam
 
FSResults - class org.apache.nutch.ndfs.FSResults.
The result of an NFS IPC call.
FSResults() - Constructor for class org.apache.nutch.ndfs.FSResults
 
FSResults(byte) - Constructor for class org.apache.nutch.ndfs.FSResults
 
FSResults(byte, Writable) - Constructor for class org.apache.nutch.ndfs.FSResults
 
FSResults(byte, Writable, Writable) - Constructor for class org.apache.nutch.ndfs.FSResults
 
FastSavedException - exception org.apache.nutch.parse.msword.FastSavedException.
Title:
FastSavedException(String) - Constructor for class org.apache.nutch.parse.msword.FastSavedException
 
FetchListEntry - class org.apache.nutch.pagedb.FetchListEntry.
 
FetchListEntry() - Constructor for class org.apache.nutch.pagedb.FetchListEntry
 
FetchListEntry(boolean, Page, String[]) - Constructor for class org.apache.nutch.pagedb.FetchListEntry
 
FetchListTool - class org.apache.nutch.tools.FetchListTool.
This class takes an IWebDBReader, computes a relevant subset, and then emits the subset.
FetchListTool(NutchFileSystem, File, boolean, float, int) - Constructor for class org.apache.nutch.tools.FetchListTool
FetchListTool takes a page db, and emits a RECNO-based subset of it.
FetchListTool.SortableScore - class org.apache.nutch.tools.FetchListTool.SortableScore.
SortableScore is just a WritableComparable Float!
FetchListTool.SortableScore() - Constructor for class org.apache.nutch.tools.FetchListTool.SortableScore
 
FetchedSegments - class org.apache.nutch.searcher.FetchedSegments.
Implements HitSummarizer and HitContent for a set of fetched segments.
FetchedSegments(NutchFileSystem, String) - Constructor for class org.apache.nutch.searcher.FetchedSegments
Construct given a directory containing fetcher output.
Fetcher - class org.apache.nutch.fetcher.Fetcher.
The fetcher.
Fetcher(NutchFileSystem, String, boolean) - Constructor for class org.apache.nutch.fetcher.Fetcher
 
Fetcher.FetcherStatus - class org.apache.nutch.fetcher.Fetcher.FetcherStatus.
 
Fetcher.FetcherStatus(String, long, int, int, long) - Constructor for class org.apache.nutch.fetcher.Fetcher.FetcherStatus
FetcherStatus encapsulates a snapshot of the Fetcher progress status.
FetcherOutput - class org.apache.nutch.fetcher.FetcherOutput.
An entry in the fetcher's output.
FetcherOutput() - Constructor for class org.apache.nutch.fetcher.FetcherOutput
 
FetcherOutput(FetchListEntry, MD5Hash, ProtocolStatus) - Constructor for class org.apache.nutch.fetcher.FetcherOutput
 
FibonacciHeap - class org.apache.nutch.util.FibonacciHeap.
A Fibonacci Heap, as described in Introduction to Algorithms by Charles E.
FibonacciHeap() - Constructor for class org.apache.nutch.util.FibonacciHeap
Creates a new FibonacciHeap.
FieldQueryFilter - class org.apache.nutch.searcher.FieldQueryFilter.
Translate query fields to search the same-named field, as indexed by an IndexingFilter.
FieldQueryFilter(String) - Constructor for class org.apache.nutch.searcher.FieldQueryFilter
Construct for the named field.
FieldQueryFilter(String, float) - Constructor for class org.apache.nutch.searcher.FieldQueryFilter
Construct for the named field, boosting as specified.
File - class org.apache.nutch.protocol.file.File.
File.java deals with file: scheme.
File() - Constructor for class org.apache.nutch.protocol.file.File
 
FileError - exception org.apache.nutch.protocol.file.FileError.
Thrown for File error codes.
FileError(int) - Constructor for class org.apache.nutch.protocol.file.FileError
 
FileException - exception org.apache.nutch.protocol.file.FileException.
 
FileException() - Constructor for class org.apache.nutch.protocol.file.FileException
 
FileException(String) - Constructor for class org.apache.nutch.protocol.file.FileException
 
FileException(String, Throwable) - Constructor for class org.apache.nutch.protocol.file.FileException
 
FileException(Throwable) - Constructor for class org.apache.nutch.protocol.file.FileException
 
FileResponse - class org.apache.nutch.protocol.file.FileResponse.
FileResponse.java mimics file replies as http response.
FileResponse(URL, File) - Constructor for class org.apache.nutch.protocol.file.FileResponse
 
FileResponse(String, URL, File) - Constructor for class org.apache.nutch.protocol.file.FileResponse
 
FileSplit - class org.apache.nutch.mapReduce.FileSplit.
A section of an input file.
FileSplit(File, long, long) - Constructor for class org.apache.nutch.mapReduce.FileSplit
Constructs a split.
FileUtil - class org.apache.nutch.fs.FileUtil.
A collection of file-processing util methods
FileUtil() - Constructor for class org.apache.nutch.fs.FileUtil
 
FillBuff() - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
FloatWritable - class org.apache.nutch.io.FloatWritable.
A WritableComparable for floats.
FloatWritable() - Constructor for class org.apache.nutch.io.FloatWritable
 
FloatWritable(float) - Constructor for class org.apache.nutch.io.FloatWritable
 
FloatWritable.Comparator - class org.apache.nutch.io.FloatWritable.Comparator.
A Comparator optimized for FloatWritable.
FloatWritable.Comparator() - Constructor for class org.apache.nutch.io.FloatWritable.Comparator
 
Ftp - class org.apache.nutch.protocol.ftp.Ftp.
Ftp.java deals with ftp: scheme.
Ftp() - Constructor for class org.apache.nutch.protocol.ftp.Ftp
 
FtpError - exception org.apache.nutch.protocol.ftp.FtpError.
Thrown for Ftp error codes.
FtpError(int) - Constructor for class org.apache.nutch.protocol.ftp.FtpError
 
FtpException - exception org.apache.nutch.protocol.ftp.FtpException.
Superclass for important exceptions thrown during FTP talk, that must be handled with care.
FtpException() - Constructor for class org.apache.nutch.protocol.ftp.FtpException
 
FtpException(String) - Constructor for class org.apache.nutch.protocol.ftp.FtpException
 
FtpException(String, Throwable) - Constructor for class org.apache.nutch.protocol.ftp.FtpException
 
FtpException(Throwable) - Constructor for class org.apache.nutch.protocol.ftp.FtpException
 
FtpExceptionBadSystResponse - exception org.apache.nutch.protocol.ftp.FtpExceptionBadSystResponse.
Exception indicating bad reply of SYST command.
FtpExceptionCanNotHaveDataConnection - exception org.apache.nutch.protocol.ftp.FtpExceptionCanNotHaveDataConnection.
Exception indicating failure of opening data connection.
FtpExceptionControlClosedByForcedDataClose - exception org.apache.nutch.protocol.ftp.FtpExceptionControlClosedByForcedDataClose.
Exception indicating control channel is closed by server end, due to forced closure of data channel at client (our) end.
FtpExceptionUnknownForcedDataClose - exception org.apache.nutch.protocol.ftp.FtpExceptionUnknownForcedDataClose.
Exception indicating unrecognizable reply from server after forced closure of data channel by client (our) side.
FtpResponse - class org.apache.nutch.protocol.ftp.FtpResponse.
FtpResponse.java mimics ftp replies as http response.
FtpResponse(URL, Ftp) - Constructor for class org.apache.nutch.protocol.ftp.FtpResponse
 
FtpResponse(String, URL, Ftp) - Constructor for class org.apache.nutch.protocol.ftp.FtpResponse
 
failedJobs() - Method in class org.apache.nutch.mapReduce.JobTracker
 
failedTask(String) - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
A task assigned to this JobInProgress has reported in as failed.
fetcherReader - Variable in class org.apache.nutch.segment.SegmentReader
 
fetcherWriter - Variable in class org.apache.nutch.segment.SegmentWriter
 
filter(Content, Parse, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.analysis.lang.HTMLLanguageParser
Scan the HTML document looking at possible indications of content language.
filter(Document, Parse, FetcherOutput) - Method in class org.apache.nutch.analysis.lang.LanguageIndexingFilter
 
filter(Document, Parse, FetcherOutput) - Method in interface org.apache.nutch.indexer.IndexingFilter
Adds fields or otherwise modifies the document that will be indexed for a parse.
filter(Document, Parse, FetcherOutput) - Static method in class org.apache.nutch.indexer.IndexingFilters
Run all defined filters.
filter(Document, Parse, FetcherOutput) - Method in class org.apache.nutch.indexer.basic.BasicIndexingFilter
 
filter(Document, Parse, FetcherOutput) - Method in class org.apache.nutch.indexer.more.MoreIndexingFilter
 
filter(String) - Method in class org.apache.nutch.net.PrefixURLFilter
 
filter(String) - Method in class org.apache.nutch.net.RegexURLFilter
 
filter(String) - Method in interface org.apache.nutch.net.URLFilter
 
filter(String) - Static method in class org.apache.nutch.net.URLFilters
Run all defined filters.
filter(Content, Parse, HTMLMetaTags, DocumentFragment) - Method in interface org.apache.nutch.parse.HtmlParseFilter
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
filter(Content, Parse, HTMLMetaTags, DocumentFragment) - Static method in class org.apache.nutch.parse.HtmlParseFilters
Run all defined filters.
filter(Content, Parse, HTMLMetaTags, DocumentFragment) - Method in class org.apache.nutch.parse.js.JSParseFilter
 
filter(Query, BooleanQuery) - Method in class org.apache.nutch.searcher.FieldQueryFilter
 
filter(Query, BooleanQuery) - Method in interface org.apache.nutch.searcher.QueryFilter
Adds clauses or otherwise modifies the BooleanQuery that will be searched.
filter(Query) - Static method in class org.apache.nutch.searcher.QueryFilters
Run all defined filters.
filter(Query, BooleanQuery) - Method in class org.apache.nutch.searcher.RawFieldQueryFilter
 
filter(Query, BooleanQuery) - Method in class org.apache.nutch.searcher.more.DateQueryFilter
 
filter(Document, Parse, FetcherOutput) - Method in class org.creativecommons.nutch.CCIndexingFilter
 
filter(Content, Parse, HTMLMetaTags, DocumentFragment) - Method in class org.creativecommons.nutch.CCParseFilter
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
finalKey(WritableComparable) - Method in class org.apache.nutch.io.MapFile.Reader
Reads the final key from the file.
finalize() - Method in class org.apache.nutch.plugin.Plugin
 
finalize() - Method in class org.apache.nutch.plugin.PluginRepository
 
finalize() - Method in class org.apache.nutch.protocol.ftp.Ftp
 
finalizeBlock(Block) - Method in class org.apache.nutch.ndfs.FSDataset
Complete the block write!
findAuthentication(Properties) - Static method in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
 
findMD5Section(MD5Hash, int) - Static method in class org.apache.nutch.db.DBKeyDivision
Find the right section index for the given MD5, and the number of sections in the db overall.
findURLSection(String, int) - Static method in class org.apache.nutch.db.DBKeyDivision
Find the right section index for the given URL, and the number of sections in the db overall.
finished - Variable in class org.apache.nutch.segment.SegmentReader
The time when fetching of this segment finished, as recorded in fetcher output data.
first - Variable in class org.apache.nutch.ndfs.FSParam
 
first - Variable in class org.apache.nutch.ndfs.FSResults
 
fix(NutchFileSystem, File, Class, Class, boolean) - Static method in class org.apache.nutch.io.MapFile
This method attempts to fix a corrupt MapFile by re-creating its index.
fixSegment(NutchFileSystem, File, boolean, boolean, boolean, boolean) - Static method in class org.apache.nutch.segment.SegmentReader
Attempt to fix a partially corrupted segment.
format - Static variable in class org.apache.nutch.net.protocols.HttpDateFormat
 
format(LogRecord) - Method in class org.apache.nutch.util.LogFormatter
Format the given LogRecord.
fullyDelete(File) - Static method in class org.apache.nutch.fs.FileUtil
Delete a directory and all its contents.
fullyDelete(NutchFileSystem, File) - Static method in class org.apache.nutch.fs.FileUtil
 

G

GONE - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Resource is gone.
GROUP_METAINFO - Static variable in class org.apache.nutch.db.EditSectionGroupWriter
 
GZIPUtils - class org.apache.nutch.util.GZIPUtils.
A collection of utility methods for working on GZIPed data.
GZIPUtils() - Constructor for class org.apache.nutch.util.GZIPUtils
 
GetImage() - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
GetSuffix(int) - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
Grep - class org.apache.nutch.mapReduce.demo.Grep.
 
garbageCollect() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
The job is dead.
generateParseException() - Method in class org.apache.nutch.analysis.NutchAnalysis
 
generateParseException() - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
get() - Static method in class org.apache.nutch.fs.NutchFileSystem
Returns the default filesystem implementation.
get(long, Writable) - Method in class org.apache.nutch.io.ArrayFile.Reader
Return the nth value in the file.
get() - Method in class org.apache.nutch.io.ArrayWritable
 
get() - Method in class org.apache.nutch.io.BooleanWritable
Returns the value of the BooleanWritable
get() - Method in class org.apache.nutch.io.BytesWritable
 
get() - Method in class org.apache.nutch.io.FloatWritable
Return the value of this FloatWritable.
get() - Method in class org.apache.nutch.io.IntWritable
Return the value of this IntWritable.
get() - Method in class org.apache.nutch.io.LongWritable
Return the value of this LongWritable.
get(WritableComparable, Writable) - Method in class org.apache.nutch.io.MapFile.Reader
Return the value for the named key, or null if none exists.
get() - Static method in class org.apache.nutch.io.NullWritable
Returns the single instance of this class.
get(WritableComparable) - Method in class org.apache.nutch.io.SetFile.Reader
Read the matching key from a set into key.
get() - Method in class org.apache.nutch.io.TwoDArrayWritable
 
get(Class) - Static method in class org.apache.nutch.io.WritableComparator
Get a comparator for a WritableComparable implementation.
get(String) - Static method in class org.apache.nutch.mapReduce.InputFormats
Return the named InputFormat.
get(String) - Static method in class org.apache.nutch.mapReduce.OutputFormats
Return the named OutputFormat.
get(String) - Method in class org.apache.nutch.parse.ParseData
Return the value of a metadata property.
get(String) - Method in class org.apache.nutch.protocol.Content
Return the value of a metadata property.
get(Object) - Method in class org.apache.nutch.protocol.httpclient.MultiProperties
Returns the value associated with the given key.
get(ServletContext) - Static method in class org.apache.nutch.searcher.NutchBean
Cache in servlet context.
get(long, FetcherOutput, Content, ParseText, ParseData) - Method in class org.apache.nutch.segment.SegmentReader
Get a specified entry from the segment.
get() - Static method in class org.apache.nutch.util.NutchConf
Return the default configuration.
get(String) - Method in class org.apache.nutch.util.NutchConf
Returns the value of the name property, or null if no such property exists.
get(String, String) - Method in class org.apache.nutch.util.NutchConf
Returns the value of the name property.
get(String) - Static method in class org.apache.nutch.util.mime.MimeTypes
Return a MimeTypes instance.
get(String, Logger) - Static method in class org.apache.nutch.util.mime.MimeTypes
Return a MimeTypes instance.
getAcceptedIssuers() - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
 
getAdditionalBlock(UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
The client would like to obtain an additional block for the indicated filename (which is being written-to).
getAnchor() - Method in class org.apache.nutch.parse.Outlink
 
getAnchorText() - Method in class org.apache.nutch.db.Link
 
getAnchors(UTF8) - Method in class org.apache.nutch.db.WebDBAnchors
Return the anchor texts of links in the db that point to this URL.
getAnchors() - Method in class org.apache.nutch.fetcher.FetcherOutput
 
getAnchors() - Method in class org.apache.nutch.pagedb.FetchListEntry
 
getAnchors(HitDetails) - Method in class org.apache.nutch.searcher.DistributedSearch.Client
 
getAnchors(HitDetails) - Method in class org.apache.nutch.searcher.FetchedSegments
 
getAnchors(HitDetails) - Method in interface org.apache.nutch.searcher.HitContent
Returns the anchors of a hit document.
getAnchors(HitDetails) - Method in class org.apache.nutch.searcher.NutchBean
 
getArgs() - Method in class org.apache.nutch.parse.ParseStatus
 
getArgs() - Method in class org.apache.nutch.protocol.ProtocolStatus
 
getAttribute(String) - Method in class org.apache.nutch.plugin.Extension
Returns a attribute value, that is setuped in the manifest file and is definied by the extension point xml schema.
getAuthentication(String) - Static method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
This method is responsible for providing Basic authentication information.
getBase(Node) - Static method in class org.apache.nutch.parse.html.DOMContentUtils
If Node contains a BASE tag then it's HREF is returned.
getBaseHref() - Method in class org.apache.nutch.parse.HTMLMetaTags
A convenience method.
getBaseUrl() - Method in class org.apache.nutch.protocol.Content
The base url for relative links contained in the content.
getBasicPattern() - Static method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
Provides a pattern which can be used by an outside resource to determine if this class can provide credentials based on simple header information.
getBeginColumn() - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
getBeginLine() - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
getBlockData(Block) - Method in class org.apache.nutch.ndfs.FSDataset
Get a stream of data from the indicated block.
getBlockId() - Method in class org.apache.nutch.ndfs.Block
 
getBlockIterator() - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
getBlockName() - Method in class org.apache.nutch.ndfs.Block
 
getBlockReport() - Method in class org.apache.nutch.ndfs.FSDataset
Return a table of block data
getBlocks() - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
getBoolean(String, boolean) - Method in class org.apache.nutch.util.NutchConf
Returns the value of the name property as an boolean.
getByteCount() - Method in class org.apache.nutch.fetcher.Fetcher.FetcherStatus
 
getBytes() - Method in class org.apache.nutch.io.UTF8
The raw bytes.
getBytes(String) - Static method in class org.apache.nutch.io.UTF8
Convert a string to a UTF-8 encoded byte array.
getCapacity() - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
getCapacity() - Method in class org.apache.nutch.ndfs.FSDataset
Return total capacity, used and unused
getCapacity() - Method in class org.apache.nutch.ndfs.HeartbeatData
 
getClass(String) - Static method in class org.apache.nutch.io.WritableName
Return the class for a name.
getClass(String, Class) - Method in class org.apache.nutch.util.NutchConf
Returns the value of the name property as a Class.
getClass(String, Class, Class) - Method in class org.apache.nutch.util.NutchConf
Returns the value of the name property as a Class.
getClassLoader() - Method in class org.apache.nutch.plugin.PluginDescriptor
Returns a cached classloader for a plugin.
getClauses() - Method in class org.apache.nutch.searcher.Query
Return all clauses.
getClazz() - Method in class org.apache.nutch.plugin.Extension
Returns the full class name of the extension point implementation
getClient() - Method in class org.apache.nutch.fs.NDFSFileSystem
 
getCode() - Method in interface org.apache.nutch.net.protocols.Response
Returns the response code.
getCode() - Method in class org.apache.nutch.protocol.ProtocolStatus
 
getCode(int) - Method in class org.apache.nutch.protocol.file.FileError
 
getCode() - Method in class org.apache.nutch.protocol.file.FileResponse
Returns the response code.
getCode(int) - Method in class org.apache.nutch.protocol.ftp.FtpError
 
getCode() - Method in class org.apache.nutch.protocol.ftp.FtpResponse
Returns the response code.
getCode(int) - Method in class org.apache.nutch.protocol.http.HttpError
 
getCode() - Method in class org.apache.nutch.protocol.http.HttpResponse
Returns the response code.
getCode(int) - Method in class org.apache.nutch.protocol.httpclient.HttpError
 
getCode() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
Returns the response code.
getColumn() - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
Deprecated.  
getCombinerClass() - Method in class org.apache.nutch.mapReduce.JobConf
 
getCommand() - Method in class org.apache.nutch.util.CommandRunner
 
getComponentCapabilities() - Method in class org.apache.nutch.clustering.carrot2.LocalNutchInputComponent
Returns the capabilities provided by this component.
getCompressedContent() - Method in interface org.apache.nutch.net.protocols.Response
Returns the compressed version of the content if the server transmitted a compressed version, or null otherwise.
getConfResourceAsInputStream(String) - Method in class org.apache.nutch.util.NutchConf
Returns an input stream attached to the configuration resource with the given name.
getConfResourceAsReader(String) - Method in class org.apache.nutch.util.NutchConf
Returns a reader attached to the configuration resource with the given name.
getContent() - Method in interface org.apache.nutch.net.protocols.Response
Returns the full content of the response.
getContent() - Method in class org.apache.nutch.protocol.Content
The binary content retrieved.
getContent() - Method in class org.apache.nutch.protocol.ProtocolOutput
 
getContent() - Method in class org.apache.nutch.protocol.file.FileResponse
 
getContent() - Method in class org.apache.nutch.protocol.ftp.FtpResponse
 
getContent() - Method in class org.apache.nutch.protocol.http.HttpResponse
 
getContent() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
 
getContent(HitDetails) - Method in class org.apache.nutch.searcher.DistributedSearch.Client
 
getContent(HitDetails) - Method in class org.apache.nutch.searcher.FetchedSegments
 
getContent(HitDetails) - Method in interface org.apache.nutch.searcher.HitContent
Returns the content of a hit document.
getContent(HitDetails) - Method in class org.apache.nutch.searcher.NutchBean
 
getContentType() - Method in class org.apache.nutch.parse.ParserNotFound
 
getContentType() - Method in class org.apache.nutch.protocol.Content
The media type of the retrieved content.
getContentsLen() - Method in class org.apache.nutch.ndfs.NDFSFileInfo
 
getContentsLength() - Method in class org.apache.nutch.ndfs.NDFSFile
And add a few extras
getCredentials() - Method in interface org.apache.nutch.protocol.httpclient.HttpAuthentication
Gets the credentials generated by the HttpAuthentication object.
getCredentials() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
Gets the Basic credentials generated by this HttpBasicAuthentication object
getCurTime() - Method in class org.apache.nutch.fetcher.Fetcher.FetcherStatus
 
getCurrentNode() - Method in class org.apache.nutch.parse.html.DOMBuilder
Get the node currently being processed.
getData() - Method in class org.apache.nutch.io.DataOutputBuffer
Returns the current contents of the buffer.
getData() - Method in interface org.apache.nutch.parse.Parse
Other data extracted from the page.
getData() - Method in class org.apache.nutch.parse.ParseImpl
 
getDatanodeHints(UTF8, long) - Method in class org.apache.nutch.ndfs.FSNamesystem
Figure out a few hosts that are likely to contain the block referred to by the given filename, offset pair.
getDedupValue() - Method in class org.apache.nutch.searcher.Hit
Return the value of the field that hits should be deduplicated on.
getDefaultAddress() - Static method in class org.apache.nutch.mapReduce.JobTracker
 
getDependencies() - Method in class org.apache.nutch.plugin.PluginDescriptor
Returns a array of plugin ids.
getDescriptionLabels() - Method in interface org.apache.nutch.clustering.HitsCluster
 
getDescriptionLabels() - Method in class org.apache.nutch.clustering.carrot2.HitsClusterAdapter
 
getDescriptor() - Method in class org.apache.nutch.plugin.Extension
return the plugin descriptor.
getDescriptor() - Method in class org.apache.nutch.plugin.Plugin
Returns the plugin descriptor
getDestroyOnTimeout() - Method in class org.apache.nutch.util.CommandRunner
 
getDetails(Hit) - Method in class org.apache.nutch.searcher.DistributedSearch.Client
 
getDetails(Hit[]) - Method in class org.apache.nutch.searcher.DistributedSearch.Client
 
getDetails(Hit) - Method in interface org.apache.nutch.searcher.HitDetailer
Returns the details for a hit document.
getDetails(Hit[]) - Method in interface org.apache.nutch.searcher.HitDetailer
Returns the details for a set of hits.
getDetails(Hit) - Method in class org.apache.nutch.searcher.IndexSearcher
 
getDetails(Hit[]) - Method in class org.apache.nutch.searcher.IndexSearcher
 
getDetails(Hit) - Method in class org.apache.nutch.searcher.NutchBean
 
getDetails(Hit[]) - Method in class org.apache.nutch.searcher.NutchBean
 
getDigest() - Method in class org.apache.nutch.io.MD5Hash
Returns the digest bytes.
getDiscriptor() - Method in class org.apache.nutch.plugin.Extension
Deprecated. Use #{getDescriptor()} instead.
getDomainID() - Method in class org.apache.nutch.db.Link
 
getElapsedTime() - Method in class org.apache.nutch.fetcher.Fetcher.FetcherStatus
 
getEmptyParse() - Method in class org.apache.nutch.parse.ParseStatus
A convenience method.
getEndColumn() - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
getEndLine() - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
getErrorCount() - Method in class org.apache.nutch.fetcher.Fetcher.FetcherStatus
 
getExitValue() - Method in class org.apache.nutch.util.CommandRunner
 
getExpireTime() - Method in class org.apache.nutch.protocol.http.RobotRulesParser.RobotRuleSet
Get expire time
getExpireTime() - Method in class org.apache.nutch.protocol.httpclient.RobotRulesParser.RobotRuleSet
Get expire time
getExplanation(Query, Hit) - Method in class org.apache.nutch.searcher.DistributedSearch.Client
 
getExplanation(Query, Hit) - Method in class org.apache.nutch.searcher.IndexSearcher
 
getExplanation(Query, Hit) - Method in class org.apache.nutch.searcher.NutchBean
 
getExplanation(Query, Hit) - Method in interface org.apache.nutch.searcher.Searcher
Return an HTML-formatted explanation of how a query scored.
getExportedLibUrls() - Method in class org.apache.nutch.plugin.PluginDescriptor
Returns a array exported librareis as URLs
getExtensionInstance() - Method in class org.apache.nutch.plugin.Extension
Return an instance of the extension implementatio.
getExtensionPoint(String) - Method in class org.apache.nutch.plugin.PluginRepository
Returns a extension point indentified by a extension point id.
getExtensions() - Method in class org.apache.nutch.plugin.ExtensionPoint
Returns a array of extensions that lsiten to this extension point
getExtensions() - Method in class org.apache.nutch.plugin.PluginDescriptor
Returns an array of extensions.
getExtenstionPoints() - Method in class org.apache.nutch.plugin.PluginDescriptor
Returns a array of extension points.
getExtentens() - Method in class org.apache.nutch.plugin.ExtensionPoint
Deprecated. Use the correctly spelled #{getExtensions} method instead.
getFactor() - Method in class org.apache.nutch.io.SequenceFile.Sorter
Get the number of streams to merge at once.
getFetch() - Method in class org.apache.nutch.pagedb.FetchListEntry
 
getFetchDate() - Method in class org.apache.nutch.fetcher.FetcherOutput
 
getFetchDate(HitDetails) - Method in class org.apache.nutch.searcher.DistributedSearch.Client
 
getFetchDate(HitDetails) - Method in class org.apache.nutch.searcher.FetchedSegments
 
getFetchDate(HitDetails) - Method in interface org.apache.nutch.searcher.HitContent
Returns the anchors of a hit document.
getFetchDate(HitDetails) - Method in class org.apache.nutch.searcher.NutchBean
 
getFetchInterval() - Method in class org.apache.nutch.db.Page
 
getFetchListEntry() - Method in class org.apache.nutch.fetcher.FetcherOutput
 
getField(int) - Method in class org.apache.nutch.searcher.HitDetails
Returns the name of the ith field.
getField() - Method in class org.apache.nutch.searcher.Query.Clause
 
getFile() - Method in class org.apache.nutch.mapReduce.FileSplit
The file containing this split's data.
getFile(String, String, IntWritable) - Method in interface org.apache.nutch.mapReduce.MapOutputProtocol
Returns the output from the named map task destined for this partition.
getFile(String, String, IntWritable) - Method in class org.apache.nutch.mapReduce.TaskTracker
 
getFile(UTF8) - Method in class org.apache.nutch.ndfs.FSDirectory
Get the blocks associated with the file
getFilesystemName() - Method in interface org.apache.nutch.mapReduce.InterTrackerProtocol
The task tracker calls this once, to discern where it can find files referred to by the JobTracker
getFilesystemName() - Method in interface org.apache.nutch.mapReduce.JobSubmissionProtocol
A MapReduce system always operates on a single filesystem.
getFilesystemName() - Method in class org.apache.nutch.mapReduce.JobTracker
 
getFilter(TokenStream, String) - Static method in class org.apache.nutch.analysis.CommonGrams
Construct a token filter that inserts n-grams for common terms.
getFinishTime() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
getFloat() - Method in class org.apache.nutch.tools.FetchListTool.SortableScore
 
getFloat(String, float) - Method in class org.apache.nutch.util.NutchConf
Returns the value of the name property as a float.
getFragments() - Method in class org.apache.nutch.searcher.Summary
Returns an array of all of this summary's fragments.
getFromID() - Method in class org.apache.nutch.db.Link
 
getFs() - Method in class org.apache.nutch.mapReduce.JobClient
Get a filesystem handle.
getGeneralTags() - Method in class org.apache.nutch.parse.HTMLMetaTags
Returns all collected values of the general meta tags.
getHeader(String) - Method in interface org.apache.nutch.net.protocols.Response
Returns the value of a named header.
getHeader(String) - Method in class org.apache.nutch.protocol.file.FileResponse
Returns the value of a named header.
getHeader(String) - Method in class org.apache.nutch.protocol.ftp.FtpResponse
Returns the value of a named header.
getHeader(String) - Method in class org.apache.nutch.protocol.http.HttpResponse
Returns the value of a named header.
getHeader(String) - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
Returns the value of a named header.
getHit(int) - Method in class org.apache.nutch.searcher.Hits
Returns the ith hit in this list.
getHits() - Method in interface org.apache.nutch.clustering.HitsCluster
 
getHits() - Method in class org.apache.nutch.clustering.carrot2.HitsClusterAdapter
 
getHits(int, int) - Method in class org.apache.nutch.searcher.Hits
Returns a subset of the hit objects.
getHost() - Method in class org.apache.nutch.mapReduce.MapOutputLocation
The host the task completed on.
getHost() - Method in class org.apache.nutch.mapReduce.TaskTrackerStatus
 
getHost() - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
getHttpEquivTags() - Method in class org.apache.nutch.parse.HTMLMetaTags
Returns all collected values of the "http-equiv" meta tags.
getId() - Method in class org.apache.nutch.clustering.carrot2.NutchDocument
 
getId() - Method in class org.apache.nutch.plugin.Extension
Return the unique id of the extension.
getId() - Method in class org.apache.nutch.plugin.ExtensionPoint
Returns the unique id of the extension point.
getIndexDocNo() - Method in class org.apache.nutch.searcher.Hit
Return the document number of this hit within an index.
getIndexInterval() - Method in class org.apache.nutch.io.MapFile.Writer
The number of entries that are added before an index entry is added.
getIndexNo() - Method in class org.apache.nutch.searcher.Hit
Return the index number that this hit came from.
getInputDir() - Method in class org.apache.nutch.mapReduce.JobConf
 
getInputFile(String, String) - Static method in class org.apache.nutch.mapReduce.MapOutputFile
Create a local reduce input file name.
getInputFormat() - Method in class org.apache.nutch.mapReduce.JobConf
 
getInputKeyClass() - Method in class org.apache.nutch.mapReduce.JobConf
 
getInputValueClass() - Method in class org.apache.nutch.mapReduce.JobConf
 
getInputs() - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
getInstance() - Static method in class org.apache.nutch.analysis.lang.LanguageIdentifier
Get a LanguageIdentifier instance.
getInstance() - Static method in class org.apache.nutch.ontology.OntologyImpl
 
getInstance() - Static method in class org.apache.nutch.plugin.PluginRepository
Returns the singelton instance of the PluginRepository
getInstruction() - Method in class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction
 
getInstruction() - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction
 
getInstruction() - Method in class org.apache.nutch.db.WebDBWriter.LinkInstruction
 
getInstruction() - Method in class org.apache.nutch.db.WebDBWriter.PageInstruction
 
getInt(String, int) - Method in class org.apache.nutch.util.NutchConf
Returns the value of the name property as an integer.
getInterprets() - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
getJar() - Method in class org.apache.nutch.mapReduce.JobConf
 
getJob(String) - Method in class org.apache.nutch.mapReduce.JobClient
Get an RunningJob object to track an ongoing job.
getJob(String) - Method in class org.apache.nutch.mapReduce.JobTracker
 
getJobClient() - Method in class org.apache.nutch.mapReduce.TaskTracker
The connection to the JobTracker, used by the TaskRunner for locating remote files.
getJobFile() - Method in class org.apache.nutch.mapReduce.JobProfile
 
getJobFile() - Method in interface org.apache.nutch.mapReduce.RunningJob
Returns the path of the submitted job.
getJobFile() - Method in class org.apache.nutch.mapReduce.Task
 
getJobID() - Method in interface org.apache.nutch.mapReduce.RunningJob
Returns an identifier for the job
getJobId() - Method in class org.apache.nutch.mapReduce.JobProfile
 
getJobId() - Method in class org.apache.nutch.mapReduce.JobStatus
 
getJobProfile(String) - Method in interface org.apache.nutch.mapReduce.JobSubmissionProtocol
Grab a handle to a job that is already known to the JobTracker
getJobProfile(String) - Method in class org.apache.nutch.mapReduce.JobTracker
 
getJobStatus(String) - Method in interface org.apache.nutch.mapReduce.JobSubmissionProtocol
Grab a handle to a job that is already known to the JobTracker
getJobStatus(String) - Method in class org.apache.nutch.mapReduce.JobTracker
 
getJobTrackerMachine() - Method in class org.apache.nutch.mapReduce.JobTracker
 
getKeyClass() - Method in class org.apache.nutch.io.MapFile.Reader
Returns the class of keys in this file.
getKeyClass() - Method in class org.apache.nutch.io.SequenceFile.Reader
Returns the class of keys in this file.
getKeyClass() - Method in class org.apache.nutch.io.SequenceFile.Writer
Returns the class of keys in this file.
getKeyClass() - Method in class org.apache.nutch.io.WritableComparator
Returns the WritableComparable implementation class.
getLastModified() - Method in class org.apache.nutch.protocol.ProtocolStatus
 
getLastSeen() - Method in class org.apache.nutch.mapReduce.TaskTrackerStatus
 
getLen() - Method in class org.apache.nutch.ndfs.NDFSFileInfo
 
getLength(File) - Method in class org.apache.nutch.fs.LocalFileSystem
 
getLength(File) - Method in class org.apache.nutch.fs.NDFSFileSystem
 
getLength(File) - Method in class org.apache.nutch.fs.NutchFileSystem
 
getLength() - Method in class org.apache.nutch.io.DataOutputBuffer
Returns the length of the valid data currently in the buffer.
getLength() - Method in class org.apache.nutch.io.SequenceFile.Writer
Returns the current length of the output file.
getLength() - Method in class org.apache.nutch.io.UTF8
The number of bytes in the encoded string.
getLength() - Method in class org.apache.nutch.mapReduce.FileSplit
The number of bytes in the file to process.
getLength(Block) - Method in class org.apache.nutch.ndfs.FSDataset
Find the block's on-disk length
getLength() - Method in class org.apache.nutch.searcher.HitDetails
Returns the number of fields contained in this.
getLength() - Method in class org.apache.nutch.searcher.Hits
Returns the number of hits included in this current listing.
getLine() - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
Deprecated.  
getLink() - Method in class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction
 
getLink() - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction
 
getLink() - Method in class org.apache.nutch.db.WebDBWriter.LinkInstruction
 
getLink() - Method in class org.apache.nutch.db.WebDBWriter.PageInstruction
 
getLinks(UTF8) - Method in class org.apache.nutch.db.DBSectionReader
Get all the hyperlinks that link TO the indicated URL.
getLinks(MD5Hash) - Method in class org.apache.nutch.db.DBSectionReader
Grab all the links from the given MD5 hash.
getLinks(UTF8) - Method in class org.apache.nutch.db.DistributedWebDBReader
Get all the hyperlinks that link TO the indicated URL.
getLinks(MD5Hash) - Method in class org.apache.nutch.db.DistributedWebDBReader
Grab all the links from the given MD5 hash.
getLinks(UTF8) - Method in interface org.apache.nutch.db.IWebDBReader
Return any Link objects that point to the given URL.
getLinks(MD5Hash) - Method in interface org.apache.nutch.db.IWebDBReader
Return all the Link objects that originate from a document with the given MD5 checksum.
getLinks(UTF8) - Method in class org.apache.nutch.db.WebDBReader
Get all the hyperlinks that link TO the indicated URL.
getLinks(MD5Hash) - Method in class org.apache.nutch.db.WebDBReader
Grab all the links from the given MD5 hash.
getListing(UTF8) - Method in class org.apache.nutch.ndfs.FSDirectory
Get a listing of files given path 'src' This function is admittedly very inefficient right now.
getListing(UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
Get a listing of all files at 'src'.
getLocalDir() - Static method in class org.apache.nutch.mapReduce.JobConf
 
getLogStream(Logger, Level) - Static method in class org.apache.nutch.util.LogFormatter
Returns a stream that, when written to, adds log lines.
getLogger(String) - Static method in class org.apache.nutch.util.LogFormatter
Gets a logger and, as a side effect, installs this as the default formatter.
getLong(String, long) - Method in class org.apache.nutch.util.NutchConf
Returns the value of the name property as a long.
getMD5() - Method in class org.apache.nutch.db.Page
 
getMD5Hash() - Method in class org.apache.nutch.fetcher.FetcherOutput
 
getMajorCode() - Method in class org.apache.nutch.parse.ParseStatus
 
getMapTaskId() - Method in class org.apache.nutch.mapReduce.MapOutputLocation
The map task id.
getMapTaskIds() - Method in class org.apache.nutch.mapReduce.ReduceTask
 
getMapperClass() - Method in class org.apache.nutch.mapReduce.JobConf
 
getMemory() - Method in class org.apache.nutch.io.SequenceFile.Sorter
Get the total amount of buffer memory, in bytes.
getMessage() - Method in class org.apache.nutch.parse.ParseStatus
A convenience method.
getMessage() - Method in class org.apache.nutch.protocol.ProtocolStatus
 
getMessage() - Method in class org.apache.nutch.quality.dynamic.ParseException
This method has the standard behavior when this object has been created using the standard constructors.
getMessage() - Method in class org.apache.nutch.quality.dynamic.TokenMgrError
You can also modify the body of this method to customize your error messages.
getMetaTags(HTMLMetaTags, Node, URL) - Static method in class org.apache.nutch.parse.html.HTMLMetaProcessor
Sets the indicators in robotsMeta to appropriate values, based on any META tags found under the given node.
getMetadata() - Method in class org.apache.nutch.parse.ParseData
Other page properties.
getMetadata() - Method in class org.apache.nutch.protocol.Content
Other protocol-specific data.
getMimeType(File) - Method in class org.apache.nutch.util.mime.MimeTypes
Find the Mime Content Type of a file.
getMimeType(URL) - Method in class org.apache.nutch.util.mime.MimeTypes
Find the Mime Content Type of a document from its URL.
getMimeType(String) - Method in class org.apache.nutch.util.mime.MimeTypes
Find the Mime Content Type of a document from its name.
getMimeType(byte[]) - Method in class org.apache.nutch.util.mime.MimeTypes
Find the Mime Content Type of a stream from its content.
getMimeType(String, byte[]) - Method in class org.apache.nutch.util.mime.MimeTypes
Find the Mime Content Type of a document from its name and its content.
getMinLength() - Method in class org.apache.nutch.util.mime.MimeTypes
Return the minimum length of data to provide to analyzing methods based on the document's content in order to check all the known MimeTypes.
getMinorCode() - Method in class org.apache.nutch.parse.ParseStatus
 
getModel() - Static method in class org.apache.nutch.ontology.OntologyImpl
 
getNDFSParent(String) - Static method in class org.apache.nutch.ndfs.NDFSFile
Retrieving parent path from NDFS path string
getName() - Method in class org.apache.nutch.analysis.lang.NGramProfile
Returns the profile name.
getName() - Method in class org.apache.nutch.fetcher.Fetcher.FetcherStatus
 
getName() - Method in class org.apache.nutch.fs.LocalFileSystem
 
getName() - Method in class org.apache.nutch.fs.NDFSFileSystem
 
getName() - Method in class org.apache.nutch.fs.NutchFileSystem
Returns a name for this filesystem, suitable to pass to NutchFileSystem.getNamed(String).
getName(Class) - Static method in class org.apache.nutch.io.WritableName
Return the name for a class.
getName() - Method in interface org.apache.nutch.mapReduce.InputFormat
The name of this input format.
getName() - Method in class org.apache.nutch.mapReduce.InputFormatBase
 
getName() - Method in interface org.apache.nutch.mapReduce.OutputFormat
The name of this output format.
getName() - Method in class org.apache.nutch.mapReduce.SequenceFileInputFormat
 
getName() - Method in class org.apache.nutch.mapReduce.SequenceFileOutputFormat
 
getName() - Method in class org.apache.nutch.mapReduce.TextInputFormat
 
getName() - Method in class org.apache.nutch.mapReduce.TextOutputFormat
 
getName() - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
getName() - Method in class org.apache.nutch.ndfs.HeartbeatData
 
getName() - Method in class org.apache.nutch.ndfs.NDFSFileInfo
 
getName() - Method in class org.apache.nutch.plugin.ExtensionPoint
Returns the name of the extension point.
getName() - Method in class org.apache.nutch.plugin.PluginDescriptor
Returns the name of the plugin.
getName() - Method in class org.apache.nutch.util.mime.MimeType
Return the name of this mime-type.
getNamed(String) - Static method in class org.apache.nutch.fs.NutchFileSystem
Returns a named filesystem.
getNewSegmentName() - Static method in class org.apache.nutch.segment.SegmentWriter
Create a new segment name
getNewUrl() - Method in class org.apache.nutch.protocol.ResourceMoved
 
getNextFetchTime() - Method in class org.apache.nutch.db.Page
 
getNextScore() - Method in class org.apache.nutch.db.Page
 
getNextToken() - Method in class org.apache.nutch.analysis.NutchAnalysis
 
getNextToken() - Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
getNextToken() - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
getNextToken() - Method in class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
getNoCache() - Method in class org.apache.nutch.parse.HTMLMetaTags
A convenience method.
getNoFollow() - Method in class org.apache.nutch.parse.HTMLMetaTags
A convenience method.
getNoIndex() - Method in class org.apache.nutch.parse.HTMLMetaTags
A convenience method.
getNormalizer() - Static method in class org.apache.nutch.net.UrlNormalizerFactory
Return the default UrlNormalizer implementation.
getNotExportedLibUrls() - Method in class org.apache.nutch.plugin.PluginDescriptor
Returns a array of libraries as URLs that are not exported by the plugin.
getNumBytes() - Method in class org.apache.nutch.ndfs.Block
 
getNumContinues() - Method in interface org.apache.nutch.net.protocols.Response
Returns the number of 100/Continue headers encountered
getNumMapTasks() - Method in class org.apache.nutch.mapReduce.JobConf
 
getNumOutlinks() - Method in class org.apache.nutch.db.Page
 
getNumReduceTasks() - Method in class org.apache.nutch.mapReduce.JobConf
 
getOldUrl() - Method in class org.apache.nutch.protocol.ResourceMoved
 
getOnlineClusterer() - Static method in class org.apache.nutch.clustering.OnlineClustererFactory
 
getOntology() - Static method in class org.apache.nutch.ontology.OntologyFactory
 
getOutlinks(String) - Static method in class org.apache.nutch.parse.OutlinkExtractor
Extracts Outlink from given plain text.
getOutlinks(String, String) - Static method in class org.apache.nutch.parse.OutlinkExtractor
Extracts Outlink from given plain text and adds anchor to the extracted Outlinks
getOutlinks() - Method in class org.apache.nutch.parse.ParseData
The outlinks of the page.
getOutlinks(URL, ArrayList, Node) - Static method in class org.apache.nutch.parse.html.DOMContentUtils
This method finds all anchors below the supplied DOM node, and creates appropriate Outlink records for each (relative to the supplied base URL), and adds them to the outlinks ArrayList.
getOutputDir() - Method in class org.apache.nutch.mapReduce.JobConf
 
getOutputFile(String, int) - Static method in class org.apache.nutch.mapReduce.MapOutputFile
Create a local map output file name.
getOutputFormat() - Method in class org.apache.nutch.mapReduce.JobConf
 
getOutputKeyClass() - Method in class org.apache.nutch.mapReduce.JobConf
 
getOutputKeyComparatorClass() - Method in class org.apache.nutch.mapReduce.JobConf
 
getOutputValueClass() - Method in class org.apache.nutch.mapReduce.JobConf
 
getPage(UTF8, Page) - Method in class org.apache.nutch.db.DBSectionReader
Fetch a Page with the given URL, and fill it into the pre-allocated Page 'p'.
getPage(String) - Method in class org.apache.nutch.db.DistributedWebDBReader
Get Page from the pagedb with the given URL.
getPage() - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction
 
getPage(String) - Method in interface org.apache.nutch.db.IWebDBReader
Return a Page object with the given URL, if any.
getPage(String) - Method in class org.apache.nutch.db.WebDBReader
Get Page from the pagedb with the given URL
getPage() - Method in class org.apache.nutch.db.WebDBWriter.PageInstruction
 
getPage() - Method in class org.apache.nutch.pagedb.FetchListEntry
 
getPageCount() - Method in class org.apache.nutch.fetcher.Fetcher.FetcherStatus
 
getPages(MD5Hash) - Method in class org.apache.nutch.db.DBSectionReader
Get Pages from the db according to their content hash.
getPages(MD5Hash) - Method in class org.apache.nutch.db.DistributedWebDBReader
Get all the Pages according to their content hash.
getPages(MD5Hash) - Method in interface org.apache.nutch.db.IWebDBReader
Return any Pages with the given MD5 checksum.
getPages(MD5Hash) - Method in class org.apache.nutch.db.WebDBReader
Get Pages from the pagedb according to their content hash.
getParent() - Method in class org.apache.nutch.ndfs.NDFSFileInfo
 
getParse(Content) - Method in interface org.apache.nutch.parse.Parser
Creates the parse for some content.
getParse(Content) - Method in class org.apache.nutch.parse.html.HtmlParser
 
getParse(Content) - Method in class org.apache.nutch.parse.js.JSParseFilter
 
getParse(Content) - Method in class org.apache.nutch.parse.msword.MSWordParser
 
getParse(Content) - Method in class org.apache.nutch.parse.pdf.PdfParser
 
getParse(Content) - Method in class org.apache.nutch.parse.text.TextParser
 
getParseData(HitDetails) - Method in class org.apache.nutch.searcher.DistributedSearch.Client
 
getParseData(HitDetails) - Method in class org.apache.nutch.searcher.FetchedSegments
 
getParseData(HitDetails) - Method in interface org.apache.nutch.searcher.HitContent
Returns the ParseData of a hit document.
getParseData(HitDetails) - Method in class org.apache.nutch.searcher.NutchBean
 
getParseText(HitDetails) - Method in class org.apache.nutch.searcher.DistributedSearch.Client
 
getParseText(HitDetails) - Method in class org.apache.nutch.searcher.FetchedSegments
 
getParseText(HitDetails) - Method in interface org.apache.nutch.searcher.HitContent
Returns the ParseText of a hit document.
getParseText(HitDetails) - Method in class org.apache.nutch.searcher.NutchBean
 
getParser() - Static method in class org.apache.nutch.ontology.OntologyImpl
 
getParser(String, String) - Static method in class org.apache.nutch.parse.ParserFactory
Returns the appropriate Parser implementation given a content type and url.
getPartition(WritableComparable, int) - Method in interface org.apache.nutch.mapReduce.Partitioner
Returns the paritition number for a given key given the total number of partitions.
getPartition() - Method in class org.apache.nutch.mapReduce.ReduceTask
 
getPartition(WritableComparable, int) - Method in class org.apache.nutch.mapReduce.lib.HashPartitioner
Use Object.hashCode() to partition.
getPartitionerClass() - Method in class org.apache.nutch.mapReduce.JobConf
 
getPath() - Method in class org.apache.nutch.ndfs.NDFSFileInfo
 
getPhrase() - Method in class org.apache.nutch.searcher.Query.Clause
 
getPluginClass() - Method in class org.apache.nutch.plugin.PluginDescriptor
Returns the fully qualified name of the class which implements the abstarct Plugin class.
getPluginDescriptor(String) - Method in class org.apache.nutch.plugin.PluginRepository
Returns the descriptor of one plugin identified by a plugin id.
getPluginDescriptors() - Method in class org.apache.nutch.plugin.PluginRepository
Returns all registed plugin descriptors.
getPluginId() - Method in class org.apache.nutch.plugin.PluginDescriptor
Returns the unique identifier of the plug-in or null.
getPluginInstance(PluginDescriptor) - Method in class org.apache.nutch.plugin.PluginRepository
Returns a instance of a plugin.
getPluginPath() - Method in class org.apache.nutch.plugin.PluginDescriptor
Returns the directory path of the plugin.
getPort() - Method in class org.apache.nutch.mapReduce.MapOutputLocation
The port listening for MapOutputProtocol connections.
getPort() - Method in class org.apache.nutch.mapReduce.TaskTrackerStatus
 
getPos() - Method in class org.apache.nutch.fs.NFSDataInputStream
 
getPos() - Method in class org.apache.nutch.fs.NFSDataOutputStream
 
getPos() - Method in class org.apache.nutch.fs.NFSInputStream
Return the current offset from the start of the file
getPos() - Method in class org.apache.nutch.fs.NFSOutputStream
Return the current offset from the start of the file
getPos() - Method in interface org.apache.nutch.mapReduce.RecordReader
Returns the current position in the input.
getPosition() - Method in class org.apache.nutch.io.DataInputBuffer
Returns the current position in the input.
getPosition() - Method in class org.apache.nutch.io.SequenceFile.Reader
Return the current byte position in the input file.
getPrimaryType() - Method in class org.apache.nutch.util.mime.MimeType
Return the primary type of this mime-type.
getProfile() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
getProgress() - Method in class org.apache.nutch.mapReduce.TaskStatus
 
getProtocol(String) - Static method in class org.apache.nutch.protocol.ProtocolFactory
Returns the appropriate Protocol implementation for a url.
getProtocolOutput(String) - Method in interface org.apache.nutch.protocol.Protocol
Returns the Content for a url.
getProtocolOutput(FetchListEntry) - Method in interface org.apache.nutch.protocol.Protocol
Returns the Content for a fetchlist entry.
getProtocolOutput(String) - Method in class org.apache.nutch.protocol.file.File
 
getProtocolOutput(FetchListEntry) - Method in class org.apache.nutch.protocol.file.File
 
getProtocolOutput(String) - Method in class org.apache.nutch.protocol.ftp.Ftp
 
getProtocolOutput(FetchListEntry) - Method in class org.apache.nutch.protocol.ftp.Ftp
 
getProtocolOutput(String) - Method in class org.apache.nutch.protocol.http.Http
 
getProtocolOutput(FetchListEntry) - Method in class org.apache.nutch.protocol.http.Http
 
getProtocolOutput(String) - Method in class org.apache.nutch.protocol.httpclient.Http
 
getProtocolOutput(FetchListEntry) - Method in class org.apache.nutch.protocol.httpclient.Http
 
getProtocolStatus() - Method in class org.apache.nutch.fetcher.FetcherOutput
 
getProxy(Class, InetSocketAddress) - Static method in class org.apache.nutch.ipc.RPC
Construct a client-side proxy object that implements the named protocol, talking to a server at the named address.
getRealm() - Method in interface org.apache.nutch.protocol.httpclient.HttpAuthentication
Gets the realm used by the HttpAuthentication object during creation.
getRealm() - Method in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
Gets the realm attribute of the HttpBasicAuthentication object.
getRecordReader(NutchFileSystem, FileSplit, JobConf) - Method in interface org.apache.nutch.mapReduce.InputFormat
Construct a RecordReader for a FileSplit.
getRecordReader(NutchFileSystem, FileSplit, JobConf) - Method in class org.apache.nutch.mapReduce.InputFormatBase
 
getRecordReader(NutchFileSystem, FileSplit, JobConf) - Method in class org.apache.nutch.mapReduce.SequenceFileInputFormat
 
getRecordReader(NutchFileSystem, FileSplit, JobConf) - Method in class org.apache.nutch.mapReduce.TextInputFormat
 
getRecordWriter(NutchFileSystem, JobConf, String) - Method in interface org.apache.nutch.mapReduce.OutputFormat
Construct a RecordWriter.
getRecordWriter(NutchFileSystem, JobConf, String) - Method in class org.apache.nutch.mapReduce.SequenceFileOutputFormat
 
getRecordWriter(NutchFileSystem, JobConf, String) - Method in class org.apache.nutch.mapReduce.TextOutputFormat
 
getReducerClass() - Method in class org.apache.nutch.mapReduce.JobConf
 
getRefresh() - Method in class org.apache.nutch.parse.HTMLMetaTags
A convenience method.
getRefreshHref() - Method in class org.apache.nutch.parse.HTMLMetaTags
A convenience method.
getRefreshTime() - Method in class org.apache.nutch.parse.HTMLMetaTags
A convenience method.
getRemaining() - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
getRemaining() - Method in class org.apache.nutch.ndfs.FSDataset
Return how many bytes can still be stored in the FSDataset
getRemaining() - Method in class org.apache.nutch.ndfs.HeartbeatData
 
getRequiredSuccessorCapabilities() - Method in class org.apache.nutch.clustering.carrot2.LocalNutchInputComponent
Returns the capabilities required from the successor component.
getResource(String) - Method in class org.apache.nutch.util.NutchConf
Returns the URL for the named resource.
getResourceString(String, Locale) - Method in class org.apache.nutch.plugin.PluginDescriptor
Returns a I18N'd resource string.
getRetriesSinceFetch() - Method in class org.apache.nutch.db.Page
 
getRootNode() - Method in class org.apache.nutch.parse.html.DOMBuilder
Get the root node of the DOM being created.
getRunState() - Method in class org.apache.nutch.mapReduce.JobStatus
 
getRunState() - Method in class org.apache.nutch.mapReduce.TaskStatus
 
getSchema() - Method in class org.apache.nutch.plugin.ExtensionPoint
Returns a path to the xml schema of a extension point.
getScore() - Method in class org.apache.nutch.db.Page
 
getScore() - Method in class org.apache.nutch.linkdb.LinkAnalysisEntry
 
getSegmentNames() - Method in class org.apache.nutch.searcher.DistributedSearch.Client
Return the names of segments searched.
getSegmentNames() - Method in interface org.apache.nutch.searcher.DistributedSearch.Protocol
The name of the segments searched by this node.
getSegmentNames() - Method in class org.apache.nutch.searcher.FetchedSegments
 
getSegmentNames() - Method in class org.apache.nutch.searcher.NutchBean
 
getServer(Object, int) - Static method in class org.apache.nutch.ipc.RPC
Construct a server for a protocol implementation instance listening on a port.
getServer(Object, int, int, boolean) - Static method in class org.apache.nutch.ipc.RPC
Construct a server for a protocol implementation instance listening on a port.
getSimilarity(NGramProfile) - Method in class org.apache.nutch.analysis.lang.NGramProfile
Calculate a score how well NGramProfiles match each other The similarity calculation is at experimental level.
getSortValue() - Method in class org.apache.nutch.searcher.Hit
Return the value of the field that hits are sorted on.
getSorted() - Method in class org.apache.nutch.analysis.lang.NGramProfile
Return a sorted list of ngrams.
getSplit() - Method in class org.apache.nutch.mapReduce.MapTask
 
getSplits(NutchFileSystem, JobConf, int) - Method in interface org.apache.nutch.mapReduce.InputFormat
Splits a set of input files.
getSplits(NutchFileSystem, JobConf, int) - Method in class org.apache.nutch.mapReduce.InputFormatBase
Splits files returned by {#listFiles(NutchFileSystem,JobConf) when they're too big.
getStart() - Method in class org.apache.nutch.mapReduce.FileSplit
The position of the first byte in the file to process.
getStartTime() - Method in class org.apache.nutch.fetcher.Fetcher.FetcherStatus
 
getStartTime() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
getStartTime() - Method in class org.apache.nutch.mapReduce.JobTracker
 
getStatus() - Method in class org.apache.nutch.fetcher.Fetcher
 
getStatus() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
getStatus() - Method in class org.apache.nutch.parse.ParseData
The status of parsing the page.
getStatus() - Method in class org.apache.nutch.protocol.ProtocolOutput
 
getStatus() - Method in class org.apache.nutch.tools.SegmentMergeTool
 
getStrings(String) - Method in class org.apache.nutch.util.NutchConf
Returns the value of the name property as an array of strings.
getSubType() - Method in class org.apache.nutch.util.mime.MimeType
Return the sub type of this mime-type.
getSubclusters() - Method in interface org.apache.nutch.clustering.HitsCluster
 
getSubclusters() - Method in class org.apache.nutch.clustering.carrot2.HitsClusterAdapter
 
getSummary(HitDetails, Query) - Method in class org.apache.nutch.searcher.DistributedSearch.Client
 
getSummary(HitDetails[], Query) - Method in class org.apache.nutch.searcher.DistributedSearch.Client
 
getSummary(HitDetails, Query) - Method in class org.apache.nutch.searcher.FetchedSegments
 
getSummary(HitDetails[], Query) - Method in class org.apache.nutch.searcher.FetchedSegments
 
getSummary(HitDetails, Query) - Method in interface org.apache.nutch.searcher.HitSummarizer
Returns a summary for the given hit details.
getSummary(HitDetails[], Query) - Method in interface org.apache.nutch.searcher.HitSummarizer
Returns summaries for a set of details.
getSummary(HitDetails, Query) - Method in class org.apache.nutch.searcher.NutchBean
 
getSummary(HitDetails[], Query) - Method in class org.apache.nutch.searcher.NutchBean
 
getSummary(String, Query) - Method in class org.apache.nutch.searcher.Summarizer
Returns a summary for the given pre-tokenized text.
getSystemDir() - Static method in class org.apache.nutch.mapReduce.JobConf
 
getSystemName() - Method in class org.apache.nutch.protocol.ftp.Client
Fetches the system type name from the server and returns the string.
getTargetPoint() - Method in class org.apache.nutch.plugin.Extension
Returns the Id of the extension point, that is implemented by this extension.
getTask(String) - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
getTask(String) - Method in class org.apache.nutch.mapReduce.TaskTracker
Called upon startup by the child process, to fetch Task data.
getTask(String) - Method in interface org.apache.nutch.mapReduce.TaskUmbilicalProtocol
Called when a child task process starts, to get its task.
getTaskId() - Method in class org.apache.nutch.mapReduce.Task
 
getTaskId() - Method in class org.apache.nutch.mapReduce.TaskStatus
 
getTaskStatus(String) - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
getTaskTracker(String) - Method in class org.apache.nutch.mapReduce.JobTracker
 
getTerm() - Method in class org.apache.nutch.searcher.Query.Clause
 
getTerms() - Method in class org.apache.nutch.searcher.Query.Phrase
 
getTerms() - Method in class org.apache.nutch.searcher.Query
Flattens a query into the set of text terms that it contains.
getText() - Method in interface org.apache.nutch.parse.Parse
The textual content of the page.
getText() - Method in class org.apache.nutch.parse.ParseImpl
 
getText() - Method in class org.apache.nutch.parse.ParseText
 
getText(StringBuffer, Node, boolean) - Static method in class org.apache.nutch.parse.html.DOMContentUtils
This method takes a StringBuffer and a DOM Node, and will append all the content text found beneath the DOM node to the StringBuffer.
getText(StringBuffer, Node) - Static method in class org.apache.nutch.parse.html.DOMContentUtils
This is a convinience method, equivalent to getText(sb, node, false).
getText() - Method in class org.apache.nutch.searcher.Summary.Fragment
Returns the text of this fragment.
getTextRuns() - Method in class org.apache.nutch.parse.msword.chp.Word6CHPBinTable
 
getThrownError() - Method in class org.apache.nutch.util.CommandRunner
 
getTimeout() - Method in class org.apache.nutch.util.CommandRunner
 
getTitle() - Method in class org.apache.nutch.parse.ParseData
The title of the page.
getTitle(StringBuffer, Node) - Static method in class org.apache.nutch.parse.html.DOMContentUtils
This method takes a StringBuffer and a DOM Node, and will append the content text found beneath the first title node to the StringBuffer.
getToUrl() - Method in class org.apache.nutch.parse.Outlink
 
getToken(int) - Method in class org.apache.nutch.analysis.NutchAnalysis
 
getToken(int) - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
getTotal() - Method in class org.apache.nutch.searcher.Hits
Returns the total number of hits for this query.
getTotalSubmissions() - Method in class org.apache.nutch.mapReduce.JobTracker
 
getTracker() - Static method in class org.apache.nutch.mapReduce.JobTracker
 
getTrackerName() - Method in class org.apache.nutch.mapReduce.TaskTrackerStatus
 
getTrackerPort() - Method in class org.apache.nutch.mapReduce.JobTracker
 
getTrackingURL() - Method in interface org.apache.nutch.mapReduce.RunningJob
Returns a URL where some job progress information will be displayed.
getURL() - Method in class org.apache.nutch.db.Link
 
getURL() - Method in class org.apache.nutch.db.Page
 
getURL() - Method in class org.apache.nutch.mapReduce.JobProfile
 
getUrl() - Method in class org.apache.nutch.fetcher.FetcherOutput
 
getUrl() - Method in interface org.apache.nutch.net.protocols.Response
Returns the URL used to retrieve this response.
getUrl() - Method in class org.apache.nutch.pagedb.FetchListEntry
 
getUrl() - Method in class org.apache.nutch.parse.ParserNotFound
 
getUrl() - Method in class org.apache.nutch.protocol.Content
The url fetched.
getUrl() - Method in class org.apache.nutch.protocol.ProtocolNotFound
 
getUrl() - Method in class org.apache.nutch.protocol.ResourceGone
 
getUrl() - Method in class org.apache.nutch.protocol.RetryLater
 
getValue(int) - Method in class org.apache.nutch.searcher.HitDetails
Returns the value of the ith field.
getValue(String) - Method in class org.apache.nutch.searcher.HitDetails
Returns the value of the first field with the specified name.
getValueClass() - Method in class org.apache.nutch.io.ArrayWritable
 
getValueClass() - Method in class org.apache.nutch.io.MapFile.Reader
Returns the class of values in this file.
getValueClass() - Method in class org.apache.nutch.io.SequenceFile.Reader
Returns the class of values in this file.
getValueClass() - Method in class org.apache.nutch.io.SequenceFile.Writer
Returns the class of values in this file.
getValues() - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
getVersion() - Method in class org.apache.nutch.fetcher.FetcherOutput
 
getVersion() - Method in class org.apache.nutch.io.VersionedWritable
Return the version number of the current implementation.
getVersion() - Method in class org.apache.nutch.linkdb.LinkAnalysisEntry
 
getVersion() - Method in class org.apache.nutch.parse.ParseData
 
getVersion() - Method in class org.apache.nutch.parse.ParseStatus
 
getVersion() - Method in class org.apache.nutch.parse.ParseText
 
getVersion() - Method in class org.apache.nutch.protocol.Content
 
getVersion() - Method in class org.apache.nutch.protocol.ProtocolStatus
 
getWaitForExit() - Method in class org.apache.nutch.util.CommandRunner
 
getWeight() - Method in class org.apache.nutch.searcher.Query.Clause
 
getWriter() - Method in class org.apache.nutch.parse.html.DOMBuilder
Return null since there is no Writer for this class.
gotHeartbeat(UTF8, long, long) - Method in class org.apache.nutch.ndfs.FSNamesystem
The given node has reported in.

H

HEARTBEAT_INTERVAL - Static variable in interface org.apache.nutch.mapReduce.MRConstants
 
HEARTBEAT_INTERVAL - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
HTMLLanguageParser - class org.apache.nutch.analysis.lang.HTMLLanguageParser.
An HtmlParseFilter that looks for possible indications of content language.
HTMLLanguageParser() - Constructor for class org.apache.nutch.analysis.lang.HTMLLanguageParser
 
HTMLMetaProcessor - class org.apache.nutch.parse.html.HTMLMetaProcessor.
Class for parsing META Directives from DOM trees.
HTMLMetaProcessor() - Constructor for class org.apache.nutch.parse.html.HTMLMetaProcessor
 
HTMLMetaTags - class org.apache.nutch.parse.HTMLMetaTags.
This class holds the information about HTML "meta" tags extracted from a page.
HTMLMetaTags() - Constructor for class org.apache.nutch.parse.HTMLMetaTags
 
HashPartitioner - class org.apache.nutch.mapReduce.lib.HashPartitioner.
Partition keys by their Object.hashCode().
HashPartitioner() - Constructor for class org.apache.nutch.mapReduce.lib.HashPartitioner
 
HeartbeatData - class org.apache.nutch.ndfs.HeartbeatData.
Heartbeat data
HeartbeatData() - Constructor for class org.apache.nutch.ndfs.HeartbeatData
 
HeartbeatData(String, long, long) - Constructor for class org.apache.nutch.ndfs.HeartbeatData
 
HighFreqTerms - class org.apache.nutch.indexer.HighFreqTerms.
Lists the most frequent terms in an index.
HighFreqTerms() - Constructor for class org.apache.nutch.indexer.HighFreqTerms
 
Hit - class org.apache.nutch.searcher.Hit.
A document which matched a query in an index.
Hit() - Constructor for class org.apache.nutch.searcher.Hit
 
Hit(int, int) - Constructor for class org.apache.nutch.searcher.Hit
 
Hit(int, int, WritableComparable, String) - Constructor for class org.apache.nutch.searcher.Hit
 
Hit(int, WritableComparable, String) - Constructor for class org.apache.nutch.searcher.Hit
 
HitContent - interface org.apache.nutch.searcher.HitContent.
Service that returns the content of a hit.
HitDetailer - interface org.apache.nutch.searcher.HitDetailer.
Service that returns details of a hit within an index.
HitDetails - class org.apache.nutch.searcher.HitDetails.
Data stored in the index for a hit.
HitDetails() - Constructor for class org.apache.nutch.searcher.HitDetails
 
HitDetails(String[], String[]) - Constructor for class org.apache.nutch.searcher.HitDetails
Construct from field names and values arrays.
HitDetails(String, String) - Constructor for class org.apache.nutch.searcher.HitDetails
Construct minimal details from a segment name and document number.
HitSummarizer - interface org.apache.nutch.searcher.HitSummarizer.
Service that builds a summary for a hit on a query.
Hits - class org.apache.nutch.searcher.Hits.
A set of hits matching a query.
Hits() - Constructor for class org.apache.nutch.searcher.Hits
 
Hits(long, Hit[]) - Constructor for class org.apache.nutch.searcher.Hits
 
HitsCluster - interface org.apache.nutch.clustering.HitsCluster.
An interface representing a group of hits.
HitsClusterAdapter - class org.apache.nutch.clustering.carrot2.HitsClusterAdapter.
An adapter of Carrot2's RawCluster interface to HitsCluster interface.
HitsClusterAdapter(RawCluster, HitDetails[]) - Constructor for class org.apache.nutch.clustering.carrot2.HitsClusterAdapter
Creates a new adapter.
HtmlParseFilter - interface org.apache.nutch.parse.HtmlParseFilter.
Extension point for DOM-based HTML parsers.
HtmlParseFilters - class org.apache.nutch.parse.HtmlParseFilters.
Creates and caches HtmlParseFilter implementing plugins.
HtmlParser - class org.apache.nutch.parse.html.HtmlParser.
 
HtmlParser() - Constructor for class org.apache.nutch.parse.html.HtmlParser
 
Http - class org.apache.nutch.protocol.http.Http.
An implementation of the Http protocol.
Http() - Constructor for class org.apache.nutch.protocol.http.Http
 
Http - class org.apache.nutch.protocol.httpclient.Http.
An implementation of the Http protocol.
Http() - Constructor for class org.apache.nutch.protocol.httpclient.Http
 
HttpAuthentication - interface org.apache.nutch.protocol.httpclient.HttpAuthentication.
The base level of services required for Http Authentication
HttpAuthenticationException - exception org.apache.nutch.protocol.httpclient.HttpAuthenticationException.
Can be used to identify problems during creation of Authentication objects.
HttpAuthenticationException() - Constructor for class org.apache.nutch.protocol.httpclient.HttpAuthenticationException
Constructs a new exception with null as its detail message.
HttpAuthenticationException(String) - Constructor for class org.apache.nutch.protocol.httpclient.HttpAuthenticationException
Constructs a new exception with the specified detail message.
HttpAuthenticationException(String, Throwable) - Constructor for class org.apache.nutch.protocol.httpclient.HttpAuthenticationException
Constructs a new exception with the specified message and cause.
HttpAuthenticationException(Throwable) - Constructor for class org.apache.nutch.protocol.httpclient.HttpAuthenticationException
Constructs a new exception with the specified cause and detail message from given clause if it is not null.
HttpAuthenticationFactory - class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory.
Provides the Http protocol implementation with the ability to authenticate when prompted.
HttpBasicAuthentication - class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication.
Implementation of RFC 2617 Basic Authentication.
HttpBasicAuthentication(String) - Constructor for class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
Construct an HttpBasicAuthentication for the given challenge parameters.
HttpDateFormat - class org.apache.nutch.net.protocols.HttpDateFormat.
class to handle HTTP dates.
HttpDateFormat() - Constructor for class org.apache.nutch.net.protocols.HttpDateFormat
 
HttpError - exception org.apache.nutch.protocol.http.HttpError.
Thrown for HTTP error codes.
HttpError(int) - Constructor for class org.apache.nutch.protocol.http.HttpError
 
HttpError - exception org.apache.nutch.protocol.httpclient.HttpError.
Thrown for HTTP error codes.
HttpError(int) - Constructor for class org.apache.nutch.protocol.httpclient.HttpError
 
HttpException - exception org.apache.nutch.protocol.http.HttpException.
 
HttpException() - Constructor for class org.apache.nutch.protocol.http.HttpException
 
HttpException(String) - Constructor for class org.apache.nutch.protocol.http.HttpException
 
HttpException(String, Throwable) - Constructor for class org.apache.nutch.protocol.http.HttpException
 
HttpException(Throwable) - Constructor for class org.apache.nutch.protocol.http.HttpException
 
HttpException - exception org.apache.nutch.protocol.httpclient.HttpException.
 
HttpException() - Constructor for class org.apache.nutch.protocol.httpclient.HttpException
 
HttpException(String) - Constructor for class org.apache.nutch.protocol.httpclient.HttpException
 
HttpException(String, Throwable) - Constructor for class org.apache.nutch.protocol.httpclient.HttpException
 
HttpException(Throwable) - Constructor for class org.apache.nutch.protocol.httpclient.HttpException
 
HttpResponse - class org.apache.nutch.protocol.http.HttpResponse.
An HTTP response.
HttpResponse(URL) - Constructor for class org.apache.nutch.protocol.http.HttpResponse
 
HttpResponse(String, URL) - Constructor for class org.apache.nutch.protocol.http.HttpResponse
 
HttpResponse - class org.apache.nutch.protocol.httpclient.HttpResponse.
An HTTP response.
HttpResponse(URL) - Constructor for class org.apache.nutch.protocol.httpclient.HttpResponse
 
halfDigest() - Method in class org.apache.nutch.io.MD5Hash
Construct a half-sized version of this MD5.
handle(String, String, HttpRequest, HttpResponse) - Method in class org.apache.nutch.mapReduce.JobTrackerInfoServer.RedirectHandler
 
hasLoggedSevere() - Static method in class org.apache.nutch.util.LogFormatter
Returns true if this LogFormatter has logged something at Level.SEVERE
hashCode() - Method in class org.apache.nutch.db.Page
 
hashCode() - Method in class org.apache.nutch.io.BooleanWritable
 
hashCode() - Method in class org.apache.nutch.io.FloatWritable
 
hashCode() - Method in class org.apache.nutch.io.IntWritable
 
hashCode() - Method in class org.apache.nutch.io.LongWritable
 
hashCode() - Method in class org.apache.nutch.io.MD5Hash
Returns a hash code value for this object.
hashCode() - Method in class org.apache.nutch.io.UTF8
 
hashCode() - Method in class org.apache.nutch.protocol.httpclient.DummySSLProtocolSocketFactory
 
hashCode() - Method in class org.apache.nutch.searcher.Hit
 
hashCode() - Method in class org.apache.nutch.searcher.Query.Clause
 
hashCode() - Method in class org.apache.nutch.searcher.Query.Phrase
 
hashCode() - Method in class org.apache.nutch.searcher.Query.Term
 
hashCode() - Method in class org.apache.nutch.searcher.Query
 
hashCode() - Method in class org.apache.nutch.util.mime.MimeType
 

I

IGNORE_INTERNAL_LINKS - Static variable in class org.apache.nutch.tools.UpdateDatabaseTool
 
INDEX_FILE_NAME - Static variable in class org.apache.nutch.io.MapFile
The name of the index file.
INDEX_MERGE_FACTOR - Static variable in class org.apache.nutch.tools.SegmentMergeTool
 
INDEX_MIN_MERGE_DOCS - Static variable in class org.apache.nutch.tools.SegmentMergeTool
 
INDEX_SIZE - Static variable in class org.apache.nutch.tools.SegmentMergeTool
Temporary de-dup index size.
INDEX_SKIP - Static variable in class org.apache.nutch.io.MapFile
Number of index entries to skip between each entry.
INTER_ANCHOR_GAP - Static variable in class org.apache.nutch.analysis.NutchDocumentAnalyzer
The number of unused term positions between anchors in the anchor field.
IRREGULAR_WORD - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
IWebDBReader - interface org.apache.nutch.db.IWebDBReader.
IWebDBReader is an interface to the consolidated page/link database.
IWebDBWriter - interface org.apache.nutch.db.IWebDBWriter.
IWebDBWriter is an interface to the consolidated page/link database.
IdentityMapper - class org.apache.nutch.mapReduce.lib.IdentityMapper.
Implements the identity function, mapping inputs directly to outputs.
IdentityMapper() - Constructor for class org.apache.nutch.mapReduce.lib.IdentityMapper
 
IdentityReducer - class org.apache.nutch.mapReduce.lib.IdentityReducer.
Performs no reduction, writing all input values directly to the output.
IdentityReducer() - Constructor for class org.apache.nutch.mapReduce.lib.IdentityReducer
 
IndexMerger - class org.apache.nutch.indexer.IndexMerger.
IndexMerger creates an index for the output corresponding to a single fetcher run.
IndexMerger(NutchFileSystem, File[], File, File) - Constructor for class org.apache.nutch.indexer.IndexMerger
Merge all of the segments given
IndexOptimizer - class org.apache.nutch.indexer.IndexOptimizer.
 
IndexOptimizer(File) - Constructor for class org.apache.nutch.indexer.IndexOptimizer
 
IndexSearcher - class org.apache.nutch.searcher.IndexSearcher.
Implements Searcher and HitDetailer for either a single merged index, or for a set of individual segment indexes.
IndexSearcher(File[]) - Constructor for class org.apache.nutch.searcher.IndexSearcher
Construct given a number of indexed segments.
IndexSearcher(String) - Constructor for class org.apache.nutch.searcher.IndexSearcher
Construct given a directory containing fetched segments, and a separate directory naming their merged index.
IndexSegment - class org.apache.nutch.indexer.IndexSegment.
Creates an index for the output corresponding to a single fetcher run.
IndexSegment(NutchFileSystem, long, File, File) - Constructor for class org.apache.nutch.indexer.IndexSegment
Index a segment in the given NFS.
IndexingException - exception org.apache.nutch.indexer.IndexingException.
 
IndexingException() - Constructor for class org.apache.nutch.indexer.IndexingException
 
IndexingException(String) - Constructor for class org.apache.nutch.indexer.IndexingException
 
IndexingException(String, Throwable) - Constructor for class org.apache.nutch.indexer.IndexingException
 
IndexingException(Throwable) - Constructor for class org.apache.nutch.indexer.IndexingException
 
IndexingFilter - interface org.apache.nutch.indexer.IndexingFilter.
Extension point for indexing.
IndexingFilters - class org.apache.nutch.indexer.IndexingFilters.
Creates and caches IndexingFilter implementing plugins.
InputFormat - interface org.apache.nutch.mapReduce.InputFormat.
An input data format.
InputFormatBase - class org.apache.nutch.mapReduce.InputFormatBase.
A base class for InputFormat.
InputFormatBase() - Constructor for class org.apache.nutch.mapReduce.InputFormatBase
 
InputFormats - class org.apache.nutch.mapReduce.InputFormats.
Repository of named InputFormats.
IntWritable - class org.apache.nutch.io.IntWritable.
A WritableComparable for ints.
IntWritable() - Constructor for class org.apache.nutch.io.IntWritable
 
IntWritable(int) - Constructor for class org.apache.nutch.io.IntWritable
 
IntWritable.Comparator - class org.apache.nutch.io.IntWritable.Comparator.
A Comparator optimized for IntWritable.
IntWritable.Comparator() - Constructor for class org.apache.nutch.io.IntWritable.Comparator
 
InterTrackerProtocol - interface org.apache.nutch.mapReduce.InterTrackerProtocol.
Protocol that a TaskTracker and the central JobTracker use to communicate.
InverseMapper - class org.apache.nutch.mapReduce.lib.InverseMapper.
A Mapper that swaps keys and values.
InverseMapper() - Constructor for class org.apache.nutch.mapReduce.lib.InverseMapper
 
identify(String) - Method in class org.apache.nutch.analysis.lang.LanguageIdentifier
Identify language of a content.
identify(StringBuffer) - Method in class org.apache.nutch.analysis.lang.LanguageIdentifier
Identify language of a content.
identify(InputStream) - Method in class org.apache.nutch.analysis.lang.LanguageIdentifier
Identify language from input stream.
identify(InputStream, String) - Method in class org.apache.nutch.analysis.lang.LanguageIdentifier
Identify language from input stream.
ignorableWhitespace(char[], int, int) - Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of ignorable whitespace in element content.
image - Variable in class org.apache.nutch.quality.dynamic.Token
The string image of the token.
inBuf - Variable in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
indent(PrintStream, int) - Static method in class org.apache.nutch.ontology.OntologyImpl
 
indexPages() - Method in class org.apache.nutch.indexer.IndexSegment
 
infix() - Method in class org.apache.nutch.analysis.NutchAnalysis
Characters which can be used to form compound terms.
init(ServletConfig) - Method in class org.apache.nutch.searcher.OpenSearchServlet
 
init() - Method in class org.apache.nutch.servlet.Cached
 
initRound(int, File) - Method in class org.apache.nutch.tools.DistributedAnalysisTool
This method prepares the ground for a set of processes to distribute a round of LinkAnalysis work.
initialize(String) - Method in class org.apache.nutch.mapReduce.JobTracker
 
injectDmozFile(File, int, boolean, boolean, int, Pattern) - Method in class org.apache.nutch.db.WebDBInjector
Iterate through all the items in this structured DMOZ file.
injectURLFile(File) - Method in class org.apache.nutch.db.WebDBInjector
Iterate through all the items in this flat text file and add them to the db.
inputItem(HashMap) - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
inputSegments - Variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
inputStream - Variable in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
input_stream - Variable in class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
input_stream - Variable in class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
invalidate(Block[]) - Method in class org.apache.nutch.ndfs.FSDataset
We're informed that a block is no longer valid.
isAllowed(String) - Method in class org.apache.nutch.protocol.http.RobotRulesParser.RobotRuleSet
Returns false if the robots.txt file prohibits us from accessing the given path, or true otherwise.
isAllowed(URL) - Static method in class org.apache.nutch.protocol.http.RobotRulesParser
 
isAllowed(String) - Method in class org.apache.nutch.protocol.httpclient.RobotRulesParser.RobotRuleSet
Returns false if the robots.txt file prohibits us from accessing the given path, or true otherwise.
isAllowed(URL) - Static method in class org.apache.nutch.protocol.httpclient.RobotRulesParser
 
isBlockFilename(File) - Static method in class org.apache.nutch.ndfs.Block
 
isClientTrusted(X509Certificate[]) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
 
isComplete() - Method in interface org.apache.nutch.mapReduce.RunningJob
Non-blocking function to check whether the job is finished or not.
isDir(UTF8) - Method in class org.apache.nutch.ndfs.FSDirectory
Check whether it's a directory
isDir(UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
Whether the given name is a directory
isDir() - Method in class org.apache.nutch.ndfs.NDFSFileInfo
 
isDirectory(File) - Method in class org.apache.nutch.fs.LocalFileSystem
 
isDirectory(File) - Method in class org.apache.nutch.fs.NDFSFileSystem
 
isDirectory(File) - Method in class org.apache.nutch.fs.NutchFileSystem
 
isDirectory(UTF8) - Method in class org.apache.nutch.ndfs.NDFSClient
 
isDirectory() - Method in class org.apache.nutch.ndfs.NDFSFile
We need to reimplement some of them
isEllipsis() - Method in class org.apache.nutch.searcher.Summary.Ellipsis
Returns true.
isEllipsis() - Method in class org.apache.nutch.searcher.Summary.Fragment
Returns true iff this fragment is an ellipsis.
isField(String) - Static method in class org.apache.nutch.searcher.QueryFilters
 
isFile(File) - Method in class org.apache.nutch.fs.NutchFileSystem
 
isFile() - Method in class org.apache.nutch.ndfs.NDFSFile
 
isHidden() - Method in class org.apache.nutch.ndfs.NDFSFile
 
isHighlight() - Method in class org.apache.nutch.searcher.Summary.Fragment
Returns true iff this fragment is to be highlighted.
isHighlight() - Method in class org.apache.nutch.searcher.Summary.Highlight
Returns true.
isJunkCluster() - Method in interface org.apache.nutch.clustering.HitsCluster
Returns true if this cluster constains documents that did not fit anywhere else (presentation layer may discard such clusters).
isJunkCluster() - Method in class org.apache.nutch.clustering.carrot2.HitsClusterAdapter
 
isParsed - Variable in class org.apache.nutch.segment.SegmentReader
 
isParsedSegment(NutchFileSystem, File) - Static method in class org.apache.nutch.segment.SegmentReader
 
isPermanentFailure() - Method in class org.apache.nutch.protocol.ProtocolStatus
 
isPhrase() - Method in class org.apache.nutch.searcher.Query.Clause
 
isProhibited() - Method in class org.apache.nutch.searcher.Query.Clause
 
isPrunable(Query, IndexReader, int) - Method in class org.apache.nutch.tools.PruneIndexTool.PrintFieldsChecker
 
isPrunable(Query, IndexReader, int) - Method in interface org.apache.nutch.tools.PruneIndexTool.PruneChecker
Check whether this document should be pruned.
isPrunable(Query, IndexReader, int) - Method in class org.apache.nutch.tools.PruneIndexTool.StoreUrlsChecker
 
isRawField(String) - Static method in class org.apache.nutch.searcher.QueryFilters
 
isRemoteVerificationEnabled() - Method in class org.apache.nutch.protocol.ftp.Client
Return whether or not verification of the remote host participating in data connections is enabled.
isRequired() - Method in class org.apache.nutch.searcher.Query.Clause
 
isServerTrusted(X509Certificate[]) - Method in class org.apache.nutch.protocol.httpclient.DummyX509TrustManager
 
isStopWord(String) - Static method in class org.apache.nutch.analysis.NutchAnalysis
True iff word is a stop word.
isSuccess() - Method in class org.apache.nutch.parse.ParseStatus
A convenience method.
isSuccess() - Method in class org.apache.nutch.protocol.ProtocolStatus
 
isTransientFailure() - Method in class org.apache.nutch.protocol.ProtocolStatus
 
isValidBlock(Block) - Method in class org.apache.nutch.ndfs.FSDataset
Check whether the given block is a valid one.
isValidBlock(Block) - Method in class org.apache.nutch.ndfs.FSDirectory
Returns whether the given block is one pointed-to by a file.
isValidToCreate(UTF8) - Method in class org.apache.nutch.ndfs.FSDirectory
Check whether the filepath could be created
isWhiteSpace(char) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer
Returns whether the specified ch conforms to the XML 1.0 definition of whitespace.
isWhiteSpace(char[], int, int) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer
Tell if the string is whitespace.
isWhiteSpace(StringBuffer) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer
Tell if the string is whitespace.
isWhiteSpace(String) - Static method in class org.apache.nutch.parse.html.XMLCharacterRecognizer
Tell if the string is whitespace.
iterate(int, File) - Method in class org.apache.nutch.tools.LinkAnalysisTool
Do a single-process iteration over the database.

J

JSParseFilter - class org.apache.nutch.parse.js.JSParseFilter.
This class is a heuristic link extractor for JavaScript files and code snippets.
JSParseFilter() - Constructor for class org.apache.nutch.parse.js.JSParseFilter
 
JobClient - class org.apache.nutch.mapReduce.JobClient.
JobClient interacts with the JobTracker network interface.
JobClient() - Constructor for class org.apache.nutch.mapReduce.JobClient
Build a job client, connect to the default job tracker
JobClient(InetSocketAddress) - Constructor for class org.apache.nutch.mapReduce.JobClient
Build a job client, connect to the indicated job tracker.
JobConf - class org.apache.nutch.mapReduce.JobConf.
A map/reduce job configuration.
JobConf() - Constructor for class org.apache.nutch.mapReduce.JobConf
Construct a map/reduce configuration.
JobConf(String) - Constructor for class org.apache.nutch.mapReduce.JobConf
Construct a map/reduce configuration.
JobConf(File) - Constructor for class org.apache.nutch.mapReduce.JobConf
Construct a map/reduce configuration.
JobProfile - class org.apache.nutch.mapReduce.JobProfile.
A JobProfile is a MapReduce primitive.
JobProfile() - Constructor for class org.apache.nutch.mapReduce.JobProfile
 
JobProfile(String, String, String) - Constructor for class org.apache.nutch.mapReduce.JobProfile
 
JobStatus - class org.apache.nutch.mapReduce.JobStatus.
Describes the current status of a job.
JobStatus() - Constructor for class org.apache.nutch.mapReduce.JobStatus
 
JobStatus(String, float, float, int) - Constructor for class org.apache.nutch.mapReduce.JobStatus
 
JobSubmissionProtocol - interface org.apache.nutch.mapReduce.JobSubmissionProtocol.
Protocol that a JobClient and the central JobTracker use to communicate.
JobTracker - class org.apache.nutch.mapReduce.JobTracker.
JobTracker is the central location for submitting and tracking MR jobs in a network environment.
JobTracker.JobInProgress - class org.apache.nutch.mapReduce.JobTracker.JobInProgress.
 
JobTracker.JobInProgress(String) - Constructor for class org.apache.nutch.mapReduce.JobTracker.JobInProgress
Create a 'JobInProgress' object, which contains both JobProfile and JobStatus.
JobTrackerInfoServer - class org.apache.nutch.mapReduce.JobTrackerInfoServer.
JobTrackerInfoServer provides stats about the JobTracker via HTTP.
JobTrackerInfoServer(JobTracker, int) - Constructor for class org.apache.nutch.mapReduce.JobTrackerInfoServer
We need the jobTracker to grab stats, and the port to know where to listen.
JobTrackerInfoServer.RedirectHandler - class org.apache.nutch.mapReduce.JobTrackerInfoServer.RedirectHandler.
 
JobTrackerInfoServer.RedirectHandler() - Constructor for class org.apache.nutch.mapReduce.JobTrackerInfoServer.RedirectHandler
 
jjFillToken() - Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
jjFillToken() - Method in class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
jj_nt - Variable in class org.apache.nutch.analysis.NutchAnalysis
 
jj_nt - Variable in class org.apache.nutch.quality.dynamic.PageDescription
 
jjnewLexState - Static variable in class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
jjstrLiteralImages - Static variable in class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
jjstrLiteralImages - Static variable in class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
join() - Method in class org.apache.nutch.ipc.Server
Wait for the server to be stopped.

K

KEYWORD - Static variable in interface org.apache.nutch.quality.dynamic.PageDescriptionConstants
 
key() - Method in class org.apache.nutch.io.ArrayFile.Reader
Returns the key associated with the most recent call to ArrayFile.Reader.seek(long), ArrayFile.Reader.next(Writable), or ArrayFile.Reader.get(long,Writable).
key() - Method in class org.apache.nutch.segment.SegmentReader
Return the current key position.
kill() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
Kill the job and all its component tasks.
killJob(String) - Method in interface org.apache.nutch.mapReduce.JobSubmissionProtocol
Kill the indicated job
killJob(String) - Method in class org.apache.nutch.mapReduce.JobTracker
 
killJob() - Method in interface org.apache.nutch.mapReduce.RunningJob
Kill the running job.
kind - Variable in class org.apache.nutch.quality.dynamic.Token
An integer that describes the kind of this token.

L

LEASE_PERIOD - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
LETTER - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
LOG - Static variable in class org.apache.nutch.clustering.OnlineClustererFactory
 
LOG - Static variable in class org.apache.nutch.db.WebDBInjector
 
LOG - Static variable in class org.apache.nutch.fetcher.Fetcher
 
LOG - Static variable in class org.apache.nutch.fs.NutchFileSystem
 
LOG - Static variable in class org.apache.nutch.indexer.IndexMerger
 
LOG - Static variable in class org.apache.nutch.indexer.IndexSegment
 
LOG - Static variable in class org.apache.nutch.indexer.basic.BasicIndexingFilter
 
LOG - Static variable in class org.apache.nutch.indexer.more.MoreIndexingFilter
 
LOG - Static variable in class org.apache.nutch.io.SequenceFile
 
LOG - Static variable in class org.apache.nutch.ipc.Client
 
LOG - Static variable in class org.apache.nutch.ipc.Server
 
LOG - Static variable in class org.apache.nutch.mapReduce.JobTracker
 
LOG - Static variable in class org.apache.nutch.mapReduce.TaskTracker
 
LOG - Static variable in class org.apache.nutch.ndfs.FSNamesystem
 
LOG - Static variable in class org.apache.nutch.ndfs.NDFS
 
LOG - Static variable in class org.apache.nutch.ndfs.NDFSClient
 
LOG - Static variable in class org.apache.nutch.net.BasicUrlNormalizer
 
LOG - Static variable in class org.apache.nutch.net.URLFilterChecker
 
LOG - Static variable in class org.apache.nutch.ontology.OntologyFactory
 
LOG - Static variable in class org.apache.nutch.ontology.OntologyImpl
 
LOG - Static variable in class org.apache.nutch.parse.ParserChecker
 
LOG - Static variable in class org.apache.nutch.parse.ParserFactory
 
LOG - Static variable in class org.apache.nutch.parse.html.HtmlParser
 
LOG - Static variable in class org.apache.nutch.parse.js.JSParseFilter
 
LOG - Static variable in class org.apache.nutch.parse.pdf.PdfParser
 
LOG - Static variable in class org.apache.nutch.plugin.PluginDescriptor
 
LOG - Static variable in class org.apache.nutch.plugin.PluginManifestParser
 
LOG - Static variable in class org.apache.nutch.plugin.PluginRepository
 
LOG - Static variable in class org.apache.nutch.protocol.ProtocolFactory
 
LOG - Static variable in class org.apache.nutch.protocol.file.File
 
LOG - Static variable in class org.apache.nutch.protocol.ftp.Ftp
 
LOG - Static variable in class org.apache.nutch.protocol.http.Http
 
LOG - Static variable in class org.apache.nutch.protocol.http.RobotRulesParser
 
LOG - Static variable in class org.apache.nutch.protocol.httpclient.Http
 
LOG - Static variable in class org.apache.nutch.protocol.httpclient.HttpAuthenticationFactory
 
LOG - Static variable in class org.apache.nutch.protocol.httpclient.HttpBasicAuthentication
 
LOG - Static variable in class org.apache.nutch.protocol.httpclient.MultiProperties
 
LOG - Static variable in class org.apache.nutch.protocol.httpclient.RobotRulesParser
 
LOG - Static variable in class org.apache.nutch.searcher.DistributedSearch
 
LOG - Static variable in class org.apache.nutch.searcher.NutchBean
 
LOG - Static variable in class org.apache.nutch.searcher.Query
 
LOG - Static variable in class org.apache.nutch.searcher.more.DateQueryFilter
 
LOG - Static variable in class org.apache.nutch.segment.SegmentReader
 
LOG - Static variable in class org.apache.nutch.segment.SegmentSlicer
 
LOG - Static variable in class org.apache.nutch.segment.SegmentWriter
 
LOG - Static variable in class org.apache.nutch.tools.CrawlTool
 
LOG - Static variable in class org.apache.nutch.tools.DistributedAnalysisTool
 
LOG - Static variable in class org.apache.nutch.tools.FetchListTool
 
LOG - Static variable in class org.apache.nutch.tools.ParseSegment
 
LOG - Static variable in class org.apache.nutch.tools.PruneIndexTool
 
LOG - Static variable in class org.apache.nutch.tools.SegmentMergeTool
 
LOG - Static variable in class org.apache.nutch.tools.UpdateDatabaseTool
 
LOG - Static variable in class org.apache.nutch.tools.UpdateSegmentsFromDb
 
LOG - Static variable in class org.apache.nutch.tools.WebDBAdminTool
 
LOG - Static variable in class org.creativecommons.nutch.CCIndexingFilter
 
LOG - Static variable in class org.creativecommons.nutch.CCParseFilter
 
LOG_STEP - Static variable in class org.apache.nutch.indexer.IndexSegment
 
LOG_STEP - Static variable in class org.apache.nutch.segment.SegmentSlicer
 
LOG_STEP - Static variable in class org.apache.nutch.tools.PruneIndexTool
Log the progress every LOG_STEP number of processed documents.
LOG_STEP - Static variable in class org.apache.nutch.tools.SegmentMergeTool
Log progress update every LOG_STEP items.
LanguageIdentifier - class org.apache.nutch.analysis.lang.LanguageIdentifier.
Identify the language of a content, based on statistical analysis.
LanguageIndexingFilter - class org.apache.nutch.analysis.lang.LanguageIndexingFilter.
An IndexingFilter that add a lang (language) field to the document.
LanguageIndexingFilter() - Constructor for class org.apache.nutch.analysis.lang.LanguageIndexingFilter
Constructs a new Language Indexing Filter.
LanguageQueryFilter - class org.apache.nutch.analysis.lang.LanguageQueryFilter.
A QueryFilter that handles "lang:" query clauses.
LanguageQueryFilter() - Constructor for class org.apache.nutch.analysis.lang.LanguageQueryFilter
 
LexicalError(boolean, int, int, int, String, char) - Static method in class org.apache.nutch.quality.dynamic.TokenMgrError
Returns a detailed message for the Error when it is thrown by the token manager to indicate a lexical error.
Link - class org.apache.nutch.db.Link.
This is the field in the Link Database.
Link() - Constructor for class org.apache.nutch.db.Link
Create the Link with no data
Link(MD5Hash, long, String, String) - Constructor for class org.apache.nutch.db.Link
Create the record
Link.MD5Comparator - class org.apache.nutch.db.Link.MD5Comparator.
MD5Comparator is the opposite.
Link.MD5Comparator() - Constructor for class org.apache.nutch.db.Link.MD5Comparator
 
Link.UrlComparator - class org.apache.nutch.db.Link.UrlComparator.
URLComparator uses the standard method where, uh, the URL comes first.
Link.UrlComparator() - Constructor for class org.apache.nutch.db.Link.UrlComparator
 
LinkAnalysisEntry - class org.apache.nutch.linkdb.LinkAnalysisEntry.
An entry in the LinkAnalysisTool's output.
LinkAnalysisEntry() - Constructor for class org.apache.nutch.linkdb.LinkAnalysisEntry
 
LinkAnalysisTool - class org.apache.nutch.tools.LinkAnalysisTool.
LinkAnalysisTool performs link-analysis by using the DistributedAnalysisTool.
LinkAnalysisTool(NutchFileSystem, File) - Constructor for class org.apache.nutch.tools.LinkAnalysisTool
We need a DistributedAnalysisTool in order to get things done!
LocalFileSystem - class org.apache.nutch.fs.LocalFileSystem.
Implement the NutchFileSystem interface for the local disk.
LocalFileSystem() - Constructor for class org.apache.nutch.fs.LocalFileSystem
 
LocalNutchInputComponent - class org.apache.nutch.clustering.carrot2.LocalNutchInputComponent.
A local input component that ignores the query passed from the controller and instead looks for data stored in the request context.
LocalNutchInputComponent() - Constructor for class org.apache.nutch.clustering.carrot2.LocalNutchInputComponent
 
LogFormatter - class org.apache.nutch.util.LogFormatter.
Prints just the date and the log message.
LogFormatter() - Constructor for class org.apache.nutch.util.LogFormatter
 
LongSumReducer - class org.apache.nutch.mapReduce.lib.LongSumReducer.
A Reducer that sums long values.
LongSumReducer() - Constructor for class org.apache.nutch.mapReduce.lib.LongSumReducer
 
LongWritable - class org.apache.nutch.io.LongWritable.
A WritableComparable for longs.
LongWritable() - Constructor for class org.apache.nutch.io.LongWritable
 
LongWritable(long) - Constructor for class org.apache.nutch.io.LongWritable
 
LongWritable.Comparator - class org.apache.nutch.io.LongWritable.Comparator.
A Comparator optimized for LongWritable.
LongWritable.Comparator() - Constructor for class org.apache.nutch.io.LongWritable.Comparator
 
LongWritable.DecreasingComparator - class org.apache.nutch.io.LongWritable.DecreasingComparator.
A decreasing Comparator optimized for LongWritable.
LongWritable.DecreasingComparator() - Constructor for class org.apache.nutch.io.LongWritable.DecreasingComparator
 
lastObsoleteCheck() - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
lastUpdate() - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
launch() - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
Start up the tasks
leftPad(String, int) - Static method in class org.apache.nutch.util.StringUtil
Returns a copy of s padded with leading spaces so that it's length is length.
length() - Method in class org.apache.nutch.ndfs.NDFSFile
 
lengthNorm(String, int) - Method in class org.apache.nutch.indexer.NutchSimilarity
Normalize field by length.
lexStateNames - Static variable in class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
lexStateNames - Static variable in class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
line - Variable in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
linkParams - Static variable in class org.apache.nutch.parse.html.DOMContentUtils
 
links() - Method in class org.apache.nutch.db.DBSectionReader
Return all the links, by target URL
links() - Method in class org.apache.nutch.db.DistributedWebDBReader
Return all the links, by target URL
links() - Method in interface org.apache.nutch.db.IWebDBReader
Obtain an Enumeration of all Link objects, sorted by target URL.
links() - Method in class org.apache.nutch.db.WebDBReader
Return all the links, by target URL
listFiles(File) - Method in class org.apache.nutch.fs.LocalFileSystem
 
listFiles(File) - Method in class org.apache.nutch.fs.NDFSFileSystem
 
listFiles(File) - Method in class org.apache.nutch.fs.NutchFileSystem
 
listFiles(File, FileFilter) - Method in class org.apache.nutch.fs.NutchFileSystem
 
listFiles(NutchFileSystem, JobConf) - Method in class org.apache.nutch.mapReduce.InputFormatBase
Subclasses may override to, e.g., select only files matching a regular expression.
listFiles(UTF8) - Method in class org.apache.nutch.ndfs.NDFSClient
 
load(InputStream) - Method in class org.apache.nutch.analysis.lang.NGramProfile
Loads a ngram profile from an InputStream.
load(String[]) - Method in interface org.apache.nutch.ontology.Ontology
 
load(String[]) - Method in class org.apache.nutch.ontology.OntologyImpl
 
locateMapOutputs(String, String[]) - Method in interface org.apache.nutch.mapReduce.InterTrackerProtocol
Called by a reduce task to find which map tasks are completed.
locateMapOutputs(String, String[]) - Method in class org.apache.nutch.mapReduce.JobTracker
A tracker wants to know the physical locations of completed, but not yet closed, tasks.
locateTasks(String[]) - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
Return locations for all the indicated taskIds.
lock(File, boolean) - Method in class org.apache.nutch.fs.LocalFileSystem
Obtain a filesystem lock at File f.
lock(File, boolean) - Method in class org.apache.nutch.fs.NDFSFileSystem
Obtain a filesystem lock at File f.
lock(File, boolean) - Method in class org.apache.nutch.fs.NutchFileSystem
Obtain a lock on the given File
lock(UTF8, boolean) - Method in class org.apache.nutch.ndfs.NDFSClient
 
login(String, String) - Method in class org.apache.nutch.protocol.ftp.Client
Login to the FTP server using the provided username and password.
logout() - Method in class org.apache.nutch.protocol.ftp.Client
Logout of the FTP server by sending the QUIT command.
longestMatch(String) - Method in class org.apache.nutch.util.PrefixStringMatcher
Returns the longest prefix of input that is matched, or null if no match exists.
longestMatch(String) - Method in class org.apache.nutch.util.SuffixStringMatcher
Returns the longest suffix of input that is matched, or null if no match exists.
longestMatch(String) - Method in class org.apache.nutch.util.TrieStringMatcher
Returns the longest substring of input that is matched by a pattern in the trie, or null if no match exists.
lookingAhead - Variable in class org.apache.nutch.analysis.NutchAnalysis
 
ls(String) - Method in class org.apache.nutch.fs.TestClient
Get a listing of all files in NDFS at the indicated name

M

MAX_ANCHOR_LENGTH - Static variable in class org.apache.nutch.db.Link
 
MAX_OUTLINKS_PER_PAGE - Static variable in class org.apache.nutch.tools.UpdateDatabaseTool
 
MAX_SECTIONS - Static variable in class org.apache.nutch.db.DBKeyDivision
 
MD5Hash - class org.apache.nutch.io.MD5Hash.
A Writable for MD5 hash values.
MD5Hash() - Constructor for class org.apache.nutch.io.MD5Hash
Constructs an MD5Hash.
MD5Hash(String) - Constructor for class org.apache.nutch.io.MD5Hash
Constructs an MD5Hash from a hex string.
MD5Hash(byte[]) - Constructor for class org.apache.nutch.io.MD5Hash
Constructs an MD5Hash with a specified value.
MD5Hash.Comparator - class org.apache.nutch.io.MD5Hash.Comparator.
A WritableComparator optimized for MD5Hash keys.
MD5Hash.Comparator() - Constructor for class org.apache.nutch.io.MD5Hash.Comparator
 
MD5_KEYSPACE - Static variable in class org.apache.nutch.db.EditSectionGroupWriter
 
MD5_KEYSPACE_DIVIDERS - Static variable in class org.apache.nutch.db.DBKeyDivision
 
MD5_LEN - Static variable in class org.apache.nutch.io.MD5Hash
 
META_LANG_NAME - Static variable in class org.apache.nutch.analysis.lang.HTMLLanguageParser
The language meta data attribute name
MINUS - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
MOVED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Resource has moved permanently.
MRConstants - interface org.apache.nutch.mapReduce.MRConstants.
Some handy constants
MSWordParser - class org.apache.nutch.parse.msword.MSWordParser.
parser for mime type application/msword.
MSWordParser() - Constructor for class org.apache.nutch.parse.msword.MSWordParser
 
MapFile - class org.apache.nutch.io.MapFile.
A file-based map from keys to values.
MapFile() - Constructor for class org.apache.nutch.io.MapFile
 
MapFile.Reader - class org.apache.nutch.io.MapFile.Reader.
Provide access to an existing map.
MapFile.Reader(NutchFileSystem, String) - Constructor for class org.apache.nutch.io.MapFile.Reader
Construct a map reader for the named map.
MapFile.Reader(NutchFileSystem, String, WritableComparator) - Constructor for class org.apache.nutch.io.MapFile.Reader
Construct a map reader for the named map using the named comparator.
MapFile.Writer - class org.apache.nutch.io.MapFile.Writer.
Writes a new map.
MapFile.Writer(NutchFileSystem, String, Class, Class) - Constructor for class org.apache.nutch.io.MapFile.Writer
Create the named map for keys of the named class.
MapFile.Writer(NutchFileSystem, String, WritableComparator, Class) - Constructor for class org.apache.nutch.io.MapFile.Writer
Create the named map using the named key comparator.
MapOutputFile - class org.apache.nutch.mapReduce.MapOutputFile.
A local file to be transferred via the MapOutputProtocol.
MapOutputFile() - Constructor for class org.apache.nutch.mapReduce.MapOutputFile
Construct a file for transfer.
MapOutputFile(String, String, int) - Constructor for class org.apache.nutch.mapReduce.MapOutputFile
 
MapOutputLocation - class org.apache.nutch.mapReduce.MapOutputLocation.
The location of a map output file, as passed to a reduce task via the InterTrackerProtocol.
MapOutputLocation() - Constructor for class org.apache.nutch.mapReduce.MapOutputLocation
RPC constructor
MapOutputLocation(String, String, int) - Constructor for class org.apache.nutch.mapReduce.MapOutputLocation
Construct a location.
MapOutputProtocol - interface org.apache.nutch.mapReduce.MapOutputProtocol.
Protocol that a reduce task uses to retrieve output data from a map task's tracker.
MapTask - class org.apache.nutch.mapReduce.MapTask.
A Map task.
MapTask() - Constructor for class org.apache.nutch.mapReduce.MapTask
 
MapTask(String, String, FileSplit) - Constructor for class org.apache.nutch.mapReduce.MapTask
 
Mapper - interface org.apache.nutch.mapReduce.Mapper.
Maps input key/value pairs to a set of intermediate key/value pairs.
MimeType - class org.apache.nutch.util.mime.MimeType.
Defines a Mime Content Type.
MimeType(String) - Constructor for class org.apache.nutch.util.mime.MimeType
Creates a MimeType from a String.
MimeType(String, String) - Constructor for class org.apache.nutch.util.mime.MimeType
Creates a MimeType with the given primary type and sub type.
MimeTypeException - exception org.apache.nutch.util.mime.MimeTypeException.
A class to encapsulate MimeType related exceptions.
MimeTypeException() - Constructor for class org.apache.nutch.util.mime.MimeTypeException
Constructs a MimeTypeException with no specified detail message.
MimeTypeException(String) - Constructor for class org.apache.nutch.util.mime.MimeTypeException
Constructs a MimeTypeException with the specified detail message.
MimeTypes - class org.apache.nutch.util.mime.MimeTypes.
This class is a MimeType repository.
MoreIndexingFilter - class org.apache.nutch.indexer.more.MoreIndexingFilter.
Add (or reset) a few metaData properties as respective fields (if they are available), so that they can be displayed by more.jsp (called by search.jsp).
MoreIndexingFilter() - Constructor for class org.apache.nutch.indexer.more.MoreIndexingFilter
 
MultiProperties - class org.apache.nutch.protocol.httpclient.MultiProperties.
An extension to Properties which allows multiple values for a single key.
MultiProperties() - Constructor for class org.apache.nutch.protocol.httpclient.MultiProperties
Creates an empty MultiProperties list with no default values.
MultiProperties(Properties) - Constructor for class org.apache.nutch.protocol.httpclient.MultiProperties
Creates an empty MultiProperties list with the specified defaults.
m_currentNode - Variable in class org.apache.nutch.parse.html.DOMBuilder
Current node
m_doc - Variable in class org.apache.nutch.parse.html.DOMBuilder
Root document
m_docFrag - Variable in class org.apache.nutch.parse.html.DOMBuilder
First node of document fragment or null if not a DocumentFragment
m_elemStack - Variable in class org.apache.nutch.parse.html.DOMBuilder
Vector of element nodes
m_inCData - Variable in class org.apache.nutch.parse.html.DOMBuilder
Flag indicating that we are processing a CData section
main(String[]) - Static method in class org.apache.nutch.analysis.CommonGrams
For debugging.
main(String[]) - Static method in class org.apache.nutch.analysis.NutchAnalysis
For debugging.
main(String[]) - Static method in class org.apache.nutch.analysis.NutchDocumentTokenizer
For debugging.
main(String[]) - Static method in class org.apache.nutch.analysis.lang.LanguageIdentifier
Main method used for command line process.
main(String[]) - Static method in class org.apache.nutch.analysis.lang.NGramProfile
Main method used for command line process.
main(String[]) - Static method in class org.apache.nutch.db.DistributedWebDBReader
The DistributedWebDBReader.main() provides some handy utility methods for looking through the contents of the webdb.
main(String[]) - Static method in class org.apache.nutch.db.DistributedWebDBWriter
The WebDBWriter.main() provides some handy methods for testing the WebDB.
main(String[]) - Static method in class org.apache.nutch.db.WebDBInjector
Command-line access.
main(String[]) - Static method in class org.apache.nutch.db.WebDBReader
The WebDBReader.main() provides some handy utility methods for looking through the contents of the webdb.
main(String[]) - Static method in class org.apache.nutch.db.WebDBWriter
The WebDBWriter.main() provides some handy methods for testing the WebDB.
main(String[]) - Static method in class org.apache.nutch.fetcher.Fetcher
Run the fetcher.
main(String[]) - Static method in class org.apache.nutch.fetcher.FetcherOutput
 
main(String[]) - Static method in class org.apache.nutch.fs.TestClient
main() has some simple utility methods
main(String[]) - Static method in class org.apache.nutch.indexer.DeleteDuplicates
Delete duplicates in the indexes in the named directory.
main(String[]) - Static method in class org.apache.nutch.indexer.HighFreqTerms
 
main(String[]) - Static method in class org.apache.nutch.indexer.IndexMerger
Create an index for the input files in the named directory.
main(String[]) - Static method in class org.apache.nutch.indexer.IndexOptimizer
 
main(String[]) - Static method in class org.apache.nutch.indexer.IndexSegment
Create an index for the input files in the named directory.
main(String[]) - Static method in class org.apache.nutch.io.MapFile
 
main(String[]) - Static method in class org.apache.nutch.mapReduce.JobClient
 
main(String[]) - Static method in class org.apache.nutch.mapReduce.JobConf
 
main(String[]) - Static method in class org.apache.nutch.mapReduce.JobTracker
Start the JobTracker process.
main(String[]) - Static method in class org.apache.nutch.mapReduce.TaskTracker.Child
 
main(String[]) - Static method in class org.apache.nutch.mapReduce.TaskTracker
Start the TaskTracker, point toward the indicated JobTracker
main(String[]) - Static method in class org.apache.nutch.mapReduce.demo.Grep
 
main(String[]) - Static method in class org.apache.nutch.ndfs.NDFS.DataNode
 
main(String[]) - Static method in class org.apache.nutch.ndfs.NDFS.NameNode
 
main(String[]) - Static method in class org.apache.nutch.net.PrefixURLFilter
 
main(String[]) - Static method in class org.apache.nutch.net.RegexURLFilter
 
main(String[]) - Static method in class org.apache.nutch.net.RegexUrlNormalizer
Spits out patterns and substitutions that are in the configuration file.
main(String[]) - Static method in class org.apache.nutch.net.URLFilterChecker
 
main(String[]) - Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
main(String[]) - Static method in class org.apache.nutch.ontology.OntologyImpl
 
main(String[]) - Static method in class org.apache.nutch.pagedb.FetchListEntry
 
main(String[]) - Static method in class org.apache.nutch.parse.ParseData
 
main(String[]) - Static method in class org.apache.nutch.parse.ParseText
 
main(String[]) - Static method in class org.apache.nutch.parse.ParserChecker
 
main(String[]) - Static method in class org.apache.nutch.parse.html.HtmlParser
 
main(String[]) - Static method in class org.apache.nutch.parse.js.JSParseFilter
 
main(String[]) - Static method in class org.apache.nutch.protocol.Content
 
main(String[]) - Static method in class org.apache.nutch.protocol.file.File
For debugging.
main(String[]) - Static method in class org.apache.nutch.protocol.ftp.Ftp
For debugging.
main(String[]) - Static method in class org.apache.nutch.protocol.http.Http
For debugging.
main(String[]) - Static method in class org.apache.nutch.protocol.http.RobotRulesParser
command-line main for testing
main(String[]) - Static method in class org.apache.nutch.protocol.httpclient.Http
For debugging.
main(String[]) - Static method in class org.apache.nutch.protocol.httpclient.RobotRulesParser
command-line main for testing
main(String[]) - Static method in class org.apache.nutch.quality.dynamic.PageDescription
Test out sherlock parsing
main(String[]) - Static method in class org.apache.nutch.searcher.DistributedSearch.Client
 
main(String[]) - Static method in class org.apache.nutch.searcher.DistributedSearch.Server
Runs a search server.
main(String[]) - Static method in class org.apache.nutch.searcher.NutchBean
For debugging.
main(String[]) - Static method in class org.apache.nutch.searcher.Query
For debugging.
main(String[]) - Static method in class org.apache.nutch.searcher.Summarizer
Tests Summary-generation.
main(String[]) - Static method in class org.apache.nutch.segment.SegmentReader
Command-line wrapper.
main(String[]) - Static method in class org.apache.nutch.segment.SegmentSlicer
Command-line wrapper.
main(String[]) - Static method in class org.apache.nutch.segment.SegmentWriter
 
main(String[]) - Static method in class org.apache.nutch.tools.CrawlTool
 
main(String[]) - Static method in class org.apache.nutch.tools.DistributedAnalysisTool
Kick off the link analysis.
main(String[]) - Static method in class org.apache.nutch.tools.FetchListTool
Generate a fetchlist from the pagedb and linkdb
main(String[]) - Static method in class org.apache.nutch.tools.LinkAnalysisTool
Kick off the link analysis.
main(String[]) - Static method in class org.apache.nutch.tools.ParseSegment
main method
main(String[]) - Static method in class org.apache.nutch.tools.PruneIndexTool
 
main(String[]) - Static method in class org.apache.nutch.tools.SegmentMergeTool
 
main(String[]) - Static method in class org.apache.nutch.tools.UpdateDatabaseTool
Create the UpdateDatabaseTool, and pass in a WebDBWriter.
main(String[]) - Static method in class org.apache.nutch.tools.UpdateSegmentsFromDb
 
main(String[]) - Static method in class org.apache.nutch.tools.WebDBAdminTool
This tool performs a number of generic db management tasks.
main(String[]) - Static method in class org.apache.nutch.util.CommandRunner
 
main(String[]) - Static method in class org.apache.nutch.util.NutchConf
For debugging.
main(String[]) - Static method in class org.apache.nutch.util.PrefixStringMatcher
 
main(String[]) - Static method in class org.apache.nutch.util.ScoreStats
 
main(String[]) - Static method in class org.apache.nutch.util.StringUtil
 
main(String[]) - Static method in class org.apache.nutch.util.SuffixStringMatcher
 
main(String[]) - Static method in class org.creativecommons.nutch.CCDeleteUnlicensedTool
Delete duplicates in the indexes in the named directory.
majorCodes - Static variable in class org.apache.nutch.parse.ParseStatus
 
map(WritableComparable, Writable, OutputCollector) - Method in interface org.apache.nutch.mapReduce.Mapper
Maps a single input key/value pair into intermediate key/value pairs.
map(WritableComparable, Writable, OutputCollector) - Method in class org.apache.nutch.mapReduce.lib.IdentityMapper
The identify function.
map(WritableComparable, Writable, OutputCollector) - Method in class org.apache.nutch.mapReduce.lib.InverseMapper
The inverse function.
map(WritableComparable, Writable, OutputCollector) - Method in class org.apache.nutch.mapReduce.lib.RegexMapper
 
map(WritableComparable, Writable, OutputCollector) - Method in class org.apache.nutch.mapReduce.lib.TokenCountMapper
 
mapProgress() - Method in class org.apache.nutch.mapReduce.JobStatus
 
mapProgress() - Method in interface org.apache.nutch.mapReduce.RunningJob
Returns a float between 0.0 and 1.0, indicating progress on the map portion of the job.
matchChar(TrieStringMatcher.TrieNode, String, int) - Method in class org.apache.nutch.util.TrieStringMatcher
Returns the next TrieStringMatcher.TrieNode visited, given that you are at node, and the the next character in the input is the idx'th character of s.
matchItem(HashMap) - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
matches(String) - Method in class org.apache.nutch.util.PrefixStringMatcher
Returns true if the given String is matched by a prefix in the trie
matches(String) - Method in class org.apache.nutch.util.SuffixStringMatcher
Returns true if the given String is matched by a suffix in the trie
matches(String) - Method in class org.apache.nutch.util.TrieStringMatcher
Returns true if the given String is matched by a pattern in the trie
maxNextCharInd - Variable in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
md5Compare(Object) - Method in class org.apache.nutch.db.Link
Compare MD5s, then compare URLs.
merge() - Method in class org.apache.nutch.indexer.IndexMerger
Load all input segment indices, then add to the single output index
merge(String[], String) - Method in class org.apache.nutch.io.SequenceFile.Sorter
Merge the provided files.
mergeSectionComponents(File) - Method in class org.apache.nutch.db.EditSectionGroupReader
Merge all the components of the Section into a single file and return the location.
mkdir(String) - Method in class org.apache.nutch.fs.TestClient
Create the given dir
mkdirs(File) - Method in class org.apache.nutch.fs.LocalFileSystem
 
mkdirs(File) - Method in class org.apache.nutch.fs.NDFSFileSystem
 
mkdirs(File) - Method in class org.apache.nutch.fs.NutchFileSystem
Make the given file and all non-existent parents into directories.
mkdirs(UTF8) - Method in class org.apache.nutch.ndfs.FSDirectory
Create the given directory and all its parent dirs.
mkdirs(UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
Create all the necessary directories
mkdirs(UTF8) - Method in class org.apache.nutch.ndfs.NDFSClient
 
moreFromDupExcluded() - Method in class org.apache.nutch.searcher.Hit
True iff other, lower-scoring, hits with the same dedup value have been excluded from the list which contains this hit..
moveFromLocalFile(File, File) - Method in class org.apache.nutch.fs.LocalFileSystem
In the case of the local filesystem, we can just rename the file.
moveFromLocalFile(File, File) - Method in class org.apache.nutch.fs.NDFSFileSystem
Remove the src when finished.
moveFromLocalFile(File, File) - Method in class org.apache.nutch.fs.NutchFileSystem
The src file is on the local disk.

N

NDFS - class org.apache.nutch.ndfs.NDFS.
The NDFS class holds the NDFS client and server.
NDFS.DataNode - class org.apache.nutch.ndfs.NDFS.DataNode.
DataNode controls just one critical table: block-> BLOCK_SIZE stream of bytes This info is stored on disk (the NameNode is responsible for asking other machines to replicate the data).
NDFS.DataNode(String) - Constructor for class org.apache.nutch.ndfs.NDFS.DataNode
Create using configured defaults and dataDir.
NDFS.DataNode(String, File, InetSocketAddress) - Constructor for class org.apache.nutch.ndfs.NDFS.DataNode
Needs a directory to find its data (and config info)
NDFS.NameNode - class org.apache.nutch.ndfs.NDFS.NameNode.
NameNode controls two critical tables: 1) filename->blocksequence,version 2) block->machinelist The first table is stored on disk and is very precious.
NDFS.NameNode() - Constructor for class org.apache.nutch.ndfs.NDFS.NameNode
Create a NameNode at the default location
NDFS.NameNode(File, int) - Constructor for class org.apache.nutch.ndfs.NDFS.NameNode
Create a NameNode at the specified location
NDFSClient - class org.apache.nutch.ndfs.NDFSClient.
NDFSClient does what's necessary to connect to a Nutch Filesystem and perform basic file tasks.
NDFSClient(InetSocketAddress) - Constructor for class org.apache.nutch.ndfs.NDFSClient
 
NDFSFile - class org.apache.nutch.ndfs.NDFSFile.
NDFSFile is a traditional java File that's been annotated with some extra information.
NDFSFile(NDFSFileInfo) - Constructor for class org.apache.nutch.ndfs.NDFSFile
 
NDFSFileInfo - class org.apache.nutch.ndfs.NDFSFileInfo.
NDFSFileInfo tracks info about remote files, including name, size, etc.
NDFSFileInfo() - Constructor for class org.apache.nutch.ndfs.NDFSFileInfo
 
NDFSFileInfo(UTF8, long, long, boolean) - Constructor for class org.apache.nutch.ndfs.NDFSFileInfo
 
NDFSFileSystem - class org.apache.nutch.fs.NDFSFileSystem.
Implement the NutchFileSystem interface for the NDFS system.
NDFSFileSystem(InetSocketAddress) - Constructor for class org.apache.nutch.fs.NDFSFileSystem
Create the ShareSet automatically, and then go on to the regular constructor.
NDFS_FILE_SEPARATOR - Static variable in class org.apache.nutch.ndfs.NDFSFile
Separator used in NDFS filenames.
NEW_EXTERNAL_LINK_FACTOR - Static variable in class org.apache.nutch.tools.UpdateDatabaseTool
 
NEW_INTERNAL_LINK_FACTOR - Static variable in class org.apache.nutch.tools.UpdateDatabaseTool
 
NFSDataInputStream - class org.apache.nutch.fs.NFSDataInputStream.
Utility that wraps a NFSInputStream in a DataInputStream and buffers input through a BufferedInputStream.
NFSDataInputStream(NFSInputStream) - Constructor for class org.apache.nutch.fs.NFSDataInputStream
 
NFSDataInputStream(NFSInputStream, int) - Constructor for class org.apache.nutch.fs.NFSDataInputStream
 
NFSDataOutputStream - class org.apache.nutch.fs.NFSDataOutputStream.
Utility that wraps a NFSOutputStream in a DataOutputStream and buffers output through a BufferedOutputStream.
NFSDataOutputStream(NFSOutputStream) - Constructor for class org.apache.nutch.fs.NFSDataOutputStream
 
NFSDataOutputStream(NFSOutputStream, int) - Constructor for class org.apache.nutch.fs.NFSDataOutputStream
 
NFSInputStream - class org.apache.nutch.fs.NFSInputStream.
NFSInputStream is a generic old InputStream with a little bit of RAF-style seek ability.
NFSInputStream() - Constructor for class org.apache.nutch.fs.NFSInputStream
 
NFSOutputStream - class org.apache.nutch.fs.NFSOutputStream.
NFSOutputStream is an OutputStream that can track its position.
NFSOutputStream() - Constructor for class org.apache.nutch.fs.NFSOutputStream
 
NGramProfile - class org.apache.nutch.analysis.lang.NGramProfile.
This class represents a ngram profile.
NGramProfile(String, int, int) - Constructor for class org.apache.nutch.analysis.lang.NGramProfile
Construct a new ngram profile
NOTFETCHING - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Not fetching.
NOTFOUND - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Resource was not found.
NOTMODIFIED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Unchanged since the last fetch.
NOTPARSED - Static variable in class org.apache.nutch.parse.ParseStatus
Parsing was not performed.
NUTCH_INPUT_HIT_DETAILS_ARRAY - Static variable in class org.apache.nutch.clustering.carrot2.LocalNutchInputComponent
 
NUTCH_INPUT_SUMMARIES_ARRAY - Static variable in class org.apache.nutch.clustering.carrot2.LocalNutchInputComponent
 
NullWritable - class org.apache.nutch.io.NullWritable.
Singleton Writable with no data.
NutchAnalysis - class org.apache.nutch.analysis.NutchAnalysis.
The JavaCC-generated Nutch lexical analyzer and query parser.
NutchAnalysis(CharStream) - Constructor for class org.apache.nutch.analysis.NutchAnalysis
 
NutchAnalysis(NutchAnalysisTokenManager) - Constructor for class org.apache.nutch.analysis.NutchAnalysis
 
NutchAnalysisConstants - interface org.apache.nutch.analysis.NutchAnalysisConstants.
 
NutchAnalysisTokenManager - class org.apache.nutch.analysis.NutchAnalysisTokenManager.
 
NutchAnalysisTokenManager(Reader) - Constructor for class org.apache.nutch.analysis.NutchAnalysisTokenManager
Constructs a token manager for the provided Reader.
NutchAnalysisTokenManager(CharStream) - Constructor for class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
NutchAnalysisTokenManager(CharStream, int) - Constructor for class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
NutchBean - class org.apache.nutch.searcher.NutchBean.
One stop shopping for search-related functionality.
NutchBean() - Constructor for class org.apache.nutch.searcher.NutchBean
Construct reading from connected directory.
NutchBean(File) - Constructor for class org.apache.nutch.searcher.NutchBean
Construct in a named directory.
NutchConf - class org.apache.nutch.util.NutchConf.
Provides access to Nutch configuration parameters.
NutchConf() - Constructor for class org.apache.nutch.util.NutchConf
A new configuration.
NutchDocument - class org.apache.nutch.clustering.carrot2.NutchDocument.
An adapter class that implements RawDocument for Carrot2.
NutchDocument(int, HitDetails, String) - Constructor for class org.apache.nutch.clustering.carrot2.NutchDocument
Creates a new document with the given id, summary and wrapping a details hit details.
NutchDocumentAnalyzer - class org.apache.nutch.analysis.NutchDocumentAnalyzer.
The analyzer used for Nutch documents.
NutchDocumentAnalyzer() - Constructor for class org.apache.nutch.analysis.NutchDocumentAnalyzer
 
NutchDocumentTokenizer - class org.apache.nutch.analysis.NutchDocumentTokenizer.
The tokenizer used for Nutch document text.
NutchDocumentTokenizer(Reader) - Constructor for class org.apache.nutch.analysis.NutchDocumentTokenizer
Construct a tokenizer for the text in a Reader.
NutchFileSystem - class org.apache.nutch.fs.NutchFileSystem.
NutchFileSystem is an interface for a fairly simple distributed file system.
NutchFileSystem() - Constructor for class org.apache.nutch.fs.NutchFileSystem
 
NutchSimilarity - class org.apache.nutch.indexer.NutchSimilarity.
Similarity implementatation used by Nutch indexing and search.
NutchSimilarity() - Constructor for class org.apache.nutch.indexer.NutchSimilarity
 
newInstance(Class) - Method in class org.apache.nutch.mapReduce.JobConf
 
newKey() - Method in class org.apache.nutch.io.WritableComparator
Construct a new WritableComparable instance.
newToken(int) - Static method in class org.apache.nutch.quality.dynamic.Token
Returns a new Token object, by default.
next() - Method in class org.apache.nutch.analysis.NutchDocumentTokenizer
Returns the next token in the stream, or null at EOF.
next(Writable) - Method in class org.apache.nutch.io.ArrayFile.Reader
Read and return the next value in the file.
next(WritableComparable, Writable) - Method in class org.apache.nutch.io.MapFile.Reader
Read the next key/value pair in the map into key and val.
next(Writable) - Method in class org.apache.nutch.io.SequenceFile.Reader
Read the next key in the file into key, skipping its value.
next(Writable, Writable) - Method in class org.apache.nutch.io.SequenceFile.Reader
Read the next key/value pair in the file into key and val.
next(DataOutputBuffer) - Method in class org.apache.nutch.io.SequenceFile.Reader
Read the next key/value pair in the file into buffer.
next(WritableComparable) - Method in class org.apache.nutch.io.SetFile.Reader
Read the next key in a set into key.
next(Writable, Writable) - Method in interface org.apache.nutch.mapReduce.RecordReader
Reads the next key/value pair.
next - Variable in class org.apache.nutch.quality.dynamic.Token
A reference to the next regular (non-special) token from the input stream.
next(FetcherOutput, Content, ParseText, ParseData) - Method in class org.apache.nutch.segment.SegmentReader
Read values from all open readers.
nfs - Variable in class org.apache.nutch.segment.SegmentReader
 
nodeChar - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
 
nonOpInfix() - Method in class org.apache.nutch.analysis.NutchAnalysis
Parse infix characters except plus and minus.
nonOpOrTerm() - Method in class org.apache.nutch.analysis.NutchAnalysis
Parse anything but a term or an operator (plur or minus or quote).
nonTerm() - Method in class org.apache.nutch.analysis.NutchAnalysis
Parse anything but a term or a quote.
nonTermOrEOF() - Method in class org.apache.nutch.analysis.NutchAnalysis
 
normalize() - Method in class org.apache.nutch.analysis.lang.NGramProfile
Normalize the profile (calculates the ngrams frequencies)
normalize(String) - Method in class org.apache.nutch.net.BasicUrlNormalizer
 
normalize(String) - Method in class org.apache.nutch.net.RegexUrlNormalizer
Normalizes any URLs by calling super.basicNormalize() and regexSub().
normalize(String) - Method in interface org.apache.nutch.net.UrlNormalizer
 
numEdits() - Method in class org.apache.nutch.db.EditSectionGroupReader
Return how many edits there are in this section.
numLinks() - Method in class org.apache.nutch.db.DistributedWebDBReader
Return the number of links in our db.
numLinks() - Method in interface org.apache.nutch.db.IWebDBReader
Simple count of all Link objects in db.
numLinks() - Method in class org.apache.nutch.db.WebDBReader
Return the number of links in our db.
numMachines() - Method in class org.apache.nutch.db.DistributedWebDBReader
How many sections (machines) there are in this distributed db.
numPages() - Method in class org.apache.nutch.db.DistributedWebDBReader
Return the number of pages we're dealing with.
numPages() - Method in interface org.apache.nutch.db.IWebDBReader
Simple count of all Page objects in db.
numPages() - Method in class org.apache.nutch.db.WebDBReader
Return the number of pages we're dealing with
numTerms - Static variable in class org.apache.nutch.indexer.HighFreqTerms
 

O

OBSOLETE_INTERVAL - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OPERATION_FAILED - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_BLOCKRECEIVED - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_BLOCKREPORT - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_ABANDONBLOCK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_ABANDONBLOCK_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_ADDBLOCK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_ADDBLOCK_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_COMPLETEFILE - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_COMPLETEFILE_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_DATANODEREPORT - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_DATANODEREPORT_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_DATANODE_HINTS - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_DATANODE_HINTS_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_DELETE - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_DELETE_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_EXISTS - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_EXISTS_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_ISDIR - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_ISDIR_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_LISTING - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_LISTING_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_MKDIRS - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_MKDIRS_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_OBTAINLOCK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_OBTAINLOCK_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_OPEN - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_OPEN_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_RAWSTATS - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_RAWSTATS_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_RELEASELOCK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_RELEASELOCK_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_RENAMETO - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_RENAMETO_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_RENEW_LEASE - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_RENEW_LEASE_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_STARTFILE - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_STARTFILE_ACK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_CLIENT_TRYAGAIN - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_ERROR - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_FAILURE - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_HEARTBEAT - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_INVALIDATE_BLOCKS - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_READSKIP_BLOCK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_READ_BLOCK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_TRANSFERBLOCKS - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_TRANSFERDATA - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OP_WRITE_BLOCK - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
OUTLINK_LIMIT - Static variable in class org.apache.nutch.tools.DistributedAnalysisTool
 
OnlineClusterer - interface org.apache.nutch.clustering.OnlineClusterer.
An extension point interface for online search results clustering algorithms.
OnlineClustererFactory - class org.apache.nutch.clustering.OnlineClustererFactory.
A factory for retrieving OnlineClusterer extensions.
Ontology - interface org.apache.nutch.ontology.Ontology.
 
OntologyFactory - class org.apache.nutch.ontology.OntologyFactory.
A factory for retrieving Ontology extensions.
OntologyImpl - class org.apache.nutch.ontology.OntologyImpl.
this class wraps about a model, built from a list of ontologies, uses HP's Jena
OntologyImpl() - Constructor for class org.apache.nutch.ontology.OntologyImpl
 
OpenSearchServlet - class org.apache.nutch.searcher.OpenSearchServlet.
Present search results using A9's OpenSearch extensions to RSS, plus a few Nutch-specific extensions.
OpenSearchServlet() - Constructor for class org.apache.nutch.searcher.OpenSearchServlet
 
Outlink - class org.apache.nutch.parse.Outlink.
 
Outlink() - Constructor for class org.apache.nutch.parse.Outlink
 
Outlink(String, String) - Constructor for class org.apache.nutch.parse.Outlink
 
OutlinkExtractor - class org.apache.nutch.parse.OutlinkExtractor.
Extractor to extract Outlinks / URLs from plain text using Regular Expressions.
OutlinkExtractor() - Constructor for class org.apache.nutch.parse.OutlinkExtractor
 
OutputCollector - interface org.apache.nutch.mapReduce.OutputCollector.
Passed to Mapper and Reducer implementations to collect output data.
OutputFormat - interface org.apache.nutch.mapReduce.OutputFormat.
An output data format.
OutputFormats - class org.apache.nutch.mapReduce.OutputFormats.
Repository of named OutputFormats.
OwlParser - class org.apache.nutch.ontology.OwlParser.
implementation of parser for w3c's OWL files
OwlParser() - Constructor for class org.apache.nutch.ontology.OwlParser
 
obtainLock(UTF8, UTF8, boolean) - Method in class org.apache.nutch.ndfs.FSDirectory
 
obtainLock(UTF8, UTF8, boolean) - Method in class org.apache.nutch.ndfs.FSNamesystem
Get a lock (perhaps exclusive) on the given file
offerService() - Method in class org.apache.nutch.mapReduce.JobTracker
Run forever
offerService() - Method in class org.apache.nutch.ndfs.NDFS.DataNode
Main loop for the DataNode.
op - Variable in class org.apache.nutch.ndfs.FSParam
 
op - Variable in class org.apache.nutch.ndfs.FSResults
 
open(File) - Method in class org.apache.nutch.fs.LocalFileSystem
Open the file at f
open(File) - Method in class org.apache.nutch.fs.NDFSFileSystem
Open the file at f
open(File) - Method in class org.apache.nutch.fs.NutchFileSystem
Opens an InputStream for the indicated File, whether local or via NDFS.
open(UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
The client wants to open the given filename.
open(UTF8) - Method in class org.apache.nutch.ndfs.NDFSClient
Create an input stream that obtains a nodelist from the namenode, and then reads from all the right places.
optimize() - Method in class org.apache.nutch.indexer.IndexOptimizer
 
optimizePhrase(Query.Phrase, String) - Static method in class org.apache.nutch.analysis.CommonGrams
Optimizes phrase queries to use n-grams when possible.
org.apache.nutch.analysis - package org.apache.nutch.analysis
Tokenizer for documents and query parser.
org.apache.nutch.analysis.lang - package org.apache.nutch.analysis.lang
Text document language identifier.
org.apache.nutch.clustering - package org.apache.nutch.clustering
 
org.apache.nutch.clustering.carrot2 - package org.apache.nutch.clustering.carrot2
 
org.apache.nutch.db - package org.apache.nutch.db
Web database: tracks page fetches and link structure.
org.apache.nutch.fetcher - package org.apache.nutch.fetcher
The Nutch robot.
org.apache.nutch.fs - package org.apache.nutch.fs
 
org.apache.nutch.html - package org.apache.nutch.html
 
org.apache.nutch.indexer - package org.apache.nutch.indexer
Maintain Lucene full-text indexes.
org.apache.nutch.indexer.basic - package org.apache.nutch.indexer.basic
A basic indexing plugin.
org.apache.nutch.indexer.more - package org.apache.nutch.indexer.more
A more indexing plugin.
org.apache.nutch.io - package org.apache.nutch.io
Generic i/o code for use when reading and writing data to the network, to databases, and to files.
org.apache.nutch.ipc - package org.apache.nutch.ipc
Client/Server code used by distributed search.
org.apache.nutch.linkdb - package org.apache.nutch.linkdb
 
org.apache.nutch.mapReduce - package org.apache.nutch.mapReduce
A system for scalable, fault-tolerant, distributed computation over large data collections.
org.apache.nutch.mapReduce.demo - package org.apache.nutch.mapReduce.demo
 
org.apache.nutch.mapReduce.lib - package org.apache.nutch.mapReduce.lib
Library of generally useful mappers, reducers, and partitioners.
org.apache.nutch.ndfs - package org.apache.nutch.ndfs
 
org.apache.nutch.net - package org.apache.nutch.net
A url filter plugin.
org.apache.nutch.net.protocols - package org.apache.nutch.net.protocols
 
org.apache.nutch.ontology - package org.apache.nutch.ontology
 
org.apache.nutch.pagedb - package org.apache.nutch.pagedb
 
org.apache.nutch.parse - package org.apache.nutch.parse
 
org.apache.nutch.parse.html - package org.apache.nutch.parse.html
An HTML document parsing plugin.
org.apache.nutch.parse.js - package org.apache.nutch.parse.js
 
org.apache.nutch.parse.msword - package org.apache.nutch.parse.msword
A Word document parsing plugin.
org.apache.nutch.parse.msword.chp - package org.apache.nutch.parse.msword.chp
 
org.apache.nutch.parse.pdf - package org.apache.nutch.parse.pdf
A pdf parsing plugin.
org.apache.nutch.parse.text - package org.apache.nutch.parse.text
A plain text parsing plugin.
org.apache.nutch.plugin - package org.apache.nutch.plugin
 
org.apache.nutch.protocol - package org.apache.nutch.protocol
 
org.apache.nutch.protocol.file - package org.apache.nutch.protocol.file
Protocol plugin which supports retrieving local file resources.
org.apache.nutch.protocol.ftp - package org.apache.nutch.protocol.ftp
Protocol plugin which supports retrieving documents via the ftp protocol.
org.apache.nutch.protocol.http - package org.apache.nutch.protocol.http
Protocol plugin which supports retrieving documents via the http protocol.
org.apache.nutch.protocol.httpclient - package org.apache.nutch.protocol.httpclient
Protocol plugin which supports retrieving documents via the HTTP protocol.
org.apache.nutch.quality.dynamic - package org.apache.nutch.quality.dynamic
 
org.apache.nutch.searcher - package org.apache.nutch.searcher
Search API
org.apache.nutch.searcher.more - package org.apache.nutch.searcher.more
A more query plugin.
org.apache.nutch.segment - package org.apache.nutch.segment
 
org.apache.nutch.servlet - package org.apache.nutch.servlet
 
org.apache.nutch.tools - package org.apache.nutch.tools
 
org.apache.nutch.util - package org.apache.nutch.util
 
org.apache.nutch.util.mime - package org.apache.nutch.util.mime
 
org.creativecommons.nutch - package org.creativecommons.nutch
Sample plugins that parse and index Creative Commons medadata.

P

PLUS - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
PROTO_NOT_FOUND - Static variable in class org.apache.nutch.protocol.ProtocolStatus
This protocol was not found.
Page - class org.apache.nutch.db.Page.
A row in the Page Database.
Page() - Constructor for class org.apache.nutch.db.Page
Construct a page ready to be read by Page.readFields(DataInput).
Page(String, MD5Hash) - Constructor for class org.apache.nutch.db.Page
Construct a new, default page, due to be fetched.
Page(String, float) - Constructor for class org.apache.nutch.db.Page
 
Page(String, float, long) - Constructor for class org.apache.nutch.db.Page
 
Page(String, float, float, long) - Constructor for class org.apache.nutch.db.Page
 
Page.Comparator - class org.apache.nutch.db.Page.Comparator.
Compares pages by MD5, then by URL.
Page.Comparator() - Constructor for class org.apache.nutch.db.Page.Comparator
 
Page.UrlComparator - class org.apache.nutch.db.Page.UrlComparator.
Compares pages by URL only.
Page.UrlComparator() - Constructor for class org.apache.nutch.db.Page.UrlComparator
 
PageDescription - class org.apache.nutch.quality.dynamic.PageDescription.
PageDescription gives the URL and the textual description for a target page.
PageDescription(InputStream) - Constructor for class org.apache.nutch.quality.dynamic.PageDescription
 
PageDescription(Reader) - Constructor for class org.apache.nutch.quality.dynamic.PageDescription
 
PageDescription(PageDescriptionTokenManager) - Constructor for class org.apache.nutch.quality.dynamic.PageDescription
 
PageDescriptionConstants - interface org.apache.nutch.quality.dynamic.PageDescriptionConstants.
 
PageDescriptionTokenManager - class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager.
 
PageDescriptionTokenManager(SimpleCharStream) - Constructor for class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
PageDescriptionTokenManager(SimpleCharStream, int) - Constructor for class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
Parse - interface org.apache.nutch.parse.Parse.
The result of parsing a page's raw content.
ParseData - class org.apache.nutch.parse.ParseData.
Data extracted from a page's content.
ParseData() - Constructor for class org.apache.nutch.parse.ParseData
 
ParseData(ParseStatus, String, Outlink[], Properties) - Constructor for class org.apache.nutch.parse.ParseData
 
ParseException - exception org.apache.nutch.parse.ParseException.
 
ParseException() - Constructor for class org.apache.nutch.parse.ParseException
 
ParseException(String) - Constructor for class org.apache.nutch.parse.ParseException
 
ParseException(String, Throwable) - Constructor for class org.apache.nutch.parse.ParseException
 
ParseException(Throwable) - Constructor for class org.apache.nutch.parse.ParseException
 
ParseException - exception org.apache.nutch.quality.dynamic.ParseException.
This exception is thrown when parse errors are encountered.
ParseException(Token, int[][], String[]) - Constructor for class org.apache.nutch.quality.dynamic.ParseException
This constructor is used by the method "generateParseException" in the generated parser.
ParseException() - Constructor for class org.apache.nutch.quality.dynamic.ParseException
The following constructors are for use by you for whatever purpose you can think of.
ParseException(String) - Constructor for class org.apache.nutch.quality.dynamic.ParseException
 
ParseImpl - class org.apache.nutch.parse.ParseImpl.
The result of parsing a page's raw content.
ParseImpl(String, ParseData) - Constructor for class org.apache.nutch.parse.ParseImpl
 
ParseSegment - class org.apache.nutch.tools.ParseSegment.
Parse contents in one segment.
ParseSegment(NutchFileSystem, String, boolean) - Constructor for class org.apache.nutch.tools.ParseSegment
ParseSegment constructor
ParseStatus - class org.apache.nutch.parse.ParseStatus.
 
ParseStatus() - Constructor for class org.apache.nutch.parse.ParseStatus
 
ParseStatus(int, int, String[]) - Constructor for class org.apache.nutch.parse.ParseStatus
 
ParseStatus(int) - Constructor for class org.apache.nutch.parse.ParseStatus
 
ParseStatus(int, String[]) - Constructor for class org.apache.nutch.parse.ParseStatus
 
ParseStatus(int, int) - Constructor for class org.apache.nutch.parse.ParseStatus
 
ParseStatus(int, int, String) - Constructor for class org.apache.nutch.parse.ParseStatus
Simplified constructor for passing just a text message.
ParseStatus(int, String) - Constructor for class org.apache.nutch.parse.ParseStatus
Simplified constructor for passing just a text message.
ParseStatus(Throwable) - Constructor for class org.apache.nutch.parse.ParseStatus
 
ParseText - class org.apache.nutch.parse.ParseText.
 
ParseText() - Constructor for class org.apache.nutch.parse.ParseText
 
ParseText(String) - Constructor for class org.apache.nutch.parse.ParseText
 
Parser - interface org.apache.nutch.ontology.Parser.
interface for the parser
Parser - interface org.apache.nutch.parse.Parser.
A parser for content generated by a Protocol implementation.
ParserChecker - class org.apache.nutch.parse.ParserChecker.
Parser checker, useful for testing parser.
ParserChecker() - Constructor for class org.apache.nutch.parse.ParserChecker
 
ParserFactory - class org.apache.nutch.parse.ParserFactory.
Creates and caches Parser plugins.
ParserNotFound - exception org.apache.nutch.parse.ParserNotFound.
 
ParserNotFound(String, String) - Constructor for class org.apache.nutch.parse.ParserNotFound
 
ParserNotFound(String, String, String) - Constructor for class org.apache.nutch.parse.ParserNotFound
 
Partitioner - interface org.apache.nutch.mapReduce.Partitioner.
Partitions the key space.
PasswordProtectedException - exception org.apache.nutch.parse.msword.PasswordProtectedException.
 
PasswordProtectedException(String) - Constructor for class org.apache.nutch.parse.msword.PasswordProtectedException
 
PdfParser - class org.apache.nutch.parse.pdf.PdfParser.
parser for mime type application/pdf.
PdfParser() - Constructor for class org.apache.nutch.parse.pdf.PdfParser
 
Plugin - class org.apache.nutch.plugin.Plugin.
A nutch-plugin is an container for a set of custom logic that provide extensions to the nutch core functionality or another plugin that provides an API for extending.
Plugin(PluginDescriptor) - Constructor for class org.apache.nutch.plugin.Plugin
Constructor
PluginClassLoader - class org.apache.nutch.plugin.PluginClassLoader.
The PluginClassLoader contains only classes of the runtime libraries setuped in the plugin manifest file and exported libraries of plugins that are required pluguin.
PluginClassLoader(URL[], ClassLoader) - Constructor for class org.apache.nutch.plugin.PluginClassLoader
Construtor
PluginDescriptor - class org.apache.nutch.plugin.PluginDescriptor.
The PluginDescriptor provide access to all meta information of a nutch-plugin, as well to the internationalizable resources and the plugin own classloader.
PluginDescriptor(String, String, String, String, String, String) - Constructor for class org.apache.nutch.plugin.PluginDescriptor
Constructor
PluginManifestParser - class org.apache.nutch.plugin.PluginManifestParser.
The PluginManifestParser parser just parse the manifest file in all plugin directories.
PluginManifestParser() - Constructor for class org.apache.nutch.plugin.PluginManifestParser
 
PluginRepository - class org.apache.nutch.plugin.PluginRepository.
The plugin repositority is a registry of all plugins.
PluginRuntimeException - exception org.apache.nutch.plugin.PluginRuntimeException.
PluginRuntimeException will be thrown until a exception in the plugin managemnt occurs.
PluginRuntimeException(Throwable) - Constructor for class org.apache.nutch.plugin.PluginRuntimeException
 
PluginRuntimeException(String) - Constructor for class org.apache.nutch.plugin.PluginRuntimeException
 
PrefixStringMatcher - class org.apache.nutch.util.PrefixStringMatcher.
A class for efficiently matching Strings against a set of prefixes.
PrefixStringMatcher(String[]) - Constructor for class org.apache.nutch.util.PrefixStringMatcher
Creates a new PrefixStringMatcher which will match Strings with any prefix in the supplied array.
PrefixStringMatcher(Collection) - Constructor for class org.apache.nutch.util.PrefixStringMatcher
Creates a new PrefixStringMatcher which will match Strings with any prefix in the supplied Collection.
PrefixURLFilter - class org.apache.nutch.net.PrefixURLFilter.
Filters URLs based on a file of URL prefixes.
PrefixURLFilter() - Constructor for class org.apache.nutch.net.PrefixURLFilter
 
PrefixURLFilter(String) - Constructor for class org.apache.nutch.net.PrefixURLFilter
 
PrintCommandListener - class org.apache.nutch.protocol.ftp.PrintCommandListener.
This is a support class for logging all ftp command/reply traffic.
PrintCommandListener(Logger) - Constructor for class org.apache.nutch.protocol.ftp.PrintCommandListener
 
Protocol - interface org.apache.nutch.protocol.Protocol.
A retriever of url content.
ProtocolException - exception org.apache.nutch.net.protocols.ProtocolException.
Base exception for all protocol handlers
ProtocolException() - Constructor for class org.apache.nutch.net.protocols.ProtocolException
 
ProtocolException(String) - Constructor for class org.apache.nutch.net.protocols.ProtocolException
 
ProtocolException(String, Throwable) - Constructor for class org.apache.nutch.net.protocols.ProtocolException
 
ProtocolException(Throwable) - Constructor for class org.apache.nutch.net.protocols.ProtocolException
 
ProtocolException - exception org.apache.nutch.protocol.ProtocolException.
 
ProtocolException() - Constructor for class org.apache.nutch.protocol.ProtocolException
 
ProtocolException(String) - Constructor for class org.apache.nutch.protocol.ProtocolException
 
ProtocolException(String, Throwable) - Constructor for class org.apache.nutch.protocol.ProtocolException
 
ProtocolException(Throwable) - Constructor for class org.apache.nutch.protocol.ProtocolException
 
ProtocolFactory - class org.apache.nutch.protocol.ProtocolFactory.
Creates and caches Protocol plugins.
ProtocolNotFound - exception org.apache.nutch.protocol.ProtocolNotFound.
 
ProtocolNotFound(String) - Constructor for class org.apache.nutch.protocol.ProtocolNotFound
 
ProtocolNotFound(String, String) - Constructor for class org.apache.nutch.protocol.ProtocolNotFound
 
ProtocolOutput - class org.apache.nutch.protocol.ProtocolOutput.
Simple aggregate to pass from protocol plugins both content and protocol status.
ProtocolOutput(Content, ProtocolStatus) - Constructor for class org.apache.nutch.protocol.ProtocolOutput
 
ProtocolOutput(Content) - Constructor for class org.apache.nutch.protocol.ProtocolOutput
 
ProtocolStatus - class org.apache.nutch.protocol.ProtocolStatus.
 
ProtocolStatus() - Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(int, String[]) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(int, String[], long) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(int) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(int, long) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(int, Object) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(int, Object, long) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
ProtocolStatus(Throwable) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
 
PruneIndexTool - class org.apache.nutch.tools.PruneIndexTool.
This tool prunes existing Nutch indexes of unwanted content.
PruneIndexTool(File[], Query[], PruneIndexTool.PruneChecker[], boolean, boolean) - Constructor for class org.apache.nutch.tools.PruneIndexTool
Create an instance of the tool, and open all input indexes.
PruneIndexTool.PrintFieldsChecker - class org.apache.nutch.tools.PruneIndexTool.PrintFieldsChecker.
This checker's main function is just to print out selected field values from each document, just before they are deleted.
PruneIndexTool.PrintFieldsChecker(PrintStream, String[]) - Constructor for class org.apache.nutch.tools.PruneIndexTool.PrintFieldsChecker
 
PruneIndexTool.PruneChecker - interface org.apache.nutch.tools.PruneIndexTool.PruneChecker.
This interface can be used to implement additional checking on matching documents.
PruneIndexTool.StoreUrlsChecker - class org.apache.nutch.tools.PruneIndexTool.StoreUrlsChecker.
This checker's main function is just to store the URLs of each document to be deleted in a text file.
PruneIndexTool.StoreUrlsChecker(File, boolean) - Constructor for class org.apache.nutch.tools.PruneIndexTool.StoreUrlsChecker
Store the list in a file
pageExists(MD5Hash) - Method in class org.apache.nutch.db.DBSectionReader
Test whether a certain piece of content is in the db, but don't bother returning it.
pageExists(MD5Hash) - Method in class org.apache.nutch.db.DistributedWebDBReader
Test whether a certain piece of content is in the database, but don't bother returning the Page(s) itself.
pageExists(MD5Hash) - Method in interface org.apache.nutch.db.IWebDBReader
Returns whether a Page with the given MD5 checksum is in the db.
pageExists(MD5Hash) - Method in class org.apache.nutch.db.WebDBReader
Test whether a certain piece of content is in the database, but don't bother returning the Page(s) itself.
pages() - Method in class org.apache.nutch.db.DBSectionReader
Iterate through all the Pages, sorted by URL
pages() - Method in class org.apache.nutch.db.DistributedWebDBReader
Iterate through all the Pages, sorted by URL.
pages() - Method in interface org.apache.nutch.db.IWebDBReader
Obtain an Enumeration of all Page objects, sorted by URL
pages() - Method in class org.apache.nutch.db.WebDBReader
Iterate through all the Pages, sorted by URL
pagesByMD5() - Method in class org.apache.nutch.db.DBSectionReader
Iterate through all the Pages, sorted by MD5
pagesByMD5() - Method in class org.apache.nutch.db.DistributedWebDBReader
Iterate through all the Pages, sorted by MD5.
pagesByMD5() - Method in interface org.apache.nutch.db.IWebDBReader
Obtain an Enumeration of all Page objects, sorted by MD5.
pagesByMD5() - Method in class org.apache.nutch.db.WebDBReader
Iterate through all the Pages, sorted by MD5
param() - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
parse() - Method in class org.apache.nutch.analysis.NutchAnalysis
Parse a query.
parse(OntModel) - Method in class org.apache.nutch.ontology.OwlParser
parse owl ontology files using jena
parse(OntModel) - Method in interface org.apache.nutch.ontology.Parser
 
parse() - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
parse(String) - Static method in class org.apache.nutch.searcher.Query
Parse a query from a string.
parse() - Method in class org.apache.nutch.tools.ParseSegment
Parse contents by multiple threads and save as unsorted ParserOutput
parseArgs(String[], int) - Static method in class org.apache.nutch.fs.NutchFileSystem
Parse the cmd-line args, starting at i.
parseCharacterEncoding(String) - Static method in class org.apache.nutch.util.StringUtil
Parse the character encoding from the specified content type header.
parseClass(OntClass, List, int) - Method in class org.apache.nutch.ontology.OwlParser
 
parseDataReader - Variable in class org.apache.nutch.segment.SegmentReader
 
parseDataWriter - Variable in class org.apache.nutch.segment.SegmentWriter
 
parsePluginFolder() - Static method in class org.apache.nutch.plugin.PluginManifestParser
Returns a list with plugin descriptors.
parseQueries(InputStream) - Static method in class org.apache.nutch.tools.PruneIndexTool
Read a list of Lucene queries from the stream (UTF-8 encoding is assumed).
parseQuery(String) - Static method in class org.apache.nutch.analysis.NutchAnalysis
Construct a query parser for the text in a reader.
parseTextReader - Variable in class org.apache.nutch.segment.SegmentReader
 
parseTextWriter - Variable in class org.apache.nutch.segment.SegmentWriter
 
peekMin() - Method in class org.apache.nutch.util.FibonacciHeap
Returns the same Object that FibonacciHeap.popMin() would, without removing it.
pendingTransfers(DatanodeInfo, int) - Method in class org.apache.nutch.ndfs.FSNamesystem
Return with a list of Block/DataNodeInfo sets, indicating where various Blocks should be copied, ASAP.
phrase(String) - Method in class org.apache.nutch.analysis.NutchAnalysis
Parse an explcitly quoted phrase query.
pollForClosedTask(String) - Method in interface org.apache.nutch.mapReduce.InterTrackerProtocol
Called to find which tasks that have been run by this tracker are now closed, i.e., their job is complete.
pollForClosedTask(String) - Method in class org.apache.nutch.mapReduce.JobTracker
A tracker wants to know if any of its Tasks have been closed (because the job completed, whether successfully or not)
pollForNewTask(String) - Method in interface org.apache.nutch.mapReduce.InterTrackerProtocol
Called to get new tasks from from the job tracker for this tracker.
pollForNewTask(String) - Method in class org.apache.nutch.mapReduce.JobTracker
A tracker wants to know if there's a Task to run
popMin() - Method in class org.apache.nutch.util.FibonacciHeap
Returns the object which has the lowest priority in the heap.
prevCharIsCR - Variable in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
prevCharIsLF - Variable in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
printStatus() - Method in class org.apache.nutch.db.WebDBInjector
Utility to present performance stats
printStatusBar(int, int) - Method in class org.apache.nutch.db.WebDBInjector
Utility to present small status bar
processReport(Block[], UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
The given node is reporting all its blocks.
processedRecords - Variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
processingInstruction(String, String) - Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of a processing instruction.
progress(String, FloatWritable) - Method in class org.apache.nutch.mapReduce.TaskTracker
Called periodically to report Task progress, from 0.0 to 1.0.
progress(String, FloatWritable) - Method in interface org.apache.nutch.mapReduce.TaskUmbilicalProtocol
Report child's progress to parent.
protocolCommandSent(ProtocolCommandEvent) - Method in class org.apache.nutch.protocol.ftp.PrintCommandListener
 
protocolReplyReceived(ProtocolCommandEvent) - Method in class org.apache.nutch.protocol.ftp.PrintCommandListener
 
put(Object, Object) - Method in class org.apache.nutch.protocol.httpclient.MultiProperties
Adds the given value using the key
putAll(Map) - Method in class org.apache.nutch.protocol.httpclient.MultiProperties
Add all entries of the given map to this MultiProperties list.

Q

QUOTE - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
QUOTED_VALUE - Static variable in interface org.apache.nutch.quality.dynamic.PageDescriptionConstants
 
Query - class org.apache.nutch.searcher.Query.
A Nutch query.
Query() - Constructor for class org.apache.nutch.searcher.Query
 
Query.Clause - class org.apache.nutch.searcher.Query.Clause.
A query clause.
Query.Clause(Query.Term, String, boolean, boolean) - Constructor for class org.apache.nutch.searcher.Query.Clause
 
Query.Clause(Query.Term, boolean, boolean) - Constructor for class org.apache.nutch.searcher.Query.Clause
 
Query.Clause(Query.Phrase, String, boolean, boolean) - Constructor for class org.apache.nutch.searcher.Query.Clause
 
Query.Clause(Query.Phrase, boolean, boolean) - Constructor for class org.apache.nutch.searcher.Query.Clause
 
Query.Phrase - class org.apache.nutch.searcher.Query.Phrase.
A phrase query clause.
Query.Phrase(Query.Term[]) - Constructor for class org.apache.nutch.searcher.Query.Phrase
 
Query.Phrase(String[]) - Constructor for class org.apache.nutch.searcher.Query.Phrase
 
Query.Term - class org.apache.nutch.searcher.Query.Term.
A single-term query clause.
Query.Term(String) - Constructor for class org.apache.nutch.searcher.Query.Term
 
QueryException - exception org.apache.nutch.searcher.QueryException.
 
QueryException(String) - Constructor for class org.apache.nutch.searcher.QueryException
 
QueryFilter - interface org.apache.nutch.searcher.QueryFilter.
Extension point for query translation.
QueryFilters - class org.apache.nutch.searcher.QueryFilters.
Creates and caches QueryFilter implementing plugins.

R

REDIR_EXCEEDED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Too many redirects.
RETRY - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Temporary failure.
ROBOTS_DENIED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Access denied by robots.txt rules.
RPC - class org.apache.nutch.ipc.RPC.
A simple RPC mechanism.
RUNLENGTH_ENCODING - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
RUNNING - Static variable in class org.apache.nutch.mapReduce.JobStatus
 
RUNNING - Static variable in class org.apache.nutch.mapReduce.TaskStatus
 
RawFieldQueryFilter - class org.apache.nutch.searcher.RawFieldQueryFilter.
Translate raw query fields to search the same-named field, as indexed by an IndexingFilter.
RawFieldQueryFilter(String) - Constructor for class org.apache.nutch.searcher.RawFieldQueryFilter
Construct for the named field, lowercasing query values.
RawFieldQueryFilter(String, float) - Constructor for class org.apache.nutch.searcher.RawFieldQueryFilter
Construct for the named field, lowercasing query values.
RawFieldQueryFilter(String, boolean) - Constructor for class org.apache.nutch.searcher.RawFieldQueryFilter
Construct for the named field, potentially lowercasing query values.
RawFieldQueryFilter(String, boolean, float) - Constructor for class org.apache.nutch.searcher.RawFieldQueryFilter
Construct for the named field, potentially lowercasing query values.
ReInit(CharStream) - Method in class org.apache.nutch.analysis.NutchAnalysis
 
ReInit(NutchAnalysisTokenManager) - Method in class org.apache.nutch.analysis.NutchAnalysis
 
ReInit(CharStream) - Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
ReInit(CharStream, int) - Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
ReInit(InputStream) - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
ReInit(Reader) - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
ReInit(PageDescriptionTokenManager) - Method in class org.apache.nutch.quality.dynamic.PageDescription
 
ReInit(SimpleCharStream) - Method in class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
ReInit(SimpleCharStream, int) - Method in class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
ReInit(Reader, int, int, int) - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
ReInit(Reader, int, int) - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
ReInit(Reader) - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
ReInit(InputStream, int, int, int) - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
ReInit(InputStream) - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
ReInit(InputStream, int, int) - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
RecordReader - interface org.apache.nutch.mapReduce.RecordReader.
Reads key/value pairs from an input file FileSplit.
RecordWriter - interface org.apache.nutch.mapReduce.RecordWriter.
Writes key/value pairs to an output file.
ReduceTask - class org.apache.nutch.mapReduce.ReduceTask.
A Reduce task.
ReduceTask() - Constructor for class org.apache.nutch.mapReduce.ReduceTask
 
ReduceTask(String, String, String[], int) - Constructor for class org.apache.nutch.mapReduce.ReduceTask
 
Reducer - interface org.apache.nutch.mapReduce.Reducer.
Reduces a set of intermediate values which share a key to a smaller set of values.
RegexMapper - class org.apache.nutch.mapReduce.lib.RegexMapper.
A Mapper that extracts text matching a regular expression.
RegexMapper() - Constructor for class org.apache.nutch.mapReduce.lib.RegexMapper
 
RegexURLFilter - class org.apache.nutch.net.RegexURLFilter.
Filters URLs based on a file of regular expressions.
RegexURLFilter() - Constructor for class org.apache.nutch.net.RegexURLFilter
 
RegexURLFilter(String) - Constructor for class org.apache.nutch.net.RegexURLFilter
 
RegexUrlNormalizer - class org.apache.nutch.net.RegexUrlNormalizer.
Allows users to do regex substitutions on all/any URLs that are encountered, which is useful for stripping session IDs from URLs.
RegexUrlNormalizer() - Constructor for class org.apache.nutch.net.RegexUrlNormalizer
Default constructor which gets the file name from either nutch-site.xml or nutch-default.xml and reads that configuration file.
RegexUrlNormalizer(String) - Constructor for class org.apache.nutch.net.RegexUrlNormalizer
Constructor which can be passed the file name, so it doesn't look in the configuration files for it.
ResourceGone - exception org.apache.nutch.protocol.ResourceGone.
Thrown when a resource is invalid.
ResourceGone(URL, String) - Constructor for class org.apache.nutch.protocol.ResourceGone
 
ResourceMoved - exception org.apache.nutch.protocol.ResourceMoved.
Thrown when a resource no longer exists.
ResourceMoved(URL, URL, String) - Constructor for class org.apache.nutch.protocol.ResourceMoved
 
Response - interface org.apache.nutch.net.protocols.Response.
A response inteface.
RetryLater - exception org.apache.nutch.protocol.RetryLater.
Thrown when a resource should be retried later.
RetryLater(URL, String) - Constructor for class org.apache.nutch.protocol.RetryLater
 
RobotRulesParser - class org.apache.nutch.protocol.http.RobotRulesParser.
This class handles the parsing of robots.txt files.
RobotRulesParser() - Constructor for class org.apache.nutch.protocol.http.RobotRulesParser
 
RobotRulesParser(String[]) - Constructor for class org.apache.nutch.protocol.http.RobotRulesParser
Creates a new RobotRulesParser which will use the supplied robotNames when choosing which stanza to follow in robots.txt files.
RobotRulesParser - class org.apache.nutch.protocol.httpclient.RobotRulesParser.
This class handles the parsing of robots.txt files.
RobotRulesParser() - Constructor for class org.apache.nutch.protocol.httpclient.RobotRulesParser
 
RobotRulesParser(String[]) - Constructor for class org.apache.nutch.protocol.httpclient.RobotRulesParser
Creates a new RobotRulesParser which will use the supplied robotNames when choosing which stanza to follow in robots.txt files.
RobotRulesParser.RobotRuleSet - class org.apache.nutch.protocol.http.RobotRulesParser.RobotRuleSet.
This class holds the rules which were parsed from a robots.txt file, and can test paths against those rules.
RobotRulesParser.RobotRuleSet - class org.apache.nutch.protocol.httpclient.RobotRulesParser.RobotRuleSet.
This class holds the rules which were parsed from a robots.txt file, and can test paths against those rules.
RunningJob - interface org.apache.nutch.mapReduce.RunningJob.
Includes details on a running MapReduce job.
rdfidToLabel(String) - Method in class org.apache.nutch.ontology.OwlParser
 
read(DataInput) - Static method in class org.apache.nutch.db.Link
 
read(DataInput) - Static method in class org.apache.nutch.db.Page
 
read(DataInput) - Static method in class org.apache.nutch.fetcher.FetcherOutput
 
read(DataInput) - Static method in class org.apache.nutch.io.MD5Hash
Constructs, reads and returns an instance.
read(DataInput) - Static method in class org.apache.nutch.linkdb.LinkAnalysisEntry
 
read(DataInput) - Static method in class org.apache.nutch.pagedb.FetchListEntry
 
read(DataInput) - Static method in class org.apache.nutch.parse.Outlink
 
read(DataInput) - Static method in class org.apache.nutch.parse.ParseData
 
read(DataInput) - Static method in class org.apache.nutch.parse.ParseStatus
 
read(DataInput) - Static method in class org.apache.nutch.parse.ParseText
 
read(DataInput) - Static method in class org.apache.nutch.protocol.Content
 
read(DataInput) - Static method in class org.apache.nutch.protocol.ProtocolStatus
 
read(DataInput) - Static method in class org.apache.nutch.searcher.HitDetails
Constructs, reads and returns an instance.
read(DataInput) - Static method in class org.apache.nutch.searcher.Query.Clause
 
read(DataInput) - Static method in class org.apache.nutch.searcher.Query.Phrase
 
read(DataInput) - Static method in class org.apache.nutch.searcher.Query.Term
 
read(DataInput) - Static method in class org.apache.nutch.searcher.Query
 
readChar() - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
readCompressedByteArray(DataInput) - Static method in class org.apache.nutch.io.WritableUtils
 
readCompressedString(DataInput) - Static method in class org.apache.nutch.io.WritableUtils
 
readCompressedStringArray(DataInput) - Static method in class org.apache.nutch.io.WritableUtils
 
readFields(DataInput) - Method in class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction
 
readFields(DataInput) - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction
 
readFields(DataInput) - Method in class org.apache.nutch.db.Link
Read in fields from a bytestream
readFields(DataInput) - Method in class org.apache.nutch.db.Page
 
readFields(DataInput) - Method in class org.apache.nutch.db.WebDBWriter.LinkInstruction
 
readFields(DataInput) - Method in class org.apache.nutch.db.WebDBWriter.PageInstruction
 
readFields(DataInput) - Method in class org.apache.nutch.fetcher.FetcherOutput
 
readFields(DataInput) - Method in class org.apache.nutch.indexer.DeleteDuplicates.IndexedDoc
 
readFields(DataInput) - Method in class org.apache.nutch.io.ArrayWritable
 
readFields(DataInput) - Method in class org.apache.nutch.io.BooleanWritable
 
readFields(DataInput) - Method in class org.apache.nutch.io.BytesWritable
 
readFields(DataInput) - Method in class org.apache.nutch.io.FloatWritable
 
readFields(DataInput) - Method in class org.apache.nutch.io.IntWritable
 
readFields(DataInput) - Method in class org.apache.nutch.io.LongWritable
 
readFields(DataInput) - Method in class org.apache.nutch.io.MD5Hash
 
readFields(DataInput) - Method in class org.apache.nutch.io.NullWritable
 
readFields(DataInput) - Method in class org.apache.nutch.io.TwoDArrayWritable
 
readFields(DataInput) - Method in class org.apache.nutch.io.UTF8
 
readFields(DataInput) - Method in class org.apache.nutch.io.VersionedWritable
 
readFields(DataInput) - Method in interface org.apache.nutch.io.Writable
Reads the fields of this object from in.
readFields(DataInput) - Method in class org.apache.nutch.linkdb.LinkAnalysisEntry
 
readFields(DataInput) - Method in class org.apache.nutch.mapReduce.FileSplit
 
readFields(DataInput) - Method in class org.apache.nutch.mapReduce.JobProfile
 
readFields(DataInput) - Method in class org.apache.nutch.mapReduce.JobStatus
 
readFields(DataInput) - Method in class org.apache.nutch.mapReduce.MapOutputFile
 
readFields(DataInput) - Method in class org.apache.nutch.mapReduce.MapOutputLocation
 
readFields(DataInput) - Method in class org.apache.nutch.mapReduce.MapTask
 
readFields(DataInput) - Method in class org.apache.nutch.mapReduce.ReduceTask
 
readFields(DataInput) - Method in class org.apache.nutch.mapReduce.Task
 
readFields(DataInput) - Method in class org.apache.nutch.mapReduce.TaskStatus
 
readFields(DataInput) - Method in class org.apache.nutch.mapReduce.TaskTrackerStatus
 
readFields(DataInput) - Method in class org.apache.nutch.ndfs.Block
 
readFields(DataInput) - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
readFields(DataInput) - Method in class org.apache.nutch.ndfs.FSParam
Deserialize the opcode and the args
readFields(DataInput) - Method in class org.apache.nutch.ndfs.FSResults
 
readFields(DataInput) - Method in class org.apache.nutch.ndfs.HeartbeatData
 
readFields(DataInput) - Method in class org.apache.nutch.ndfs.NDFSFileInfo
 
readFields(DataInput) - Method in class org.apache.nutch.pagedb.FetchListEntry
 
readFields(DataInput) - Method in class org.apache.nutch.parse.Outlink
 
readFields(DataInput) - Method in class org.apache.nutch.parse.ParseData
 
readFields(DataInput) - Method in class org.apache.nutch.parse.ParseStatus
 
readFields(DataInput) - Method in class org.apache.nutch.parse.ParseText
 
readFields(DataInput) - Method in class org.apache.nutch.protocol.Content
 
readFields(DataInput) - Method in class org.apache.nutch.protocol.ProtocolStatus
 
readFields(DataInput) - Method in class org.apache.nutch.searcher.Hit
 
readFields(DataInput) - Method in class org.apache.nutch.searcher.HitDetails
 
readFields(DataInput) - Method in class org.apache.nutch.searcher.Hits
 
readFields(DataInput) - Method in class org.apache.nutch.searcher.Query
 
readFields(DataInput) - Method in class org.apache.nutch.tools.FetchListTool.SortableScore
 
readFields(DataInput) - Method in class org.apache.nutch.tools.UpdateSegmentsFromDb.SegmentPage
 
readFields(DataInput) - Method in class org.apache.nutch.tools.UpdateSegmentsFromDb.Update
 
readFloat(byte[], int) - Static method in class org.apache.nutch.io.WritableComparator
Parse a float from a byte array.
readInt(byte[], int) - Static method in class org.apache.nutch.io.WritableComparator
Parse an integer from a byte array.
readLong(byte[], int) - Static method in class org.apache.nutch.io.WritableComparator
Parse a long from a byte array.
readString(DataInput) - Static method in class org.apache.nutch.io.UTF8
Read a UTF-8 encoded string.
readString(DataInput) - Static method in class org.apache.nutch.io.WritableUtils
 
readStringArray(DataInput) - Static method in class org.apache.nutch.io.WritableUtils
 
readUnsignedShort(byte[], int) - Static method in class org.apache.nutch.io.WritableComparator
Parse an unsigned short from a byte array.
recentlyInvalidBlocks(UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
Return with a list of Blocks that should be invalidated at the given node.
recursiveCopy(NutchFileSystem, File, File) - Static method in class org.apache.nutch.fs.FileUtil
Copy a file and/or directory and all its contents (whether data or other files/dirs)
reduce(WritableComparable, Iterator, OutputCollector) - Method in interface org.apache.nutch.mapReduce.Reducer
Combines values for a given key.
reduce(WritableComparable, Iterator, OutputCollector) - Method in class org.apache.nutch.mapReduce.lib.IdentityReducer
Writes all keys and values directly to output.
reduce(WritableComparable, Iterator, OutputCollector) - Method in class org.apache.nutch.mapReduce.lib.LongSumReducer
 
reduceProgress() - Method in class org.apache.nutch.mapReduce.JobStatus
 
reduceProgress() - Method in interface org.apache.nutch.mapReduce.RunningJob
Returns a float between 0.0 and 1.0, indicating progress on the reduce portion of the job.
regexNormalize(String) - Method in class org.apache.nutch.net.RegexUrlNormalizer
This function does the replacements by iterating through all the regex patterns.
release(File) - Method in class org.apache.nutch.fs.LocalFileSystem
Release a held lock
release(File) - Method in class org.apache.nutch.fs.NDFSFileSystem
Release a held lock
release(File) - Method in class org.apache.nutch.fs.NutchFileSystem
Release the lock
release(UTF8) - Method in class org.apache.nutch.ndfs.NDFSClient
 
releaseLock(UTF8, UTF8) - Method in class org.apache.nutch.ndfs.FSDirectory
 
releaseLock(UTF8, UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
Release the lock on the given file
removeAll(String) - Static method in class org.apache.nutch.mapReduce.MapOutputFile
Removes all of the files related to a task.
rename(File, File) - Method in class org.apache.nutch.fs.LocalFileSystem
Rename files/dirs
rename(File, File) - Method in class org.apache.nutch.fs.NDFSFileSystem
Rename files/dirs
rename(File, File) - Method in class org.apache.nutch.fs.NutchFileSystem
Renames File src to File dst.
rename(String, String) - Method in class org.apache.nutch.fs.TestClient
Rename an NDFS file
rename(NutchFileSystem, String, String) - Static method in class org.apache.nutch.io.MapFile
Renames an existing map directory.
rename(UTF8, UTF8) - Method in class org.apache.nutch.ndfs.NDFSClient
Make a direct connection to namenode and manipulate structures there.
renameTo(UTF8, UTF8) - Method in class org.apache.nutch.ndfs.FSDirectory
Change the filename
renameTo(UTF8, UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
Change the indicated filename.
renderAnonymous(PrintStream, Resource, String) - Static method in class org.apache.nutch.ontology.OntologyImpl
 
renderClassDescription(PrintStream, OntClass, int) - Static method in class org.apache.nutch.ontology.OntologyImpl
 
renderHierarchy(PrintStream, OntClass, List, int) - Static method in class org.apache.nutch.ontology.OntologyImpl
 
renderRestriction(PrintStream, Restriction) - Static method in class org.apache.nutch.ontology.OntologyImpl
 
renderURI(PrintStream, PrefixMapping, String) - Static method in class org.apache.nutch.ontology.OntologyImpl
 
renewLease(UTF8) - Method in class org.apache.nutch.ndfs.FSNamesystem
Renew the lease(s) held by the given client
report() - Method in class org.apache.nutch.fs.TestClient
Gives a report on how the NutchFileSystem is doing
reset(byte[], int) - Method in class org.apache.nutch.io.DataInputBuffer
Resets the data that the buffer reads.
reset(byte[], int, int) - Method in class org.apache.nutch.io.DataInputBuffer
Resets the data that the buffer reads.
reset() - Method in class org.apache.nutch.io.DataOutputBuffer
Resets the buffer to empty.
reset() - Method in class org.apache.nutch.io.MapFile.Reader
Re-positions the reader before its first key.
reset() - Method in class org.apache.nutch.parse.HTMLMetaTags
Sets all boolean values to false.
reset() - Method in class org.apache.nutch.segment.SegmentReader
Reset all readers.
resolveEncodingAlias(String) - Static method in class org.apache.nutch.util.StringUtil
 
retrieve(String) - Static method in class org.apache.nutch.ontology.OntologyImpl
 
retrieveFile(String, OutputStream, int) - Method in class org.apache.nutch.protocol.ftp.Client
 
retrieveList(String, List, int, FTPFileEntryParser) - Method in class org.apache.nutch.protocol.ftp.Client
 
rightPad(String, int) - Static method in class org.apache.nutch.util.StringUtil
Returns a copy of s padded with trailing spaces so that it's length is length.
root - Variable in class org.apache.nutch.util.TrieStringMatcher
 
rootClasses(OntModel) - Method in class org.apache.nutch.ontology.OwlParser
 
rootClasses(OntModel) - Method in interface org.apache.nutch.ontology.Parser
 
run() - Method in class org.apache.nutch.fetcher.Fetcher
Runs the fetcher.
run(JobConf, TaskUmbilicalProtocol) - Method in class org.apache.nutch.mapReduce.MapTask
 
run(JobConf, TaskUmbilicalProtocol) - Method in class org.apache.nutch.mapReduce.ReduceTask
 
run(JobConf, TaskUmbilicalProtocol) - Method in class org.apache.nutch.mapReduce.Task
Run this task as a part of the named job.
run() - Method in class org.apache.nutch.mapReduce.TaskTracker
The server retry loop.
run() - Method in class org.apache.nutch.searcher.DistributedSearch.Client
 
run() - Method in class org.apache.nutch.segment.SegmentSlicer
Run the slicer.
run() - Method in class org.apache.nutch.tools.PruneIndexTool
For each query, find all matching documents and delete them from all input indexes.
run() - Method in class org.apache.nutch.tools.SegmentMergeTool
Run the tool, periodically reporting progress.
run() - Method in class org.apache.nutch.tools.UpdateSegmentsFromDb
 
runJob(JobConf) - Static method in class org.apache.nutch.mapReduce.JobClient
Utility that submits a job, then polls for progress until the job is complete.
runningJobs() - Method in class org.apache.nutch.mapReduce.JobTracker
 

S

SIGRAM - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
SLASH - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
STAGE_DEDUP - Static variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STAGE_DELETING - Static variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STAGE_INDEXING - Static variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STAGE_MASTERIDX - Static variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STAGE_MERGEIDX - Static variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STAGE_OPENING - Static variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STAGE_WRITING - Static variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STATUS_FAILED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_FAILURE - Static variable in class org.apache.nutch.parse.ParseStatus
 
STATUS_GONE - Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_NOTFETCHING - Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_NOTFOUND - Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_NOTMODIFIED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_NOTPARSED - Static variable in class org.apache.nutch.parse.ParseStatus
 
STATUS_REDIR_EXCEEDED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_RETRY - Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_ROBOTS_DENIED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STATUS_SUCCESS - Static variable in class org.apache.nutch.parse.ParseStatus
 
STATUS_SUCCESS - Static variable in class org.apache.nutch.protocol.ProtocolStatus
 
STILL_WAITING - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
SUCCEEDED - Static variable in class org.apache.nutch.mapReduce.JobStatus
 
SUCCEEDED - Static variable in class org.apache.nutch.mapReduce.TaskStatus
 
SUCCESS - Static variable in interface org.apache.nutch.mapReduce.MRConstants
 
SUCCESS - Static variable in class org.apache.nutch.parse.ParseStatus
Parsing succeeded.
SUCCESS - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Content was retrieved without errors.
SUCCESS_REDIRECT - Static variable in class org.apache.nutch.parse.ParseStatus
Parsed content contains a directive to redirect to another URL.
SYSTEM_STARTUP_PERIOD - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
ScoreStats - class org.apache.nutch.util.ScoreStats.
When we generate a fetchlist, we need to choose a "cutoff" score, such that any scores above that cutoff will be included in the fetchlist.
ScoreStats() - Constructor for class org.apache.nutch.util.ScoreStats
 
Searcher - interface org.apache.nutch.searcher.Searcher.
Service that searches.
SegmentMergeTool - class org.apache.nutch.tools.SegmentMergeTool.
This class cleans up accumulated segments data, and merges them into a single (or optionally multiple) segment(s), with no duplicates in it.
SegmentMergeTool(NutchFileSystem, File[], File, long, boolean, boolean) - Constructor for class org.apache.nutch.tools.SegmentMergeTool
Create a SegmentMergeTool.
SegmentMergeTool.SegmentMergeStatus - class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus.
 
SegmentMergeTool.SegmentMergeStatus() - Constructor for class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
SegmentMergeTool.SegmentMergeStatus(int, File[], long, long, long) - Constructor for class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
SegmentReader - class org.apache.nutch.segment.SegmentReader.
This class holds together all data readers for an existing segment.
SegmentReader(File) - Constructor for class org.apache.nutch.segment.SegmentReader
Open a segment for reading.
SegmentReader(NutchFileSystem, File) - Constructor for class org.apache.nutch.segment.SegmentReader
Open a segment for reading.
SegmentReader(File, boolean) - Constructor for class org.apache.nutch.segment.SegmentReader
Open a segment for reading.
SegmentReader(NutchFileSystem, File, boolean) - Constructor for class org.apache.nutch.segment.SegmentReader
Open a segment for reading.
SegmentReader(NutchFileSystem, File, boolean, boolean, boolean, boolean) - Constructor for class org.apache.nutch.segment.SegmentReader
Open a segment for reading.
SegmentSlicer - class org.apache.nutch.segment.SegmentSlicer.
This class reads data from one or more input segments, and outputs it to one or more output segments, optionally deleting the input segments when it's finished.
SegmentSlicer(NutchFileSystem, File[], File, boolean, boolean, boolean, boolean, long, boolean, Pattern) - Constructor for class org.apache.nutch.segment.SegmentSlicer
Create new SegmentSlicer.
SegmentWriter - class org.apache.nutch.segment.SegmentWriter.
This class holds together all data writers for a new segment.
SegmentWriter(File, boolean) - Constructor for class org.apache.nutch.segment.SegmentWriter
 
SegmentWriter(NutchFileSystem, File, boolean) - Constructor for class org.apache.nutch.segment.SegmentWriter
 
SegmentWriter(File, boolean, boolean) - Constructor for class org.apache.nutch.segment.SegmentWriter
 
SegmentWriter(NutchFileSystem, File, boolean, boolean) - Constructor for class org.apache.nutch.segment.SegmentWriter
 
SegmentWriter(NutchFileSystem, File, boolean, boolean, boolean, boolean, boolean) - Constructor for class org.apache.nutch.segment.SegmentWriter
Open a segment for writing.
SequenceFile - class org.apache.nutch.io.SequenceFile.
Support for flat files of binary key/value pairs.
SequenceFile.Reader - class org.apache.nutch.io.SequenceFile.Reader.
Writes key/value pairs from a sequence-format file.
SequenceFile.Reader(NutchFileSystem, String) - Constructor for class org.apache.nutch.io.SequenceFile.Reader
Open the named file.
SequenceFile.Sorter - class org.apache.nutch.io.SequenceFile.Sorter.
Sorts key/value pairs in a sequence-format file.
SequenceFile.Sorter(NutchFileSystem, Class, Class) - Constructor for class org.apache.nutch.io.SequenceFile.Sorter
Sort and merge files containing the named classes.
SequenceFile.Sorter(NutchFileSystem, WritableComparator, Class) - Constructor for class org.apache.nutch.io.SequenceFile.Sorter
Sort and merge using an arbitrary WritableComparator.
SequenceFile.Writer - class org.apache.nutch.io.SequenceFile.Writer.
Write key/value pairs to a sequence-format file.
SequenceFile.Writer(NutchFileSystem, String, Class, Class) - Constructor for class org.apache.nutch.io.SequenceFile.Writer
Create the named file.
SequenceFileInputFormat - class org.apache.nutch.mapReduce.SequenceFileInputFormat.
An InputFormat for plain text files.
SequenceFileInputFormat() - Constructor for class org.apache.nutch.mapReduce.SequenceFileInputFormat
 
SequenceFileOutputFormat - class org.apache.nutch.mapReduce.SequenceFileOutputFormat.
 
SequenceFileOutputFormat() - Constructor for class org.apache.nutch.mapReduce.SequenceFileOutputFormat
 
Server - class org.apache.nutch.ipc.Server.
An abstract IPC service.
Server(int, Class, int) - Constructor for class org.apache.nutch.ipc.Server
Constructs a server listening on the named port.
SetFile - class org.apache.nutch.io.SetFile.
A file-based set of keys.
SetFile() - Constructor for class org.apache.nutch.io.SetFile
 
SetFile.Reader - class org.apache.nutch.io.SetFile.Reader.
Provide access to an existing set file.
SetFile.Reader(NutchFileSystem, String) - Constructor for class org.apache.nutch.io.SetFile.Reader
Construct a set reader for the named set.
SetFile.Reader(NutchFileSystem, String, WritableComparator) - Constructor for class org.apache.nutch.io.SetFile.Reader
Construct a set reader for the named set using the named comparator.
SetFile.Writer - class org.apache.nutch.io.SetFile.Writer.
Write a new set file.
SetFile.Writer(NutchFileSystem, String, Class) - Constructor for class org.apache.nutch.io.SetFile.Writer
Create the named set for keys of the named class.
SetFile.Writer(NutchFileSystem, String, WritableComparator) - Constructor for class org.apache.nutch.io.SetFile.Writer
Create the named set using the named key comparator.
SimpleCharStream - class org.apache.nutch.quality.dynamic.SimpleCharStream.
An implementation of interface CharStream, where the stream is assumed to contain only ASCII characters (without unicode processing).
SimpleCharStream(Reader, int, int, int) - Constructor for class org.apache.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(Reader, int, int) - Constructor for class org.apache.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(Reader) - Constructor for class org.apache.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(InputStream, int, int, int) - Constructor for class org.apache.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(InputStream, int, int) - Constructor for class org.apache.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(InputStream) - Constructor for class org.apache.nutch.quality.dynamic.SimpleCharStream
 
StringUtil - class org.apache.nutch.util.StringUtil.
A collection of String processing utility methods.
StringUtil() - Constructor for class org.apache.nutch.util.StringUtil
 
SuffixStringMatcher - class org.apache.nutch.util.SuffixStringMatcher.
A class for efficiently matching Strings against a set of suffixes.
SuffixStringMatcher(String[]) - Constructor for class org.apache.nutch.util.SuffixStringMatcher
Creates a new PrefixStringMatcher which will match Strings with any suffix in the supplied array.
SuffixStringMatcher(Collection) - Constructor for class org.apache.nutch.util.SuffixStringMatcher
Creates a new PrefixStringMatcher which will match Strings with any suffix in the supplied Collection
Summarizer - class org.apache.nutch.searcher.Summarizer.
Implements hit summarization.
Summarizer() - Constructor for class org.apache.nutch.searcher.Summarizer
 
Summary - class org.apache.nutch.searcher.Summary.
A document summary dynamically generated to match a query.
Summary() - Constructor for class org.apache.nutch.searcher.Summary
Constructs an empty Summary.
Summary.Ellipsis - class org.apache.nutch.searcher.Summary.Ellipsis.
An ellipsis fragment within a summary.
Summary.Ellipsis() - Constructor for class org.apache.nutch.searcher.Summary.Ellipsis
Constructs an ellipsis fragment for the given text.
Summary.Fragment - class org.apache.nutch.searcher.Summary.Fragment.
A fragment of text within a summary.
Summary.Fragment(String) - Constructor for class org.apache.nutch.searcher.Summary.Fragment
Constructs a fragment for the given text.
Summary.Highlight - class org.apache.nutch.searcher.Summary.Highlight.
A highlighted fragment of text within a summary.
Summary.Highlight(String) - Constructor for class org.apache.nutch.searcher.Summary.Highlight
Constructs a highlighted fragment for the given text.
SwitchTo(int) - Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
SwitchTo(int) - Method in class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
save(OutputStream) - Method in class org.apache.nutch.analysis.lang.NGramProfile
Writes NGramProfile content into OutputStream.
save() - Method in class org.apache.nutch.tools.ParseSegment
Split sorted ParserOutput into ParseData and ParseText, and generate new FetcherOutput with updated status
scoreDump() - Method in class org.apache.nutch.tools.WebDBAdminTool
Emit each page's score and link data
search(Query, int, String, String, boolean) - Method in class org.apache.nutch.searcher.DistributedSearch.Client
 
search(Query, int, String, String, boolean) - Method in class org.apache.nutch.searcher.IndexSearcher
 
search(Query, int) - Method in class org.apache.nutch.searcher.NutchBean
 
search(Query, int, String, String, boolean) - Method in class org.apache.nutch.searcher.NutchBean
 
search(Query, int, int) - Method in class org.apache.nutch.searcher.NutchBean
Search for pages matching a query, eliminating excessive hits from the same site.
search(Query, int, int, String) - Method in class org.apache.nutch.searcher.NutchBean
Search for pages matching a query, eliminating excessive hits with matching values for a named field.
search(Query, int, int, String, String, boolean) - Method in class org.apache.nutch.searcher.NutchBean
Search for pages matching a query, eliminating excessive hits with matching values for a named field.
search(Query, int, String, String, boolean) - Method in interface org.apache.nutch.searcher.Searcher
Return the top-scoring hits for a query.
second - Variable in class org.apache.nutch.ndfs.FSParam
 
second - Variable in class org.apache.nutch.ndfs.FSResults
 
seek(long) - Method in class org.apache.nutch.fs.NFSDataInputStream
 
seek(long) - Method in class org.apache.nutch.fs.NFSInputStream
Seek to the given offset from the start of the file.
seek(long) - Method in class org.apache.nutch.io.ArrayFile.Reader
Positions the reader before its nth value.
seek(WritableComparable) - Method in class org.apache.nutch.io.MapFile.Reader
Positions the reader at the named key, or if none such exists, at the first entry after the named key.
seek(long) - Method in class org.apache.nutch.io.SequenceFile.Reader
Set the current byte position in the input file.
seek(WritableComparable) - Method in class org.apache.nutch.io.SetFile.Reader
 
seek(long) - Method in class org.apache.nutch.segment.SegmentReader
Seek to a position in all readers.
segmentDir - Variable in class org.apache.nutch.segment.SegmentReader
 
segmentDir - Variable in class org.apache.nutch.segment.SegmentWriter
 
sendNoOp() - Method in class org.apache.nutch.protocol.ftp.Client
Sends a NOOP command to the FTP server.
set(DistributedWebDBWriter.LinkInstruction) - Method in class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction
Re-init from another LinkInstruction's info.
set(Link, int) - Method in class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction
Re-init with a Link and an instruction
set(DistributedWebDBWriter.PageInstruction) - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction
Init from another PageInstruction object.
set(Page, int) - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction
Init PageInstruction with no Link
set(Page, Link, int) - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction
Init PageInstruction with a Link
set(Link) - Method in class org.apache.nutch.db.Link
 
set(Page) - Method in class org.apache.nutch.db.Page
Copy the contents of another instance into this instance.
set(WebDBWriter.LinkInstruction) - Method in class org.apache.nutch.db.WebDBWriter.LinkInstruction
Re-init from another LinkInstruction's info.
set(Link, int) - Method in class org.apache.nutch.db.WebDBWriter.LinkInstruction
Re-init with a Link and an instruction
set(WebDBWriter.PageInstruction) - Method in class org.apache.nutch.db.WebDBWriter.PageInstruction
Init from another PageInstruction object.
set(Page, int) - Method in class org.apache.nutch.db.WebDBWriter.PageInstruction
Init PageInstruction with no Link
set(Page, Link, int) - Method in class org.apache.nutch.db.WebDBWriter.PageInstruction
Init PageInstruction with a Link
set(Writable[]) - Method in class org.apache.nutch.io.ArrayWritable
 
set(boolean) - Method in class org.apache.nutch.io.BooleanWritable
Set the value of the BooleanWritable
set(float) - Method in class org.apache.nutch.io.FloatWritable
Set the value of this FloatWritable.
set(int) - Method in class org.apache.nutch.io.IntWritable
Set the value of this IntWritable.
set(long) - Method in class org.apache.nutch.io.LongWritable
Set the value of this LongWritable.
set(MD5Hash) - Method in class org.apache.nutch.io.MD5Hash
Copy the contents of another instance into this instance.
set(Writable[][]) - Method in class org.apache.nutch.io.TwoDArrayWritable
 
set(String) - Method in class org.apache.nutch.io.UTF8
Set to contain the contents of a string.
set(UTF8) - Method in class org.apache.nutch.io.UTF8
Set to contain the contents of a string.
set(float) - Method in class org.apache.nutch.tools.FetchListTool.SortableScore
 
set(UTF8, UTF8, int) - Method in class org.apache.nutch.tools.UpdateSegmentsFromDb.SegmentPage
 
set(String, Object) - Method in class org.apache.nutch.util.NutchConf
Sets the value of the name property.
setAnchors(String[]) - Method in class org.apache.nutch.pagedb.FetchListEntry
 
setArgs(String[]) - Method in class org.apache.nutch.parse.ParseStatus
 
setArgs(String[]) - Method in class org.apache.nutch.protocol.ProtocolStatus
 
setBaseHref(URL) - Method in class org.apache.nutch.parse.HTMLMetaTags
Sets the baseHref.
setClass(String, Class, Class) - Method in class org.apache.nutch.util.NutchConf
Sets the value of the name property to the name of a class.
setClazz(String) - Method in class org.apache.nutch.plugin.Extension
Sets the Class that implement the concret extension and is only used until model creation at system start up.
setClean(boolean) - Method in class org.apache.nutch.tools.ParseSegment
Set if clean intermediates.
setCode(int) - Method in class org.apache.nutch.protocol.ProtocolStatus
 
setCombinerClass(Class) - Method in class org.apache.nutch.mapReduce.JobConf
 
setCommand(String) - Method in class org.apache.nutch.util.CommandRunner
 
setContent(byte[]) - Method in class org.apache.nutch.protocol.Content
 
setContent(Content) - Method in class org.apache.nutch.protocol.ProtocolOutput
 
setContentType(String) - Method in class org.apache.nutch.protocol.Content
 
setDataTimeout(int) - Method in class org.apache.nutch.protocol.ftp.Client
Sets the timeout in milliseconds to use for data connection.
setDebugStream(PrintStream) - Method in class org.apache.nutch.analysis.NutchAnalysisTokenManager
 
setDebugStream(PrintStream) - Method in class org.apache.nutch.quality.dynamic.PageDescriptionTokenManager
 
setDescriptor(PluginDescriptor) - Method in class org.apache.nutch.plugin.Extension
Sets the plugin descriptor and is only used until model creation at system start up.
setDestroyOnTimeout(boolean) - Method in class org.apache.nutch.util.CommandRunner
 
setDigest(String) - Method in class org.apache.nutch.io.MD5Hash
Sets the digest value from a hex string.
setDocumentLocator(Locator) - Method in class org.apache.nutch.parse.html.DOMBuilder
Receive an object for locating the origin of SAX document events.
setExpireTime(long) - Method in class org.apache.nutch.protocol.http.RobotRulesParser.RobotRuleSet
Change when the ruleset goes stale.
setExpireTime(long) - Method in class org.apache.nutch.protocol.httpclient.RobotRulesParser.RobotRuleSet
Change when the ruleset goes stale.
setFactor(int) - Method in class org.apache.nutch.io.SequenceFile.Sorter
Set the number of streams to merge at once.
setFetchDate(long) - Method in class org.apache.nutch.fetcher.FetcherOutput
 
setFetchInterval(byte) - Method in class org.apache.nutch.db.Page
 
setFileType(int) - Method in class org.apache.nutch.protocol.ftp.Client
Sets the file type to be transferred.
setFollowTalk(boolean) - Method in class org.apache.nutch.protocol.ftp.Ftp
Set followTalk
setIDAttribute(String, Element) - Method in class org.apache.nutch.parse.html.DOMBuilder
Set an ID string to node association in the ID table.
setId(String) - Method in class org.apache.nutch.plugin.Extension
Sets the unique extension Id and is only used until model creation at system start up.
setIndexInterval(int) - Method in class org.apache.nutch.io.MapFile.Writer
Sets the index interval.
setIndexInterval(int) - Method in class org.apache.nutch.segment.SegmentWriter
Sets the index interval for all segment writers.
setIndexNo(int) - Method in class org.apache.nutch.searcher.Hit
 
setInputDir(File) - Method in class org.apache.nutch.mapReduce.JobConf
 
setInputFormat(InputFormat) - Method in class org.apache.nutch.mapReduce.JobConf
 
setInputKeyClass(Class) - Method in class org.apache.nutch.mapReduce.JobConf
 
setInputStream(InputStream) - Method in class org.apache.nutch.util.CommandRunner
 
setInputValueClass(Class) - Method in class org.apache.nutch.mapReduce.JobConf
 
setInt(String, int) - Method in class org.apache.nutch.util.NutchConf
Sets the value of the name property to an integer.
setJar(String) - Method in class org.apache.nutch.mapReduce.JobConf
 
setJobFile(String) - Method in class org.apache.nutch.mapReduce.Task
 
setKeepConnection(boolean) - Method in class org.apache.nutch.protocol.ftp.Ftp
Set keepConnection
setLastModified(long) - Method in class org.apache.nutch.protocol.ProtocolStatus
 
setLastSeen(long) - Method in class org.apache.nutch.mapReduce.TaskTrackerStatus
 
setLogLevel(Level) - Static method in class org.apache.nutch.fetcher.Fetcher
Set the logging level.
setLogLevel(Level) - Static method in class org.apache.nutch.tools.ParseSegment
Set the logging level.
setMD5(MD5Hash) - Method in class org.apache.nutch.db.Page
 
setMajorCode(byte) - Method in class org.apache.nutch.parse.ParseStatus
 
setMapProgress(float) - Method in class org.apache.nutch.mapReduce.JobStatus
 
setMapperClass(Class) - Method in class org.apache.nutch.mapReduce.JobConf
 
setMaxContentLength(int) - Method in class org.apache.nutch.protocol.file.File
Set the point at which content is truncated.
setMaxContentLength(int) - Method in class org.apache.nutch.protocol.ftp.Ftp
Set the point at which content is truncated.
setMemory(int) - Method in class org.apache.nutch.io.SequenceFile.Sorter
Set the total amount of buffer memory, in bytes.
setMessage(String) - Method in class org.apache.nutch.parse.ParseStatus
 
setMessage(String) - Method in class org.apache.nutch.protocol.ProtocolStatus
 
setMinorCode(short) - Method in class org.apache.nutch.parse.ParseStatus
 
setMoreFromDupExcluded(boolean) - Method in class org.apache.nutch.searcher.Hit
True iff other, lower-scoring, hits with the same deup value have been excluded from the list which contains this hit..
setName(Class, String) - Static method in class org.apache.nutch.io.WritableName
Set the name that a class should be known as to something other than the class name.
setNextFetchTime(long) - Method in class org.apache.nutch.db.Page
 
setNoCache() - Method in class org.apache.nutch.parse.HTMLMetaTags
Sets noCache to true.
setNoFollow() - Method in class org.apache.nutch.parse.HTMLMetaTags
Sets noFollow to true.
setNoIndex() - Method in class org.apache.nutch.parse.HTMLMetaTags
Sets noIndex to true.
setNumBytes(long) - Method in class org.apache.nutch.ndfs.Block
 
setNumMapTasks(int) - Method in class org.apache.nutch.mapReduce.JobConf
 
setNumOutlinks(int) - Method in class org.apache.nutch.db.Page
 
setNumReduceTasks(int) - Method in class org.apache.nutch.mapReduce.JobConf
 
setOutputDir(File) - Method in class org.apache.nutch.mapReduce.JobConf
 
setOutputFormat(OutputFormat) - Method in class org.apache.nutch.mapReduce.JobConf
 
setOutputKeyClass(Class) - Method in class org.apache.nutch.mapReduce.JobConf
 
setOutputKeyComparatorClass(Class) - Method in class org.apache.nutch.mapReduce.JobConf
 
setOutputValueClass(Class) - Method in class org.apache.nutch.mapReduce.JobConf
 
setPartitionerClass(Class) - Method in class org.apache.nutch.mapReduce.JobConf
 
setProgress(float) - Method in class org.apache.nutch.mapReduce.TaskStatus
 
setProtocolStatus(ProtocolStatus) - Method in class org.apache.nutch.fetcher.FetcherOutput
 
setQuery(String) - Method in class org.apache.nutch.clustering.carrot2.LocalNutchInputComponent
 
setReduceProgress(float) - Method in class org.apache.nutch.mapReduce.JobStatus
 
setReducerClass(Class) - Method in class org.apache.nutch.mapReduce.JobConf
 
setRefresh(boolean) - Method in class org.apache.nutch.parse.HTMLMetaTags
Sets refresh to the supplied value.
setRefreshHref(URL) - Method in class org.apache.nutch.parse.HTMLMetaTags
Sets the refreshHref.
setRefreshTime(int) - Method in class org.apache.nutch.parse.HTMLMetaTags
Sets the refreshTime.
setRemoteVerificationEnabled(boolean) - Method in class org.apache.nutch.protocol.ftp.Client
Enable or disable verification that the remote host taking part of a data connection is the same as the host to which the control connection is attached.
setRetriesSinceFetch(int) - Method in class org.apache.nutch.db.Page
 
setRunState(int) - Method in class org.apache.nutch.mapReduce.TaskStatus
 
setScore(float) - Method in class org.apache.nutch.db.Page
 
setScore(float, float) - Method in class org.apache.nutch.db.Page
 
setScore(float) - Method in class org.apache.nutch.linkdb.LinkAnalysisEntry
 
setScorePower(float) - Method in class org.apache.nutch.indexer.IndexSegment
Determines the power of link analyis scores.
setShowThreadIDs(boolean) - Static method in class org.apache.nutch.util.LogFormatter
When set true, thread IDs are logged.
setStatus(ProtocolStatus) - Method in class org.apache.nutch.protocol.ProtocolOutput
 
setStdErrorStream(OutputStream) - Method in class org.apache.nutch.util.CommandRunner
 
setStdOutputStream(OutputStream) - Method in class org.apache.nutch.util.CommandRunner
 
setTargetHasOutlink(boolean) - Method in class org.apache.nutch.db.Link
 
setThreadCount(int) - Method in class org.apache.nutch.fetcher.Fetcher
Set thread count
setThreadCount(int) - Method in class org.apache.nutch.tools.ParseSegment
Set thread count
setTimeout(int) - Method in class org.apache.nutch.ipc.Client
Sets the timeout used for network i/o.
setTimeout(int) - Method in class org.apache.nutch.ipc.Server
Sets the timeout used for network i/o.
setTimeout(int) - Method in class org.apache.nutch.protocol.ftp.Ftp
Set the timeout.
setTimeout(int) - Method in class org.apache.nutch.util.CommandRunner
 
setTotalIsExact(boolean) - Method in class org.apache.nutch.searcher.Hits
Set Hits.totalIsExact().
setURL(String) - Method in class org.apache.nutch.db.Page
 
setValueClass(Class) - Method in class org.apache.nutch.io.ArrayWritable
 
setWaitForExit(boolean) - Method in class org.apache.nutch.util.CommandRunner
 
setWeight(float) - Method in class org.apache.nutch.searcher.Query.Clause
 
shortestMatch(String) - Method in class org.apache.nutch.util.PrefixStringMatcher
Returns the shortest prefix of input that is matched, or null if no match exists.
shortestMatch(String) - Method in class org.apache.nutch.util.SuffixStringMatcher
Returns the shortest suffix of input that is matched, or null if no match exists.
shortestMatch(String) - Method in class org.apache.nutch.util.TrieStringMatcher
Returns the shortest substring of input that is matched by a pattern in the trie, or null if no match exists.
showTime(boolean) - Static method in class org.apache.nutch.util.LogFormatter
When true, time is logged with each entry.
shutDown() - Method in class org.apache.nutch.plugin.Plugin
Shutdown the plugin.
shutdown() - Method in class org.apache.nutch.util.ThreadPool
Turn off the pool.
size - Variable in class org.apache.nutch.segment.SegmentReader
 
size - Variable in class org.apache.nutch.segment.SegmentWriter
 
size() - Method in class org.apache.nutch.util.FibonacciHeap
Returns the number of objects in the heap.
skip(DataInput) - Static method in class org.apache.nutch.io.UTF8
Skips over one UTF8 in the input.
skip(DataInput) - Static method in class org.apache.nutch.parse.Outlink
Skips over one Outlink in the input.
skipCompressedByteArray(DataInput) - Static method in class org.apache.nutch.io.WritableUtils
 
skippedEntity(String) - Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of a skipped entity.
sort(String, String) - Method in class org.apache.nutch.io.SequenceFile.Sorter
Perform a file sort.
sort() - Method in class org.apache.nutch.tools.ParseSegment
Sort ParserOutput
specialConstructor - Variable in class org.apache.nutch.quality.dynamic.ParseException
This variable determines which constructor was used to create this object and thereby affects the semantics of the "getMessage" method (see below).
specialToken - Variable in class org.apache.nutch.quality.dynamic.Token
This field is used to access special tokens that occur prior to this token, but after the immediately preceding regular (non-special) token.
stage - Variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
stages - Static variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
start() - Method in class org.apache.nutch.ipc.Server
Starts the service.
start() - Method in class org.apache.nutch.mapReduce.JobTrackerInfoServer
Launch the HTTP server
startBlock(Block) - Method in class org.apache.nutch.ndfs.FSDataset
A Block b will be coming soon!
startCDATA() - Method in class org.apache.nutch.parse.html.DOMBuilder
Report the start of a CDATA section.
startDTD(String, String, String) - Method in class org.apache.nutch.parse.html.DOMBuilder
Report the start of DTD declarations, if any.
startDocument() - Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of the beginning of a document.
startElement(String, String, String, Attributes) - Method in class org.apache.nutch.parse.html.DOMBuilder
Receive notification of the beginning of an element.
startEntity(String) - Method in class org.apache.nutch.parse.html.DOMBuilder
Report the beginning of an entity.
startFile(UTF8, UTF8, boolean) - Method in class org.apache.nutch.ndfs.FSNamesystem
The client would like to create a new block for the indicated filename.
startLocalInput(File, File) - Method in class org.apache.nutch.fs.LocalFileSystem
We can read directly from the real local fs.
startLocalInput(File, File) - Method in class org.apache.nutch.fs.NDFSFileSystem
Fetch remote NDFS file, place at tmpLocalFile
startLocalInput(File, File) - Method in class org.apache.nutch.fs.NutchFileSystem
Returns a local File that the user can read from.
startLocalOutput(File, File) - Method in class org.apache.nutch.fs.LocalFileSystem
We can write output directly to the final location
startLocalOutput(File, File) - Method in class org.apache.nutch.fs.NDFSFileSystem
Output will go to the tmp working area.
startLocalOutput(File, File) - Method in class org.apache.nutch.fs.NutchFileSystem
Returns a local File that the user can write output to.
startPrefixMapping(String, String) - Method in class org.apache.nutch.parse.html.DOMBuilder
Begin the scope of a prefix-URI Namespace mapping.
startProcessing(RequestContext) - Method in class org.apache.nutch.clustering.carrot2.LocalNutchInputComponent
A callback hook that starts the processing.
startTime - Variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
startUp() - Method in class org.apache.nutch.plugin.Plugin
Will be invoked until plugin start up.
started - Variable in class org.apache.nutch.segment.SegmentReader
The time when fetching of this segment started, as recorded in fetcher output data.
staticFlag - Static variable in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
status() - Method in class org.apache.nutch.fetcher.Fetcher
Display the status of the fetcher run.
status() - Method in class org.apache.nutch.tools.ParseSegment
Display the status of the parser run.
stop() - Method in class org.apache.nutch.ipc.Client
Stop all threads related to this client.
stop() - Method in class org.apache.nutch.ipc.Server
Stops the service.
stop() - Method in class org.apache.nutch.mapReduce.JobTrackerInfoServer
Stop the HTTP server
subclasses(String) - Method in interface org.apache.nutch.ontology.Ontology
 
subclasses(String) - Method in class org.apache.nutch.ontology.OntologyImpl
retrieve all subclasses of entity(ies) hashed to searchTerm
submitJob(String) - Method in class org.apache.nutch.mapReduce.JobClient
Submit a job to the MR system
submitJob(JobConf) - Method in class org.apache.nutch.mapReduce.JobClient
Submit a job to the MR system
submitJob(String) - Method in interface org.apache.nutch.mapReduce.JobSubmissionProtocol
Submit a Job for execution.
submitJob(String) - Method in class org.apache.nutch.mapReduce.JobTracker
 
success() - Method in class org.apache.nutch.ndfs.FSResults
Whether the call worked.
sync(long) - Method in class org.apache.nutch.io.SequenceFile.Reader
Seek to the next sync mark past a given position.
syncSeen() - Method in class org.apache.nutch.io.SequenceFile.Reader
Returns true iff the previous call to next passed a sync mark.
synonyms(String) - Method in interface org.apache.nutch.ontology.Ontology
 
synonyms(String) - Method in class org.apache.nutch.ontology.OntologyImpl
retrieves synonyms from wordnet via sweet's web interface

T

TASKTRACKER_EXPIRY_INTERVAL - Static variable in interface org.apache.nutch.mapReduce.MRConstants
 
TEMP_MOVED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
Resource has moved temporarily.
TRACKERS_OK - Static variable in interface org.apache.nutch.mapReduce.InterTrackerProtocol
 
Task - class org.apache.nutch.mapReduce.Task.
Base class for tasks.
Task() - Constructor for class org.apache.nutch.mapReduce.Task
 
Task(String, String) - Constructor for class org.apache.nutch.mapReduce.Task
 
TaskStatus - class org.apache.nutch.mapReduce.TaskStatus.
Describes the current status of a task.
TaskStatus() - Constructor for class org.apache.nutch.mapReduce.TaskStatus
 
TaskStatus(String, float, int) - Constructor for class org.apache.nutch.mapReduce.TaskStatus
 
TaskTracker - class org.apache.nutch.mapReduce.TaskTracker.
TaskTracker is a process that starts and tracks MR Tasks in a networked environment.
TaskTracker() - Constructor for class org.apache.nutch.mapReduce.TaskTracker
Start with the local machine name, and the default JobTracker
TaskTracker(InetSocketAddress) - Constructor for class org.apache.nutch.mapReduce.TaskTracker
Start with the local machine name, and the addr of the target JobTracker
TaskTracker.Child - class org.apache.nutch.mapReduce.TaskTracker.Child.
The main() for child processes.
TaskTracker.Child() - Constructor for class org.apache.nutch.mapReduce.TaskTracker.Child
 
TaskTrackerStatus - class org.apache.nutch.mapReduce.TaskTrackerStatus.
A TaskTrackerStatus is a MapReduce primitive.
TaskTrackerStatus() - Constructor for class org.apache.nutch.mapReduce.TaskTrackerStatus
 
TaskTrackerStatus(String, String, int, Vector) - Constructor for class org.apache.nutch.mapReduce.TaskTrackerStatus
 
TaskUmbilicalProtocol - interface org.apache.nutch.mapReduce.TaskUmbilicalProtocol.
Protocol that task child process uses to contact its parent process.
TestClient - class org.apache.nutch.fs.TestClient.
This class provides some NDFS administrative access.
TestClient(NutchFileSystem) - Constructor for class org.apache.nutch.fs.TestClient
 
TextInputFormat - class org.apache.nutch.mapReduce.TextInputFormat.
An InputFormat for plain text files.
TextInputFormat() - Constructor for class org.apache.nutch.mapReduce.TextInputFormat
 
TextOutputFormat - class org.apache.nutch.mapReduce.TextOutputFormat.
 
TextOutputFormat() - Constructor for class org.apache.nutch.mapReduce.TextOutputFormat
 
TextParser - class org.apache.nutch.parse.text.TextParser.
 
TextParser() - Constructor for class org.apache.nutch.parse.text.TextParser
 
ThreadPool - class org.apache.nutch.util.ThreadPool.
ThreadPool.java ThreadPool maintains a large set of threads, which can be dedicated to a certain task, and then recycled.
ThreadPool(int) - Constructor for class org.apache.nutch.util.ThreadPool
Creates a pool of numThreads size.
Token - class org.apache.nutch.quality.dynamic.Token.
Describes the input token stream.
Token() - Constructor for class org.apache.nutch.quality.dynamic.Token
 
TokenCountMapper - class org.apache.nutch.mapReduce.lib.TokenCountMapper.
A Mapper that maps text values into pairs.
TokenCountMapper() - Constructor for class org.apache.nutch.mapReduce.lib.TokenCountMapper
 
TokenMgrError - error org.apache.nutch.quality.dynamic.TokenMgrError.
 
TokenMgrError() - Constructor for class org.apache.nutch.quality.dynamic.TokenMgrError
 
TokenMgrError(String, int) - Constructor for class org.apache.nutch.quality.dynamic.TokenMgrError
 
TokenMgrError(boolean, int, int, int, String, char, int) - Constructor for class org.apache.nutch.quality.dynamic.TokenMgrError
 
TrieStringMatcher - class org.apache.nutch.util.TrieStringMatcher.
TrieStringMatcher is a base class for simple tree-based string matching.
TrieStringMatcher() - Constructor for class org.apache.nutch.util.TrieStringMatcher
 
TrieStringMatcher.TrieNode - class org.apache.nutch.util.TrieStringMatcher.TrieNode.
Node class for the character tree.
TwoDArrayWritable - class org.apache.nutch.io.TwoDArrayWritable.
A Writable for 2D arrays containing a matrix of instances of a class.
TwoDArrayWritable(Class) - Constructor for class org.apache.nutch.io.TwoDArrayWritable
 
TwoDArrayWritable(Class, Writable[][]) - Constructor for class org.apache.nutch.io.TwoDArrayWritable
 
TypeQueryFilter - class org.apache.nutch.searcher.more.TypeQueryFilter.
Handles "type:" query clauses, causing them to search the field indexed by MoreIndexingFilter.
TypeQueryFilter() - Constructor for class org.apache.nutch.searcher.more.TypeQueryFilter
 
targetHasOutlink() - Method in class org.apache.nutch.db.Link
 
taskReports() - Method in class org.apache.nutch.mapReduce.TaskTrackerStatus
All current tasks at the TaskTracker.
taskTrackers() - Method in class org.apache.nutch.mapReduce.JobTracker
 
term() - Method in class org.apache.nutch.analysis.NutchAnalysis
Parse a single term.
terminal - Variable in class org.apache.nutch.util.TrieStringMatcher.TrieNode
 
textDump(String) - Method in class org.apache.nutch.tools.WebDBAdminTool
Emit the webdb to 2 text files.
toArray() - Method in class org.apache.nutch.io.ArrayWritable
 
toArray() - Method in class org.apache.nutch.io.TwoDArrayWritable
 
toContent() - Method in class org.apache.nutch.protocol.file.FileResponse
 
toContent() - Method in class org.apache.nutch.protocol.ftp.FtpResponse
 
toContent() - Method in class org.apache.nutch.protocol.http.HttpResponse
 
toContent() - Method in class org.apache.nutch.protocol.httpclient.HttpResponse
 
toDate(String) - Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
toHtml() - Method in class org.apache.nutch.searcher.HitDetails
Display as HTML.
toLong(String) - Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
toString() - Method in class org.apache.nutch.analysis.lang.NGramProfile
 
toString() - Method in class org.apache.nutch.db.Link
Print out the record
toString() - Method in class org.apache.nutch.db.Page
Print out the Page
toString() - Method in class org.apache.nutch.fetcher.Fetcher.FetcherStatus
 
toString() - Method in class org.apache.nutch.fetcher.FetcherOutput
 
toString() - Method in class org.apache.nutch.fs.LocalFileSystem
 
toString() - Method in class org.apache.nutch.fs.NDFSFileSystem
 
toString() - Method in class org.apache.nutch.io.FloatWritable
 
toString() - Method in class org.apache.nutch.io.IntWritable
 
toString() - Method in class org.apache.nutch.io.LongWritable
 
toString() - Method in class org.apache.nutch.io.MD5Hash
Returns a string representation of this object.
toString() - Method in class org.apache.nutch.io.SequenceFile.Reader
Returns the name of the file.
toString() - Method in class org.apache.nutch.io.UTF8
Convert to a String.
toString() - Method in class org.apache.nutch.io.VersionMismatchException
Returns a string representation of this object.
toString() - Method in class org.apache.nutch.mapReduce.Task
 
toString() - Method in class org.apache.nutch.ndfs.Block
 
toString() - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
toString(Date) - Static method in class org.apache.nutch.net.protocols.HttpDateFormat
Get the HTTP format of the specified date.
toString(Calendar) - Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
toString(long) - Static method in class org.apache.nutch.net.protocols.HttpDateFormat
 
toString() - Method in class org.apache.nutch.pagedb.FetchListEntry
 
toString() - Method in class org.apache.nutch.parse.HTMLMetaTags
 
toString() - Method in class org.apache.nutch.parse.Outlink
 
toString() - Method in class org.apache.nutch.parse.ParseData
 
toString() - Method in class org.apache.nutch.parse.ParseStatus
 
toString() - Method in class org.apache.nutch.parse.ParseText
 
toString() - Method in class org.apache.nutch.parse.html.DOMContentUtils.LinkParams
 
toString() - Method in class org.apache.nutch.parse.msword.WordTextBuffer
 
toString() - Method in class org.apache.nutch.protocol.Content
 
toString() - Method in class org.apache.nutch.protocol.ProtocolStatus
 
toString() - Method in class org.apache.nutch.protocol.http.RobotRulesParser.RobotRuleSet
 
toString() - Method in class org.apache.nutch.protocol.httpclient.MultiProperties
A string representation of this MultiProperties list.
toString() - Method in class org.apache.nutch.protocol.httpclient.RobotRulesParser.RobotRuleSet
 
toString() - Method in class org.apache.nutch.quality.dynamic.Token
Returns the image.
toString() - Method in class org.apache.nutch.searcher.Hit
Display as a string.
toString() - Method in class org.apache.nutch.searcher.HitDetails
Display as a string.
toString() - Method in class org.apache.nutch.searcher.Query.Clause
 
toString() - Method in class org.apache.nutch.searcher.Query.Phrase
 
toString() - Method in class org.apache.nutch.searcher.Query.Term
 
toString() - Method in class org.apache.nutch.searcher.Query
 
toString() - Method in class org.apache.nutch.searcher.Summary.Ellipsis
Returns an HTML representation of this fragment.
toString() - Method in class org.apache.nutch.searcher.Summary.Fragment
Returns an HTML representation of this fragment.
toString() - Method in class org.apache.nutch.searcher.Summary.Highlight
Returns an HTML representation of this fragment.
toString() - Method in class org.apache.nutch.searcher.Summary
Returns an HTML representation of this fragment.
toString() - Method in class org.apache.nutch.util.mime.MimeType
 
toStrings() - Method in class org.apache.nutch.io.ArrayWritable
 
toTabbedString() - Method in class org.apache.nutch.db.Link
Get a tab-delimited version of the text data.
toTabbedString() - Method in class org.apache.nutch.db.Page
A tab-delimited text version of the Page's data.
token - Variable in class org.apache.nutch.analysis.NutchAnalysis
 
token - Variable in class org.apache.nutch.quality.dynamic.PageDescription
 
tokenImage - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
tokenImage - Static variable in interface org.apache.nutch.quality.dynamic.PageDescriptionConstants
 
tokenImage - Variable in class org.apache.nutch.quality.dynamic.ParseException
This is a reference to the "tokenImage" array of the generated parser within which the parse error occurred.
tokenStream(String, Reader) - Method in class org.apache.nutch.analysis.NutchDocumentAnalyzer
Returns a new token stream for text from the named field.
token_source - Variable in class org.apache.nutch.analysis.NutchAnalysis
 
token_source - Variable in class org.apache.nutch.quality.dynamic.PageDescription
 
totalCapacity() - Method in class org.apache.nutch.ndfs.FSNamesystem
Total raw bytes
totalIsExact() - Method in class org.apache.nutch.searcher.Hits
True if Hits.getTotal() gives the exact number of hits, or false if it is only an estimate of the total number of hits.
totalRawCapacity() - Method in class org.apache.nutch.ndfs.NDFSClient
 
totalRawUsed() - Method in class org.apache.nutch.ndfs.NDFSClient
 
totalRecords - Variable in class org.apache.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
totalRemaining() - Method in class org.apache.nutch.ndfs.FSNamesystem
Total non-used raw bytes
tracker - Static variable in class org.apache.nutch.mapReduce.JobTracker
 
tryagain() - Method in class org.apache.nutch.ndfs.FSResults
Whether the client should give it another shot

U

UNASSIGNED - Static variable in class org.apache.nutch.mapReduce.TaskStatus
 
UNKNOWN_TASKTRACKER - Static variable in interface org.apache.nutch.mapReduce.InterTrackerProtocol
 
UNQUOTED_VALUE - Static variable in interface org.apache.nutch.quality.dynamic.PageDescriptionConstants
 
URLFilter - interface org.apache.nutch.net.URLFilter.
Interface used to limit which URLs enter Nutch.
URLFilterChecker - class org.apache.nutch.net.URLFilterChecker.
Checke one given filter or all filters.
URLFilterChecker() - Constructor for class org.apache.nutch.net.URLFilterChecker
 
URLFilterException - exception org.apache.nutch.net.URLFilterException.
 
URLFilterException() - Constructor for class org.apache.nutch.net.URLFilterException
 
URLFilterException(String) - Constructor for class org.apache.nutch.net.URLFilterException
 
URLFilterException(String, Throwable) - Constructor for class org.apache.nutch.net.URLFilterException
 
URLFilterException(Throwable) - Constructor for class org.apache.nutch.net.URLFilterException
 
URLFilters - class org.apache.nutch.net.URLFilters.
Creates and caches URLFilter implementing plugins.
URL_KEYSPACE - Static variable in class org.apache.nutch.db.EditSectionGroupWriter
 
URL_KEYSPACE_DIVIDERS - Static variable in class org.apache.nutch.db.DBKeyDivision
 
UTF8 - class org.apache.nutch.io.UTF8.
A WritableComparable for strings that uses the UTF8 encoding.
UTF8() - Constructor for class org.apache.nutch.io.UTF8
 
UTF8(String) - Constructor for class org.apache.nutch.io.UTF8
Construct from a given string.
UTF8(UTF8) - Constructor for class org.apache.nutch.io.UTF8
Construct from a given string.
UTF8.Comparator - class org.apache.nutch.io.UTF8.Comparator.
A WritableComparator optimized for UTF8 keys.
UTF8.Comparator() - Constructor for class org.apache.nutch.io.UTF8.Comparator
 
UpdateDatabaseTool - class org.apache.nutch.tools.UpdateDatabaseTool.
This class takes the output of the fetcher and updates the page and link DBs accordingly.
UpdateDatabaseTool(IWebDBWriter, boolean, int) - Constructor for class org.apache.nutch.tools.UpdateDatabaseTool
Take in the WebDBWriter, instantiated elsewhere.
UpdateLineColumn(char) - Method in class org.apache.nutch.quality.dynamic.SimpleCharStream
 
UpdateSegmentsFromDb - class org.apache.nutch.tools.UpdateSegmentsFromDb.
Update scores and links in a set of segments from the current information in a web database.
UpdateSegmentsFromDb(NutchFileSystem, String, String, String) - Constructor for class org.apache.nutch.tools.UpdateSegmentsFromDb
Updates all segemnts in the named directory from the named db.
UpdateSegmentsFromDb.BySegmentComparator - class org.apache.nutch.tools.UpdateSegmentsFromDb.BySegmentComparator.
Used internally only.
UpdateSegmentsFromDb.BySegmentComparator() - Constructor for class org.apache.nutch.tools.UpdateSegmentsFromDb.BySegmentComparator
 
UpdateSegmentsFromDb.ByUrlComparator - class org.apache.nutch.tools.UpdateSegmentsFromDb.ByUrlComparator.
Used internally only.
UpdateSegmentsFromDb.ByUrlComparator() - Constructor for class org.apache.nutch.tools.UpdateSegmentsFromDb.ByUrlComparator
 
UpdateSegmentsFromDb.SegmentPage - class org.apache.nutch.tools.UpdateSegmentsFromDb.SegmentPage.
Used internally only.
UpdateSegmentsFromDb.SegmentPage() - Constructor for class org.apache.nutch.tools.UpdateSegmentsFromDb.SegmentPage
 
UpdateSegmentsFromDb.Update - class org.apache.nutch.tools.UpdateSegmentsFromDb.Update.
Used internally only.
UpdateSegmentsFromDb.Update() - Constructor for class org.apache.nutch.tools.UpdateSegmentsFromDb.Update
 
UpdateSegmentsFromDb.Update(float, String[]) - Constructor for class org.apache.nutch.tools.UpdateSegmentsFromDb.Update
 
UrlNormalizer - interface org.apache.nutch.net.UrlNormalizer.
Interface used to convert URLs to normal form and optionally do regex substitutions
UrlNormalizerFactory - class org.apache.nutch.net.UrlNormalizerFactory.
Factory to create a UrlNormalizer from "urlnormalizer.class" config property.
unzip(byte[]) - Static method in class org.apache.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array.
unzipBestEffort(byte[]) - Static method in class org.apache.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array.
unzipBestEffort(byte[], int) - Static method in class org.apache.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array, truncated to sizeLimit bytes, if necessary.
updateBlocks(Block[]) - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
updateForSegment(NutchFileSystem, String) - Method in class org.apache.nutch.tools.UpdateDatabaseTool
Iterate through items in the FetcherOutput.
updateHeartbeat(long, long) - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
updateObsoleteCheck() - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
updateSegments() - Method in class org.apache.nutch.searcher.DistributedSearch.Client
Updates segment names.
updateTaskStatus(String, TaskStatus) - Method in class org.apache.nutch.mapReduce.JobTracker.JobInProgress
 
urlCompare(Object) - Method in class org.apache.nutch.db.Link
Compare URLs, then compare MD5s.

V

VersionMismatchException - exception org.apache.nutch.io.VersionMismatchException.
Thrown by VersionedWritable.readFields(DataInput) when the version of an object being read does not match the current implementation version as returned by VersionedWritable.getVersion().
VersionMismatchException(byte, byte) - Constructor for class org.apache.nutch.io.VersionMismatchException
 
VersionedWritable - class org.apache.nutch.io.VersionedWritable.
A base class for Writables that provides version checking.
VersionedWritable() - Constructor for class org.apache.nutch.io.VersionedWritable
 
value() - Method in class org.apache.nutch.quality.dynamic.PageDescription
 

W

WHITE - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
WORD - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
WORD_PUNCT - Static variable in interface org.apache.nutch.analysis.NutchAnalysisConstants
 
WRITE_COMPLETE - Static variable in interface org.apache.nutch.ndfs.FSConstants
 
WRITE_METAINFO_PREFIX - Static variable in class org.apache.nutch.db.EditSectionWriter
 
WebDBAdminTool - class org.apache.nutch.tools.WebDBAdminTool.
The WebDBAdminTool is for Nutch administrators who need special access to the webdb.
WebDBAdminTool(IWebDBReader) - Constructor for class org.apache.nutch.tools.WebDBAdminTool
 
WebDBAnchors - class org.apache.nutch.db.WebDBAnchors.
Utility that extracts the set of anchor texts for a URL from the database.
WebDBAnchors(IWebDBReader) - Constructor for class org.apache.nutch.db.WebDBAnchors
Construct for the named db.
WebDBInjector - class org.apache.nutch.db.WebDBInjector.
This class takes a flat file of URLs and adds them as entries into a pagedb.
WebDBInjector(IWebDBWriter) - Constructor for class org.apache.nutch.db.WebDBInjector
WebDBInjector takes a reference to a WebDBWriter that it should add to.
WebDBReader - class org.apache.nutch.db.WebDBReader.
The WebDBReader implements all the read-only parts of accessing our web database.
WebDBReader(NutchFileSystem, File) - Constructor for class org.apache.nutch.db.WebDBReader
Open a web db reader for the named directory.
WebDBWriter - class org.apache.nutch.db.WebDBWriter.
This is a wrapper class that allows us to reorder write operations to the linkdb and pagedb.
WebDBWriter(NutchFileSystem, File) - Constructor for class org.apache.nutch.db.WebDBWriter
Create a WebDBWriter.
WebDBWriter.LinkInstruction - class org.apache.nutch.db.WebDBWriter.LinkInstruction.
Holds an instruction over a Link.
WebDBWriter.LinkInstruction() - Constructor for class org.apache.nutch.db.WebDBWriter.LinkInstruction
 
WebDBWriter.LinkInstruction(Link, int) - Constructor for class org.apache.nutch.db.WebDBWriter.LinkInstruction
 
WebDBWriter.LinkInstruction.MD5Comparator - class org.apache.nutch.db.WebDBWriter.LinkInstruction.MD5Comparator.
Sorts the instruction first by Md5, then by opcode.
WebDBWriter.LinkInstruction.MD5Comparator() - Constructor for class org.apache.nutch.db.WebDBWriter.LinkInstruction.MD5Comparator
 
WebDBWriter.LinkInstruction.UrlComparator - class org.apache.nutch.db.WebDBWriter.LinkInstruction.UrlComparator.
Sorts the instruction first by url, then by opcode.
WebDBWriter.LinkInstruction.UrlComparator() - Constructor for class org.apache.nutch.db.WebDBWriter.LinkInstruction.UrlComparator
 
WebDBWriter.LinkInstructionWriter - class org.apache.nutch.db.WebDBWriter.LinkInstructionWriter.
LinkInstructionWriter very efficiently writes a LinkInstruction to a SequenceFile.Writer.
WebDBWriter.LinkInstructionWriter() - Constructor for class org.apache.nutch.db.WebDBWriter.LinkInstructionWriter
 
WebDBWriter.PageInstruction - class org.apache.nutch.db.WebDBWriter.PageInstruction.
PageInstruction holds an operation over a Page.
WebDBWriter.PageInstruction() - Constructor for class org.apache.nutch.db.WebDBWriter.PageInstruction
 
WebDBWriter.PageInstruction(Page, int) - Constructor for class org.apache.nutch.db.WebDBWriter.PageInstruction
 
WebDBWriter.PageInstruction(Page, Link, int) - Constructor for class org.apache.nutch.db.WebDBWriter.PageInstruction
 
WebDBWriter.PageInstruction.PageComparator - class org.apache.nutch.db.WebDBWriter.PageInstruction.PageComparator.
Sorts the instruction first by Page, then by opcode.
WebDBWriter.PageInstruction.PageComparator() - Constructor for class org.apache.nutch.db.WebDBWriter.PageInstruction.PageComparator
 
WebDBWriter.PageInstruction.UrlComparator - class org.apache.nutch.db.WebDBWriter.PageInstruction.UrlComparator.
Sorts the instruction first by url, then by opcode.
WebDBWriter.PageInstruction.UrlComparator() - Constructor for class org.apache.nutch.db.WebDBWriter.PageInstruction.UrlComparator
 
WebDBWriter.PageInstructionWriter - class org.apache.nutch.db.WebDBWriter.PageInstructionWriter.
PageInstructionWriter very efficiently writes a PageInstruction to a SequenceFile.Writer.
WebDBWriter.PageInstructionWriter() - Constructor for class org.apache.nutch.db.WebDBWriter.PageInstructionWriter
 
Word6CHPBinTable - class org.apache.nutch.parse.msword.chp.Word6CHPBinTable.
This class holds all of the character formatting properties from a Word 6.0/95 document.
Word6CHPBinTable(byte[], int, int, int) - Constructor for class org.apache.nutch.parse.msword.chp.Word6CHPBinTable
Constructor used to read a binTable in from a Word document.
WordExtractor - class org.apache.nutch.parse.msword.WordExtractor.
This class extracts the text from a Word 6.0/95/97/2000/XP word doc
WordExtractor() - Constructor for class org.apache.nutch.parse.msword.WordExtractor
Constructor
WordTextBuffer - class org.apache.nutch.parse.msword.WordTextBuffer.
This class acts as a StringBuffer for text from a word document.
WordTextBuffer() - Constructor for class org.apache.nutch.parse.msword.WordTextBuffer
 
Writable - interface org.apache.nutch.io.Writable.
A simple, efficient, serialization protocol, based on DataInput and DataOutput.
WritableComparable - interface org.apache.nutch.io.WritableComparable.
An interface which extends both Writable and Comparable.
WritableComparator - class org.apache.nutch.io.WritableComparator.
A Comparator for WritableComparables.
WritableComparator(Class) - Constructor for class org.apache.nutch.io.WritableComparator
Construct for a WritableComparable implementation.
WritableName - class org.apache.nutch.io.WritableName.
Utility to permit renaming of Writable implementation classes without invalidiating files that contain their class name.
WritableUtils - class org.apache.nutch.io.WritableUtils.
 
WritableUtils() - Constructor for class org.apache.nutch.io.WritableUtils
 
waitForCompletion() - Method in interface org.apache.nutch.mapReduce.RunningJob
Blocks until the job is complete.
walk(Node, URL, Properties) - Static method in class org.creativecommons.nutch.CCParseFilter.Walker
Scan the document adding attributes to metadata.
write(DataOutput) - Method in class org.apache.nutch.db.DistributedWebDBWriter.LinkInstruction
 
write(DataOutput) - Method in class org.apache.nutch.db.DistributedWebDBWriter.PageInstruction
 
write(DataOutput) - Method in class org.apache.nutch.db.Link
Write bytes out to stream
write(DataOutput) - Method in class org.apache.nutch.db.Page
Write the bytes out to the bytestream
write(DataOutput) - Method in class org.apache.nutch.db.WebDBWriter.LinkInstruction
 
write(DataOutput) - Method in class org.apache.nutch.db.WebDBWriter.PageInstruction
 
write(DataOutput) - Method in class org.apache.nutch.fetcher.FetcherOutput
 
write(DataOutput) - Method in class org.apache.nutch.indexer.DeleteDuplicates.IndexedDoc
 
write(DataOutput) - Method in class org.apache.nutch.io.ArrayWritable
 
write(DataOutput) - Method in class org.apache.nutch.io.BooleanWritable
 
write(DataOutput) - Method in class org.apache.nutch.io.BytesWritable
 
write(DataInput, int) - Method in class org.apache.nutch.io.DataOutputBuffer
Writes bytes from a DataInput directly into the buffer.
write(DataOutput) - Method in class org.apache.nutch.io.FloatWritable
 
write(DataOutput) - Method in class org.apache.nutch.io.IntWritable
 
write(DataOutput) - Method in class org.apache.nutch.io.LongWritable
 
write(DataOutput) - Method in class org.apache.nutch.io.MD5Hash
 
write(DataOutput) - Method in class org.apache.nutch.io.NullWritable
 
write(DataOutput) - Method in class org.apache.nutch.io.TwoDArrayWritable
 
write(DataOutput) - Method in class org.apache.nutch.io.UTF8
 
write(DataOutput) - Method in class org.apache.nutch.io.VersionedWritable
 
write(DataOutput) - Method in interface org.apache.nutch.io.Writable
Writes the fields of this object to out.
write(DataOutput) - Method in class org.apache.nutch.linkdb.LinkAnalysisEntry
 
write(DataOutput) - Method in class org.apache.nutch.mapReduce.FileSplit
 
write(DataOutput) - Method in class org.apache.nutch.mapReduce.JobProfile
 
write(DataOutput) - Method in class org.apache.nutch.mapReduce.JobStatus
 
write(DataOutput) - Method in class org.apache.nutch.mapReduce.MapOutputFile
 
write(DataOutput) - Method in class org.apache.nutch.mapReduce.MapOutputLocation
 
write(DataOutput) - Method in class org.apache.nutch.mapReduce.MapTask
 
write(WritableComparable, Writable) - Method in interface org.apache.nutch.mapReduce.RecordWriter
Writes a key/value pair.
write(DataOutput) - Method in class org.apache.nutch.mapReduce.ReduceTask
 
write(DataOutput) - Method in class org.apache.nutch.mapReduce.Task
 
write(DataOutput) - Method in class org.apache.nutch.mapReduce.TaskStatus
 
write(DataOutput) - Method in class org.apache.nutch.mapReduce.TaskTrackerStatus
 
write(DataOutput) - Method in class org.apache.nutch.ndfs.Block
 
write(DataOutput) - Method in class org.apache.nutch.ndfs.DatanodeInfo
 
write(DataOutput) - Method in class org.apache.nutch.ndfs.FSParam
 
write(DataOutput) - Method in class org.apache.nutch.ndfs.FSResults
 
write(DataOutput) - Method in class org.apache.nutch.ndfs.HeartbeatData
 
write(DataOutput) - Method in class org.apache.nutch.ndfs.NDFSFileInfo
 
write(DataOutput) - Method in class org.apache.nutch.pagedb.FetchListEntry
 
write(DataOutput) - Method in class org.apache.nutch.parse.Outlink
 
write(DataOutput) - Method in class org.apache.nutch.parse.ParseData
 
write(DataOutput) - Method in class org.apache.nutch.parse.ParseStatus
 
write(DataOutput) - Method in class org.apache.nutch.parse.ParseText
 
write(DataOutput) - Method in class org.apache.nutch.protocol.Content
 
write(DataOutput) - Method in class org.apache.nutch.protocol.ProtocolStatus
 
write(DataOutput) - Method in class org.apache.nutch.searcher.Hit
 
write(DataOutput) - Method in class org.apache.nutch.searcher.HitDetails
 
write(DataOutput) - Method in class org.apache.nutch.searcher.Hits
 
write(DataOutput) - Method in class org.apache.nutch.searcher.Query.Clause
 
write(DataOutput) - Method in class org.apache.nutch.searcher.Query.Phrase
 
write(DataOutput) - Method in class org.apache.nutch.searcher.Query.Term
 
write(DataOutput) - Method in class org.apache.nutch.searcher.Query
 
write(DataOutput) - Method in class org.apache.nutch.tools.FetchListTool.SortableScore
 
write(DataOutput) - Method in class org.apache.nutch.tools.UpdateSegmentsFromDb.SegmentPage
 
write(DataOutput) - Method in class org.apache.nutch.tools.UpdateSegmentsFromDb.Update
 
write(OutputStream) - Method in class org.apache.nutch.util.NutchConf
Writes non-default properties in this configuration.
writeCompressedByteArray(DataOutput, byte[]) - Static method in class org.apache.nutch.io.WritableUtils
 
writeCompressedString(DataOutput, String) - Static method in class org.apache.nutch.io.WritableUtils
 
writeCompressedStringArray(DataOutput, String[]) - Static method in class org.apache.nutch.io.WritableUtils
 
writeString(DataOutput, String) - Static method in class org.apache.nutch.io.UTF8
Write a UTF-8 encoded string.
writeString(DataOutput, String) - Static method in class org.apache.nutch.io.WritableUtils
 
writeStringArray(DataOutput, String[]) - Static method in class org.apache.nutch.io.WritableUtils
 
writeToBlock(Block) - Method in class org.apache.nutch.ndfs.FSDataset
Start writing to a block file

X

XMLCharacterRecognizer - class org.apache.nutch.parse.html.XMLCharacterRecognizer.
Class used to verify whether the specified ch conforms to the XML 1.0 definition of whitespace.
XMLCharacterRecognizer() - Constructor for class org.apache.nutch.parse.html.XMLCharacterRecognizer
 
X_POINT_ID - Static variable in interface org.apache.nutch.clustering.OnlineClusterer
The name of the extension point.
X_POINT_ID - Static variable in interface org.apache.nutch.indexer.IndexingFilter
The name of the extension point.
X_POINT_ID - Static variable in interface org.apache.nutch.net.URLFilter
The name of the extension point.
X_POINT_ID - Static variable in interface org.apache.nutch.ontology.Ontology
The name of the extension point.
X_POINT_ID - Static variable in interface org.apache.nutch.parse.HtmlParseFilter
The name of the extension point.
X_POINT_ID - Static variable in interface org.apache.nutch.parse.Parser
The name of the extension point.
X_POINT_ID - Static variable in interface org.apache.nutch.protocol.Protocol
The name of the extension point.
X_POINT_ID - Static variable in interface org.apache.nutch.searcher.QueryFilter
The name of the extension point.

Z

zip(byte[]) - Static method in class org.apache.nutch.util.GZIPUtils
Returns an gzipped copy of the input array.

_

__openPassiveDataConnection(int, String) - Method in class org.apache.nutch.protocol.ftp.Client
 

A B C D E F G H I J K L M N O P Q R S T U V W X Z _

Copyright © 2005 The Apache Software Foundation