A B C D E F G H I J K L M N O P Q R S T U V W X Z _

A

ACRONYM - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
AFTER_EQUALS - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
ANCHOR_ANALYZER - Static variable in class net.nutch.analysis.NutchDocumentAnalyzer
Analyzer used to analyze anchors.
APOSTROPHE - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
ATSIGN - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
ArrayFile - class net.nutch.io.ArrayFile.
A dense file-based mapping from integers to values.
ArrayFile() - Constructor for class net.nutch.io.ArrayFile
 
ArrayFile.Reader - class net.nutch.io.ArrayFile.Reader.
Provide access to an existing array file.
ArrayFile.Reader(NutchFileSystem, String) - Constructor for class net.nutch.io.ArrayFile.Reader
Construct an array reader for the named file.
ArrayFile.Writer - class net.nutch.io.ArrayFile.Writer.
Write a new array file.
ArrayFile.Writer(NutchFileSystem, String, Class) - Constructor for class net.nutch.io.ArrayFile.Writer
Create the named file for values of the named class.
ArrayWritable - class net.nutch.io.ArrayWritable.
A Writable for arrays containing instances of a class.
ArrayWritable(Class) - Constructor for class net.nutch.io.ArrayWritable
 
ArrayWritable(Class, Writable[]) - Constructor for class net.nutch.io.ArrayWritable
 
ArrayWritable(String[]) - Constructor for class net.nutch.io.ArrayWritable
 
abandonBlock(Block, UTF8) - Method in class net.nutch.ndfs.FSNamesystem
The client would like to let go of the given block
add(Summary.Fragment) - Method in class net.nutch.searcher.Summary
Adds a fragment to a summary.
add(Object, int) - Method in class net.nutch.util.FibonacciHeap
Adds the Object item, with the supplied priority.
addAttribute(String, String) - Method in class net.nutch.plugin.Extension
Adds a attribute and is only used until model creation at plugin system start up.
addBlock(Block) - Method in class net.nutch.ndfs.DatanodeInfo
 
addConfResource(String) - Static method in class net.nutch.util.NutchConf
Adds a resource name to the chain of resources read.
addDependency(String) - Method in class net.nutch.plugin.PluginDescriptor
Adds a dependency
addEscapes(String) - Static method in class net.nutch.quality.dynamic.TokenMgrError
Replaces unprintable characters by their espaced (or unicode escaped) equivalents in the given string
addExportedLibRelative(String) - Method in class net.nutch.plugin.PluginDescriptor
Adds a exported library with a relative path to the plugin directory.
addExtension(Extension) - Method in class net.nutch.plugin.ExtensionPoint
Install a coresponding extension to this extension point.
addExtension(Extension) - Method in class net.nutch.plugin.PluginDescriptor
Adds a extension.
addExtensionPoint(ExtensionPoint) - Method in class net.nutch.plugin.PluginDescriptor
Adds a extension point.
addFile(UTF8, Block[]) - Method in class net.nutch.ndfs.FSDirectory
Add the given filename to the fs.
addFinalizationListener(SoftHashMap.FinalizationListener) - Method in interface net.nutch.util.SoftHashMap.FinalizationNotifier
Registers a SoftHashMap.FinalizationListener for this object.
addFromToken(Token) - Method in class net.nutch.analysis.lang.NGramProfile
Add ngrams from a token to this profile
addJob(Runnable) - Method in class net.nutch.util.ThreadPool
Post a Runnable to the queue.
addLink(Link) - Method in class net.nutch.db.DistributedWebDBWriter
Add a link to the link database
addLink(Link) - Method in interface net.nutch.db.IWebDBWriter
addLink(Link) will add the given Link to the webdb.
addLink(Link) - Method in class net.nutch.db.WebDBWriter
Add a link to the link database
addNGrams(StringBuffer) - Method in class net.nutch.analysis.lang.NGramProfile
Add ngrams from a single word to this profile
addName(Class, String) - Static method in class net.nutch.io.WritableName
Add an alternate name for a class.
addNotExportedLibRelative(String) - Method in class net.nutch.plugin.PluginDescriptor
Adds a not exported library with a plugin directory relativ path.
addPage(Page) - Method in class net.nutch.db.DistributedWebDBWriter
Add a page to the page database
addPage(Page) - Method in interface net.nutch.db.IWebDBWriter
addPage(Page page) will insert a Page object into the webdb.
addPage(Page) - Method in class net.nutch.db.WebDBWriter
Add a page to the page database
addPageIfNotPresent(Page) - Method in class net.nutch.db.DistributedWebDBWriter
Don't replace the one in the database, if there is one.
addPageIfNotPresent(Page, Link) - Method in class net.nutch.db.DistributedWebDBWriter
Don't replace the one in the database, if there is one.
addPageIfNotPresent(Page) - Method in interface net.nutch.db.IWebDBWriter
addPageIfNotPresent(Page) works just like addPage(), except that the insertion will not take place if there is already a Page with that URL in the webdb.
addPageIfNotPresent(Page, Link) - Method in interface net.nutch.db.IWebDBWriter
addPageIfNotPresent(Page, Link) works just like the above addPage(), except that a Link is also conditionally added to the webdb.
addPageIfNotPresent(Page) - Method in class net.nutch.db.WebDBWriter
Don't replace the one in the database, if there is one.
addPageIfNotPresent(Page, Link) - Method in class net.nutch.db.WebDBWriter
Don't replace the one in the database, if there is one.
addPageWithScore(Page) - Method in class net.nutch.db.DistributedWebDBWriter
Add a page to the page database, with a brand-new score
addPageWithScore(Page) - Method in interface net.nutch.db.IWebDBWriter
addPageWithScore(Page page) inserts a Page into the webdb.
addPageWithScore(Page) - Method in class net.nutch.db.WebDBWriter
Add a page to the page database, with a brand-new score
addPatternBackward(String) - Method in class net.nutch.util.TrieStringMatcher
Adds any necessary nodes to the trie so that the given String can be decoded in reverse and the first character is represented by a terminal node.
addPatternForward(String) - Method in class net.nutch.util.TrieStringMatcher
Adds any necessary nodes to the trie so that the given String can be decoded and the last character is represented by a terminal node.
addProhibitedPhrase(String[]) - Method in class net.nutch.searcher.Query
Add a prohibited phrase in the default field.
addProhibitedPhrase(String[], String) - Method in class net.nutch.searcher.Query
Add a prohibited phrase in the specified field.
addProhibitedTerm(String) - Method in class net.nutch.searcher.Query
Add a prohibited term in the default field.
addProhibitedTerm(String, String) - Method in class net.nutch.searcher.Query
Add a prohibited term in the specified field.
addRequiredPhrase(String[]) - Method in class net.nutch.searcher.Query
Add a required phrase in the default field.
addRequiredPhrase(String[], String) - Method in class net.nutch.searcher.Query
Add a required phrase in the specified field.
addRequiredTerm(String) - Method in class net.nutch.searcher.Query
Add a required term in the default field.
addRequiredTerm(String, String) - Method in class net.nutch.searcher.Query
Add a required term in a specified field.
addScore(float) - Method in class net.nutch.util.ScoreStats
Increment the counter in the right place.
addSearchTerm(String, OntResource) - Static method in class net.nutch.ontology.OntologyImpl
 
addUrlFeatures(Document, String) - Method in class org.creativecommons.nutch.CCIndexingFilter
Add the features represented by a license URL.
add_escapes(String) - Method in class net.nutch.quality.dynamic.ParseException
Used to convert raw characters to their escaped version when these raw version cannot be used as part of an ASCII string literal.
adjustBeginLineColumn(int, int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
Method to adjust line and column numbers for the start of a token.
analyze(StringBuffer) - Method in class net.nutch.analysis.lang.NGramProfile
Analyze a piece of text
append(WritableComparable, Writable) - Method in class net.nutch.db.EditSectionGroupWriter
Add an instruction and append it.
append(WritableComparable, Writable) - Method in class net.nutch.db.EditSectionWriter
Add a key/val pair
append(Writable) - Method in class net.nutch.io.ArrayFile.Writer
Append a value to the file.
append(WritableComparable, Writable) - Method in class net.nutch.io.MapFile.Writer
Append a key/value pair to the map.
append(Writable, Writable) - Method in class net.nutch.io.SequenceFile.Writer
Append a key/value pair.
append(byte[], int, int, int) - Method in class net.nutch.io.SequenceFile.Writer
Append a key/value pair.
append(WritableComparable) - Method in class net.nutch.io.SetFile.Writer
Append a key to a set.
append(String) - Method in class net.nutch.parse.msword.WordTextBuffer
 
append(FetcherOutput, Content, ParseText, ParseData) - Method in class net.nutch.segment.SegmentWriter
Append new values to the output segment.
appendInstructionInfo(EditSectionGroupWriter, Link, int, Writable) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstructionWriter
Append the LinkInstruction info to the indicated SequenceFile and keep the LI for later reuse.
appendInstructionInfo(EditSectionGroupWriter, Page, int, Writable) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstructionWriter
Append the PageInstruction info to the indicated SequenceFile, and keep the PI for later reuse.
appendInstructionInfo(EditSectionGroupWriter, Page, Link, int, Writable) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstructionWriter
Append the PageInstruction info to the indicated SequenceFile, and keep the PI for later reuse.
appendInstructionInfo(SequenceFile.Writer, Link, int, Writable) - Method in class net.nutch.db.WebDBWriter.LinkInstructionWriter
Append the LinkInstruction info to the indicated SequenceFile and keep the LI for later reuse.
appendInstructionInfo(SequenceFile.Writer, Page, int, Writable) - Method in class net.nutch.db.WebDBWriter.PageInstructionWriter
Append the PageInstruction info to the indicated SequenceFile, and keep the PI for later reuse.
appendInstructionInfo(SequenceFile.Writer, Page, Link, int, Writable) - Method in class net.nutch.db.WebDBWriter.PageInstructionWriter
Append the PageInstruction info to the indicated SequenceFile, and keep the PI for later reuse.
attrName - Variable in class net.nutch.parse.html.DOMContentUtils.LinkParams
 

B

BLOCKREPORT_INTERVAL - Static variable in interface net.nutch.ndfs.FSConstants
 
BLOCK_SIZE - Static variable in interface net.nutch.ndfs.FSConstants
 
BasicIndexingFilter - class net.nutch.indexer.basic.BasicIndexingFilter.
Adds basic searchable fields to a document.
BasicIndexingFilter() - Constructor for class net.nutch.indexer.basic.BasicIndexingFilter
 
BasicUrlNormalizer - class net.nutch.net.BasicUrlNormalizer.
Converts URLs to a normal form .
BasicUrlNormalizer() - Constructor for class net.nutch.net.BasicUrlNormalizer
 
BeginToken() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
Block - class net.nutch.ndfs.Block.
A Block is a Nutch FS primitive, identified by a long.
Block() - Constructor for class net.nutch.ndfs.Block
 
Block(long, long) - Constructor for class net.nutch.ndfs.Block
 
Block(File, long) - Constructor for class net.nutch.ndfs.Block
Find the blockid from the given filename
BooleanWritable - class net.nutch.io.BooleanWritable.
A WritableComparable for booleans.
BooleanWritable() - Constructor for class net.nutch.io.BooleanWritable
 
BooleanWritable(boolean) - Constructor for class net.nutch.io.BooleanWritable
 
BooleanWritable.Comparator - class net.nutch.io.BooleanWritable.Comparator.
A Comparator optimized for BooleanWritable.
BooleanWritable.Comparator() - Constructor for class net.nutch.io.BooleanWritable.Comparator
 
BytesWritable - class net.nutch.io.BytesWritable.
A Writable for byte arrays.
BytesWritable() - Constructor for class net.nutch.io.BytesWritable
 
BytesWritable(byte[]) - Constructor for class net.nutch.io.BytesWritable
 
backup(int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
beginColumn - Variable in class net.nutch.quality.dynamic.Token
beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.
beginLine - Variable in class net.nutch.quality.dynamic.Token
beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.
blockReceived(Block, UTF8) - Method in class net.nutch.ndfs.FSNamesystem
The given node is reporting that it received a certain block.
bufcolumn - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
buffer - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
bufline - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
bufpos - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 

C

CANT_PARSE - Static variable in class net.nutch.fetcher.FetcherOutput
 
CCDeleteUnlicensedTool - class org.creativecommons.nutch.CCDeleteUnlicensedTool.
Deletes documents in a set of Lucene indexes that do not have a Creative Commons license.
CCDeleteUnlicensedTool(IndexReader[]) - Constructor for class org.creativecommons.nutch.CCDeleteUnlicensedTool
Constructs a duplicate detector for the provided indexes.
CCIndexingFilter - class org.creativecommons.nutch.CCIndexingFilter.
Adds basic searchable fields to a document.
CCIndexingFilter() - Constructor for class org.creativecommons.nutch.CCIndexingFilter
 
CCParseFilter - class org.creativecommons.nutch.CCParseFilter.
Adds metadata identifying the Creative Commons license used, if any.
CCParseFilter() - Constructor for class org.creativecommons.nutch.CCParseFilter
 
CCParseFilter.Walker - class org.creativecommons.nutch.CCParseFilter.Walker.
Walks DOM tree, looking for RDF in comments and licenses in anchors.
CCQueryFilter - class org.creativecommons.nutch.CCQueryFilter.
Handles "cc:" query clauses, causing them to search the "cc" field indexed by CCIndexingFilter.
CCQueryFilter() - Constructor for class org.creativecommons.nutch.CCQueryFilter
 
CHUNKED_ENCODING - Static variable in interface net.nutch.ndfs.FSConstants
 
CJK - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
COLON - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
COMPLETE_SUCCESS - Static variable in interface net.nutch.ndfs.FSConstants
 
CONTENT_ANALYZER - Static variable in class net.nutch.analysis.NutchDocumentAnalyzer
Analyzer used to index textual content.
C_PLUS_PLUS - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
C_SHARP - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
Client - class net.nutch.ipc.Client.
A client for an IPC service.
Client(Class) - Constructor for class net.nutch.ipc.Client
Construct an IPC client whose values are of the given Writable class.
Client - class net.nutch.protocol.ftp.Client.
Client.java encapsulates functionalities necessary for nutch to get dir list and retrieve file from an FTP server.
Client() - Constructor for class net.nutch.protocol.ftp.Client
 
Clusterer - class net.nutch.clustering.carrot2.Clusterer.
An plugin providing an implementation of OnlineClusterer extension using clustering components of the Carrot2 project (http://carrot2.sourceforge.net).
Clusterer() - Constructor for class net.nutch.clustering.carrot2.Clusterer
An empty public constructor for making new instances of the clusterer.
CommandRunner - class net.nutch.util.CommandRunner.
 
CommandRunner() - Constructor for class net.nutch.util.CommandRunner
 
CommonGrams - class net.nutch.analysis.CommonGrams.
Construct n-grams for frequently occuring terms and phrases while indexing.
Content - class net.nutch.protocol.Content.
 
Content() - Constructor for class net.nutch.protocol.Content
 
Content(String, String, byte[], String, Properties) - Constructor for class net.nutch.protocol.Content
 
CrawlTool - class net.nutch.tools.CrawlTool.
 
CrawlTool() - Constructor for class net.nutch.tools.CrawlTool
 
call(Writable, InetSocketAddress) - Method in class net.nutch.ipc.Client
Make a call, passing param, to the IPC server running at address, returning the value.
call(Writable[], InetSocketAddress[]) - Method in class net.nutch.ipc.Client
Makes a set of calls in parallel.
call(Writable) - Method in class net.nutch.ipc.Server
Called for each call.
call(Writable) - Method in class net.nutch.ndfs.NDFS.NameNode
This method implements the call invoked by client.
call(Writable) - Method in class net.nutch.searcher.DistributedSearch.Server
 
canRead() - Method in class net.nutch.ndfs.NDFSFile
A number of File methods are unsupported in this subclass
canWrite() - Method in class net.nutch.ndfs.NDFSFile
 
checkObsoleteBlocks(UTF8) - Method in class net.nutch.ndfs.FSNamesystem
If the node has not been checked in some time, go through its blocks and find which ones are neither valid nor pending.
childLen - Variable in class net.nutch.parse.html.DOMContentUtils.LinkParams
 
children - Variable in class net.nutch.util.TrieStringMatcher.TrieNode
 
childrenList - Variable in class net.nutch.util.TrieStringMatcher.TrieNode
 
clear() - Method in class net.nutch.util.SoftHashMap
 
clone() - Method in class net.nutch.db.Page
 
clone() - Method in class net.nutch.pagedb.FetchListEntry
 
clone() - Method in class net.nutch.searcher.Query.Clause
 
clone() - Method in class net.nutch.searcher.Query
 
close() - Method in class net.nutch.db.DBSectionReader
 
close() - Method in class net.nutch.db.DistributedWebDBReader
Shutdown
close() - Method in class net.nutch.db.DistributedWebDBWriter
Shutdown
close() - Method in class net.nutch.db.EditSectionGroupWriter
Close down the writers
close() - Method in class net.nutch.db.EditSectionWriter
Close down the EditSectionWriter.
close() - Method in interface net.nutch.db.IWebDBReader
Done reading.
close() - Method in interface net.nutch.db.IWebDBWriter
Flush and complete all writes to the db.
close() - Method in class net.nutch.db.WebDBInjector
Close dbWriter and save changes
close() - Method in class net.nutch.db.WebDBReader
Shutdown
close() - Method in class net.nutch.db.WebDBWriter
Shutdown
close() - Method in class net.nutch.fs.LocalFileSystem
Shut down the FS.
close() - Method in class net.nutch.fs.NDFSFileSystem
Shut down the FS.
close() - Method in class net.nutch.fs.NutchFileSystem
No more filesystem operations are needed.
close() - Method in class net.nutch.indexer.DeleteDuplicates
Closes the indexes, saving changes.
close() - Method in class net.nutch.io.MapFile.Reader
Close the map.
close() - Method in class net.nutch.io.MapFile.Writer
Close the map.
close() - Method in class net.nutch.io.SequenceFile.Reader
Close the file.
close() - Method in class net.nutch.io.SequenceFile.Writer
Close the file.
close() - Method in class net.nutch.ndfs.FSDirectory
Shutdown the filestore
close() - Method in class net.nutch.ndfs.FSNamesystem
 
close() - Method in class net.nutch.ndfs.NDFSClient
 
close() - Method in class net.nutch.segment.SegmentReader
Close all readers.
close() - Method in class net.nutch.segment.SegmentWriter
Close all writers.
close() - Method in class net.nutch.tools.PruneIndexTool.PrintFieldsChecker
 
close() - Method in interface net.nutch.tools.PruneIndexTool.PruneChecker
Close the checker - this could involve flushing output files or somesuch.
close() - Method in class net.nutch.tools.PruneIndexTool.StoreUrlsChecker
 
close() - Method in class net.nutch.tools.UpdateDatabaseTool
Shut everything down.
close() - Method in class org.creativecommons.nutch.CCDeleteUnlicensedTool
Closes the indexes, saving changes.
closeGroup(int) - Method in class net.nutch.parse.rtf.RTFParserDelegateImpl
 
clusterHits(HitDetails[], String[]) - Method in interface net.nutch.clustering.OnlineClusterer
Clusters an array of hits (HitDetails objects) and their previously extracted summaries (Strings).
clusterHits(HitDetails[], String[]) - Method in class net.nutch.clustering.carrot2.Clusterer
See OnlineClusterer for documentation.
collect(WritableComparable, Writable) - Method in interface net.nutch.mapReduce.OutputCollector
Adds a key/value pair to the output.
column - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction.MD5Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction.MD5Comparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction.UrlComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction.UrlComparator
Optimized comparator.
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction.PageComparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction.UrlComparator
We need to sort by ordered URLs.
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction.UrlComparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.Link.MD5Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.Link.MD5Comparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.Link.UrlComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.Link.UrlComparator
Optimized comparator.
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.Page.Comparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.Page.UrlComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.Page.UrlComparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.WebDBWriter.LinkInstruction.MD5Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.WebDBWriter.LinkInstruction.MD5Comparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.WebDBWriter.LinkInstruction.UrlComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.WebDBWriter.LinkInstruction.UrlComparator
Optimized comparator.
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.WebDBWriter.PageInstruction.PageComparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.WebDBWriter.PageInstruction.UrlComparator
We need to sort by ordered URLs.
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.WebDBWriter.PageInstruction.UrlComparator
Optimized comparator.
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashDoc
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashScore
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.io.BooleanWritable.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.io.IntWritable.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.io.LongWritable.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.io.MD5Hash.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.io.UTF8.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.io.WritableComparator
Optimization hook.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.io.WritableComparator
Compare two WritableComparables.
compareBytes(byte[], int, int, byte[], int, int) - Static method in class net.nutch.io.WritableComparator
Lexicographic order of binary data.
compareTo(Object) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
compareTo(Object) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
compareTo(Object) - Method in class net.nutch.db.Link
 
compareTo(Object) - Method in class net.nutch.db.Page
Compare to another Page object
compareTo(Object) - Method in class net.nutch.db.WebDBWriter.LinkInstruction
 
compareTo(Object) - Method in class net.nutch.db.WebDBWriter.PageInstruction
 
compareTo(Object) - Method in class net.nutch.indexer.DeleteDuplicates.IndexedDoc
 
compareTo(Object) - Method in class net.nutch.io.BooleanWritable
 
compareTo(Object) - Method in class net.nutch.io.IntWritable
Compares two IntWritables.
compareTo(Object) - Method in class net.nutch.io.LongWritable
Compares two LongWritables.
compareTo(Object) - Method in class net.nutch.io.MD5Hash
Compares this object with the specified object for order.
compareTo(Object) - Method in class net.nutch.io.UTF8
Compare two UTF8s.
compareTo(Object) - Method in class net.nutch.ndfs.Block
 
compareTo(Object) - Method in class net.nutch.ndfs.DatanodeInfo
 
compareTo(Object) - Method in class net.nutch.searcher.Hit
 
compareTo(Object) - Method in class net.nutch.tools.FetchListTool.SortableScore
Sort them in descending order!
compareTo(Object) - Method in class net.nutch.util.TrieStringMatcher.TrieNode
 
completeFile(UTF8, UTF8) - Method in class net.nutch.ndfs.FSNamesystem
Finalize the created file and make it world-accessible.
completeLocalInput(File) - Method in class net.nutch.fs.LocalFileSystem
We're done reading.
completeLocalInput(File) - Method in class net.nutch.fs.NDFSFileSystem
We're done with the local stuff, so delete it
completeLocalInput(File) - Method in class net.nutch.fs.NutchFileSystem
Called when we're all done writing to the target.
completeLocalOutput(File, File) - Method in class net.nutch.fs.LocalFileSystem
It's in the right place - nothing to do.
completeLocalOutput(File, File) - Method in class net.nutch.fs.NDFSFileSystem
Move completed local data to NDFS destination
completeLocalOutput(File, File) - Method in class net.nutch.fs.NutchFileSystem
Called when we're all done writing to the target.
completeRound(File, File) - Method in class net.nutch.tools.DistributedAnalysisTool
This method collates and executes all the instructions computed by the many executors of computeRound().
compound(String) - Method in class net.nutch.analysis.NutchAnalysis
Parse a compound term that is interpreted as an implicit phrase query.
computeDomainID() - Method in class net.nutch.db.Page
Compute domain ID from URL
computeRound(int, File) - Method in class net.nutch.tools.DistributedAnalysisTool
This method is invoked by one of the many processes involved in LinkAnalysis.
contains(Object) - Method in class net.nutch.util.FibonacciHeap
Returns true if item exists in this FibonacciHeap, false otherwise.
containsKey(Object) - Method in class net.nutch.util.SoftHashMap
Returns true if this map contains a mapping for the specified key.
containsValue(Object) - Method in class net.nutch.util.SoftHashMap
Not Implemented Note that the finalizer may invalidate the result an implementation would return.
contentReader - Variable in class net.nutch.segment.SegmentReader
 
contentWriter - Variable in class net.nutch.segment.SegmentWriter
 
controlSymbol(String, int) - Method in class net.nutch.parse.rtf.RTFParserDelegateImpl
 
controlWord(String, int, int) - Method in class net.nutch.parse.rtf.RTFParserDelegateImpl
 
coord(int, int) - Method in class net.nutch.indexer.NutchSimilarity
 
copy(String, String) - Method in class net.nutch.fs.TestClient
Copy an NDFS file
copyContents(NutchFileSystem, File, File, boolean) - Static method in class net.nutch.fs.FileUtil
Copy a file's contents to a new location.
copyFromLocalFile(File, File) - Method in class net.nutch.fs.LocalFileSystem
Similar to moveFromLocalFile(), except the source is kept intact.
copyFromLocalFile(File, File) - Method in class net.nutch.fs.NDFSFileSystem
keep the src when finished.
copyFromLocalFile(File, File) - Method in class net.nutch.fs.NutchFileSystem
The src file is on the local disk.
copyToLocalFile(File, File) - Method in class net.nutch.fs.LocalFileSystem
We can't delete the src file in this case.
copyToLocalFile(File, File) - Method in class net.nutch.fs.NDFSFileSystem
Takes a hierarchy of files from the NFS system and writes to the given local target.
copyToLocalFile(File, File) - Method in class net.nutch.fs.NutchFileSystem
The src file is under NFS2, and the dst is on the local disk.
create(File) - Method in class net.nutch.fs.LocalFileSystem
Create the file at f.
create(File, boolean) - Method in class net.nutch.fs.LocalFileSystem
 
create(File) - Method in class net.nutch.fs.NDFSFileSystem
Create the file at f.
create(File, boolean) - Method in class net.nutch.fs.NDFSFileSystem
 
create(File) - Method in class net.nutch.fs.NutchFileSystem
Opens an OutputStream at the indicated File, whether local or via NDFS.
create(File, boolean) - Method in class net.nutch.fs.NutchFileSystem
 
create(UTF8) - Method in class net.nutch.ndfs.NDFSClient
Create an output stream that writes to all the right places.
create(UTF8, boolean) - Method in class net.nutch.ndfs.NDFSClient
 
createDB(NutchFileSystem, File, int) - Static method in class net.nutch.db.DistributedWebDBWriter
Method useful for the first time we create a distributed db project.
createEditGroup(NutchFileSystem, File, String, int, int) - Static method in class net.nutch.db.EditSectionGroupWriter
Initialize an EditSectionGroup.
createKey() - Method in interface net.nutch.mapReduce.RecordReader
Constructs a key suitable to pass as the first parameter to RecordReader.next(Writable,Writable).
createNewFile(File) - Method in class net.nutch.fs.NutchFileSystem
Creates the given File as a brand-new zero-length file.
createNewFile() - Method in class net.nutch.ndfs.NDFSFile
 
createNgramProfile(String, InputStream, String) - Static method in class net.nutch.analysis.lang.NGramProfile
Create a new Language profile from (preferably quite large) text file
createSocketAddr(String) - Static method in class net.nutch.ndfs.NDFS
Util method to build socket addr from string
createTempFile(String, String, File) - Method in class net.nutch.fs.LocalFileSystem
Create a temp file by just calling the Java method
createTempFile(String, String, File) - Method in class net.nutch.fs.NDFSFileSystem
Create a temp working file, on the remote ndfs disk
createTempFile(String, String, File) - Method in class net.nutch.fs.NutchFileSystem
Create an empty File in the given directory (or /tmp, if directory is null) using the given prefix and suffix to guide name generation.
createValue() - Method in interface net.nutch.mapReduce.RecordReader
Constructs a value suitable to pass as the second parameter to RecordReader.next(Writable,Writable).
createWebDB(NutchFileSystem, File) - Static method in class net.nutch.db.WebDBWriter
Create the WebDB for the first time.
curChar - Variable in class net.nutch.analysis.NutchAnalysisTokenManager
 
curChar - Variable in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
curTime - Variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
currentToken - Variable in class net.nutch.quality.dynamic.ParseException
This is the last token that has been consumed successfully.

D

DATANODE_STARTUP_PERIOD - Static variable in interface net.nutch.ndfs.FSConstants
 
DATA_FILE_NAME - Static variable in class net.nutch.io.MapFile
The name of the data file.
DBKeyDivision - class net.nutch.db.DBKeyDivision.
DBKeyDivision exists for other DB classes to figure out how to find the right distributed-DB section.
DBKeyDivision() - Constructor for class net.nutch.db.DBKeyDivision
 
DBSectionReader - class net.nutch.db.DBSectionReader.
DBSectionReader reads a discrete portion of a WebDB.
DBSectionReader(NutchFileSystem, File, WritableComparator) - Constructor for class net.nutch.db.DBSectionReader
Right now we assume we're getting a File that is a MapFile.Reader directory.
DEFAULT - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
DEFAULT - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
DEFAULT_FIELD - Static variable in class net.nutch.searcher.Query.Clause
 
DELIMITER_SEARCHTERM - Static variable in class net.nutch.ontology.OntologyImpl
 
DIGIT - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
DIR_NAME - Static variable in class net.nutch.fetcher.FetcherOutput
 
DIR_NAME - Static variable in class net.nutch.pagedb.FetchListEntry
 
DIR_NAME - Static variable in class net.nutch.parse.ParseData
 
DIR_NAME - Static variable in class net.nutch.parse.ParseText
 
DIR_NAME - Static variable in class net.nutch.protocol.Content
 
DIR_NAME_NP - Static variable in class net.nutch.fetcher.FetcherOutput
 
DOMContentUtils - class net.nutch.parse.html.DOMContentUtils.
A collection of methods for extracting content from DOM trees.
DOMContentUtils() - Constructor for class net.nutch.parse.html.DOMContentUtils
 
DOMContentUtils.LinkParams - class net.nutch.parse.html.DOMContentUtils.LinkParams.
 
DOMContentUtils.LinkParams(String, String, int) - Constructor for class net.nutch.parse.html.DOMContentUtils.LinkParams
 
DONE_NAME - Static variable in class net.nutch.fetcher.FetcherOutput
 
DONE_NAME - Static variable in class net.nutch.indexer.IndexMerger
 
DONE_NAME - Static variable in class net.nutch.indexer.IndexOptimizer
 
DONE_NAME - Static variable in class net.nutch.indexer.IndexSegment
 
DOT - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
DataInputBuffer - class net.nutch.io.DataInputBuffer.
A reusable DataInput implementation that reads from an in-memory buffer.
DataInputBuffer() - Constructor for class net.nutch.io.DataInputBuffer
Constructs a new empty buffer.
DataOutputBuffer - class net.nutch.io.DataOutputBuffer.
A reusable DataOutput implementation that writes to an in-memory buffer.
DataOutputBuffer() - Constructor for class net.nutch.io.DataOutputBuffer
Constructs a new empty buffer.
DatanodeInfo - class net.nutch.ndfs.DatanodeInfo.
DatanodeInfo tracks stats on a given node
DatanodeInfo() - Constructor for class net.nutch.ndfs.DatanodeInfo
 
DatanodeInfo(UTF8) - Constructor for class net.nutch.ndfs.DatanodeInfo
 
DatanodeInfo(UTF8, long, long) - Constructor for class net.nutch.ndfs.DatanodeInfo
 
DefaultMapper - class net.nutch.mapReduce.DefaultMapper.
The default Mapper.
DefaultMapper() - Constructor for class net.nutch.mapReduce.DefaultMapper
 
DefaultPartitioner - class net.nutch.mapReduce.DefaultPartitioner.
The default Partitioner.
DefaultPartitioner() - Constructor for class net.nutch.mapReduce.DefaultPartitioner
 
DefaultReducer - class net.nutch.mapReduce.DefaultReducer.
The default Reducer.
DefaultReducer() - Constructor for class net.nutch.mapReduce.DefaultReducer
 
DeleteDuplicates - class net.nutch.indexer.DeleteDuplicates.
Deletes duplicate documents in a set of Lucene indexes.
DeleteDuplicates(IndexReader[], File) - Constructor for class net.nutch.indexer.DeleteDuplicates
Constructs a duplicate detector for the provided indexes.
DeleteDuplicates.IndexedDoc - class net.nutch.indexer.DeleteDuplicates.IndexedDoc.
The key used in sorting for duplicates.
DeleteDuplicates.IndexedDoc() - Constructor for class net.nutch.indexer.DeleteDuplicates.IndexedDoc
 
DeleteDuplicates.IndexedDoc.ByHashDoc - class net.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashDoc.
Order equal hashes by decreasing index and document.
DeleteDuplicates.IndexedDoc.ByHashDoc() - Constructor for class net.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashDoc
 
DeleteDuplicates.IndexedDoc.ByHashScore - class net.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashScore.
Order equal hashes by decreasing score and increasing urlLen.
DeleteDuplicates.IndexedDoc.ByHashScore() - Constructor for class net.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashScore
 
DistributedAnalysisTool - class net.nutch.tools.DistributedAnalysisTool.
DistributedAnalysisTool performs link-analysis by reading exclusively from a IWebDBReader, and writing to an IWebDBWriter.
DistributedAnalysisTool(NutchFileSystem, File) - Constructor for class net.nutch.tools.DistributedAnalysisTool
Give the pagedb and linkdb files and their cache sizes
DistributedSearch - class net.nutch.searcher.DistributedSearch.
Implements the search API over IPC connnections.
DistributedSearch.Client - class net.nutch.searcher.DistributedSearch.Client.
The search client.
DistributedSearch.Client(File) - Constructor for class net.nutch.searcher.DistributedSearch.Client
Construct a client talking to servers listed in the named file.
DistributedSearch.Client(InetSocketAddress[]) - Constructor for class net.nutch.searcher.DistributedSearch.Client
Construct a client talking to the named servers.
DistributedSearch.Param - class net.nutch.searcher.DistributedSearch.Param.
The parameter passed with IPC requests.
DistributedSearch.Param() - Constructor for class net.nutch.searcher.DistributedSearch.Param
 
DistributedSearch.Result - class net.nutch.searcher.DistributedSearch.Result.
The parameter returned with IPC responses.
DistributedSearch.Result() - Constructor for class net.nutch.searcher.DistributedSearch.Result
 
DistributedSearch.Server - class net.nutch.searcher.DistributedSearch.Server.
The search server.
DistributedSearch.Server(File, int) - Constructor for class net.nutch.searcher.DistributedSearch.Server
Construct a search server on the index and segments in the named directory, listening on the named port.
DistributedWebDBReader - class net.nutch.db.DistributedWebDBReader.
The WebDBReader implements all the read-only parts of accessing our web database.
DistributedWebDBReader(NutchFileSystem, File) - Constructor for class net.nutch.db.DistributedWebDBReader
Open a web db reader for the named directory.
DistributedWebDBWriter - class net.nutch.db.DistributedWebDBWriter.
This is a wrapper class that allows us to reorder write operations to the linkdb and pagedb.
DistributedWebDBWriter(NutchFileSystem, File, int) - Constructor for class net.nutch.db.DistributedWebDBWriter
Open the db files.
DistributedWebDBWriter.LinkInstruction - class net.nutch.db.DistributedWebDBWriter.LinkInstruction.
Holds an instruction over a Link.
DistributedWebDBWriter.LinkInstruction() - Constructor for class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
DistributedWebDBWriter.LinkInstruction(Link, int) - Constructor for class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
DistributedWebDBWriter.LinkInstruction.MD5Comparator - class net.nutch.db.DistributedWebDBWriter.LinkInstruction.MD5Comparator.
Sorts the instruction first by Md5, then by opcode.
DistributedWebDBWriter.LinkInstruction.MD5Comparator() - Constructor for class net.nutch.db.DistributedWebDBWriter.LinkInstruction.MD5Comparator
 
DistributedWebDBWriter.LinkInstruction.UrlComparator - class net.nutch.db.DistributedWebDBWriter.LinkInstruction.UrlComparator.
Sorts the instruction first by url, then by opcode.
DistributedWebDBWriter.LinkInstruction.UrlComparator() - Constructor for class net.nutch.db.DistributedWebDBWriter.LinkInstruction.UrlComparator
 
DistributedWebDBWriter.LinkInstructionWriter - class net.nutch.db.DistributedWebDBWriter.LinkInstructionWriter.
LinkInstructionWriter very efficiently writes a LinkInstruction to an EditSectionGroupWriter.
DistributedWebDBWriter.LinkInstructionWriter() - Constructor for class net.nutch.db.DistributedWebDBWriter.LinkInstructionWriter
 
DistributedWebDBWriter.PageInstruction - class net.nutch.db.DistributedWebDBWriter.PageInstruction.
PageInstruction holds an operation over a Page.
DistributedWebDBWriter.PageInstruction() - Constructor for class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
DistributedWebDBWriter.PageInstruction(Page, int) - Constructor for class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
DistributedWebDBWriter.PageInstruction(Page, Link, int) - Constructor for class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
DistributedWebDBWriter.PageInstruction.PageComparator - class net.nutch.db.DistributedWebDBWriter.PageInstruction.PageComparator.
Sorts the instruction first by Page, then by opcode.
DistributedWebDBWriter.PageInstruction.PageComparator() - Constructor for class net.nutch.db.DistributedWebDBWriter.PageInstruction.PageComparator
 
DistributedWebDBWriter.PageInstruction.UrlComparator - class net.nutch.db.DistributedWebDBWriter.PageInstruction.UrlComparator.
Sorts the instruction first by url, then by opcode.
DistributedWebDBWriter.PageInstruction.UrlComparator() - Constructor for class net.nutch.db.DistributedWebDBWriter.PageInstruction.UrlComparator
 
DistributedWebDBWriter.PageInstructionWriter - class net.nutch.db.DistributedWebDBWriter.PageInstructionWriter.
PageInstructionWriter very efficiently writes a PageInstruction to an EditSectionGroupWriter.
DistributedWebDBWriter.PageInstructionWriter() - Constructor for class net.nutch.db.DistributedWebDBWriter.PageInstructionWriter
 
Done() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
datanodeReport() - Method in class net.nutch.ndfs.FSNamesystem
 
datanodeReport() - Method in class net.nutch.ndfs.NDFSClient
 
debugStream - Variable in class net.nutch.analysis.NutchAnalysisTokenManager
 
debugStream - Variable in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
decreaseKey(Object, int) - Method in class net.nutch.util.FibonacciHeap
Decreases the priority value associated with item.
delete() - Method in class net.nutch.db.EditSectionGroupReader
Get rid of the edits encapsulated by this file.
delete(File) - Method in class net.nutch.fs.LocalFileSystem
Get rid of File f, whether a true file or dir.
delete(File) - Method in class net.nutch.fs.NDFSFileSystem
Get rid of File f, whether a true file or dir.
delete(File) - Method in class net.nutch.fs.NutchFileSystem
Deletes File
delete(String) - Method in class net.nutch.fs.TestClient
Delete an NDFS file
delete(NutchFileSystem, String) - Static method in class net.nutch.io.MapFile
Deletes the named map file.
delete(UTF8) - Method in class net.nutch.ndfs.FSDirectory
Remove the file from management, return blocks
delete(UTF8) - Method in class net.nutch.ndfs.FSNamesystem
Remove the indicated filename from the namespace.
delete(UTF8) - Method in class net.nutch.ndfs.NDFSClient
Make a direct connection to namenode and manipulate structures there.
delete() - Method in class net.nutch.ndfs.NDFSFile
 
deleteContentDuplicates() - Method in class net.nutch.indexer.DeleteDuplicates
Delete pages with duplicate content hashes.
deleteLink(MD5Hash) - Method in class net.nutch.db.WebDBWriter
Remove links with the given MD5 from the db.
deleteOnExit() - Method in class net.nutch.ndfs.NDFSFile
 
deletePage(String) - Method in class net.nutch.db.DistributedWebDBWriter
Remove a page from the page database.
deletePage(String) - Method in interface net.nutch.db.IWebDBWriter
deletePage(url) will remove a Page object from the db with the given URL.
deletePage(String) - Method in class net.nutch.db.WebDBWriter
Remove a page from the page database.
deleteUnlicensed() - Method in class org.creativecommons.nutch.CCDeleteUnlicensedTool
Delete pages without CC licenes.
deleteUrlDuplicates() - Method in class net.nutch.indexer.DeleteDuplicates
Delete pages with duplicate URLs.
digest(byte[]) - Static method in class net.nutch.io.MD5Hash
Construct a hash value for a byte array.
digest(byte[], int, int) - Static method in class net.nutch.io.MD5Hash
Construct a hash value for a byte array.
digest(String) - Static method in class net.nutch.io.MD5Hash
Construct a hash value for a String.
digest(UTF8) - Static method in class net.nutch.io.MD5Hash
Construct a hash value for a String.
disable_tracing() - Method in class net.nutch.analysis.NutchAnalysis
 
disable_tracing() - Method in class net.nutch.quality.dynamic.PageDescription
 
disconnect() - Method in class net.nutch.protocol.ftp.Client
Closes the connection to the FTP server and restores connection parameters to the default values.
displayByteArray(byte[]) - Static method in class net.nutch.io.WritableUtils
 
du(String) - Method in class net.nutch.fs.TestClient
 
dump(boolean, PrintStream) - Method in class net.nutch.segment.SegmentReader
Dump the segment's content in human-readable format.

E

EDITS_PREFIX - Static variable in class net.nutch.db.EditSectionWriter
 
EOF - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
EOF - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
EQUALS - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
ERROR_NAME - Static variable in class net.nutch.fetcher.FetcherOutput
 
EXPIRE_INTERVAL - Static variable in interface net.nutch.ndfs.FSConstants
 
EditSectionGroupReader - class net.nutch.db.EditSectionGroupReader.
The EditSectionGroupReader will read in an edits-file that was built in a distributed way.
EditSectionGroupReader(NutchFileSystem, String, int, int) - Constructor for class net.nutch.db.EditSectionGroupReader
Open the EditSectionGroupReader for the appropriate file.
EditSectionGroupWriter - class net.nutch.db.EditSectionGroupWriter.
The EditSectionGroupWriter maintains a set of EditSectionWriter objects.
EditSectionGroupWriter(NutchFileSystem, int, int, String, Class, Class, EditSectionGroupWriter.KeyExtractor) - Constructor for class net.nutch.db.EditSectionGroupWriter
Start a EditSectionGroupWriter at the indicated location, for a single emitter.
EditSectionGroupWriter.KeyExtractor - class net.nutch.db.EditSectionGroupWriter.KeyExtractor.
Edit instructions are Comparable, but they also have an "inner" key like MD5Hash or URL that is also Comparable.
EditSectionGroupWriter.KeyExtractor() - Constructor for class net.nutch.db.EditSectionGroupWriter.KeyExtractor
 
EditSectionGroupWriter.LinkMD5Extractor - class net.nutch.db.EditSectionGroupWriter.LinkMD5Extractor.
Get the MD5 from a LinkInstruction
EditSectionGroupWriter.LinkMD5Extractor() - Constructor for class net.nutch.db.EditSectionGroupWriter.LinkMD5Extractor
 
EditSectionGroupWriter.LinkURLExtractor - class net.nutch.db.EditSectionGroupWriter.LinkURLExtractor.
Get the URL from a LinkInstruction
EditSectionGroupWriter.LinkURLExtractor() - Constructor for class net.nutch.db.EditSectionGroupWriter.LinkURLExtractor
 
EditSectionGroupWriter.PageMD5Extractor - class net.nutch.db.EditSectionGroupWriter.PageMD5Extractor.
Get the MD5 from a PageInstruction
EditSectionGroupWriter.PageMD5Extractor() - Constructor for class net.nutch.db.EditSectionGroupWriter.PageMD5Extractor
 
EditSectionGroupWriter.PageURLExtractor - class net.nutch.db.EditSectionGroupWriter.PageURLExtractor.
Get the URL from a PageInstruction
EditSectionGroupWriter.PageURLExtractor() - Constructor for class net.nutch.db.EditSectionGroupWriter.PageURLExtractor
 
EditSectionWriter - class net.nutch.db.EditSectionWriter.
EditSectionWriter writes a discrete portion of a WebDB.
EditSectionWriter(NutchFileSystem, String, int, int, Class, Class) - Constructor for class net.nutch.db.EditSectionWriter
Make a EditSectionWriter for the appropriate file.
Entities - class net.nutch.html.Entities.
 
Entities() - Constructor for class net.nutch.html.Entities
 
ExpandBuff(boolean) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
Extension - class net.nutch.plugin.Extension.
A Extension is a kind of listener descriptor that will be installed on a concret ExtensionPoint that act as kind of Publisher.
Extension(PluginDescriptor, String, String, String) - Constructor for class net.nutch.plugin.Extension
 
ExtensionPoint - class net.nutch.plugin.ExtensionPoint.
The ExtensionPoint provide meta information of a extension point.
ExtensionPoint(String, String, String) - Constructor for class net.nutch.plugin.ExtensionPoint
Constructor
elName - Variable in class net.nutch.parse.html.DOMContentUtils.LinkParams
 
element() - Method in class net.nutch.quality.dynamic.PageDescription
 
emitDistribution(PrintStream) - Method in class net.nutch.util.ScoreStats
Print out the distribution, with greater specificity for percentiles 90th - 100th.
emitFetchList(File, long, long) - Method in class net.nutch.tools.FetchListTool
Spit out the fetchlist, to a BDB at the indicated filename.
emitMultipleLists(File, int, long, long) - Method in class net.nutch.tools.FetchListTool
Spit out several fetchlists, so that we can fetch across several machines.
emitTopK(int) - Method in class net.nutch.tools.WebDBAdminTool
Emit the top K-rated Pages.
enable_tracing() - Method in class net.nutch.analysis.NutchAnalysis
 
enable_tracing() - Method in class net.nutch.quality.dynamic.PageDescription
 
encode(String) - Static method in class net.nutch.html.Entities
 
endColumn - Variable in class net.nutch.quality.dynamic.Token
beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.
endDocument() - Method in class net.nutch.parse.rtf.RTFParserDelegateImpl
 
endLine - Variable in class net.nutch.quality.dynamic.Token
beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.
entrySet() - Method in class net.nutch.util.SoftHashMap
Not Implemented
eol - Variable in class net.nutch.quality.dynamic.ParseException
The end of line string for this machine.
equals(Object) - Method in class net.nutch.db.Page
 
equals(Object) - Method in class net.nutch.fetcher.FetcherOutput
 
equals(Object) - Method in class net.nutch.io.BooleanWritable
 
equals(Object) - Method in class net.nutch.io.IntWritable
Returns true iff o is a IntWritable with the same value.
equals(Object) - Method in class net.nutch.io.LongWritable
Returns true iff o is a LongWritable with the same value.
equals(Object) - Method in class net.nutch.io.MD5Hash
Returns true iff o is an MD5Hash whose digest contains the same values.
equals(Object) - Method in class net.nutch.io.UTF8
Returns true iff o is a UTF8 with the same contents.
equals(Object) - Method in class net.nutch.linkdb.LinkAnalysisEntry
 
equals(Object) - Method in class net.nutch.pagedb.FetchListEntry
 
equals(Object) - Method in class net.nutch.parse.Outlink
 
equals(Object) - Method in class net.nutch.parse.ParseData
 
equals(Object) - Method in class net.nutch.parse.ParseText
 
equals(Object) - Method in class net.nutch.protocol.Content
 
equals(Object) - Method in class net.nutch.searcher.Hit
 
equals(Object) - Method in class net.nutch.searcher.Query.Clause
 
equals(Object) - Method in class net.nutch.searcher.Query.Phrase
 
equals(Object) - Method in class net.nutch.searcher.Query.Term
 
equals(Object) - Method in class net.nutch.searcher.Query
 
evaluate() - Method in class net.nutch.util.CommandRunner
 
exists(File) - Method in class net.nutch.fs.LocalFileSystem
 
exists(File) - Method in class net.nutch.fs.NDFSFileSystem
 
exists(File) - Method in class net.nutch.fs.NutchFileSystem
Check if exists
exists(UTF8) - Method in class net.nutch.ndfs.FSNamesystem
Return whether the given filename exists
exists(UTF8) - Method in class net.nutch.ndfs.NDFSClient
 
expectedTokenSequences - Variable in class net.nutch.quality.dynamic.ParseException
Each entry in this array is an array of integers.
extractInnerKey(WritableComparable) - Method in class net.nutch.db.EditSectionGroupWriter.KeyExtractor
 
extractInnerKey(WritableComparable) - Method in class net.nutch.db.EditSectionGroupWriter.LinkMD5Extractor
 
extractInnerKey(WritableComparable) - Method in class net.nutch.db.EditSectionGroupWriter.LinkURLExtractor
 
extractInnerKey(WritableComparable) - Method in class net.nutch.db.EditSectionGroupWriter.PageMD5Extractor
 
extractInnerKey(WritableComparable) - Method in class net.nutch.db.EditSectionGroupWriter.PageURLExtractor
 
extractProperties(InputStream) - Method in class net.nutch.parse.msword.WordExtractor
 
extractText(InputStream) - Method in class net.nutch.parse.msword.WordExtractor
Gets the text from a Word document.

F

FIELD - Static variable in class org.creativecommons.nutch.CCIndexingFilter
The name of the document field we use.
FSConstants - interface net.nutch.ndfs.FSConstants.
Some handy constants
FSDataset - class net.nutch.ndfs.FSDataset.
FSDataset manages a set of data blocks.
FSDataset(File, long) - Constructor for class net.nutch.ndfs.FSDataset
An FSDataset has a directory where it loads its data files.
FSDirectory - class net.nutch.ndfs.FSDirectory.
FSDirectory stores the filesystem directory state.
FSDirectory(File) - Constructor for class net.nutch.ndfs.FSDirectory
Create a FileSystem directory, and load its info from the indicated place.
FSNamesystem - class net.nutch.ndfs.FSNamesystem.
The FSNamesystem tracks several important tables.
FSNamesystem(File) - Constructor for class net.nutch.ndfs.FSNamesystem
dir is where the filesystem directory state is stored
FSParam - class net.nutch.ndfs.FSParam.
IPC param
FSParam() - Constructor for class net.nutch.ndfs.FSParam
 
FSParam(byte) - Constructor for class net.nutch.ndfs.FSParam
 
FSResults - class net.nutch.ndfs.FSResults.
The result of an NFS IPC call.
FSResults() - Constructor for class net.nutch.ndfs.FSResults
 
FSResults(byte) - Constructor for class net.nutch.ndfs.FSResults
 
FSResults(byte, Writable) - Constructor for class net.nutch.ndfs.FSResults
 
FSResults(byte, Writable, Writable) - Constructor for class net.nutch.ndfs.FSResults
 
FastSavedException - exception net.nutch.parse.msword.FastSavedException.
Title:
FastSavedException(String) - Constructor for class net.nutch.parse.msword.FastSavedException
 
FetchListEntry - class net.nutch.pagedb.FetchListEntry.
 
FetchListEntry() - Constructor for class net.nutch.pagedb.FetchListEntry
 
FetchListEntry(boolean, Page, String[]) - Constructor for class net.nutch.pagedb.FetchListEntry
 
FetchListTool - class net.nutch.tools.FetchListTool.
This class takes an IWebDBReader, computes a relevant subset, and then emits the subset.
FetchListTool(NutchFileSystem, File, boolean, boolean, float, int) - Constructor for class net.nutch.tools.FetchListTool
FetchListTool takes a page db, and emits a RECNO-based subset of it.
FetchListTool.SortableScore - class net.nutch.tools.FetchListTool.SortableScore.
SortableScore is just a WritableComparable Float!
FetchListTool.SortableScore() - Constructor for class net.nutch.tools.FetchListTool.SortableScore
 
FetchedSegments - class net.nutch.searcher.FetchedSegments.
Implements HitSummarizer and HitContent for a set of fetched segments.
FetchedSegments(NutchFileSystem, String) - Constructor for class net.nutch.searcher.FetchedSegments
Construct given a directory containing fetcher output.
Fetcher - class net.nutch.fetcher.Fetcher.
The fetcher.
Fetcher(NutchFileSystem, String, boolean) - Constructor for class net.nutch.fetcher.Fetcher
 
Fetcher.FetcherStatus - class net.nutch.fetcher.Fetcher.FetcherStatus.
 
Fetcher.FetcherStatus(String, long, int, int, long) - Constructor for class net.nutch.fetcher.Fetcher.FetcherStatus
FetcherStatus encapsulates a snapshot of the Fetcher progress status.
FetcherOutput - class net.nutch.fetcher.FetcherOutput.
An entry in the fetcher's output.
FetcherOutput() - Constructor for class net.nutch.fetcher.FetcherOutput
 
FetcherOutput(FetchListEntry, MD5Hash, int) - Constructor for class net.nutch.fetcher.FetcherOutput
 
FibonacciHeap - class net.nutch.util.FibonacciHeap.
A Fibonacci Heap, as described in Introduction to Algorithms by Charles E.
FibonacciHeap() - Constructor for class net.nutch.util.FibonacciHeap
Creates a new FibonacciHeap.
FieldQueryFilter - class net.nutch.searcher.FieldQueryFilter.
Translate query fields to search the same-named field, as indexed by an IndexingFilter.
FieldQueryFilter(String) - Constructor for class net.nutch.searcher.FieldQueryFilter
Construct for the named field.
FieldQueryFilter(String, float) - Constructor for class net.nutch.searcher.FieldQueryFilter
Construct for the named field, boosting as specified.
File - class net.nutch.protocol.file.File.
File.java deals with file: scheme.
File() - Constructor for class net.nutch.protocol.file.File
 
FileError - exception net.nutch.protocol.file.FileError.
Thrown for File error codes.
FileError(int) - Constructor for class net.nutch.protocol.file.FileError
 
FileException - exception net.nutch.protocol.file.FileException.
 
FileException() - Constructor for class net.nutch.protocol.file.FileException
 
FileException(String) - Constructor for class net.nutch.protocol.file.FileException
 
FileException(String, Throwable) - Constructor for class net.nutch.protocol.file.FileException
 
FileException(Throwable) - Constructor for class net.nutch.protocol.file.FileException
 
FileResponse - class net.nutch.protocol.file.FileResponse.
FileResponse.java mimics file replies as http response.
FileResponse(URL, File) - Constructor for class net.nutch.protocol.file.FileResponse
 
FileResponse(String, URL, File) - Constructor for class net.nutch.protocol.file.FileResponse
 
FileSplit - class net.nutch.mapReduce.FileSplit.
An InputFormat.Split implementation for sections of files.
FileSplit(NutchFileSystem, File, long, long) - Constructor for class net.nutch.mapReduce.FileSplit
Constructs a split.
FileUtil - class net.nutch.fs.FileUtil.
A collection of file-processing util methods
FileUtil() - Constructor for class net.nutch.fs.FileUtil
 
FillBuff() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
Ftp - class net.nutch.protocol.ftp.Ftp.
Ftp.java deals with ftp: scheme.
Ftp() - Constructor for class net.nutch.protocol.ftp.Ftp
 
FtpError - exception net.nutch.protocol.ftp.FtpError.
Thrown for Ftp error codes.
FtpError(int) - Constructor for class net.nutch.protocol.ftp.FtpError
 
FtpException - exception net.nutch.protocol.ftp.FtpException.
Superclass for important exceptions thrown during FTP talk, that must be handled with care.
FtpException() - Constructor for class net.nutch.protocol.ftp.FtpException
 
FtpException(String) - Constructor for class net.nutch.protocol.ftp.FtpException
 
FtpException(String, Throwable) - Constructor for class net.nutch.protocol.ftp.FtpException
 
FtpException(Throwable) - Constructor for class net.nutch.protocol.ftp.FtpException
 
FtpExceptionBadSystResponse - exception net.nutch.protocol.ftp.FtpExceptionBadSystResponse.
Exception indicating bad reply of SYST command.
FtpExceptionCanNotHaveDataConnection - exception net.nutch.protocol.ftp.FtpExceptionCanNotHaveDataConnection.
Exception indicating failure of opening data connection.
FtpExceptionControlClosedByForcedDataClose - exception net.nutch.protocol.ftp.FtpExceptionControlClosedByForcedDataClose.
Exception indicating control channel is closed by server end, due to forced closure of data channel at client (our) end.
FtpExceptionUnknownForcedDataClose - exception net.nutch.protocol.ftp.FtpExceptionUnknownForcedDataClose.
Exception indicating unrecognizable reply from server after forced closure of data channel by client (our) side.
FtpResponse - class net.nutch.protocol.ftp.FtpResponse.
FtpResponse.java mimics ftp replies as http response.
FtpResponse(URL, Ftp) - Constructor for class net.nutch.protocol.ftp.FtpResponse
 
FtpResponse(String, URL, Ftp) - Constructor for class net.nutch.protocol.ftp.FtpResponse
 
fetcherReader - Variable in class net.nutch.segment.SegmentReader
 
fetcherWriter - Variable in class net.nutch.segment.SegmentWriter
 
filter(Content, Parse, DocumentFragment) - Method in class net.nutch.analysis.lang.HTMLLanguageParser
Scan the HTML document looking at possible indications of content language
1.
filter(Document, Parse, FetcherOutput) - Method in class net.nutch.analysis.lang.LanguageIdentifier
 
filter(Document, Parse, FetcherOutput) - Method in interface net.nutch.indexer.IndexingFilter
Adds fields or otherwise modifies the document that will be indexed for a parse.
filter(Document, Parse, FetcherOutput) - Static method in class net.nutch.indexer.IndexingFilters
Run all defined filters.
filter(Document, Parse, FetcherOutput) - Method in class net.nutch.indexer.basic.BasicIndexingFilter
 
filter(Document, Parse, FetcherOutput) - Method in class net.nutch.indexer.more.MoreIndexingFilter
 
filter(String) - Method in class net.nutch.net.PrefixURLFilter
 
filter(String) - Method in class net.nutch.net.RegexURLFilter
 
filter(String) - Method in interface net.nutch.net.URLFilter
 
filter(Content, Parse, DocumentFragment) - Method in interface net.nutch.parse.HtmlParseFilter
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
filter(Content, Parse, DocumentFragment) - Static method in class net.nutch.parse.HtmlParseFilters
Run all defined filters.
filter(Query, BooleanQuery) - Method in class net.nutch.searcher.FieldQueryFilter
 
filter(Query, BooleanQuery) - Method in interface net.nutch.searcher.QueryFilter
Adds clauses or otherwise modifies the BooleanQuery that will be searched.
filter(Query) - Static method in class net.nutch.searcher.QueryFilters
Run all defined filters.
filter(Query, BooleanQuery) - Method in class net.nutch.searcher.RawFieldQueryFilter
 
filter(Document, Parse, FetcherOutput) - Method in class org.creativecommons.nutch.CCIndexingFilter
 
filter(Content, Parse, DocumentFragment) - Method in class org.creativecommons.nutch.CCParseFilter
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
finalizationOccurring() - Method in interface net.nutch.util.SoftHashMap.FinalizationListener
This method will be called when a SoftHashMap.FinalizationNotifier this Object is registered with is being finalized.
finalize() - Method in class net.nutch.plugin.Plugin
 
finalize() - Method in class net.nutch.plugin.PluginRepository
 
finalize() - Method in class net.nutch.protocol.ftp.Ftp
 
finalizeBlock(Block) - Method in class net.nutch.ndfs.FSDataset
Complete the block write!
findMD5Section(MD5Hash, int) - Static method in class net.nutch.db.DBKeyDivision
Find the right section index for the given MD5, and the number of sections in the db overall.
findURLSection(String, int) - Static method in class net.nutch.db.DBKeyDivision
Find the right section index for the given URL, and the number of sections in the db overall.
finished - Variable in class net.nutch.segment.SegmentReader
The time when fetching of this segment finished, as recorded in fetcher output data.
first - Variable in class net.nutch.ndfs.FSParam
 
first - Variable in class net.nutch.ndfs.FSResults
 
fix(NutchFileSystem, File, Class, Class, boolean) - Static method in class net.nutch.io.MapFile
This method attempts to fix a corrupt MapFile by re-creating its index.
fixSegment(NutchFileSystem, File, boolean, boolean, boolean, boolean) - Static method in class net.nutch.segment.SegmentReader
Attempt to fix a partially corrupted segment.
format - Static variable in class net.nutch.net.protocols.HttpDateFormat
 
format(LogRecord) - Method in class net.nutch.util.LogFormatter
Format the given LogRecord.
fullyDelete(File) - Static method in class net.nutch.fs.FileUtil
Delete a directory and all its contents.
fullyDelete(NutchFileSystem, File) - Static method in class net.nutch.fs.FileUtil
 

G

GROUP_METAINFO - Static variable in class net.nutch.db.EditSectionGroupWriter
 
GZIPUtils - class net.nutch.util.GZIPUtils.
A collection of utility methods for working on GZIPed data.
GZIPUtils() - Constructor for class net.nutch.util.GZIPUtils
 
GetImage() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
GetSuffix(int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
generateParseException() - Method in class net.nutch.analysis.NutchAnalysis
 
generateParseException() - Method in class net.nutch.quality.dynamic.PageDescription
 
get(long, Writable) - Method in class net.nutch.io.ArrayFile.Reader
Return the nth value in the file.
get() - Method in class net.nutch.io.ArrayWritable
 
get() - Method in class net.nutch.io.BooleanWritable
Returns the value of the BooleanWritable
get() - Method in class net.nutch.io.BytesWritable
 
get() - Method in class net.nutch.io.IntWritable
Return the value of this IntWritable.
get() - Method in class net.nutch.io.LongWritable
Return the value of this LongWritable.
get(WritableComparable, Writable) - Method in class net.nutch.io.MapFile.Reader
Return the value for the named key, or null if none exists.
get() - Static method in class net.nutch.io.NullWritable
Returns the single instance of this class.
get(WritableComparable) - Method in class net.nutch.io.SetFile.Reader
Read the matching key from a set into key.
get() - Method in class net.nutch.io.TwoDArrayWritable
 
get(String) - Method in class net.nutch.parse.ParseData
Return the value of a metadata property.
get(String) - Method in class net.nutch.protocol.Content
Return the value of a metadata property.
get(ServletContext) - Static method in class net.nutch.searcher.NutchBean
Cache in servlet context.
get(long, FetcherOutput, Content, ParseText, ParseData) - Method in class net.nutch.segment.SegmentReader
Get a specified entry from the segment.
get(String) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property, or null if no such property exists.
get(String, String) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property.
get(Object) - Method in class net.nutch.util.SoftHashMap
 
getAdditionalBlock(UTF8) - Method in class net.nutch.ndfs.FSNamesystem
The client would like to obtain an additional block for the indicated filename (which is being written-to).
getAnchor() - Method in class net.nutch.parse.Outlink
 
getAnchorText() - Method in class net.nutch.db.Link
 
getAnchors() - Method in class net.nutch.fetcher.FetcherOutput
 
getAnchors() - Method in class net.nutch.pagedb.FetchListEntry
 
getAnchors(HitDetails) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getAnchors(HitDetails) - Method in class net.nutch.searcher.FetchedSegments
 
getAnchors(HitDetails) - Method in interface net.nutch.searcher.HitContent
Returns the anchors of a hit document.
getAnchors(HitDetails) - Method in class net.nutch.searcher.NutchBean
 
getAttribute(String) - Method in class net.nutch.plugin.Extension
Returns a attribute value, that is setuped in the manifest file and is definied by the extension point xml schema.
getBase(Node) - Static method in class net.nutch.parse.html.DOMContentUtils
If Node contains a BASE tag then it's HREF is returned.
getBaseHref() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Returns the baseHref, if set, or null otherwise.
getBaseUrl() - Method in class net.nutch.protocol.Content
The base url for relative links contained in the content.
getBeginColumn() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
getBeginLine() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
getBlockData(Block) - Method in class net.nutch.ndfs.FSDataset
Get a stream of data from the indicated block.
getBlockId() - Method in class net.nutch.ndfs.Block
 
getBlockIterator() - Method in class net.nutch.ndfs.DatanodeInfo
 
getBlockName() - Method in class net.nutch.ndfs.Block
 
getBlockReport() - Method in class net.nutch.ndfs.FSDataset
Return a table of block data
getBlocks() - Method in class net.nutch.ndfs.DatanodeInfo
 
getBoolean(String, boolean) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property as an boolean.
getByteCount() - Method in class net.nutch.fetcher.Fetcher.FetcherStatus
 
getBytes() - Method in class net.nutch.io.UTF8
The raw bytes.
getBytes(String) - Static method in class net.nutch.io.UTF8
Convert a string to a UTF-8 encoded byte array.
getCapacity() - Method in class net.nutch.ndfs.DatanodeInfo
 
getCapacity() - Method in class net.nutch.ndfs.FSDataset
Return total capacity, used and unused
getCapacity() - Method in class net.nutch.ndfs.HeartbeatData
 
getClass(String) - Static method in class net.nutch.io.WritableName
Return the class for a name.
getClassLoader() - Method in class net.nutch.plugin.PluginDescriptor
Returns a cached classloader for a plugin.
getClauses() - Method in class net.nutch.searcher.Query
Return all clauses.
getClazz() - Method in class net.nutch.plugin.Extension
Returns the full class name of the extension point implementation
getClient() - Method in class net.nutch.fs.NDFSFileSystem
 
getCode() - Method in interface net.nutch.net.protocols.Response
Returns the response code.
getCode(int) - Method in class net.nutch.protocol.file.FileError
 
getCode() - Method in class net.nutch.protocol.file.FileResponse
Returns the response code.
getCode(int) - Method in class net.nutch.protocol.ftp.FtpError
 
getCode() - Method in class net.nutch.protocol.ftp.FtpResponse
Returns the response code.
getCode(int) - Method in class net.nutch.protocol.http.HttpError
 
getCode() - Method in class net.nutch.protocol.http.HttpResponse
Returns the response code.
getColumn() - Method in class net.nutch.quality.dynamic.SimpleCharStream
Deprecated.  
getCommand() - Method in class net.nutch.util.CommandRunner
 
getComponentCapabilities() - Method in class net.nutch.clustering.carrot2.LocalNutchInputComponent
Returns the capabilities provided by this component.
getCompressedContent() - Method in interface net.nutch.net.protocols.Response
Returns the compressed version of the content if the server transmitted a compressed version, or null otherwise.
getConfResourceAsInputStream(String) - Static method in class net.nutch.util.NutchConf
Returns an input stream attached to the configuration resource with the given name.
getConfResourceAsReader(String) - Static method in class net.nutch.util.NutchConf
Returns a reader attached to the configuration resource with the given name.
getContent() - Method in interface net.nutch.net.protocols.Response
Returns the full content of the response.
getContent() - Method in class net.nutch.protocol.Content
The binary content retrieved.
getContent(String) - Method in interface net.nutch.protocol.Protocol
Returns the Content for a url.
getContent(String) - Method in class net.nutch.protocol.file.File
 
getContent() - Method in class net.nutch.protocol.file.FileResponse
 
getContent(String) - Method in class net.nutch.protocol.ftp.Ftp
 
getContent() - Method in class net.nutch.protocol.ftp.FtpResponse
 
getContent(String) - Method in class net.nutch.protocol.http.Http
 
getContent() - Method in class net.nutch.protocol.http.HttpResponse
 
getContent(HitDetails) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getContent(HitDetails) - Method in class net.nutch.searcher.FetchedSegments
 
getContent(HitDetails) - Method in interface net.nutch.searcher.HitContent
Returns the content of a hit document.
getContent(HitDetails) - Method in class net.nutch.searcher.NutchBean
 
getContentType() - Method in class net.nutch.parse.ParserNotFound
 
getContentType() - Method in class net.nutch.protocol.Content
The media type of the retrieved content.
getContentsLen() - Method in class net.nutch.ndfs.NDFSFileInfo
 
getContentsLength() - Method in class net.nutch.ndfs.NDFSFile
And add a few extras
getCurTime() - Method in class net.nutch.fetcher.Fetcher.FetcherStatus
 
getData() - Method in class net.nutch.io.DataOutputBuffer
Returns the current contents of the buffer.
getData() - Method in interface net.nutch.parse.Parse
Other data extracted from the page.
getData() - Method in class net.nutch.parse.ParseImpl
 
getData() - Method in class net.nutch.parse.mp3.MetadataCollector
 
getDependencies() - Method in class net.nutch.plugin.PluginDescriptor
Returns a array of plugin ids.
getDescriptionLabels() - Method in interface net.nutch.clustering.HitsCluster
 
getDescriptionLabels() - Method in class net.nutch.clustering.carrot2.HitsClusterAdapter
 
getDescriptor() - Method in class net.nutch.plugin.Plugin
Returns the plugin descriptor
getDestroyOnTimeout() - Method in class net.nutch.util.CommandRunner
 
getDetails(Hit) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getDetails(Hit[]) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getDetails(Hit) - Method in interface net.nutch.searcher.HitDetailer
Returns the details for a hit document.
getDetails(Hit[]) - Method in interface net.nutch.searcher.HitDetailer
Returns the details for a set of hits.
getDetails(Hit) - Method in class net.nutch.searcher.IndexSearcher
 
getDetails(Hit[]) - Method in class net.nutch.searcher.IndexSearcher
 
getDetails(Hit) - Method in class net.nutch.searcher.NutchBean
 
getDetails(Hit[]) - Method in class net.nutch.searcher.NutchBean
 
getDigest() - Method in class net.nutch.io.MD5Hash
Returns the digest bytes.
getDiscriptor() - Method in class net.nutch.plugin.Extension
return the plugin descriptor.
getDomainID() - Method in class net.nutch.db.Link
 
getElapsedTime() - Method in class net.nutch.fetcher.Fetcher.FetcherStatus
 
getEndColumn() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
getEndLine() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
getErrorCount() - Method in class net.nutch.fetcher.Fetcher.FetcherStatus
 
getExitValue() - Method in class net.nutch.util.CommandRunner
 
getExpireTime() - Method in class net.nutch.protocol.http.RobotRulesParser.RobotRuleSet
Get expire time
getExplanation(Query, Hit) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getExplanation(Query, Hit) - Method in class net.nutch.searcher.IndexSearcher
 
getExplanation(Query, Hit) - Method in class net.nutch.searcher.NutchBean
 
getExplanation(Query, Hit) - Method in interface net.nutch.searcher.Searcher
Return an HTML-formatted explanation of how a query scored.
getExportedLibUrls() - Method in class net.nutch.plugin.PluginDescriptor
Returns a array exported librareis as URLs
getExtensionInstance() - Method in class net.nutch.plugin.Extension
Return an instance of the extension implementatio.
getExtensionPoint(String) - Method in class net.nutch.plugin.PluginRepository
Returns a extension point indentified by a extension point id.
getExtensions() - Method in class net.nutch.plugin.PluginDescriptor
Returns an array of extensions.
getExtenstionPoints() - Method in class net.nutch.plugin.PluginDescriptor
Returns a array of extension points.
getExtentens() - Method in class net.nutch.plugin.ExtensionPoint
Returns a array of extensions that lsiten to this extension point
getFactor() - Method in class net.nutch.io.SequenceFile.Sorter
Get the number of streams to merge at once.
getFetch() - Method in class net.nutch.pagedb.FetchListEntry
 
getFetchDate() - Method in class net.nutch.fetcher.FetcherOutput
 
getFetchDate(HitDetails) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getFetchDate(HitDetails) - Method in class net.nutch.searcher.FetchedSegments
 
getFetchDate(HitDetails) - Method in interface net.nutch.searcher.HitContent
Returns the anchors of a hit document.
getFetchDate(HitDetails) - Method in class net.nutch.searcher.NutchBean
 
getFetchInterval() - Method in class net.nutch.db.Page
 
getFetchListEntry() - Method in class net.nutch.fetcher.FetcherOutput
 
getField(int) - Method in class net.nutch.searcher.HitDetails
Returns the name of the ith field.
getField() - Method in class net.nutch.searcher.Query.Clause
 
getFile() - Method in class net.nutch.mapReduce.FileSplit
The file containing this split's data.
getFile(UTF8) - Method in class net.nutch.ndfs.FSDirectory
Get the blocks associated with the file
getFileSystem() - Method in class net.nutch.mapReduce.FileSplit
The file system containing this split's data.
getFilter(TokenStream, String) - Static method in class net.nutch.analysis.CommonGrams
Construct a token filter that inserts n-grams for common terms.
getFilter() - Static method in class net.nutch.net.URLFilterFactory
Return the default URLFilter implementation.
getFloat() - Method in class net.nutch.tools.FetchListTool.SortableScore
 
getFloat(String, float) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property as a float.
getFragments() - Method in class net.nutch.searcher.Summary
Returns an array of all of this summary's fragments.
getFromID() - Method in class net.nutch.db.Link
 
getHeader(String) - Method in interface net.nutch.net.protocols.Response
Returns the value of a named header.
getHeader(String) - Method in class net.nutch.protocol.file.FileResponse
Returns the value of a named header.
getHeader(String) - Method in class net.nutch.protocol.ftp.FtpResponse
Returns the value of a named header.
getHeader(String) - Method in class net.nutch.protocol.http.HttpResponse
Returns the value of a named header.
getHit(int) - Method in class net.nutch.searcher.Hits
Returns the ith hit in this list.
getHits() - Method in interface net.nutch.clustering.HitsCluster
 
getHits() - Method in class net.nutch.clustering.carrot2.HitsClusterAdapter
 
getHits(int, int) - Method in class net.nutch.searcher.Hits
Returns a subset of the hit objects.
getID3v2Parse(MP3File) - Method in class net.nutch.parse.mp3.MP3Parser
 
getId() - Method in class net.nutch.clustering.carrot2.NutchDocument
 
getId() - Method in class net.nutch.plugin.Extension
Return the unique id of the extension.
getId() - Method in class net.nutch.plugin.ExtensionPoint
Returns the unique id of the extension point.
getIndexDocNo() - Method in class net.nutch.searcher.Hit
Return the document number of this hit within an index.
getIndexInterval() - Method in class net.nutch.io.MapFile.Writer
The number of entries that are added before an index entry is added.
getIndexNo() - Method in class net.nutch.searcher.Hit
Return the index number that this hit came from.
getInputs() - Method in class net.nutch.quality.dynamic.PageDescription
 
getInstance() - Static method in class net.nutch.analysis.lang.LanguageIdentifier
return handle to singleton instance
getInstance() - Static method in class net.nutch.ontology.OntologyImpl
 
getInstance() - Static method in class net.nutch.plugin.PluginRepository
Returns the singelton instance of the PluginRepository
getInstruction() - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
getInstruction() - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
getInstruction() - Method in class net.nutch.db.WebDBWriter.LinkInstruction
 
getInstruction() - Method in class net.nutch.db.WebDBWriter.PageInstruction
 
getInt(String, int) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property as an integer.
getInterprets() - Method in class net.nutch.quality.dynamic.PageDescription
 
getKeyClass() - Method in class net.nutch.io.MapFile.Reader
Returns the class of keys in this file.
getKeyClass() - Method in class net.nutch.io.SequenceFile.Reader
Returns the class of keys in this file.
getKeyClass() - Method in class net.nutch.io.SequenceFile.Writer
Returns the class of keys in this file.
getKeyClass() - Method in class net.nutch.io.WritableComparator
Returns the WritableComparable implementation class.
getLen() - Method in class net.nutch.ndfs.NDFSFileInfo
 
getLength(File) - Method in class net.nutch.fs.LocalFileSystem
 
getLength(File) - Method in class net.nutch.fs.NDFSFileSystem
 
getLength(File) - Method in class net.nutch.fs.NutchFileSystem
 
getLength() - Method in class net.nutch.io.DataOutputBuffer
Returns the length of the valid data currently in the buffer.
getLength() - Method in class net.nutch.io.SequenceFile.Writer
Returns the current length of the output file.
getLength() - Method in class net.nutch.io.UTF8
The number of bytes in the encoded string.
getLength() - Method in class net.nutch.mapReduce.FileSplit
The number of bytes in the file to process.
getLength(Block) - Method in class net.nutch.ndfs.FSDataset
Find the block's on-disk length
getLength() - Method in class net.nutch.searcher.HitDetails
Returns the number of fields contained in this.
getLength() - Method in class net.nutch.searcher.Hits
Returns the number of hits included in this current listing.
getLine() - Method in class net.nutch.quality.dynamic.SimpleCharStream
Deprecated.  
getLink() - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
getLink() - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
getLink() - Method in class net.nutch.db.WebDBWriter.LinkInstruction
 
getLink() - Method in class net.nutch.db.WebDBWriter.PageInstruction
 
getLinks(UTF8) - Method in class net.nutch.db.DBSectionReader
Get all the hyperlinks that link TO the indicated URL.
getLinks(MD5Hash) - Method in class net.nutch.db.DBSectionReader
Grab all the links from the given MD5 hash.
getLinks(UTF8) - Method in class net.nutch.db.DistributedWebDBReader
Get all the hyperlinks that link TO the indicated URL.
getLinks(MD5Hash) - Method in class net.nutch.db.DistributedWebDBReader
Grab all the links from the given MD5 hash.
getLinks(UTF8) - Method in interface net.nutch.db.IWebDBReader
Return any Link objects that point to the given URL.
getLinks(MD5Hash) - Method in interface net.nutch.db.IWebDBReader
Return all the Link objects that originate from a document with the given MD5 checksum.
getLinks(UTF8) - Method in class net.nutch.db.WebDBReader
Get all the hyperlinks that link TO the indicated URL.
getLinks(MD5Hash) - Method in class net.nutch.db.WebDBReader
Grab all the links from the given MD5 hash.
getListing(UTF8) - Method in class net.nutch.ndfs.FSDirectory
Get a listing of files given path 'src' This function is admittedly very inefficient right now.
getListing(UTF8) - Method in class net.nutch.ndfs.FSNamesystem
Get a listing of all files at 'src'.
getLogStream(Logger, Level) - Static method in class net.nutch.util.LogFormatter
Returns a stream that, when written to, adds log lines.
getLogger(String) - Static method in class net.nutch.util.LogFormatter
Gets a logger and, as a side effect, installs this as the default formatter.
getLong(String, long) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property as a long.
getMD5() - Method in class net.nutch.db.Page
 
getMD5Hash() - Method in class net.nutch.fetcher.FetcherOutput
 
getMemory() - Method in class net.nutch.io.SequenceFile.Sorter
Get the total amount of buffer memory, in bytes.
getMessage() - Method in class net.nutch.quality.dynamic.ParseException
This method has the standard behavior when this object has been created using the standard constructors.
getMessage() - Method in class net.nutch.quality.dynamic.TokenMgrError
You can also modify the body of this method to customize your error messages.
getMetaData() - Method in class net.nutch.parse.rtf.RTFParserDelegateImpl
 
getMetadata() - Method in class net.nutch.parse.ParseData
Other page properties.
getMetadata() - Method in class net.nutch.protocol.Content
Other protocol-specific data.
getModel() - Static method in class net.nutch.ontology.OntologyImpl
 
getName() - Method in class net.nutch.analysis.lang.NGramProfile
 
getName() - Method in class net.nutch.fetcher.Fetcher.FetcherStatus
 
getName(Class) - Static method in class net.nutch.io.WritableName
Return the name for a class.
getName() - Method in class net.nutch.ndfs.DatanodeInfo
 
getName() - Method in class net.nutch.ndfs.HeartbeatData
 
getName() - Method in class net.nutch.ndfs.NDFSFileInfo
 
getName() - Method in class net.nutch.plugin.ExtensionPoint
Returns the name of the extension point.
getName() - Method in class net.nutch.plugin.PluginDescriptor
Returns the name of the plugin.
getNewSegmentName() - Static method in class net.nutch.segment.SegmentWriter
Create a new segment name
getNewUrl() - Method in class net.nutch.protocol.ResourceMoved
 
getNextFetchTime() - Method in class net.nutch.db.Page
 
getNextScore() - Method in class net.nutch.db.Page
 
getNextToken() - Method in class net.nutch.analysis.NutchAnalysis
 
getNextToken() - Method in class net.nutch.analysis.NutchAnalysisTokenManager
 
getNextToken() - Method in class net.nutch.quality.dynamic.PageDescription
 
getNextToken() - Method in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
getNoCache() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Returns the current value of noCache.
getNoFollow() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Returns the current value of noFollow.
getNoIndex() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Returns the current value of noIndex.
getNormalizer() - Static method in class net.nutch.net.UrlNormalizerFactory
Return the default UrlNormalizer implementation.
getNotExportedLibUrls() - Method in class net.nutch.plugin.PluginDescriptor
Returns a array of libraries as URLs that are not exported by the plugin.
getNumBytes() - Method in class net.nutch.ndfs.Block
 
getNumContinues() - Method in interface net.nutch.net.protocols.Response
Returns the number of 100/Continue headers encountered
getNumOutlinks() - Method in class net.nutch.db.Page
 
getOldUrl() - Method in class net.nutch.protocol.ResourceMoved
 
getOnlineClusterer() - Static method in class net.nutch.clustering.OnlineClustererFactory
 
getOntology() - Static method in class net.nutch.ontology.OntologyFactory
 
getOutlinks() - Method in class net.nutch.parse.ParseData
The outlinks of the page.
getOutlinks(URL, ArrayList, Node) - Static method in class net.nutch.parse.html.DOMContentUtils
This method finds all anchors below the supplied DOM node, and creates appropriate Outlink records for each (relative to the supplied base URL), and adds them to the outlinks ArrayList.
getOutlinks() - Method in class net.nutch.parse.mp3.MetadataCollector
 
getPage(UTF8, Page) - Method in class net.nutch.db.DBSectionReader
Fetch a Page with the given URL, and fill it into the pre-allocated Page 'p'.
getPage(String) - Method in class net.nutch.db.DistributedWebDBReader
Get Page from the pagedb with the given URL.
getPage() - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
getPage(String) - Method in interface net.nutch.db.IWebDBReader
Return a Page object with the given URL, if any.
getPage(String) - Method in class net.nutch.db.WebDBReader
Get Page from the pagedb with the given URL
getPage() - Method in class net.nutch.db.WebDBWriter.PageInstruction
 
getPage() - Method in class net.nutch.pagedb.FetchListEntry
 
getPageCount() - Method in class net.nutch.fetcher.Fetcher.FetcherStatus
 
getPages(MD5Hash) - Method in class net.nutch.db.DBSectionReader
Get Pages from the db according to their content hash.
getPages(MD5Hash) - Method in class net.nutch.db.DistributedWebDBReader
Get all the Pages according to their content hash.
getPages(MD5Hash) - Method in interface net.nutch.db.IWebDBReader
Return any Pages with the given MD5 checksum.
getPages(MD5Hash) - Method in class net.nutch.db.WebDBReader
Get Pages from the pagedb according to their content hash.
getParent() - Method in class net.nutch.ndfs.NDFSFileInfo
 
getParse(Content) - Method in interface net.nutch.parse.Parser
Creates the parse for some content.
getParse(Content) - Method in class net.nutch.parse.html.HtmlParser
 
getParse(Content) - Method in class net.nutch.parse.mp3.MP3Parser
 
getParse(Content) - Method in class net.nutch.parse.msword.MSWordParser
 
getParse(Content) - Method in class net.nutch.parse.pdf.PdfParser
 
getParse(Content) - Method in class net.nutch.parse.rtf.RTFParseFactory
 
getParse(Content) - Method in class net.nutch.parse.text.TextParser
 
getParseData(HitDetails) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getParseData(HitDetails) - Method in class net.nutch.searcher.FetchedSegments
 
getParseData(HitDetails) - Method in interface net.nutch.searcher.HitContent
Returns the ParseData of a hit document.
getParseData(HitDetails) - Method in class net.nutch.searcher.NutchBean
 
getParseText(HitDetails) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getParseText(HitDetails) - Method in class net.nutch.searcher.FetchedSegments
 
getParseText(HitDetails) - Method in interface net.nutch.searcher.HitContent
Returns the ParseText of a hit document.
getParseText(HitDetails) - Method in class net.nutch.searcher.NutchBean
 
getParser() - Static method in class net.nutch.ontology.OntologyImpl
 
getParser(String, String) - Static method in class net.nutch.parse.ParserFactory
Returns the appropriate Parser implementation given a content type and url.
getPartition(WritableComparable, int) - Method in class net.nutch.mapReduce.DefaultPartitioner
Use Object.hashCode() to partition.
getPartition(WritableComparable, int) - Method in interface net.nutch.mapReduce.Partitioner
Returns the paritition number for a given key given the total number of partitions.
getPath() - Method in class net.nutch.ndfs.NDFSFileInfo
 
getPhrase() - Method in class net.nutch.searcher.Query.Clause
 
getPluginClass() - Method in class net.nutch.plugin.PluginDescriptor
Returns the fully qualified name of the class which implements the abstarct Plugin class.
getPluginDescriptor(String) - Method in class net.nutch.plugin.PluginRepository
Returns the descriptor of one plugin identified by a plugin id.
getPluginDescriptors() - Method in class net.nutch.plugin.PluginRepository
Returns all registed plugin descriptors.
getPluginId() - Method in class net.nutch.plugin.PluginDescriptor
Returns the unique identifier of the plug-in or null.
getPluginInstance(PluginDescriptor) - Method in class net.nutch.plugin.PluginRepository
Returns a instance of a plugin.
getPluginPath() - Method in class net.nutch.plugin.PluginDescriptor
Returns the directory path of the plugin.
getPos() - Method in class net.nutch.fs.NFSDataInputStream
 
getPos() - Method in class net.nutch.fs.NFSDataOutputStream
 
getPos() - Method in class net.nutch.fs.NFSInputStream
Return the current offset from the start of the file
getPos() - Method in class net.nutch.fs.NFSOutputStream
Return the current offset from the start of the file
getPosition() - Method in class net.nutch.io.DataInputBuffer
Returns the current position in the input.
getPosition() - Method in class net.nutch.io.SequenceFile.Reader
Return the current byte position in the input file.
getProtocol(String) - Static method in class net.nutch.protocol.ProtocolFactory
Returns the appropriate Protocol implementation for a url.
getRecordReader(InputFormat.Split) - Method in interface net.nutch.mapReduce.InputFormat
Construct a RecordReader for a InputFormat.Split.
getRecordReader(InputFormat.Split) - Method in class net.nutch.mapReduce.TextInputFormat
 
getRecordWriter(File) - Method in interface net.nutch.mapReduce.OutputFormat
Construct a RecordWriter.
getRemaining() - Method in class net.nutch.ndfs.DatanodeInfo
 
getRemaining() - Method in class net.nutch.ndfs.FSDataset
Return how many bytes can still be stored in the FSDataset
getRemaining() - Method in class net.nutch.ndfs.HeartbeatData
 
getRequiredSuccessorCapabilities() - Method in class net.nutch.clustering.carrot2.LocalNutchInputComponent
Returns the capabilities required from the successor component.
getResourceString(String, Locale) - Method in class net.nutch.plugin.PluginDescriptor
Returns a internationalizabel resource string.
getRetriesSinceFetch() - Method in class net.nutch.db.Page
 
getRobotsMetaDirectives(RobotsMetaProcessor.RobotsMetaIndicator, Node, URL) - Static method in class net.nutch.parse.html.RobotsMetaProcessor
Sets the indicators in robotsMeta to appropriate values, based on any META tags found under the given node.
getSchema() - Method in class net.nutch.plugin.ExtensionPoint
Returns a path to the xml schema of a extension point.
getScore() - Method in class net.nutch.db.Page
 
getScore() - Method in class net.nutch.linkdb.LinkAnalysisEntry
 
getScore() - Method in class net.nutch.searcher.Hit
Return the degree to which this document matched the query.
getSegmentNames() - Method in class net.nutch.searcher.DistributedSearch.Client
Return the names of segments searched.
getSegmentNames() - Method in class net.nutch.searcher.FetchedSegments
 
getSegmentNames() - Method in class net.nutch.searcher.NutchBean
 
getSimilarity(NGramProfile) - Method in class net.nutch.analysis.lang.NGramProfile
Calculate a score how well NGramProfiles match each other
getSite() - Method in class net.nutch.searcher.Hit
Return the name of this this document's website.
getSorted() - Method in class net.nutch.analysis.lang.NGramProfile
Return sorted vector of ngrams (sort done by 1.
getSplits(NutchFileSystem, File[], int) - Method in interface net.nutch.mapReduce.InputFormat
Splits a set of input files.
getSplits(NutchFileSystem, File[], int) - Method in class net.nutch.mapReduce.TextInputFormat
 
getStart() - Method in class net.nutch.mapReduce.FileSplit
The position of the first byte in the file to process.
getStartTime() - Method in class net.nutch.fetcher.Fetcher.FetcherStatus
 
getStatus() - Method in class net.nutch.fetcher.Fetcher
 
getStatus() - Method in class net.nutch.fetcher.FetcherOutput
 
getStatus() - Method in class net.nutch.tools.SegmentMergeTool
 
getStrings(String) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property as an array of strings.
getSubclusters() - Method in interface net.nutch.clustering.HitsCluster
 
getSubclusters() - Method in class net.nutch.clustering.carrot2.HitsClusterAdapter
 
getSummary(HitDetails, Query) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getSummary(HitDetails[], Query) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getSummary(HitDetails, Query) - Method in class net.nutch.searcher.FetchedSegments
 
getSummary(HitDetails[], Query) - Method in class net.nutch.searcher.FetchedSegments
 
getSummary(HitDetails, Query) - Method in interface net.nutch.searcher.HitSummarizer
Returns a summary for the given hit details.
getSummary(HitDetails[], Query) - Method in interface net.nutch.searcher.HitSummarizer
Returns summaries for a set of details.
getSummary(HitDetails, Query) - Method in class net.nutch.searcher.NutchBean
 
getSummary(HitDetails[], Query) - Method in class net.nutch.searcher.NutchBean
 
getSummary(String, Query) - Method in class net.nutch.searcher.Summarizer
Returns a summary for the given pre-tokenized text.
getSystemName() - Method in class net.nutch.protocol.ftp.Client
Fetches the system type name from the server and returns the string.
getTargetPoint() - Method in class net.nutch.plugin.Extension
Returns the Id of the extension point, that is implemented by this extension.
getTerm() - Method in class net.nutch.searcher.Query.Clause
 
getTerms() - Method in class net.nutch.searcher.Query.Phrase
 
getTerms() - Method in class net.nutch.searcher.Query
Flattens a query into the set of text terms that it contains.
getText() - Method in interface net.nutch.parse.Parse
The textual content of the page.
getText() - Method in class net.nutch.parse.ParseImpl
 
getText() - Method in class net.nutch.parse.ParseText
 
getText(StringBuffer, Node, boolean) - Static method in class net.nutch.parse.html.DOMContentUtils
This method takes a StringBuffer and a DOM Node, and will append all the content text found beneath the DOM node to the StringBuffer.
getText(StringBuffer, Node) - Static method in class net.nutch.parse.html.DOMContentUtils
This is a convinience method, equivalent to getText(sb, node, false).
getText() - Method in class net.nutch.parse.mp3.MetadataCollector
 
getText() - Method in class net.nutch.parse.rtf.RTFParserDelegateImpl
 
getText() - Method in class net.nutch.searcher.Summary.Fragment
Returns the text of this fragment.
getTextRuns() - Method in class net.nutch.parse.msword.chp.Word6CHPBinTable
 
getThrownError() - Method in class net.nutch.util.CommandRunner
 
getTimeout() - Method in class net.nutch.util.CommandRunner
 
getTitle() - Method in class net.nutch.parse.ParseData
The title of the page.
getTitle(StringBuffer, Node) - Static method in class net.nutch.parse.html.DOMContentUtils
This method takes a StringBuffer and a DOM Node, and will append the content text found beneath the first title node to the StringBuffer.
getTitle() - Method in class net.nutch.parse.mp3.MetadataCollector
 
getToUrl() - Method in class net.nutch.parse.Outlink
 
getToken(int) - Method in class net.nutch.analysis.NutchAnalysis
 
getToken(int) - Method in class net.nutch.quality.dynamic.PageDescription
 
getTotal() - Method in class net.nutch.searcher.Hits
Returns the total number of hits for this query.
getURL() - Method in class net.nutch.db.Link
 
getURL() - Method in class net.nutch.db.Page
 
getUrl() - Method in class net.nutch.fetcher.FetcherOutput
 
getUrl() - Method in interface net.nutch.net.protocols.Response
Returns the URL used to retrieve this response.
getUrl() - Method in class net.nutch.pagedb.FetchListEntry
 
getUrl() - Method in class net.nutch.parse.ParserNotFound
 
getUrl() - Method in class net.nutch.protocol.Content
The url fetched.
getUrl() - Method in class net.nutch.protocol.ProtocolNotFound
 
getUrl() - Method in class net.nutch.protocol.ResourceGone
 
getUrl() - Method in class net.nutch.protocol.RetryLater
 
getValue(int) - Method in class net.nutch.searcher.HitDetails
Returns the value of the ith field.
getValue(String) - Method in class net.nutch.searcher.HitDetails
Returns the value of the first field with the specified name.
getValueClass() - Method in class net.nutch.io.MapFile.Reader
Returns the class of values in this file.
getValueClass() - Method in class net.nutch.io.SequenceFile.Reader
Returns the class of values in this file.
getValueClass() - Method in class net.nutch.io.SequenceFile.Writer
Returns the class of values in this file.
getValues() - Method in class net.nutch.quality.dynamic.PageDescription
 
getVersion() - Method in class net.nutch.fetcher.FetcherOutput
 
getVersion() - Method in class net.nutch.io.VersionedWritable
Return the version number of the current implementation.
getVersion() - Method in class net.nutch.linkdb.LinkAnalysisEntry
 
getVersion() - Method in class net.nutch.parse.ParseData
 
getVersion() - Method in class net.nutch.parse.ParseText
 
getVersion() - Method in class net.nutch.protocol.Content
 
getWaitForExit() - Method in class net.nutch.util.CommandRunner
 
getWeight() - Method in class net.nutch.searcher.Query.Clause
 
gotHeartbeat(UTF8, long, long) - Method in class net.nutch.ndfs.FSNamesystem
The given node has reported in.

H

HEARTBEAT_INTERVAL - Static variable in interface net.nutch.ndfs.FSConstants
 
HTMLLanguageParser - class net.nutch.analysis.lang.HTMLLanguageParser.
Adds metadata identifying language of document if found We could also run statistical analysis here but we'd miss all other formats
HTMLLanguageParser() - Constructor for class net.nutch.analysis.lang.HTMLLanguageParser
 
HeartbeatData - class net.nutch.ndfs.HeartbeatData.
Heartbeat data
HeartbeatData() - Constructor for class net.nutch.ndfs.HeartbeatData
 
HeartbeatData(String, long, long) - Constructor for class net.nutch.ndfs.HeartbeatData
 
HighFreqTerms - class net.nutch.indexer.HighFreqTerms.
Lists the most frequent terms in an index.
HighFreqTerms() - Constructor for class net.nutch.indexer.HighFreqTerms
 
Hit - class net.nutch.searcher.Hit.
A document which matched a query in an index.
Hit() - Constructor for class net.nutch.searcher.Hit
 
Hit(int, int, float, String) - Constructor for class net.nutch.searcher.Hit
 
Hit(int, float, String) - Constructor for class net.nutch.searcher.Hit
 
HitContent - interface net.nutch.searcher.HitContent.
Service that returns the content of a hit.
HitDetailer - interface net.nutch.searcher.HitDetailer.
Service that returns details of a hit within an index.
HitDetails - class net.nutch.searcher.HitDetails.
Data stored in the index for a hit.
HitDetails() - Constructor for class net.nutch.searcher.HitDetails
 
HitDetails(String[], String[]) - Constructor for class net.nutch.searcher.HitDetails
Construct from field names and values arrays.
HitDetails(String, String) - Constructor for class net.nutch.searcher.HitDetails
Construct minimal details from a segment name and document number.
HitSummarizer - interface net.nutch.searcher.HitSummarizer.
Service that builds a summary for a hit on a query.
Hits - class net.nutch.searcher.Hits.
A set of hits matching a query.
Hits() - Constructor for class net.nutch.searcher.Hits
 
Hits(long, Hit[]) - Constructor for class net.nutch.searcher.Hits
 
HitsCluster - interface net.nutch.clustering.HitsCluster.
An interface representing a group of hits.
HitsClusterAdapter - class net.nutch.clustering.carrot2.HitsClusterAdapter.
An adapter of Carrot2's RawCluster interface to HitsCluster interface.
HitsClusterAdapter(RawCluster, HitDetails[]) - Constructor for class net.nutch.clustering.carrot2.HitsClusterAdapter
Creates a new adapter.
HtmlParseFilter - interface net.nutch.parse.HtmlParseFilter.
Extension point for DOM-based HTML parsers.
HtmlParseFilters - class net.nutch.parse.HtmlParseFilters.
Creates and caches HtmlParseFilter implementing plugins.
HtmlParser - class net.nutch.parse.html.HtmlParser.
 
HtmlParser() - Constructor for class net.nutch.parse.html.HtmlParser
 
Http - class net.nutch.protocol.http.Http.
An implementation of the Http protocol.
Http() - Constructor for class net.nutch.protocol.http.Http
 
HttpDateFormat - class net.nutch.net.protocols.HttpDateFormat.
class to handle HTTP dates.
HttpDateFormat() - Constructor for class net.nutch.net.protocols.HttpDateFormat
 
HttpError - exception net.nutch.protocol.http.HttpError.
Thrown for HTTP error codes.
HttpError(int) - Constructor for class net.nutch.protocol.http.HttpError
 
HttpException - exception net.nutch.protocol.http.HttpException.
 
HttpException() - Constructor for class net.nutch.protocol.http.HttpException
 
HttpException(String) - Constructor for class net.nutch.protocol.http.HttpException
 
HttpException(String, Throwable) - Constructor for class net.nutch.protocol.http.HttpException
 
HttpException(Throwable) - Constructor for class net.nutch.protocol.http.HttpException
 
HttpResponse - class net.nutch.protocol.http.HttpResponse.
An HTTP response.
HttpResponse(URL) - Constructor for class net.nutch.protocol.http.HttpResponse
 
HttpResponse(String, URL) - Constructor for class net.nutch.protocol.http.HttpResponse
 
halfDigest() - Method in class net.nutch.io.MD5Hash
Construct a half-sized version of this MD5.
hasLoggedSevere() - Static method in class net.nutch.util.LogFormatter
Returns true if this LogFormatter has logged something at Level.SEVERE
hashCode() - Method in class net.nutch.db.Page
 
hashCode() - Method in class net.nutch.io.IntWritable
 
hashCode() - Method in class net.nutch.io.LongWritable
 
hashCode() - Method in class net.nutch.io.MD5Hash
Returns a hash code value for this object.
hashCode() - Method in class net.nutch.searcher.Hit
 
hashCode() - Method in class net.nutch.searcher.Query.Clause
 
hashCode() - Method in class net.nutch.searcher.Query.Phrase
 
hashCode() - Method in class net.nutch.searcher.Query.Term
 
hashCode() - Method in class net.nutch.searcher.Query
 

I

IGNORE_INTERNAL_LINKS - Static variable in class net.nutch.tools.UpdateDatabaseTool
 
INDEX_FILE_NAME - Static variable in class net.nutch.io.MapFile
The name of the index file.
INDEX_MERGE_FACTOR - Static variable in class net.nutch.tools.SegmentMergeTool
 
INDEX_MIN_MERGE_DOCS - Static variable in class net.nutch.tools.SegmentMergeTool
 
INDEX_SIZE - Static variable in class net.nutch.tools.SegmentMergeTool
Temporary de-dup index size.
INTER_ANCHOR_GAP - Static variable in class net.nutch.analysis.NutchDocumentAnalyzer
The number of unused term positions between anchors in the anchor field.
IRREGULAR_WORD - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
IWebDBReader - interface net.nutch.db.IWebDBReader.
IWebDBReader is an interface to the consolidated page/link database.
IWebDBWriter - interface net.nutch.db.IWebDBWriter.
IWebDBWriter is an interface to the consolidated page/link database.
IndexMerger - class net.nutch.indexer.IndexMerger.
IndexMerger creates an index for the output corresponding to a single fetcher run.
IndexMerger(NutchFileSystem, File[], File, File) - Constructor for class net.nutch.indexer.IndexMerger
Merge all of the segments given
IndexOptimizer - class net.nutch.indexer.IndexOptimizer.
 
IndexOptimizer(File) - Constructor for class net.nutch.indexer.IndexOptimizer
 
IndexSearcher - class net.nutch.searcher.IndexSearcher.
Implements Searcher and HitDetailer for either a single merged index, or for a set of individual segment indexes.
IndexSearcher(File[]) - Constructor for class net.nutch.searcher.IndexSearcher
Construct given a number of indexed segments.
IndexSearcher(String) - Constructor for class net.nutch.searcher.IndexSearcher
Construct given a directory containing fetched segments, and a separate directory naming their merged index.
IndexSegment - class net.nutch.indexer.IndexSegment.
Creates an index for the output corresponding to a single fetcher run.
IndexSegment(NutchFileSystem, long, File, File) - Constructor for class net.nutch.indexer.IndexSegment
Index a segment in the given NFS.
IndexingException - exception net.nutch.indexer.IndexingException.
 
IndexingException() - Constructor for class net.nutch.indexer.IndexingException
 
IndexingException(String) - Constructor for class net.nutch.indexer.IndexingException
 
IndexingException(String, Throwable) - Constructor for class net.nutch.indexer.IndexingException
 
IndexingException(Throwable) - Constructor for class net.nutch.indexer.IndexingException
 
IndexingFilter - interface net.nutch.indexer.IndexingFilter.
Extension point for indexing.
IndexingFilters - class net.nutch.indexer.IndexingFilters.
Creates and caches IndexingFilter implementing plugins.
InputFormat - interface net.nutch.mapReduce.InputFormat.
An input data format.
InputFormat.Split - interface net.nutch.mapReduce.InputFormat.Split.
A section of an input file.
IntWritable - class net.nutch.io.IntWritable.
A WritableComparable for ints.
IntWritable() - Constructor for class net.nutch.io.IntWritable
 
IntWritable(int) - Constructor for class net.nutch.io.IntWritable
 
IntWritable.Comparator - class net.nutch.io.IntWritable.Comparator.
A Comparator optimized for IntWritable.
IntWritable.Comparator() - Constructor for class net.nutch.io.IntWritable.Comparator
 
identify(String) - Method in class net.nutch.analysis.lang.LanguageIdentifier
Identify language based on submitted content
identify(StringBuffer) - Method in class net.nutch.analysis.lang.LanguageIdentifier
 
identify(InputStream) - Method in class net.nutch.analysis.lang.LanguageIdentifier
Identify language from inputstream
image - Variable in class net.nutch.quality.dynamic.Token
The string image of the token.
inBuf - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
indent(PrintStream, int) - Static method in class net.nutch.ontology.OntologyImpl
 
indexPages() - Method in class net.nutch.indexer.IndexSegment
 
infix() - Method in class net.nutch.analysis.NutchAnalysis
Characters which can be used to form compound terms.
initRound(int, File) - Method in class net.nutch.tools.DistributedAnalysisTool
This method prepares the ground for a set of processes to distribute a round of LinkAnalysis work.
injectDmozFile(File, int, boolean, boolean, int, Pattern) - Method in class net.nutch.db.WebDBInjector
Iterate through all the items in this structured DMOZ file.
injectURLFile(File) - Method in class net.nutch.db.WebDBInjector
Iterate through all the items in this flat text file and add them to the db.
inputItem(HashMap) - Method in class net.nutch.quality.dynamic.PageDescription
 
inputSegments - Variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
inputStream - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
input_stream - Variable in class net.nutch.analysis.NutchAnalysisTokenManager
 
input_stream - Variable in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
invalidate(Block[]) - Method in class net.nutch.ndfs.FSDataset
We're informed that a block is no longer valid.
isAllowed(String) - Method in class net.nutch.protocol.http.RobotRulesParser.RobotRuleSet
Returns false if the robots.txt file prohibits us from accessing the given path, or true otherwise.
isAllowed(URL) - Static method in class net.nutch.protocol.http.RobotRulesParser
 
isBlockFilename(File) - Static method in class net.nutch.ndfs.Block
 
isDir(UTF8) - Method in class net.nutch.ndfs.FSDirectory
Check whether it's a directory
isDir(UTF8) - Method in class net.nutch.ndfs.FSNamesystem
Whether the given name is a directory
isDir() - Method in class net.nutch.ndfs.NDFSFileInfo
 
isDirectory(File) - Method in class net.nutch.fs.LocalFileSystem
 
isDirectory(File) - Method in class net.nutch.fs.NDFSFileSystem
 
isDirectory(File) - Method in class net.nutch.fs.NutchFileSystem
 
isDirectory(UTF8) - Method in class net.nutch.ndfs.NDFSClient
 
isDirectory() - Method in class net.nutch.ndfs.NDFSFile
We need to reimplement some of them
isEllipsis() - Method in class net.nutch.searcher.Summary.Ellipsis
Returns true.
isEllipsis() - Method in class net.nutch.searcher.Summary.Fragment
Returns true iff this fragment is an ellipsis.
isEmpty() - Method in class net.nutch.util.SoftHashMap
 
isField(String) - Static method in class net.nutch.searcher.QueryFilters
 
isFile(File) - Method in class net.nutch.fs.NutchFileSystem
 
isFile() - Method in class net.nutch.ndfs.NDFSFile
 
isHidden() - Method in class net.nutch.ndfs.NDFSFile
 
isHighlight() - Method in class net.nutch.searcher.Summary.Fragment
Returns true iff this fragment is to be highlighted.
isHighlight() - Method in class net.nutch.searcher.Summary.Highlight
Returns true.
isJunkCluster() - Method in interface net.nutch.clustering.HitsCluster
Returns true if this cluster constains documents that did not fit anywhere else (presentation layer may discard such clusters).
isJunkCluster() - Method in class net.nutch.clustering.carrot2.HitsClusterAdapter
 
isParsed - Variable in class net.nutch.segment.SegmentReader
 
isParsedSegment(NutchFileSystem, File) - Static method in class net.nutch.segment.SegmentReader
 
isPhrase() - Method in class net.nutch.searcher.Query.Clause
 
isProhibited() - Method in class net.nutch.searcher.Query.Clause
 
isPrunable(Query, IndexReader, int) - Method in class net.nutch.tools.PruneIndexTool.PrintFieldsChecker
 
isPrunable(Query, IndexReader, int) - Method in interface net.nutch.tools.PruneIndexTool.PruneChecker
Check whether this document should be pruned.
isPrunable(Query, IndexReader, int) - Method in class net.nutch.tools.PruneIndexTool.StoreUrlsChecker
 
isRawField(String) - Static method in class net.nutch.searcher.QueryFilters
 
isRemoteVerificationEnabled() - Method in class net.nutch.protocol.ftp.Client
Return whether or not verification of the remote host participating in data connections is enabled.
isRequired() - Method in class net.nutch.searcher.Query.Clause
 
isStopWord(String) - Static method in class net.nutch.analysis.NutchAnalysis
True iff word is a stop word.
isValidBlock(Block) - Method in class net.nutch.ndfs.FSDataset
Check whether the given block is a valid one.
isValidBlock(Block) - Method in class net.nutch.ndfs.FSDirectory
Returns whether the given block is one pointed-to by a file.
isValidToCreate(UTF8) - Method in class net.nutch.ndfs.FSDirectory
Check whether the filepath could be created
iterate(int, File) - Method in class net.nutch.tools.LinkAnalysisTool
Do a single-process iteration over the database.

J

jjFillToken() - Method in class net.nutch.analysis.NutchAnalysisTokenManager
 
jjFillToken() - Method in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
jj_nt - Variable in class net.nutch.analysis.NutchAnalysis
 
jj_nt - Variable in class net.nutch.quality.dynamic.PageDescription
 
jjnewLexState - Static variable in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
jjstrLiteralImages - Static variable in class net.nutch.analysis.NutchAnalysisTokenManager
 
jjstrLiteralImages - Static variable in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
join() - Method in class net.nutch.ipc.Server
Wait for the server to be stopped.

K

KEYWORD - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
key() - Method in class net.nutch.io.ArrayFile.Reader
Returns the key associated with the most recent call to ArrayFile.Reader.seek(long), ArrayFile.Reader.next(Writable), or ArrayFile.Reader.get(long,Writable).
key() - Method in class net.nutch.segment.SegmentReader
Return the current key position.
keySet() - Method in class net.nutch.util.SoftHashMap
 
kind - Variable in class net.nutch.quality.dynamic.Token
An integer that describes the kind of this token.

L

LEASE_PERIOD - Static variable in interface net.nutch.ndfs.FSConstants
 
LETTER - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
LOG - Static variable in class net.nutch.analysis.lang.HTMLLanguageParser
 
LOG - Static variable in class net.nutch.analysis.lang.LanguageIdentifier
 
LOG - Static variable in class net.nutch.analysis.lang.NGramProfile
 
LOG - Static variable in class net.nutch.clustering.OnlineClustererFactory
 
LOG - Static variable in class net.nutch.db.WebDBInjector
 
LOG - Static variable in class net.nutch.fetcher.Fetcher
 
LOG - Static variable in class net.nutch.fs.NutchFileSystem
 
LOG - Static variable in class net.nutch.indexer.IndexMerger
 
LOG - Static variable in class net.nutch.indexer.IndexSegment
 
LOG - Static variable in class net.nutch.indexer.basic.BasicIndexingFilter
 
LOG - Static variable in class net.nutch.indexer.more.MoreIndexingFilter
 
LOG - Static variable in class net.nutch.io.SequenceFile
 
LOG - Static variable in class net.nutch.ipc.Client
 
LOG - Static variable in class net.nutch.ipc.Server
 
LOG - Static variable in class net.nutch.ndfs.FSNamesystem
 
LOG - Static variable in class net.nutch.ndfs.NDFS
 
LOG - Static variable in class net.nutch.ndfs.NDFSClient
 
LOG - Static variable in class net.nutch.net.BasicUrlNormalizer
 
LOG - Static variable in class net.nutch.ontology.OntologyFactory
 
LOG - Static variable in class net.nutch.ontology.OntologyImpl
 
LOG - Static variable in class net.nutch.parse.ParserChecker
 
LOG - Static variable in class net.nutch.parse.ParserFactory
 
LOG - Static variable in class net.nutch.parse.html.HtmlParser
 
LOG - Static variable in class net.nutch.parse.pdf.PdfParser
 
LOG - Static variable in class net.nutch.plugin.PluginDescriptor
 
LOG - Static variable in class net.nutch.plugin.PluginManifestParser
 
LOG - Static variable in class net.nutch.plugin.PluginRepository
 
LOG - Static variable in class net.nutch.protocol.ProtocolFactory
 
LOG - Static variable in class net.nutch.protocol.file.File
 
LOG - Static variable in class net.nutch.protocol.ftp.Ftp
 
LOG - Static variable in class net.nutch.protocol.http.Http
 
LOG - Static variable in class net.nutch.protocol.http.RobotRulesParser
 
LOG - Static variable in class net.nutch.searcher.DistributedSearch
 
LOG - Static variable in class net.nutch.searcher.NutchBean
 
LOG - Static variable in class net.nutch.searcher.Query
 
LOG - Static variable in class net.nutch.segment.SegmentReader
 
LOG - Static variable in class net.nutch.segment.SegmentSlicer
 
LOG - Static variable in class net.nutch.segment.SegmentWriter
 
LOG - Static variable in class net.nutch.tools.CrawlTool
 
LOG - Static variable in class net.nutch.tools.DistributedAnalysisTool
 
LOG - Static variable in class net.nutch.tools.FetchListTool
 
LOG - Static variable in class net.nutch.tools.ParseSegment
 
LOG - Static variable in class net.nutch.tools.PruneIndexTool
 
LOG - Static variable in class net.nutch.tools.SegmentMergeTool
 
LOG - Static variable in class net.nutch.tools.UpdateDatabaseTool
 
LOG - Static variable in class net.nutch.tools.WebDBAdminTool
 
LOG - Static variable in class org.creativecommons.nutch.CCIndexingFilter
 
LOG - Static variable in class org.creativecommons.nutch.CCParseFilter
 
LOG_STEP - Static variable in class net.nutch.indexer.IndexSegment
 
LOG_STEP - Static variable in class net.nutch.segment.SegmentSlicer
 
LOG_STEP - Static variable in class net.nutch.tools.PruneIndexTool
Log the progress every LOG_STEP number of processed documents.
LOG_STEP - Static variable in class net.nutch.tools.SegmentMergeTool
Log progress update every LOG_STEP items.
LanguageIdentifier - class net.nutch.analysis.lang.LanguageIdentifier.
 
LanguageIdentifier() - Constructor for class net.nutch.analysis.lang.LanguageIdentifier
 
LanguageQueryFilter - class net.nutch.analysis.lang.LanguageQueryFilter.
Handles "lang:" query clauses, causing them to search the "lang" field indexed by LanguageIdentifier.
LanguageQueryFilter() - Constructor for class net.nutch.analysis.lang.LanguageQueryFilter
 
LexicalError(boolean, int, int, int, String, char) - Static method in class net.nutch.quality.dynamic.TokenMgrError
Returns a detailed message for the Error when it is thrown by the token manager to indicate a lexical error.
Link - class net.nutch.db.Link.
This is the field in the Link Database.
Link() - Constructor for class net.nutch.db.Link
Create the Link with no data
Link(MD5Hash, long, String, String) - Constructor for class net.nutch.db.Link
Create the record
Link.MD5Comparator - class net.nutch.db.Link.MD5Comparator.
MD5Comparator is the opposite.
Link.MD5Comparator() - Constructor for class net.nutch.db.Link.MD5Comparator
 
Link.UrlComparator - class net.nutch.db.Link.UrlComparator.
URLComparator uses the standard method where, uh, the URL comes first.
Link.UrlComparator() - Constructor for class net.nutch.db.Link.UrlComparator
 
LinkAnalysisEntry - class net.nutch.linkdb.LinkAnalysisEntry.
An entry in the LinkAnalysisTool's output.
LinkAnalysisEntry() - Constructor for class net.nutch.linkdb.LinkAnalysisEntry
 
LinkAnalysisTool - class net.nutch.tools.LinkAnalysisTool.
LinkAnalysisTool performs link-analysis by using the DistributedAnalysisTool.
LinkAnalysisTool(NutchFileSystem, File) - Constructor for class net.nutch.tools.LinkAnalysisTool
We need a DistributedAnalysisTool in order to get things done!
LocalFileSystem - class net.nutch.fs.LocalFileSystem.
Implement the NutchFileSystem interface for the local disk.
LocalFileSystem() - Constructor for class net.nutch.fs.LocalFileSystem
 
LocalNutchInputComponent - class net.nutch.clustering.carrot2.LocalNutchInputComponent.
A local input component that ignores the query passed from the controller and instead looks for data stored in the request context.
LocalNutchInputComponent() - Constructor for class net.nutch.clustering.carrot2.LocalNutchInputComponent
 
LogFormatter - class net.nutch.util.LogFormatter.
Prints just the date and the log message.
LogFormatter() - Constructor for class net.nutch.util.LogFormatter
 
LongWritable - class net.nutch.io.LongWritable.
A WritableComparable for longs.
LongWritable() - Constructor for class net.nutch.io.LongWritable
 
LongWritable(long) - Constructor for class net.nutch.io.LongWritable
 
LongWritable.Comparator - class net.nutch.io.LongWritable.Comparator.
A Comparator optimized for LongWritable.
LongWritable.Comparator() - Constructor for class net.nutch.io.LongWritable.Comparator
 
lastObsoleteCheck() - Method in class net.nutch.ndfs.DatanodeInfo
 
lastUpdate() - Method in class net.nutch.ndfs.DatanodeInfo
 
leftPad(String, int) - Static method in class net.nutch.util.StringUtil
Returns a copy of s padded with leading spaces so that it's length is length.
length() - Method in class net.nutch.ndfs.NDFSFile
 
lengthNorm(String, int) - Method in class net.nutch.indexer.NutchSimilarity
Normalize field by length.
lexStateNames - Static variable in class net.nutch.analysis.NutchAnalysisTokenManager
 
lexStateNames - Static variable in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
line - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
linkParams - Static variable in class net.nutch.parse.html.DOMContentUtils
 
links() - Method in class net.nutch.db.DBSectionReader
Return all the links, by target URL
links() - Method in class net.nutch.db.DistributedWebDBReader
Return all the links, by target URL
links() - Method in interface net.nutch.db.IWebDBReader
Obtain an Enumeration of all Link objects, sorted by target URL.
links() - Method in class net.nutch.db.WebDBReader
Return all the links, by target URL
listFiles(File) - Method in class net.nutch.fs.LocalFileSystem
 
listFiles(File) - Method in class net.nutch.fs.NDFSFileSystem
 
listFiles(File) - Method in class net.nutch.fs.NutchFileSystem
 
listFiles(File, FileFilter) - Method in class net.nutch.fs.NutchFileSystem
 
listFiles(UTF8) - Method in class net.nutch.ndfs.NDFSClient
 
load(InputStream) - Method in class net.nutch.analysis.lang.NGramProfile
Loads a ngram profile from InputStream (assumes UTF-8 encoded content)
load(String[]) - Method in interface net.nutch.ontology.Ontology
 
load(String[]) - Method in class net.nutch.ontology.OntologyImpl
 
lock(File, boolean) - Method in class net.nutch.fs.LocalFileSystem
Obtain a filesystem lock at File f.
lock(File, boolean) - Method in class net.nutch.fs.NDFSFileSystem
Obtain a filesystem lock at File f.
lock(File, boolean) - Method in class net.nutch.fs.NutchFileSystem
Obtain a lock on the given File
lock(UTF8, boolean) - Method in class net.nutch.ndfs.NDFSClient
 
login(String, String) - Method in class net.nutch.protocol.ftp.Client
Login to the FTP server using the provided username and password.
logout() - Method in class net.nutch.protocol.ftp.Client
Logout of the FTP server by sending the QUIT command.
longestMatch(String) - Method in class net.nutch.util.PrefixStringMatcher
Returns the longest prefix of input that is matched, or null if no match exists.
longestMatch(String) - Method in class net.nutch.util.SuffixStringMatcher
Returns the longest suffix of input that is matched, or null if no match exists.
longestMatch(String) - Method in class net.nutch.util.TrieStringMatcher
Returns the longest substring of input that is matched by a pattern in the trie, or null if no match exists.
lookingAhead - Variable in class net.nutch.analysis.NutchAnalysis
 
ls(String) - Method in class net.nutch.fs.TestClient
Get a listing of all files in NDFS at the indicated name

M

MAX_ANCHOR_LENGTH - Static variable in class net.nutch.db.Link
 
MAX_OUTLINKS_PER_PAGE - Static variable in class net.nutch.tools.UpdateDatabaseTool
 
MAX_SECTIONS - Static variable in class net.nutch.db.DBKeyDivision
 
MD5Hash - class net.nutch.io.MD5Hash.
A Writable for MD5 hash values.
MD5Hash() - Constructor for class net.nutch.io.MD5Hash
Constructs an MD5Hash.
MD5Hash(String) - Constructor for class net.nutch.io.MD5Hash
Constructs an MD5Hash from a hex string.
MD5Hash(byte[]) - Constructor for class net.nutch.io.MD5Hash
Constructs an MD5Hash with a specified value.
MD5Hash.Comparator - class net.nutch.io.MD5Hash.Comparator.
A WritableComparator optimized for MD5Hash keys.
MD5Hash.Comparator() - Constructor for class net.nutch.io.MD5Hash.Comparator
 
MD5_KEYSPACE - Static variable in class net.nutch.db.EditSectionGroupWriter
 
MD5_KEYSPACE_DIVIDERS - Static variable in class net.nutch.db.DBKeyDivision
 
MD5_LEN - Static variable in class net.nutch.io.MD5Hash
 
META_LANG_NAME - Static variable in class net.nutch.analysis.lang.HTMLLanguageParser
 
MINUS - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
MP3Parser - class net.nutch.parse.mp3.MP3Parser.
A parser for MP3 audio files
MP3Parser() - Constructor for class net.nutch.parse.mp3.MP3Parser
 
MSWordParser - class net.nutch.parse.msword.MSWordParser.
parser for mime type application/msword.
MSWordParser() - Constructor for class net.nutch.parse.msword.MSWordParser
 
MapFile - class net.nutch.io.MapFile.
A file-based map from keys to values.
MapFile() - Constructor for class net.nutch.io.MapFile
 
MapFile.Reader - class net.nutch.io.MapFile.Reader.
Provide access to an existing map.
MapFile.Reader(NutchFileSystem, String) - Constructor for class net.nutch.io.MapFile.Reader
Construct a map reader for the named map.
MapFile.Reader(NutchFileSystem, String, WritableComparator) - Constructor for class net.nutch.io.MapFile.Reader
Construct a map reader for the named map using the named comparator.
MapFile.Writer - class net.nutch.io.MapFile.Writer.
Writes a new map.
MapFile.Writer(NutchFileSystem, String, Class, Class) - Constructor for class net.nutch.io.MapFile.Writer
Create the named map for keys of the named class.
MapFile.Writer(NutchFileSystem, String, WritableComparator, Class) - Constructor for class net.nutch.io.MapFile.Writer
Create the named map using the named key comparator.
MapReduceJob - class net.nutch.mapReduce.MapReduceJob.
Specifies a map/reduce job.
MapReduceJob(File, String, File, String) - Constructor for class net.nutch.mapReduce.MapReduceJob
Constructs a map/reduce job.
Mapper - interface net.nutch.mapReduce.Mapper.
Maps input key/value pairs to a set of intermediate key/value pairs.
MetadataCollector - class net.nutch.parse.mp3.MetadataCollector.
This class allows meta data to be collected and manipulated
MetadataCollector() - Constructor for class net.nutch.parse.mp3.MetadataCollector
 
MoreIndexingFilter - class net.nutch.indexer.more.MoreIndexingFilter.
Add (or reset) a few metaData properties as respective fields (if they are available), * so that they can be displayed by more.jsp (called by search.jsp).
MoreIndexingFilter() - Constructor for class net.nutch.indexer.more.MoreIndexingFilter
 
main(String[]) - Static method in class net.nutch.analysis.CommonGrams
For debugging.
main(String[]) - Static method in class net.nutch.analysis.NutchAnalysis
For debugging.
main(String[]) - Static method in class net.nutch.analysis.NutchDocumentTokenizer
For debugging.
main(String[]) - Static method in class net.nutch.analysis.lang.LanguageIdentifier
main method used for testing
main(String[]) - Static method in class net.nutch.analysis.lang.NGramProfile
main method used for testing only
main(String[]) - Static method in class net.nutch.db.DistributedWebDBReader
The DistributedWebDBReader.main() provides some handy utility methods for looking through the contents of the webdb.
main(String[]) - Static method in class net.nutch.db.DistributedWebDBWriter
The WebDBWriter.main() provides some handy methods for testing the WebDB.
main(String[]) - Static method in class net.nutch.db.WebDBInjector
Command-line access.
main(String[]) - Static method in class net.nutch.db.WebDBReader
The WebDBReader.main() provides some handy utility methods for looking through the contents of the webdb.
main(String[]) - Static method in class net.nutch.db.WebDBWriter
The WebDBWriter.main() provides some handy methods for testing the WebDB.
main(String[]) - Static method in class net.nutch.fetcher.Fetcher
Run the fetcher.
main(String[]) - Static method in class net.nutch.fetcher.FetcherOutput
 
main(String[]) - Static method in class net.nutch.fs.TestClient
main() has some simple utility methods
main(String[]) - Static method in class net.nutch.indexer.DeleteDuplicates
Delete duplicates in the indexes in the named directory.
main(String[]) - Static method in class net.nutch.indexer.HighFreqTerms
 
main(String[]) - Static method in class net.nutch.indexer.IndexMerger
Create an index for the input files in the named directory.
main(String[]) - Static method in class net.nutch.indexer.IndexOptimizer
 
main(String[]) - Static method in class net.nutch.indexer.IndexSegment
Create an index for the input files in the named directory.
main(String[]) - Static method in class net.nutch.io.MapFile
 
main(String[]) - Static method in class net.nutch.ndfs.NDFS.DataNode
 
main(String[]) - Static method in class net.nutch.ndfs.NDFS.NameNode
 
main(String[]) - Static method in class net.nutch.net.PrefixURLFilter
 
main(String[]) - Static method in class net.nutch.net.RegexURLFilter
 
main(String[]) - Static method in class net.nutch.net.RegexUrlNormalizer
Spits out patterns and substitutions that are in the configuration file.
main(String[]) - Static method in class net.nutch.net.protocols.HttpDateFormat
 
main(String[]) - Static method in class net.nutch.ontology.OntologyImpl
 
main(String[]) - Static method in class net.nutch.pagedb.FetchListEntry
 
main(String[]) - Static method in class net.nutch.parse.ParseData
 
main(String[]) - Static method in class net.nutch.parse.ParseText
 
main(String[]) - Static method in class net.nutch.parse.ParserChecker
 
main(String[]) - Static method in class net.nutch.protocol.Content
 
main(String[]) - Static method in class net.nutch.protocol.file.File
For debugging.
main(String[]) - Static method in class net.nutch.protocol.ftp.Ftp
For debugging.
main(String[]) - Static method in class net.nutch.protocol.http.Http
For debugging.
main(String[]) - Static method in class net.nutch.protocol.http.RobotRulesParser
command-line main for testing
main(String[]) - Static method in class net.nutch.quality.dynamic.PageDescription
Test out sherlock parsing
main(String[]) - Static method in class net.nutch.searcher.DistributedSearch.Client
 
main(String[]) - Static method in class net.nutch.searcher.DistributedSearch.Server
Runs a search server.
main(String[]) - Static method in class net.nutch.searcher.NutchBean
For debugging.
main(String[]) - Static method in class net.nutch.searcher.Query
For debugging.
main(String[]) - Static method in class net.nutch.searcher.Summarizer
Tests Summary-generation.
main(String[]) - Static method in class net.nutch.segment.SegmentReader
Command-line wrapper.
main(String[]) - Static method in class net.nutch.segment.SegmentSlicer
Command-line wrapper.
main(String[]) - Static method in class net.nutch.segment.SegmentWriter
 
main(String[]) - Static method in class net.nutch.tools.CrawlTool
 
main(String[]) - Static method in class net.nutch.tools.DistributedAnalysisTool
Kick off the link analysis.
main(String[]) - Static method in class net.nutch.tools.FetchListTool
Generate a fetchlist from the pagedb and linkdb
main(String[]) - Static method in class net.nutch.tools.LinkAnalysisTool
Kick off the link analysis.
main(String[]) - Static method in class net.nutch.tools.ParseSegment
main method
main(String[]) - Static method in class net.nutch.tools.PruneIndexTool
 
main(String[]) - Static method in class net.nutch.tools.SegmentMergeTool
 
main(String[]) - Static method in class net.nutch.tools.UpdateDatabaseTool
Create the UpdateDatabaseTool, and pass in a WebDBWriter.
main(String[]) - Static method in class net.nutch.tools.WebDBAdminTool
This tool performs a number of generic db management tasks.
main(String[]) - Static method in class net.nutch.util.CommandRunner
 
main(String[]) - Static method in class net.nutch.util.NutchConf
For debugging.
main(String[]) - Static method in class net.nutch.util.PrefixStringMatcher
 
main(String[]) - Static method in class net.nutch.util.ScoreStats
 
main(String[]) - Static method in class net.nutch.util.StringUtil
 
main(String[]) - Static method in class net.nutch.util.SuffixStringMatcher
 
main(String[]) - Static method in class org.creativecommons.nutch.CCDeleteUnlicensedTool
Delete duplicates in the indexes in the named directory.
map(WritableComparable, Writable, OutputCollector) - Method in class net.nutch.mapReduce.DefaultMapper
The identify function.
map(WritableComparable, Writable, OutputCollector) - Method in interface net.nutch.mapReduce.Mapper
Maps a single input key/value pair into intermediate key/value pairs.
matchChar(TrieStringMatcher.TrieNode, String, int) - Method in class net.nutch.util.TrieStringMatcher
Returns the next TrieStringMatcher.TrieNode visited, given that you are at node, and the the next character in the input is the idx'th character of s.
matchItem(HashMap) - Method in class net.nutch.quality.dynamic.PageDescription
 
matches(String) - Method in class net.nutch.util.PrefixStringMatcher
Returns true if the given String is matched by a prefix in the trie
matches(String) - Method in class net.nutch.util.SuffixStringMatcher
Returns true if the given String is matched by a suffix in the trie
matches(String) - Method in class net.nutch.util.TrieStringMatcher
Returns true if the given String is matched by a pattern in the trie
maxNextCharInd - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
md5Compare(Object) - Method in class net.nutch.db.Link
Compare MD5s, then compare URLs.
merge(String[], String) - Method in class net.nutch.io.SequenceFile.Sorter
Merge the provided files.
mergeSectionComponents(File) - Method in class net.nutch.db.EditSectionGroupReader
Merge all the components of the Section into a single file and return the location.
mkdir(String) - Method in class net.nutch.fs.TestClient
Create the given dir
mkdirs(File) - Method in class net.nutch.fs.LocalFileSystem
 
mkdirs(File) - Method in class net.nutch.fs.NDFSFileSystem
 
mkdirs(File) - Method in class net.nutch.fs.NutchFileSystem
Make the given file and all non-existent parents into directories.
mkdirs(UTF8) - Method in class net.nutch.ndfs.FSDirectory
Create the given directory and all its parent dirs.
mkdirs(UTF8) - Method in class net.nutch.ndfs.FSNamesystem
Create all the necessary directories
mkdirs(UTF8) - Method in class net.nutch.ndfs.NDFSClient
 
moreFromSiteExcluded() - Method in class net.nutch.searcher.Hit
True iff other, lower-scoring, hits from the same site have been excluded from the list which contains this hit..
moveFromLocalFile(File, File) - Method in class net.nutch.fs.LocalFileSystem
In the case of the local filesystem, we can just rename the file.
moveFromLocalFile(File, File) - Method in class net.nutch.fs.NDFSFileSystem
Remove the src when finished.
moveFromLocalFile(File, File) - Method in class net.nutch.fs.NutchFileSystem
The src file is on the local disk.

N

NDFS - class net.nutch.ndfs.NDFS.
The NDFS class holds the NDFS client and server.
NDFS.DataNode - class net.nutch.ndfs.NDFS.DataNode.
DataNode controls just one critical table: block-> BLOCK_SIZE stream of bytes This info is stored on disk (the NameNode is responsible for asking other machines to replicate the data).
NDFS.DataNode(String, File, InetSocketAddress) - Constructor for class net.nutch.ndfs.NDFS.DataNode
Needs a directory to find its data (and config info)
NDFS.NameNode - class net.nutch.ndfs.NDFS.NameNode.
NameNode controls two critical tables: 1) filename->blocksequence,version 2) block->machinelist The first table is stored on disk and is very precious.
NDFS.NameNode(File, int) - Constructor for class net.nutch.ndfs.NDFS.NameNode
Create a NameNode at the specified location
NDFSClient - class net.nutch.ndfs.NDFSClient.
NDFSClient does what's necessary to connect to a Nutch Filesystem and perform basic file tasks.
NDFSClient(InetSocketAddress) - Constructor for class net.nutch.ndfs.NDFSClient
 
NDFSFile - class net.nutch.ndfs.NDFSFile.
NDFSFile is a traditional java File that's been annotated with some extra information.
NDFSFile(NDFSFileInfo) - Constructor for class net.nutch.ndfs.NDFSFile
 
NDFSFileInfo - class net.nutch.ndfs.NDFSFileInfo.
NDFSFileInfo tracks info about remote files, including name, size, etc.
NDFSFileInfo() - Constructor for class net.nutch.ndfs.NDFSFileInfo
 
NDFSFileInfo(UTF8, long, long, boolean) - Constructor for class net.nutch.ndfs.NDFSFileInfo
 
NDFSFileSystem - class net.nutch.fs.NDFSFileSystem.
Implement the NutchFileSystem interface for the NDFS system.
NDFSFileSystem(InetSocketAddress) - Constructor for class net.nutch.fs.NDFSFileSystem
Create the ShareSet automatically, and then go on to the regular constructor.
NEW_EXTERNAL_LINK_FACTOR - Static variable in class net.nutch.tools.UpdateDatabaseTool
 
NEW_INTERNAL_LINK_FACTOR - Static variable in class net.nutch.tools.UpdateDatabaseTool
 
NFSDataInputStream - class net.nutch.fs.NFSDataInputStream.
Utility that wraps a NFSInputStream in a DataInputStream and buffers input through a BufferedInputStream.
NFSDataInputStream(NFSInputStream) - Constructor for class net.nutch.fs.NFSDataInputStream
 
NFSDataInputStream(NFSInputStream, int) - Constructor for class net.nutch.fs.NFSDataInputStream
 
NFSDataOutputStream - class net.nutch.fs.NFSDataOutputStream.
Utility that wraps a NFSOutputStream in a DataOutputStream and buffers output through a BufferedOutputStream.
NFSDataOutputStream(NFSOutputStream) - Constructor for class net.nutch.fs.NFSDataOutputStream
 
NFSDataOutputStream(NFSOutputStream, int) - Constructor for class net.nutch.fs.NFSDataOutputStream
 
NFSInputStream - class net.nutch.fs.NFSInputStream.
NFSInputStream is a generic old InputStream with a little bit of RAF-style seek ability.
NFSInputStream() - Constructor for class net.nutch.fs.NFSInputStream
 
NFSOutputStream - class net.nutch.fs.NFSOutputStream.
NFSOutputStream is an OutputStream that can track its position.
NFSOutputStream() - Constructor for class net.nutch.fs.NFSOutputStream
 
NGramProfile - class net.nutch.analysis.lang.NGramProfile.
This class runs a ngram analysis over submitted text, results might be used for automatic language identifiaction.
NGramProfile(String) - Constructor for class net.nutch.analysis.lang.NGramProfile
Construct a new ngram profile
NGramProfile(String, int, int) - Constructor for class net.nutch.analysis.lang.NGramProfile
Construct a new ngram profile
NOT_FOUND - Static variable in class net.nutch.fetcher.FetcherOutput
 
NUTCH_INPUT_HIT_DETAILS_ARRAY - Static variable in class net.nutch.clustering.carrot2.LocalNutchInputComponent
 
NUTCH_INPUT_SUMMARIES_ARRAY - Static variable in class net.nutch.clustering.carrot2.LocalNutchInputComponent
 
NullWritable - class net.nutch.io.NullWritable.
Singleton Writable with no data.
NutchAnalysis - class net.nutch.analysis.NutchAnalysis.
The JavaCC-generated Nutch lexical analyzer and query parser.
NutchAnalysis(CharStream) - Constructor for class net.nutch.analysis.NutchAnalysis
 
NutchAnalysis(NutchAnalysisTokenManager) - Constructor for class net.nutch.analysis.NutchAnalysis
 
NutchAnalysisConstants - interface net.nutch.analysis.NutchAnalysisConstants.
 
NutchAnalysisTokenManager - class net.nutch.analysis.NutchAnalysisTokenManager.
 
NutchAnalysisTokenManager(Reader) - Constructor for class net.nutch.analysis.NutchAnalysisTokenManager
Constructs a token manager for the provided Reader.
NutchAnalysisTokenManager(CharStream) - Constructor for class net.nutch.analysis.NutchAnalysisTokenManager
 
NutchAnalysisTokenManager(CharStream, int) - Constructor for class net.nutch.analysis.NutchAnalysisTokenManager
 
NutchBean - class net.nutch.searcher.NutchBean.
One stop shopping for search-related functionality.
NutchBean() - Constructor for class net.nutch.searcher.NutchBean
Construct reading from connected directory.
NutchBean(File) - Constructor for class net.nutch.searcher.NutchBean
Construct in a named directory.
NutchConf - class net.nutch.util.NutchConf.
Provides access to Nutch configuration parameters.
NutchConf() - Constructor for class net.nutch.util.NutchConf
 
NutchDocument - class net.nutch.clustering.carrot2.NutchDocument.
An adapter class that implements RawDocument for Carrot2.
NutchDocument(int, HitDetails, String) - Constructor for class net.nutch.clustering.carrot2.NutchDocument
Creates a new document with the given id, summary and wrapping a details hit details.
NutchDocumentAnalyzer - class net.nutch.analysis.NutchDocumentAnalyzer.
The analyzer used for Nutch documents.
NutchDocumentAnalyzer() - Constructor for class net.nutch.analysis.NutchDocumentAnalyzer
 
NutchDocumentTokenizer - class net.nutch.analysis.NutchDocumentTokenizer.
The tokenizer used for Nutch document text.
NutchDocumentTokenizer(Reader) - Constructor for class net.nutch.analysis.NutchDocumentTokenizer
Construct a tokenizer for the text in a Reader.
NutchFileSystem - class net.nutch.fs.NutchFileSystem.
NutchFileSystem is an interface for a fairly simple distributed file system.
NutchFileSystem() - Constructor for class net.nutch.fs.NutchFileSystem
 
NutchSimilarity - class net.nutch.indexer.NutchSimilarity.
Similarity implementatation used by Nutch indexing and search.
NutchSimilarity() - Constructor for class net.nutch.indexer.NutchSimilarity
 
net.nutch.analysis - package net.nutch.analysis
Tokenizer for documents and query parser.
net.nutch.analysis.lang - package net.nutch.analysis.lang
Text document language identifier.
net.nutch.clustering - package net.nutch.clustering
 
net.nutch.clustering.carrot2 - package net.nutch.clustering.carrot2
 
net.nutch.db - package net.nutch.db
Web database: tracks page fetches and link structure.
net.nutch.fetcher - package net.nutch.fetcher
The Nutch robot.
net.nutch.fs - package net.nutch.fs
 
net.nutch.html - package net.nutch.html
 
net.nutch.indexer - package net.nutch.indexer
Maintain Lucene full-text indexes.
net.nutch.indexer.basic - package net.nutch.indexer.basic
A basic indexing plugin.
net.nutch.indexer.more - package net.nutch.indexer.more
A more indexing plugin.
net.nutch.io - package net.nutch.io
Generic i/o code for use when reading and writing data to the network, to databases, and to files.
net.nutch.ipc - package net.nutch.ipc
Client/Server code used by distributed search.
net.nutch.linkdb - package net.nutch.linkdb
 
net.nutch.mapReduce - package net.nutch.mapReduce
A system for scalable, fault-tolerant, distributed computation over large data collections.
net.nutch.ndfs - package net.nutch.ndfs
 
net.nutch.net - package net.nutch.net
 
net.nutch.net.protocols - package net.nutch.net.protocols
 
net.nutch.ontology - package net.nutch.ontology
 
net.nutch.pagedb - package net.nutch.pagedb
 
net.nutch.parse - package net.nutch.parse
 
net.nutch.parse.html - package net.nutch.parse.html
An HTML document parsing plugin.
net.nutch.parse.mp3 - package net.nutch.parse.mp3
A MP3 parsing plugin.
net.nutch.parse.msword - package net.nutch.parse.msword
A Word document parsing plugin.
net.nutch.parse.msword.chp - package net.nutch.parse.msword.chp
 
net.nutch.parse.pdf - package net.nutch.parse.pdf
A pdf parsing plugin.
net.nutch.parse.rtf - package net.nutch.parse.rtf
A RTF parsing plugin.
net.nutch.parse.text - package net.nutch.parse.text
A plain text parsing plugin.
net.nutch.plugin - package net.nutch.plugin
 
net.nutch.protocol - package net.nutch.protocol
 
net.nutch.protocol.file - package net.nutch.protocol.file
Protocol plugin which supports retrieving local file resources.
net.nutch.protocol.ftp - package net.nutch.protocol.ftp
Protocol plugin which supports retrieving documents via the ftp protocol.
net.nutch.protocol.http - package net.nutch.protocol.http
Protocol plugin which supports retrieving documents via the http protocol.
net.nutch.quality.dynamic - package net.nutch.quality.dynamic
 
net.nutch.searcher - package net.nutch.searcher
Search API
net.nutch.segment - package net.nutch.segment
 
net.nutch.tools - package net.nutch.tools
 
net.nutch.util - package net.nutch.util
 
newKey() - Method in class net.nutch.io.WritableComparator
Construct a new WritableComparable instance.
newToken(int) - Static method in class net.nutch.quality.dynamic.Token
Returns a new Token object, by default.
next() - Method in class net.nutch.analysis.NutchDocumentTokenizer
Returns the next token in the stream, or null at EOF.
next(Writable) - Method in class net.nutch.io.ArrayFile.Reader
Read and return the next value in the file.
next(WritableComparable, Writable) - Method in class net.nutch.io.MapFile.Reader
Read the next key/value pair in the map into key and val.
next(Writable) - Method in class net.nutch.io.SequenceFile.Reader
Read the next key in the file into key, skipping its value.
next(Writable, Writable) - Method in class net.nutch.io.SequenceFile.Reader
Read the next key/value pair in the file into key and val.
next(DataOutputBuffer) - Method in class net.nutch.io.SequenceFile.Reader
Read the next key/value pair in the file into buffer.
next(WritableComparable) - Method in class net.nutch.io.SetFile.Reader
Read the next key in a set into key.
next(Writable, Writable) - Method in interface net.nutch.mapReduce.RecordReader
Reads the next key/value pair.
next(Writable, Writable) - Method in interface net.nutch.mapReduce.RecordWriter
Writes a key/value pair.
next - Variable in class net.nutch.quality.dynamic.Token
A reference to the next regular (non-special) token from the input stream.
next(FetcherOutput, Content, ParseText, ParseData) - Method in class net.nutch.segment.SegmentReader
Read values from all open readers.
nfs - Variable in class net.nutch.segment.SegmentReader
 
nodeChar - Variable in class net.nutch.util.TrieStringMatcher.TrieNode
 
nonOpInfix() - Method in class net.nutch.analysis.NutchAnalysis
Parse infix characters except plus and minus.
nonOpOrTerm() - Method in class net.nutch.analysis.NutchAnalysis
Parse anything but a term or an operator (plur or minus or quote).
nonTerm() - Method in class net.nutch.analysis.NutchAnalysis
Parse anything but a term or a quote.
normalize() - Method in class net.nutch.analysis.lang.NGramProfile
Normalize profile
normalize(String) - Method in class net.nutch.net.BasicUrlNormalizer
 
normalize(String) - Method in class net.nutch.net.RegexUrlNormalizer
Normalizes any URLs by calling super.basicNormalize() and regexSub().
normalize(String) - Method in interface net.nutch.net.UrlNormalizer
 
notifyProperty(String, String) - Method in class net.nutch.parse.mp3.MetadataCollector
 
numEdits() - Method in class net.nutch.db.EditSectionGroupReader
Return how many edits there are in this section.
numLinks() - Method in class net.nutch.db.DistributedWebDBReader
Return the number of links in our db.
numLinks() - Method in interface net.nutch.db.IWebDBReader
Simple count of all Link objects in db.
numLinks() - Method in class net.nutch.db.WebDBReader
Return the number of links in our db.
numMachines() - Method in class net.nutch.db.DistributedWebDBReader
How many sections (machines) there are in this distributed db.
numPages() - Method in class net.nutch.db.DistributedWebDBReader
Return the number of pages we're dealing with.
numPages() - Method in interface net.nutch.db.IWebDBReader
Simple count of all Page objects in db.
numPages() - Method in class net.nutch.db.WebDBReader
Return the number of pages we're dealing with
numTerms - Static variable in class net.nutch.indexer.HighFreqTerms
 

O

OBSOLETE_INTERVAL - Static variable in interface net.nutch.ndfs.FSConstants
 
OPERATION_FAILED - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_BLOCKRECEIVED - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_BLOCKREPORT - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_ABANDONBLOCK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_ABANDONBLOCK_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_ADDBLOCK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_ADDBLOCK_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_COMPLETEFILE - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_COMPLETEFILE_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_DATANODEREPORT - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_DATANODEREPORT_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_DELETE - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_DELETE_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_EXISTS - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_EXISTS_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_ISDIR - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_ISDIR_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_LISTING - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_LISTING_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_MKDIRS - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_MKDIRS_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_OBTAINLOCK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_OBTAINLOCK_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_OPEN - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_OPEN_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_RAWSTATS - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_RAWSTATS_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_RELEASELOCK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_RELEASELOCK_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_RENAMETO - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_RENAMETO_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_RENEW_LEASE - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_RENEW_LEASE_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_STARTFILE - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_STARTFILE_ACK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_CLIENT_TRYAGAIN - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_ERROR - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_FAILURE - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_HEARTBEAT - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_INVALIDATE_BLOCKS - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_READSKIP_BLOCK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_READ_BLOCK - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_TRANSFERBLOCKS - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_TRANSFERDATA - Static variable in interface net.nutch.ndfs.FSConstants
 
OP_WRITE_BLOCK - Static variable in interface net.nutch.ndfs.FSConstants
 
OnlineClusterer - interface net.nutch.clustering.OnlineClusterer.
An extension point interface for online search results clustering algorithms.
OnlineClustererFactory - class net.nutch.clustering.OnlineClustererFactory.
A factory for retrieving OnlineClusterer extensions.
Ontology - interface net.nutch.ontology.Ontology.
 
OntologyFactory - class net.nutch.ontology.OntologyFactory.
A factory for retrieving Ontology extensions.
OntologyImpl - class net.nutch.ontology.OntologyImpl.
this class wraps about a model, built from a list of ontologies, uses HP's Jena
OntologyImpl() - Constructor for class net.nutch.ontology.OntologyImpl
 
Outlink - class net.nutch.parse.Outlink.
 
Outlink() - Constructor for class net.nutch.parse.Outlink
 
Outlink(String, String) - Constructor for class net.nutch.parse.Outlink
 
OutputCollector - interface net.nutch.mapReduce.OutputCollector.
Passed to Mapper and Reducer implementations to collect output data.
OutputFormat - interface net.nutch.mapReduce.OutputFormat.
An output data format.
OwlParser - class net.nutch.ontology.OwlParser.
implementation of parser for w3c's OWL files
OwlParser() - Constructor for class net.nutch.ontology.OwlParser
 
obtainLock(UTF8, UTF8, boolean) - Method in class net.nutch.ndfs.FSDirectory
 
obtainLock(UTF8, UTF8, boolean) - Method in class net.nutch.ndfs.FSNamesystem
Get a lock (perhaps exclusive) on the given file
offerService() - Method in class net.nutch.ndfs.NDFS.DataNode
Main loop for the DataNode.
op - Variable in class net.nutch.ndfs.FSParam
 
op - Variable in class net.nutch.ndfs.FSResults
 
open(File) - Method in class net.nutch.fs.LocalFileSystem
Open the file at f
open(File) - Method in class net.nutch.fs.NDFSFileSystem
Open the file at f
open(File) - Method in class net.nutch.fs.NutchFileSystem
Opens an InputStream for the indicated File, whether local or via NDFS.
open(UTF8) - Method in class net.nutch.ndfs.FSNamesystem
The client wants to open the given filename.
open(UTF8) - Method in class net.nutch.ndfs.NDFSClient
Create an input stream that obtains a nodelist from the namenode, and then reads from all the right places.
openGroup(int) - Method in class net.nutch.parse.rtf.RTFParserDelegateImpl
 
optimize() - Method in class net.nutch.indexer.IndexOptimizer
 
optimizePhrase(Query.Phrase, String) - Static method in class net.nutch.analysis.CommonGrams
Optimizes phrase queries to use n-grams when possible.
org.creativecommons.nutch - package org.creativecommons.nutch
Sample plugins that parse and index Creative Commons medadata.

P

PLUS - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
Page - class net.nutch.db.Page.
A row in the Page Database.
Page() - Constructor for class net.nutch.db.Page
Construct a page ready to be read by Page.readFields(DataInput).
Page(String, MD5Hash) - Constructor for class net.nutch.db.Page
Construct a new, default page, due to be fetched.
Page(String, float) - Constructor for class net.nutch.db.Page
 
Page(String, float, long) - Constructor for class net.nutch.db.Page
 
Page(String, float, float, long) - Constructor for class net.nutch.db.Page
 
Page.Comparator - class net.nutch.db.Page.Comparator.
Compares pages by MD5, then by URL.
Page.Comparator() - Constructor for class net.nutch.db.Page.Comparator
 
Page.UrlComparator - class net.nutch.db.Page.UrlComparator.
Compares pages by URL only.
Page.UrlComparator() - Constructor for class net.nutch.db.Page.UrlComparator
 
PageDescription - class net.nutch.quality.dynamic.PageDescription.
PageDescription gives the URL and the textual description for a target page.
PageDescription(InputStream) - Constructor for class net.nutch.quality.dynamic.PageDescription
 
PageDescription(Reader) - Constructor for class net.nutch.quality.dynamic.PageDescription
 
PageDescription(PageDescriptionTokenManager) - Constructor for class net.nutch.quality.dynamic.PageDescription
 
PageDescriptionConstants - interface net.nutch.quality.dynamic.PageDescriptionConstants.
 
PageDescriptionTokenManager - class net.nutch.quality.dynamic.PageDescriptionTokenManager.
 
PageDescriptionTokenManager(SimpleCharStream) - Constructor for class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
PageDescriptionTokenManager(SimpleCharStream, int) - Constructor for class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
Parse - interface net.nutch.parse.Parse.
The result of parsing a page's raw content.
ParseData - class net.nutch.parse.ParseData.
Data extracted from a page's content.
ParseData() - Constructor for class net.nutch.parse.ParseData
 
ParseData(String, Outlink[], Properties) - Constructor for class net.nutch.parse.ParseData
 
ParseException - exception net.nutch.parse.ParseException.
 
ParseException() - Constructor for class net.nutch.parse.ParseException
 
ParseException(String) - Constructor for class net.nutch.parse.ParseException
 
ParseException(String, Throwable) - Constructor for class net.nutch.parse.ParseException
 
ParseException(Throwable) - Constructor for class net.nutch.parse.ParseException
 
ParseException - exception net.nutch.quality.dynamic.ParseException.
This exception is thrown when parse errors are encountered.
ParseException(Token, int[][], String[]) - Constructor for class net.nutch.quality.dynamic.ParseException
This constructor is used by the method "generateParseException" in the generated parser.
ParseException() - Constructor for class net.nutch.quality.dynamic.ParseException
The following constructors are for use by you for whatever purpose you can think of.
ParseException(String) - Constructor for class net.nutch.quality.dynamic.ParseException
 
ParseImpl - class net.nutch.parse.ParseImpl.
The result of parsing a page's raw content.
ParseImpl(String, ParseData) - Constructor for class net.nutch.parse.ParseImpl
 
ParseSegment - class net.nutch.tools.ParseSegment.
Parse contents in one segment.
ParseSegment(NutchFileSystem, String, boolean) - Constructor for class net.nutch.tools.ParseSegment
ParseSegment constructor
ParseText - class net.nutch.parse.ParseText.
 
ParseText() - Constructor for class net.nutch.parse.ParseText
 
ParseText(String) - Constructor for class net.nutch.parse.ParseText
 
Parser - interface net.nutch.ontology.Parser.
interface for the parser
Parser - interface net.nutch.parse.Parser.
A parser for content generated by a Protocol implementation.
ParserChecker - class net.nutch.parse.ParserChecker.
Parser checker, useful for testing parser.
ParserChecker() - Constructor for class net.nutch.parse.ParserChecker
 
ParserFactory - class net.nutch.parse.ParserFactory.
Creates and caches Parser plugins.
ParserNotFound - exception net.nutch.parse.ParserNotFound.
 
ParserNotFound(String, String) - Constructor for class net.nutch.parse.ParserNotFound
 
ParserNotFound(String, String, String) - Constructor for class net.nutch.parse.ParserNotFound
 
Partitioner - interface net.nutch.mapReduce.Partitioner.
Partitions the key space.
PasswordProtectedException - exception net.nutch.parse.msword.PasswordProtectedException.
 
PasswordProtectedException(String) - Constructor for class net.nutch.parse.msword.PasswordProtectedException
 
PdfParser - class net.nutch.parse.pdf.PdfParser.
parser for mime type application/pdf.
PdfParser() - Constructor for class net.nutch.parse.pdf.PdfParser
 
Plugin - class net.nutch.plugin.Plugin.
A nutch-plugin is an container for a set of custom logic that provide extensions to the nutch core functionality or a other plugin that proides a API for extending.
Plugin(PluginDescriptor) - Constructor for class net.nutch.plugin.Plugin
Constructor
PluginClassLoader - class net.nutch.plugin.PluginClassLoader.
The PluginClassLoader contains only classes of the runtime libraries setuped in the plugin manifest file and exported libraries of plugins that are required pluguin.
PluginClassLoader(URL[], ClassLoader) - Constructor for class net.nutch.plugin.PluginClassLoader
Construtor
PluginDescriptor - class net.nutch.plugin.PluginDescriptor.
The PluginDescriptor provide access to all meta information of a nutch-plugin, as well to the internationalizable resources and the plugin own classloader.
PluginDescriptor(String, String, String, String, String, String) - Constructor for class net.nutch.plugin.PluginDescriptor
Constructor
PluginManifestParser - class net.nutch.plugin.PluginManifestParser.
The PluginManifestParser parser just parse the manifest file in all plugin directories.
PluginManifestParser() - Constructor for class net.nutch.plugin.PluginManifestParser
 
PluginRepository - class net.nutch.plugin.PluginRepository.
The plugin repositority is a registry of all plugins.
PluginRuntimeException - exception net.nutch.plugin.PluginRuntimeException.
PluginRuntimeException will be thrown until a exception in the plugin managemnt occurs.
PluginRuntimeException(Throwable) - Constructor for class net.nutch.plugin.PluginRuntimeException
 
PluginRuntimeException(String) - Constructor for class net.nutch.plugin.PluginRuntimeException
 
PrefixStringMatcher - class net.nutch.util.PrefixStringMatcher.
A class for efficiently matching Strings against a set of prefixes.
PrefixStringMatcher(String[]) - Constructor for class net.nutch.util.PrefixStringMatcher
Creates a new PrefixStringMatcher which will match Strings with any prefix in the supplied array.
PrefixStringMatcher(Collection) - Constructor for class net.nutch.util.PrefixStringMatcher
Creates a new PrefixStringMatcher which will match Strings with any prefix in the supplied Collection.
PrefixURLFilter - class net.nutch.net.PrefixURLFilter.
Filters URLs based on a file of URL prefixes.
PrefixURLFilter() - Constructor for class net.nutch.net.PrefixURLFilter
 
PrefixURLFilter(String) - Constructor for class net.nutch.net.PrefixURLFilter
 
PrintCommandListener - class net.nutch.protocol.ftp.PrintCommandListener.
This is a support class for logging all ftp command/reply traffic.
PrintCommandListener(Logger) - Constructor for class net.nutch.protocol.ftp.PrintCommandListener
 
Protocol - interface net.nutch.protocol.Protocol.
A retriever of url content.
ProtocolException - exception net.nutch.net.protocols.ProtocolException.
Base exception for all protocol handlers
ProtocolException() - Constructor for class net.nutch.net.protocols.ProtocolException
 
ProtocolException(String) - Constructor for class net.nutch.net.protocols.ProtocolException
 
ProtocolException(String, Throwable) - Constructor for class net.nutch.net.protocols.ProtocolException
 
ProtocolException(Throwable) - Constructor for class net.nutch.net.protocols.ProtocolException
 
ProtocolException - exception net.nutch.protocol.ProtocolException.
Thrown by Protocol.getContent(String).
ProtocolException() - Constructor for class net.nutch.protocol.ProtocolException
 
ProtocolException(String) - Constructor for class net.nutch.protocol.ProtocolException
 
ProtocolException(String, Throwable) - Constructor for class net.nutch.protocol.ProtocolException
 
ProtocolException(Throwable) - Constructor for class net.nutch.protocol.ProtocolException
 
ProtocolFactory - class net.nutch.protocol.ProtocolFactory.
Creates and caches Protocol plugins.
ProtocolNotFound - exception net.nutch.protocol.ProtocolNotFound.
 
ProtocolNotFound(String) - Constructor for class net.nutch.protocol.ProtocolNotFound
 
ProtocolNotFound(String, String) - Constructor for class net.nutch.protocol.ProtocolNotFound
 
PruneIndexTool - class net.nutch.tools.PruneIndexTool.
This tool prunes existing Nutch indexes of unwanted content.
PruneIndexTool(File[], Query[], PruneIndexTool.PruneChecker[], boolean, boolean) - Constructor for class net.nutch.tools.PruneIndexTool
Create an instance of the tool, and open all input indexes.
PruneIndexTool.PrintFieldsChecker - class net.nutch.tools.PruneIndexTool.PrintFieldsChecker.
This checker's main function is just to print out selected field values from each document, just before they are deleted.
PruneIndexTool.PrintFieldsChecker(PrintStream, String[]) - Constructor for class net.nutch.tools.PruneIndexTool.PrintFieldsChecker
 
PruneIndexTool.PruneChecker - interface net.nutch.tools.PruneIndexTool.PruneChecker.
This interface can be used to implement additional checking on matching documents.
PruneIndexTool.StoreUrlsChecker - class net.nutch.tools.PruneIndexTool.StoreUrlsChecker.
This checker's main function is just to store the URLs of each document to be deleted in a text file.
PruneIndexTool.StoreUrlsChecker(File, boolean) - Constructor for class net.nutch.tools.PruneIndexTool.StoreUrlsChecker
Store the list in a file
pageExists(MD5Hash) - Method in class net.nutch.db.DBSectionReader
Test whether a certain piece of content is in the db, but don't bother returning it.
pageExists(MD5Hash) - Method in class net.nutch.db.DistributedWebDBReader
Test whether a certain piece of content is in the database, but don't bother returning the Page(s) itself.
pageExists(MD5Hash) - Method in interface net.nutch.db.IWebDBReader
Returns whether a Page with the given MD5 checksum is in the db.
pageExists(MD5Hash) - Method in class net.nutch.db.WebDBReader
Test whether a certain piece of content is in the database, but don't bother returning the Page(s) itself.
pages() - Method in class net.nutch.db.DBSectionReader
Iterate through all the Pages, sorted by URL
pages() - Method in class net.nutch.db.DistributedWebDBReader
Iterate through all the Pages, sorted by URL.
pages() - Method in interface net.nutch.db.IWebDBReader
Obtain an Enumeration of all Page objects, sorted by URL
pages() - Method in class net.nutch.db.WebDBReader
Iterate through all the Pages, sorted by URL
pagesByMD5() - Method in class net.nutch.db.DBSectionReader
Iterate through all the Pages, sorted by MD5
pagesByMD5() - Method in class net.nutch.db.DistributedWebDBReader
Iterate through all the Pages, sorted by MD5.
pagesByMD5() - Method in interface net.nutch.db.IWebDBReader
Obtain an Enumeration of all Page objects, sorted by MD5.
pagesByMD5() - Method in class net.nutch.db.WebDBReader
Iterate through all the Pages, sorted by MD5
param() - Method in class net.nutch.quality.dynamic.PageDescription
 
parse() - Method in class net.nutch.analysis.NutchAnalysis
Parse a query.
parse(OntModel) - Method in class net.nutch.ontology.OwlParser
parse owl ontology files using jena
parse(OntModel) - Method in interface net.nutch.ontology.Parser
 
parse() - Method in class net.nutch.quality.dynamic.PageDescription
 
parse(String) - Static method in class net.nutch.searcher.Query
Parse a query from a string.
parse() - Method in class net.nutch.tools.ParseSegment
Parse contents by multiple threads and save as unsorted ParserOutput
parseArgs(String[], int) - Static method in class net.nutch.fs.NutchFileSystem
Parse the cmd-line args, starting at i.
parseCharacterEncoding(String) - Static method in class net.nutch.util.StringUtil
Parse the character encoding from the specified content type header.
parseClass(OntClass, List, int) - Method in class net.nutch.ontology.OwlParser
 
parseDataReader - Variable in class net.nutch.segment.SegmentReader
 
parseDataWriter - Variable in class net.nutch.segment.SegmentWriter
 
parsePluginFolder() - Static method in class net.nutch.plugin.PluginManifestParser
Returns a list with plugin descriptors.
parseQueries(InputStream) - Static method in class net.nutch.tools.PruneIndexTool
Read a list of Lucene queries from the stream (UTF-8 encoding is assumed).
parseQuery(String) - Static method in class net.nutch.analysis.NutchAnalysis
Construct a query parser for the text in a reader.
parseTextReader - Variable in class net.nutch.segment.SegmentReader
 
parseTextWriter - Variable in class net.nutch.segment.SegmentWriter
 
peekMin() - Method in class net.nutch.util.FibonacciHeap
Returns the same Object that FibonacciHeap.popMin() would, without removing it.
pendingTransfers(DatanodeInfo, int) - Method in class net.nutch.ndfs.FSNamesystem
Return with a list of Block/DataNodeInfo sets, indicating where various Blocks should be copied, ASAP.
phrase(String) - Method in class net.nutch.analysis.NutchAnalysis
Parse an explcitly quoted phrase query.
popMin() - Method in class net.nutch.util.FibonacciHeap
Returns the object which has the lowest priority in the heap.
prevCharIsCR - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
prevCharIsLF - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
printStatus() - Method in class net.nutch.db.WebDBInjector
Utility to present performance stats
printStatusBar(int, int) - Method in class net.nutch.db.WebDBInjector
Utility to present small status bar
processReport(Block[], UTF8) - Method in class net.nutch.ndfs.FSNamesystem
The given node is reporting all its blocks.
processedRecords - Variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
protocolCommandSent(ProtocolCommandEvent) - Method in class net.nutch.protocol.ftp.PrintCommandListener
 
protocolReplyReceived(ProtocolCommandEvent) - Method in class net.nutch.protocol.ftp.PrintCommandListener
 
purgeQueuedKeys() - Method in class net.nutch.util.SoftHashMap
 
put(Object, Object) - Method in class net.nutch.util.SoftHashMap
Associates the specified value with the specified key in this map.
putAll(Properties) - Method in class net.nutch.parse.mp3.MetadataCollector
 

Q

QUOTE - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
QUOTED_VALUE - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
Query - class net.nutch.searcher.Query.
A Nutch query.
Query() - Constructor for class net.nutch.searcher.Query
 
Query.Clause - class net.nutch.searcher.Query.Clause.
A query clause.
Query.Clause(Query.Term, String, boolean, boolean) - Constructor for class net.nutch.searcher.Query.Clause
 
Query.Clause(Query.Term, boolean, boolean) - Constructor for class net.nutch.searcher.Query.Clause
 
Query.Clause(Query.Phrase, String, boolean, boolean) - Constructor for class net.nutch.searcher.Query.Clause
 
Query.Clause(Query.Phrase, boolean, boolean) - Constructor for class net.nutch.searcher.Query.Clause
 
Query.Phrase - class net.nutch.searcher.Query.Phrase.
A phrase query clause.
Query.Phrase(Query.Term[]) - Constructor for class net.nutch.searcher.Query.Phrase
 
Query.Phrase(String[]) - Constructor for class net.nutch.searcher.Query.Phrase
 
Query.Term - class net.nutch.searcher.Query.Term.
A single-term query clause.
Query.Term(String) - Constructor for class net.nutch.searcher.Query.Term
 
QueryException - exception net.nutch.searcher.QueryException.
 
QueryException(String) - Constructor for class net.nutch.searcher.QueryException
 
QueryFilter - interface net.nutch.searcher.QueryFilter.
Extension point for query translation.
QueryFilters - class net.nutch.searcher.QueryFilters.
Creates and caches QueryFilter implementing plugins.
queueKeyForDeletion(Object) - Method in class net.nutch.util.SoftHashMap
 

R

RETRY - Static variable in class net.nutch.fetcher.FetcherOutput
 
RTFParseFactory - class net.nutch.parse.rtf.RTFParseFactory.
A parser for RTF documents
RTFParseFactory() - Constructor for class net.nutch.parse.rtf.RTFParseFactory
 
RTFParserDelegateImpl - class net.nutch.parse.rtf.RTFParserDelegateImpl.
A parser delegate for handling rtf events.
RTFParserDelegateImpl() - Constructor for class net.nutch.parse.rtf.RTFParserDelegateImpl
 
RUNLENGTH_ENCODING - Static variable in interface net.nutch.ndfs.FSConstants
 
RawFieldQueryFilter - class net.nutch.searcher.RawFieldQueryFilter.
Translate raw query fields to search the same-named field, as indexed by an IndexingFilter.
RawFieldQueryFilter(String) - Constructor for class net.nutch.searcher.RawFieldQueryFilter
Construct for the named field, lowercasing query values.
RawFieldQueryFilter(String, float) - Constructor for class net.nutch.searcher.RawFieldQueryFilter
Construct for the named field, lowercasing query values.
RawFieldQueryFilter(String, boolean) - Constructor for class net.nutch.searcher.RawFieldQueryFilter
Construct for the named field, potentially lowercasing query values.
RawFieldQueryFilter(String, boolean, float) - Constructor for class net.nutch.searcher.RawFieldQueryFilter
Construct for the named field, potentially lowercasing query values.
ReInit(CharStream) - Method in class net.nutch.analysis.NutchAnalysis
 
ReInit(NutchAnalysisTokenManager) - Method in class net.nutch.analysis.NutchAnalysis
 
ReInit(CharStream) - Method in class net.nutch.analysis.NutchAnalysisTokenManager
 
ReInit(CharStream, int) - Method in class net.nutch.analysis.NutchAnalysisTokenManager
 
ReInit(InputStream) - Method in class net.nutch.quality.dynamic.PageDescription
 
ReInit(Reader) - Method in class net.nutch.quality.dynamic.PageDescription
 
ReInit(PageDescriptionTokenManager) - Method in class net.nutch.quality.dynamic.PageDescription
 
ReInit(SimpleCharStream) - Method in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
ReInit(SimpleCharStream, int) - Method in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
ReInit(Reader, int, int, int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
ReInit(Reader, int, int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
ReInit(Reader) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
ReInit(InputStream, int, int, int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
ReInit(InputStream) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
ReInit(InputStream, int, int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
RecordReader - interface net.nutch.mapReduce.RecordReader.
Reads key/value pairs from an input file InputFormat.Split.
RecordWriter - interface net.nutch.mapReduce.RecordWriter.
Writes key/value pairs to an output file.
Reducer - interface net.nutch.mapReduce.Reducer.
Reduces a set of intermediate values which share a key to a smaller set of values.
RegexURLFilter - class net.nutch.net.RegexURLFilter.
Filters URLs based on a file of regular expressions.
RegexURLFilter() - Constructor for class net.nutch.net.RegexURLFilter
 
RegexURLFilter(String) - Constructor for class net.nutch.net.RegexURLFilter
 
RegexUrlNormalizer - class net.nutch.net.RegexUrlNormalizer.
Allows users to do regex substitutions on all/any URLs that are encountered, which is useful for stripping session IDs from URLs.
RegexUrlNormalizer() - Constructor for class net.nutch.net.RegexUrlNormalizer
Default constructor which gets the file name from either nutch-site.xml or nutch-default.xml and reads that configuration file.
RegexUrlNormalizer(String) - Constructor for class net.nutch.net.RegexUrlNormalizer
Constructor which can be passed the file name, so it doesn't look in the configuration files for it.
ResourceGone - exception net.nutch.protocol.ResourceGone.
Thrown by Protocol.getContent(String) when a URL is invalid.
ResourceGone(URL, String) - Constructor for class net.nutch.protocol.ResourceGone
 
ResourceMoved - exception net.nutch.protocol.ResourceMoved.
Thrown by Protocol.getContent(String) when a URL no longer exists.
ResourceMoved(URL, URL, String) - Constructor for class net.nutch.protocol.ResourceMoved
 
Response - interface net.nutch.net.protocols.Response.
A response inteface.
RetryLater - exception net.nutch.protocol.RetryLater.
Thrown by Protocol.getContent(String) when a URL should be retried later.
RetryLater(URL, String) - Constructor for class net.nutch.protocol.RetryLater
 
RobotRulesParser - class net.nutch.protocol.http.RobotRulesParser.
This class handles the parsing of robots.txt files.
RobotRulesParser() - Constructor for class net.nutch.protocol.http.RobotRulesParser
 
RobotRulesParser(String[]) - Constructor for class net.nutch.protocol.http.RobotRulesParser
Creates a new RobotRulesParser which will use the supplied robotNames when choosing which stanza to follow in robots.txt files.
RobotRulesParser.RobotRuleSet - class net.nutch.protocol.http.RobotRulesParser.RobotRuleSet.
This class holds the rules which were parsed from a robots.txt file, and can test paths against those rules.
RobotsMetaProcessor - class net.nutch.parse.html.RobotsMetaProcessor.
Class for parsing META Directives from DOM trees.
RobotsMetaProcessor() - Constructor for class net.nutch.parse.html.RobotsMetaProcessor
 
RobotsMetaProcessor.RobotsMetaIndicator - class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator.
Utility class with indicators for the robots directives "noindex" and "nofollow", and HTTP-EQUIV/no-cache
RobotsMetaProcessor.RobotsMetaIndicator() - Constructor for class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
 
rdfidToLabel(String) - Method in class net.nutch.ontology.OwlParser
 
read(DataInput) - Static method in class net.nutch.db.Link
 
read(DataInput) - Static method in class net.nutch.db.Page
 
read(DataInput) - Static method in class net.nutch.fetcher.FetcherOutput
 
read(DataInput) - Static method in class net.nutch.io.MD5Hash
Constructs, reads and returns an instance.
read(DataInput) - Static method in class net.nutch.linkdb.LinkAnalysisEntry
 
read(DataInput) - Static method in class net.nutch.pagedb.FetchListEntry
 
read(DataInput) - Static method in class net.nutch.parse.Outlink
 
read(DataInput) - Static method in class net.nutch.parse.ParseData
 
read(DataInput) - Static method in class net.nutch.parse.ParseText
 
read(DataInput) - Static method in class net.nutch.protocol.Content
 
read(DataInput) - Static method in class net.nutch.searcher.HitDetails
Constructs, reads and returns an instance.
read(DataInput) - Static method in class net.nutch.searcher.Query.Clause
 
read(DataInput) - Static method in class net.nutch.searcher.Query.Phrase
 
read(DataInput) - Static method in class net.nutch.searcher.Query.Term
 
read(DataInput) - Static method in class net.nutch.searcher.Query
 
readChar() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
readCompressedByteArray(DataInput) - Static method in class net.nutch.io.WritableUtils
 
readCompressedString(DataInput) - Static method in class net.nutch.io.WritableUtils
 
readFields(DataInput) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
readFields(DataInput) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
readFields(DataInput) - Method in class net.nutch.db.Link
Read in fields from a bytestream
readFields(DataInput) - Method in class net.nutch.db.Page
 
readFields(DataInput) - Method in class net.nutch.db.WebDBWriter.LinkInstruction
 
readFields(DataInput) - Method in class net.nutch.db.WebDBWriter.PageInstruction
 
readFields(DataInput) - Method in class net.nutch.fetcher.FetcherOutput
 
readFields(DataInput) - Method in class net.nutch.indexer.DeleteDuplicates.IndexedDoc
 
readFields(DataInput) - Method in class net.nutch.io.ArrayWritable
 
readFields(DataInput) - Method in class net.nutch.io.BooleanWritable
 
readFields(DataInput) - Method in class net.nutch.io.BytesWritable
 
readFields(DataInput) - Method in class net.nutch.io.IntWritable
 
readFields(DataInput) - Method in class net.nutch.io.LongWritable
 
readFields(DataInput) - Method in class net.nutch.io.MD5Hash
 
readFields(DataInput) - Method in class net.nutch.io.NullWritable
 
readFields(DataInput) - Method in class net.nutch.io.TwoDArrayWritable
 
readFields(DataInput) - Method in class net.nutch.io.UTF8
 
readFields(DataInput) - Method in class net.nutch.io.VersionedWritable
 
readFields(DataInput) - Method in interface net.nutch.io.Writable
Reads the fields of this object from in.
readFields(DataInput) - Method in class net.nutch.linkdb.LinkAnalysisEntry
 
readFields(DataInput) - Method in class net.nutch.ndfs.Block
 
readFields(DataInput) - Method in class net.nutch.ndfs.DatanodeInfo
 
readFields(DataInput) - Method in class net.nutch.ndfs.FSParam
Deserialize the opcode and the args
readFields(DataInput) - Method in class net.nutch.ndfs.FSResults
 
readFields(DataInput) - Method in class net.nutch.ndfs.HeartbeatData
 
readFields(DataInput) - Method in class net.nutch.ndfs.NDFSFileInfo
 
readFields(DataInput) - Method in class net.nutch.pagedb.FetchListEntry
 
readFields(DataInput) - Method in class net.nutch.parse.Outlink
 
readFields(DataInput) - Method in class net.nutch.parse.ParseData
 
readFields(DataInput) - Method in class net.nutch.parse.ParseText
 
readFields(DataInput) - Method in class net.nutch.protocol.Content
 
readFields(DataInput) - Method in class net.nutch.searcher.DistributedSearch.Param
 
readFields(DataInput) - Method in class net.nutch.searcher.DistributedSearch.Result
 
readFields(DataInput) - Method in class net.nutch.searcher.Hit
 
readFields(DataInput) - Method in class net.nutch.searcher.HitDetails
 
readFields(DataInput) - Method in class net.nutch.searcher.Hits
 
readFields(DataInput) - Method in class net.nutch.searcher.Query
 
readFields(DataInput) - Method in class net.nutch.tools.FetchListTool.SortableScore
 
readFloat(byte[], int) - Static method in class net.nutch.io.WritableComparator
Parse a float from a byte array.
readInt(byte[], int) - Static method in class net.nutch.io.WritableComparator
Parse an integer from a byte array.
readLong(byte[], int) - Static method in class net.nutch.io.WritableComparator
Parse a long from a byte array.
readString(DataInput) - Static method in class net.nutch.io.UTF8
Read a UTF-8 encoded string.
readString(DataInput) - Static method in class net.nutch.io.WritableUtils
 
readStringArray(DataInput) - Static method in class net.nutch.io.WritableUtils
 
readUnsignedShort(byte[], int) - Static method in class net.nutch.io.WritableComparator
Parse an unsigned short from a byte array.
recentlyInvalidBlocks(UTF8) - Method in class net.nutch.ndfs.FSNamesystem
Return with a list of Blocks that should be invalidated at the given node.
recursiveCopy(NutchFileSystem, File, File) - Static method in class net.nutch.fs.FileUtil
Copy a file and/or directory and all its contents (whether data or other files/dirs)
reduce(WritableComparable, Iterator, OutputCollector) - Method in class net.nutch.mapReduce.DefaultReducer
Writes all values directly to results.
reduce(WritableComparable, Iterator, OutputCollector) - Method in interface net.nutch.mapReduce.Reducer
Combines values for a given key.
regexNormalize(String) - Method in class net.nutch.net.RegexUrlNormalizer
This function does the replacements by iterating through all the regex patterns.
release(File) - Method in class net.nutch.fs.LocalFileSystem
Release a held lock
release(File) - Method in class net.nutch.fs.NDFSFileSystem
Release a held lock
release(File) - Method in class net.nutch.fs.NutchFileSystem
Release the lock
release(UTF8) - Method in class net.nutch.ndfs.NDFSClient
 
releaseLock(UTF8, UTF8) - Method in class net.nutch.ndfs.FSDirectory
 
releaseLock(UTF8, UTF8) - Method in class net.nutch.ndfs.FSNamesystem
Release the lock on the given file
remove(Object) - Method in class net.nutch.util.SoftHashMap
 
rename(File, File) - Method in class net.nutch.fs.LocalFileSystem
Rename files/dirs
rename(File, File) - Method in class net.nutch.fs.NDFSFileSystem
Rename files/dirs
rename(File, File) - Method in class net.nutch.fs.NutchFileSystem
Renames File src to File dst.
rename(String, String) - Method in class net.nutch.fs.TestClient
Rename an NDFS file
rename(NutchFileSystem, String, String) - Static method in class net.nutch.io.MapFile
Renames an existing map directory.
rename(UTF8, UTF8) - Method in class net.nutch.ndfs.NDFSClient
Make a direct connection to namenode and manipulate structures there.
renameTo(UTF8, UTF8) - Method in class net.nutch.ndfs.FSDirectory
Change the filename
renameTo(UTF8, UTF8) - Method in class net.nutch.ndfs.FSNamesystem
Change the indicated filename.
renderAnonymous(PrintStream, Resource, String) - Static method in class net.nutch.ontology.OntologyImpl
 
renderClassDescription(PrintStream, OntClass, int) - Static method in class net.nutch.ontology.OntologyImpl
 
renderHierarchy(PrintStream, OntClass, List, int) - Static method in class net.nutch.ontology.OntologyImpl
 
renderRestriction(PrintStream, Restriction) - Static method in class net.nutch.ontology.OntologyImpl
 
renderURI(PrintStream, PrefixMapping, String) - Static method in class net.nutch.ontology.OntologyImpl
 
renewLease(UTF8) - Method in class net.nutch.ndfs.FSNamesystem
Renew the lease(s) held by the given client
report() - Method in class net.nutch.fs.TestClient
Gives a report on how the NutchFileSystem is doing
reset(byte[], int) - Method in class net.nutch.io.DataInputBuffer
Resets the data that the buffer reads.
reset(byte[], int, int) - Method in class net.nutch.io.DataInputBuffer
Resets the data that the buffer reads.
reset() - Method in class net.nutch.io.DataOutputBuffer
Resets the buffer to empty.
reset() - Method in class net.nutch.io.MapFile.Reader
Re-positions the reader before its first key.
reset() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Sets noIndex, noFollow and noCache to false.
reset() - Method in class net.nutch.segment.SegmentReader
Reset all readers.
resolveEncodingAlias(String) - Static method in class net.nutch.util.StringUtil
 
retrieve(String) - Static method in class net.nutch.ontology.OntologyImpl
 
retrieveFile(String, OutputStream, int) - Method in class net.nutch.protocol.ftp.Client
 
retrieveList(String, List, int, FTPFileEntryParser) - Method in class net.nutch.protocol.ftp.Client
 
rightPad(String, int) - Static method in class net.nutch.util.StringUtil
Returns a copy of s padded with trailing spaces so that it's length is length.
root - Variable in class net.nutch.util.TrieStringMatcher
 
rootClasses(OntModel) - Method in class net.nutch.ontology.OwlParser
 
rootClasses(OntModel) - Method in interface net.nutch.ontology.Parser
 
run() - Method in class net.nutch.fetcher.Fetcher
Runs the fetcher.
run() - Method in class net.nutch.segment.SegmentSlicer
Run the slicer.
run() - Method in class net.nutch.tools.PruneIndexTool
For each query, find all matching documents and delete them from all input indexes.
run() - Method in class net.nutch.tools.SegmentMergeTool
Run the tool, periodically reporting progress.

S

SIGRAM - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
SLASH - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
STAGE_DEDUP - Static variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STAGE_DELETING - Static variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STAGE_INDEXING - Static variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STAGE_MASTERIDX - Static variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STAGE_MERGEIDX - Static variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STAGE_OPENING - Static variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STAGE_WRITING - Static variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
STILL_WAITING - Static variable in interface net.nutch.ndfs.FSConstants
 
SUCCESS - Static variable in class net.nutch.fetcher.FetcherOutput
 
SYSTEM_STARTUP_PERIOD - Static variable in interface net.nutch.ndfs.FSConstants
 
ScoreStats - class net.nutch.util.ScoreStats.
When we generate a fetchlist, we need to choose a "cutoff" score, such that any scores above that cutoff will be included in the fetchlist.
ScoreStats() - Constructor for class net.nutch.util.ScoreStats
 
Searcher - interface net.nutch.searcher.Searcher.
Service that searches.
SegmentMergeTool - class net.nutch.tools.SegmentMergeTool.
This class cleans up accumulated segments data, and merges them into a single (or optionally multiple) segment(s), with no duplicates in it.
SegmentMergeTool(NutchFileSystem, File[], File, long, boolean, boolean) - Constructor for class net.nutch.tools.SegmentMergeTool
Create a SegmentMergeTool.
SegmentMergeTool.SegmentMergeStatus - class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus.
 
SegmentMergeTool.SegmentMergeStatus() - Constructor for class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
SegmentMergeTool.SegmentMergeStatus(int, File[], long, long, long) - Constructor for class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
SegmentReader - class net.nutch.segment.SegmentReader.
This class holds together all data readers for an existing segment.
SegmentReader(File) - Constructor for class net.nutch.segment.SegmentReader
Open a segment for reading.
SegmentReader(NutchFileSystem, File) - Constructor for class net.nutch.segment.SegmentReader
Open a segment for reading.
SegmentReader(File, boolean) - Constructor for class net.nutch.segment.SegmentReader
Open a segment for reading.
SegmentReader(NutchFileSystem, File, boolean) - Constructor for class net.nutch.segment.SegmentReader
Open a segment for reading.
SegmentReader(NutchFileSystem, File, boolean, boolean, boolean, boolean) - Constructor for class net.nutch.segment.SegmentReader
Open a segment for reading.
SegmentSlicer - class net.nutch.segment.SegmentSlicer.
This class reads data from one or more input segments, and outputs it to one or more output segments, optionally deleting the input segments when it's finished.
SegmentSlicer(NutchFileSystem, File[], File, boolean, boolean, boolean, boolean, long) - Constructor for class net.nutch.segment.SegmentSlicer
Create new SegmentSlicer.
SegmentWriter - class net.nutch.segment.SegmentWriter.
This class holds together all data writers for a new segment.
SegmentWriter(File, boolean) - Constructor for class net.nutch.segment.SegmentWriter
 
SegmentWriter(NutchFileSystem, File, boolean) - Constructor for class net.nutch.segment.SegmentWriter
 
SegmentWriter(File, boolean, boolean) - Constructor for class net.nutch.segment.SegmentWriter
 
SegmentWriter(NutchFileSystem, File, boolean, boolean) - Constructor for class net.nutch.segment.SegmentWriter
 
SegmentWriter(NutchFileSystem, File, boolean, boolean, boolean, boolean, boolean) - Constructor for class net.nutch.segment.SegmentWriter
Open a segment for writing.
SequenceFile - class net.nutch.io.SequenceFile.
Support for flat files of binary key/value pairs.
SequenceFile.Reader - class net.nutch.io.SequenceFile.Reader.
Writes key/value pairs from a sequence-format file.
SequenceFile.Reader(NutchFileSystem, String) - Constructor for class net.nutch.io.SequenceFile.Reader
Open the named file.
SequenceFile.Sorter - class net.nutch.io.SequenceFile.Sorter.
Sorts key/value pairs in a sequence-format file.
SequenceFile.Sorter(NutchFileSystem, Class, Class) - Constructor for class net.nutch.io.SequenceFile.Sorter
Sort and merge files containing the named classes.
SequenceFile.Sorter(NutchFileSystem, WritableComparator, Class) - Constructor for class net.nutch.io.SequenceFile.Sorter
Sort and merge using an arbitrary WritableComparator.
SequenceFile.Writer - class net.nutch.io.SequenceFile.Writer.
Write key/value pairs to a sequence-format file.
SequenceFile.Writer(NutchFileSystem, String, Class, Class) - Constructor for class net.nutch.io.SequenceFile.Writer
Create the named file.
Server - class net.nutch.ipc.Server.
An abstract IPC service.
Server(int, Class, int) - Constructor for class net.nutch.ipc.Server
Constructs a server listening on the named port.
SetFile - class net.nutch.io.SetFile.
A file-based set of keys.
SetFile() - Constructor for class net.nutch.io.SetFile
 
SetFile.Reader - class net.nutch.io.SetFile.Reader.
Provide access to an existing set file.
SetFile.Reader(NutchFileSystem, String) - Constructor for class net.nutch.io.SetFile.Reader
Construct a set reader for the named set.
SetFile.Reader(NutchFileSystem, String, WritableComparator) - Constructor for class net.nutch.io.SetFile.Reader
Construct a set reader for the named set using the named comparator.
SetFile.Writer - class net.nutch.io.SetFile.Writer.
Write a new set file.
SetFile.Writer(NutchFileSystem, String, Class) - Constructor for class net.nutch.io.SetFile.Writer
Create the named set for keys of the named class.
SetFile.Writer(NutchFileSystem, String, WritableComparator) - Constructor for class net.nutch.io.SetFile.Writer
Create the named set using the named key comparator.
SimpleCharStream - class net.nutch.quality.dynamic.SimpleCharStream.
An implementation of interface CharStream, where the stream is assumed to contain only ASCII characters (without unicode processing).
SimpleCharStream(Reader, int, int, int) - Constructor for class net.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(Reader, int, int) - Constructor for class net.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(Reader) - Constructor for class net.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(InputStream, int, int, int) - Constructor for class net.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(InputStream, int, int) - Constructor for class net.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(InputStream) - Constructor for class net.nutch.quality.dynamic.SimpleCharStream
 
SoftHashMap - class net.nutch.util.SoftHashMap.
A Map which uses SoftReferences to keep track of values.
SoftHashMap() - Constructor for class net.nutch.util.SoftHashMap
 
SoftHashMap.FinalizationListener - interface net.nutch.util.SoftHashMap.FinalizationListener.
An interface for Object which accept notification when an another Object is finalized.
SoftHashMap.FinalizationNotifier - interface net.nutch.util.SoftHashMap.FinalizationNotifier.
An interface for a Objects which can notify an object when they are finalized.
StringUtil - class net.nutch.util.StringUtil.
A collection of String processing utility methods.
StringUtil() - Constructor for class net.nutch.util.StringUtil
 
SuffixStringMatcher - class net.nutch.util.SuffixStringMatcher.
A class for efficiently matching Strings against a set of suffixes.
SuffixStringMatcher(String[]) - Constructor for class net.nutch.util.SuffixStringMatcher
Creates a new PrefixStringMatcher which will match Strings with any suffix in the supplied array.
SuffixStringMatcher(Collection) - Constructor for class net.nutch.util.SuffixStringMatcher
Creates a new PrefixStringMatcher which will match Strings with any suffix in the supplied Collection
Summarizer - class net.nutch.searcher.Summarizer.
Implements hit summarization.
Summarizer() - Constructor for class net.nutch.searcher.Summarizer
 
Summary - class net.nutch.searcher.Summary.
A document summary dynamically generated to match a query.
Summary() - Constructor for class net.nutch.searcher.Summary
Constructs an empty Summary.
Summary.Ellipsis - class net.nutch.searcher.Summary.Ellipsis.
An ellipsis fragment within a summary.
Summary.Ellipsis() - Constructor for class net.nutch.searcher.Summary.Ellipsis
Constructs an ellipsis fragment for the given text.
Summary.Fragment - class net.nutch.searcher.Summary.Fragment.
A fragment of text within a summary.
Summary.Fragment(String) - Constructor for class net.nutch.searcher.Summary.Fragment
Constructs a fragment for the given text.
Summary.Highlight - class net.nutch.searcher.Summary.Highlight.
A highlighted fragment of text within a summary.
Summary.Highlight(String) - Constructor for class net.nutch.searcher.Summary.Highlight
Constructs a highlighted fragment for the given text.
SwitchTo(int) - Method in class net.nutch.analysis.NutchAnalysisTokenManager
 
SwitchTo(int) - Method in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
save(OutputStream) - Method in class net.nutch.analysis.lang.NGramProfile
Writes NGramProfile content into OutputStream, content is outputted with UTF-8 encoding
save() - Method in class net.nutch.tools.ParseSegment
Split sorted ParserOutput into ParseData and ParseText, and generate new FetcherOutput with updated status
scoreDump() - Method in class net.nutch.tools.WebDBAdminTool
Emit each page's score and link data
search(Query, int) - Method in class net.nutch.searcher.DistributedSearch.Client
 
search(Query, int) - Method in class net.nutch.searcher.IndexSearcher
 
search(Query, int) - Method in class net.nutch.searcher.NutchBean
 
search(Query, int, int) - Method in class net.nutch.searcher.NutchBean
Search for pages matching a query, eliminating excessive hits from sites.
search(Query, int) - Method in interface net.nutch.searcher.Searcher
Return the top-scoring hits for a query.
second - Variable in class net.nutch.ndfs.FSParam
 
second - Variable in class net.nutch.ndfs.FSResults
 
seek(long) - Method in class net.nutch.fs.NFSDataInputStream
 
seek(long) - Method in class net.nutch.fs.NFSInputStream
Seek to the given offset from the start of the file.
seek(long) - Method in class net.nutch.io.ArrayFile.Reader
Positions the reader before its nth value.
seek(WritableComparable) - Method in class net.nutch.io.MapFile.Reader
Positions the reader at the named key, or if none such exists, at the first entry after the named key.
seek(long) - Method in class net.nutch.io.SequenceFile.Reader
Set the current byte position in the input file.
seek(WritableComparable) - Method in class net.nutch.io.SetFile.Reader
 
seek(long) - Method in class net.nutch.segment.SegmentReader
Seek to a position in all readers.
segmentDir - Variable in class net.nutch.segment.SegmentReader
 
segmentDir - Variable in class net.nutch.segment.SegmentWriter
 
sendNoOp() - Method in class net.nutch.protocol.ftp.Client
Sends a NOOP command to the FTP server.
set(DistributedWebDBWriter.LinkInstruction) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
Re-init from another LinkInstruction's info.
set(Link, int) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
Re-init with a Link and an instruction
set(DistributedWebDBWriter.PageInstruction) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
Init from another PageInstruction object.
set(Page, int) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
Init PageInstruction with no Link
set(Page, Link, int) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
Init PageInstruction with a Link
set(Link) - Method in class net.nutch.db.Link
 
set(Page) - Method in class net.nutch.db.Page
Copy the contents of another instance into this instance.
set(WebDBWriter.LinkInstruction) - Method in class net.nutch.db.WebDBWriter.LinkInstruction
Re-init from another LinkInstruction's info.
set(Link, int) - Method in class net.nutch.db.WebDBWriter.LinkInstruction
Re-init with a Link and an instruction
set(WebDBWriter.PageInstruction) - Method in class net.nutch.db.WebDBWriter.PageInstruction
Init from another PageInstruction object.
set(Page, int) - Method in class net.nutch.db.WebDBWriter.PageInstruction
Init PageInstruction with no Link
set(Page, Link, int) - Method in class net.nutch.db.WebDBWriter.PageInstruction
Init PageInstruction with a Link
set(Writable[]) - Method in class net.nutch.io.ArrayWritable
 
set(boolean) - Method in class net.nutch.io.BooleanWritable
Set the value of the BooleanWritable
set(int) - Method in class net.nutch.io.IntWritable
Set the value of this IntWritable.
set(long) - Method in class net.nutch.io.LongWritable
Set the value of this LongWritable.
set(MD5Hash) - Method in class net.nutch.io.MD5Hash
Copy the contents of another instance into this instance.
set(Writable[][]) - Method in class net.nutch.io.TwoDArrayWritable
 
set(String) - Method in class net.nutch.io.UTF8
Set to contain the contents of a string.
set(UTF8) - Method in class net.nutch.io.UTF8
Set to contain the contents of a string.
set(float) - Method in class net.nutch.tools.FetchListTool.SortableScore
 
setAlbum(String) - Method in class net.nutch.parse.mp3.MetadataCollector
 
setArtist(String) - Method in class net.nutch.parse.mp3.MetadataCollector
 
setBaseHref(URL) - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Sets the baseHref.
setClazz(String) - Method in class net.nutch.plugin.Extension
Sets the Class that implement the concret extension and is only used until model creation at system start up.
setClean(boolean) - Method in class net.nutch.tools.ParseSegment
Set if clean intermediates.
setCombiner(Class) - Method in class net.nutch.mapReduce.MapReduceJob
Set the combiner class, if any, to a Reducer.
setCommand(String) - Method in class net.nutch.util.CommandRunner
 
setContent(byte[]) - Method in class net.nutch.protocol.Content
 
setContentType(String) - Method in class net.nutch.protocol.Content
 
setDataTimeout(int) - Method in class net.nutch.protocol.ftp.Client
Sets the timeout in milliseconds to use for data connection.
setDebugStream(PrintStream) - Method in class net.nutch.analysis.NutchAnalysisTokenManager
 
setDebugStream(PrintStream) - Method in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
setDestroyOnTimeout(boolean) - Method in class net.nutch.util.CommandRunner
 
setDigest(String) - Method in class net.nutch.io.MD5Hash
Sets the digest value from a hex string.
setDiscriptor(PluginDescriptor) - Method in class net.nutch.plugin.Extension
Sets the plugin descriptor and is only used until model creation at system start up.
setExpireTime(long) - Method in class net.nutch.protocol.http.RobotRulesParser.RobotRuleSet
Change when the ruleset goes stale.
setFactor(int) - Method in class net.nutch.io.SequenceFile.Sorter
Set the number of streams to merge at once.
setFetchDate(long) - Method in class net.nutch.fetcher.FetcherOutput
 
setFetchInterval(byte) - Method in class net.nutch.db.Page
 
setFileType(int) - Method in class net.nutch.protocol.ftp.Client
Sets the file type to be transferred.
setFollowTalk(boolean) - Method in class net.nutch.protocol.ftp.Ftp
Set followTalk
setId(String) - Method in class net.nutch.plugin.Extension
Sets the unique extension Id and is only used until model creation at system start up.
setIndexInterval(int) - Method in class net.nutch.io.MapFile.Writer
Sets the index interval.
setIndexInterval(int) - Method in class net.nutch.segment.SegmentWriter
Sets the index interval for all segment writers.
setIndexNo(int) - Method in class net.nutch.searcher.Hit
 
setInputStream(InputStream) - Method in class net.nutch.util.CommandRunner
 
setKeepConnection(boolean) - Method in class net.nutch.protocol.ftp.Ftp
Set keepConnection
setLogLevel(Level) - Static method in class net.nutch.fetcher.Fetcher
Set the logging level.
setLogLevel(Level) - Static method in class net.nutch.tools.ParseSegment
Set the logging level.
setMD5(MD5Hash) - Method in class net.nutch.db.Page
 
setMapper(Class) - Method in class net.nutch.mapReduce.MapReduceJob
Set the Mapper class.
setMaxContentLength(int) - Method in class net.nutch.protocol.file.File
Set the point at which content is truncated.
setMaxContentLength(int) - Method in class net.nutch.protocol.ftp.Ftp
Set the point at which content is truncated.
setMemory(int) - Method in class net.nutch.io.SequenceFile.Sorter
Set the total amount of buffer memory, in bytes.
setMoreFromSiteExcluded(boolean) - Method in class net.nutch.searcher.Hit
True iff other, lower-scoring, hits from the same site have been excluded from the list which contains this hit..
setName(String) - Method in class net.nutch.analysis.lang.NGramProfile
 
setName(Class, String) - Static method in class net.nutch.io.WritableName
Set the name that a class should be known as to something other than the class name.
setNextFetchTime(long) - Method in class net.nutch.db.Page
 
setNoCache() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Sets noCache to true.
setNoFollow() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Sets noFollow to true.
setNoIndex() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Sets noIndex to true.
setNumBytes(long) - Method in class net.nutch.ndfs.Block
 
setNumMapTasks(int) - Method in class net.nutch.mapReduce.MapReduceJob
Set the desired number of map tasks to be executed.
setNumOutlinks(int) - Method in class net.nutch.db.Page
 
setNumReduceTasks(int) - Method in class net.nutch.mapReduce.MapReduceJob
Set the desired number of reduce tasks to be executed.
setPartitioner(Class) - Method in class net.nutch.mapReduce.MapReduceJob
Set the Partitioner class.
setQuery(String) - Method in class net.nutch.clustering.carrot2.LocalNutchInputComponent
 
setReducer(Class) - Method in class net.nutch.mapReduce.MapReduceJob
Set the Reducer class.
setRemoteVerificationEnabled(boolean) - Method in class net.nutch.protocol.ftp.Client
Enable or disable verification that the remote host taking part of a data connection is the same as the host to which the control connection is attached.
setRetriesSinceFetch(int) - Method in class net.nutch.db.Page
 
setScore(float, float) - Method in class net.nutch.db.Page
 
setScore(float) - Method in class net.nutch.linkdb.LinkAnalysisEntry
 
setScorePower(float) - Method in class net.nutch.indexer.IndexSegment
Determines the power of link analyis scores.
setShowThreadIDs(boolean) - Static method in class net.nutch.util.LogFormatter
When set true, thread IDs are logged.
setStatus(int) - Method in class net.nutch.fetcher.FetcherOutput
 
setStdErrorStream(OutputStream) - Method in class net.nutch.util.CommandRunner
 
setStdOutputStream(OutputStream) - Method in class net.nutch.util.CommandRunner
 
setTargetHasOutlink(boolean) - Method in class net.nutch.db.Link
 
setThreadCount(int) - Method in class net.nutch.fetcher.Fetcher
Set thread count
setThreadCount(int) - Method in class net.nutch.tools.ParseSegment
Set thread count
setTimeout(int) - Method in class net.nutch.ipc.Client
Sets the timeout used for network i/o.
setTimeout(int) - Method in class net.nutch.ipc.Server
Sets the timeout used for network i/o.
setTimeout(int) - Method in class net.nutch.protocol.ftp.Ftp
Set the timeout.
setTimeout(int) - Method in class net.nutch.util.CommandRunner
 
setTitle(String) - Method in class net.nutch.parse.mp3.MetadataCollector
 
setTotalIsExact(boolean) - Method in class net.nutch.searcher.Hits
Set Hits.totalIsExact().
setURL(String) - Method in class net.nutch.db.Page
 
setWaitForExit(boolean) - Method in class net.nutch.util.CommandRunner
 
setWeight(float) - Method in class net.nutch.searcher.Query.Clause
 
shortestMatch(String) - Method in class net.nutch.util.PrefixStringMatcher
Returns the shortest prefix of input that is matched, or null if no match exists.
shortestMatch(String) - Method in class net.nutch.util.SuffixStringMatcher
Returns the shortest suffix of input that is matched, or null if no match exists.
shortestMatch(String) - Method in class net.nutch.util.TrieStringMatcher
Returns the shortest substring of input that is matched by a pattern in the trie, or null if no match exists.
shutDown() - Method in class net.nutch.plugin.Plugin
Shutdown the plugin.
shutdown() - Method in class net.nutch.util.ThreadPool
Turn off the pool.
size - Variable in class net.nutch.segment.SegmentReader
 
size - Variable in class net.nutch.segment.SegmentWriter
 
size() - Method in class net.nutch.util.FibonacciHeap
Returns the number of objects in the heap.
size() - Method in class net.nutch.util.SoftHashMap
 
skip(DataInput) - Static method in class net.nutch.io.UTF8
Skips over one UTF8 in the input.
skip(DataInput) - Static method in class net.nutch.parse.Outlink
Skips over one Outlink in the input.
sort(String, String) - Method in class net.nutch.io.SequenceFile.Sorter
Perform a file sort.
sort() - Method in class net.nutch.tools.ParseSegment
Sort ParserOutput
specialConstructor - Variable in class net.nutch.quality.dynamic.ParseException
This variable determines which constructor was used to create this object and thereby affects the semantics of the "getMessage" method (see below).
specialToken - Variable in class net.nutch.quality.dynamic.Token
This field is used to access special tokens that occur prior to this token, but after the immediately preceding regular (non-special) token.
stage - Variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
stages - Static variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
start() - Method in class net.nutch.ipc.Server
Starts the service.
startBlock(Block) - Method in class net.nutch.ndfs.FSDataset
A Block b will be coming soon!
startDocument() - Method in class net.nutch.parse.rtf.RTFParserDelegateImpl
 
startFile(UTF8, UTF8, boolean) - Method in class net.nutch.ndfs.FSNamesystem
The client would like to create a new block for the indicated filename.
startLocalInput(File, File) - Method in class net.nutch.fs.LocalFileSystem
We can read directly from the real local fs.
startLocalInput(File, File) - Method in class net.nutch.fs.NDFSFileSystem
Fetch remote NDFS file, place at tmpLocalFile
startLocalInput(File, File) - Method in class net.nutch.fs.NutchFileSystem
Returns a local File that the user can read from.
startLocalOutput(File, File) - Method in class net.nutch.fs.LocalFileSystem
We can write output directly to the final location
startLocalOutput(File, File) - Method in class net.nutch.fs.NDFSFileSystem
Output will go to the tmp working area.
startLocalOutput(File, File) - Method in class net.nutch.fs.NutchFileSystem
Returns a local File that the user can write output to.
startProcessing(RequestContext) - Method in class net.nutch.clustering.carrot2.LocalNutchInputComponent
A callback hook that starts the processing.
startTime - Variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
startUp() - Method in class net.nutch.plugin.Plugin
Will be invoked until plugin start up.
started - Variable in class net.nutch.segment.SegmentReader
The time when fetching of this segment started, as recorded in fetcher output data.
staticFlag - Static variable in class net.nutch.quality.dynamic.SimpleCharStream
 
status() - Method in class net.nutch.fetcher.Fetcher
Display the status of the fetcher run.
status() - Method in class net.nutch.tools.ParseSegment
Display the status of the parser run.
stop() - Method in class net.nutch.ipc.Client
Stop all threads related to this client.
stop() - Method in class net.nutch.ipc.Server
Stops the service.
styleList(List) - Method in class net.nutch.parse.rtf.RTFParserDelegateImpl
 
subclasses(String) - Method in interface net.nutch.ontology.Ontology
 
subclasses(String) - Method in class net.nutch.ontology.OntologyImpl
retrieve all subclasses of entity(ies) hashed to searchTerm
success() - Method in class net.nutch.ndfs.FSResults
Whether the call worked.
synonyms(String) - Method in interface net.nutch.ontology.Ontology
 
synonyms(String) - Method in class net.nutch.ontology.OntologyImpl
retrieves synonyms from wordnet via sweet's web interface

T

TestClient - class net.nutch.fs.TestClient.
This class provides some NDFS administrative access.
TestClient(NutchFileSystem) - Constructor for class net.nutch.fs.TestClient
 
TextInputFormat - class net.nutch.mapReduce.TextInputFormat.
An InputFormat for plain text files.
TextInputFormat() - Constructor for class net.nutch.mapReduce.TextInputFormat
 
TextParser - class net.nutch.parse.text.TextParser.
 
TextParser() - Constructor for class net.nutch.parse.text.TextParser
 
ThreadPool - class net.nutch.util.ThreadPool.
ThreadPool.java ThreadPool maintains a large set of threads, which can be dedicated to a certain task, and then recycled.
ThreadPool(int) - Constructor for class net.nutch.util.ThreadPool
Creates a pool of numThreads size.
Token - class net.nutch.quality.dynamic.Token.
Describes the input token stream.
Token() - Constructor for class net.nutch.quality.dynamic.Token
 
TokenMgrError - error net.nutch.quality.dynamic.TokenMgrError.
 
TokenMgrError() - Constructor for class net.nutch.quality.dynamic.TokenMgrError
 
TokenMgrError(String, int) - Constructor for class net.nutch.quality.dynamic.TokenMgrError
 
TokenMgrError(boolean, int, int, int, String, char, int) - Constructor for class net.nutch.quality.dynamic.TokenMgrError
 
TrieStringMatcher - class net.nutch.util.TrieStringMatcher.
TrieStringMatcher is a base class for simple tree-based string matching.
TrieStringMatcher() - Constructor for class net.nutch.util.TrieStringMatcher
 
TrieStringMatcher.TrieNode - class net.nutch.util.TrieStringMatcher.TrieNode.
Node class for the character tree.
TwoDArrayWritable - class net.nutch.io.TwoDArrayWritable.
A Writable for 2D arrays containing a matrix of instances of a class.
TwoDArrayWritable(Class) - Constructor for class net.nutch.io.TwoDArrayWritable
 
TwoDArrayWritable(Class, Writable[][]) - Constructor for class net.nutch.io.TwoDArrayWritable
 
targetHasOutlink() - Method in class net.nutch.db.Link
 
term() - Method in class net.nutch.analysis.NutchAnalysis
Parse a single term.
terminal - Variable in class net.nutch.util.TrieStringMatcher.TrieNode
 
text(String, String, int) - Method in class net.nutch.parse.rtf.RTFParserDelegateImpl
 
textDump(String) - Method in class net.nutch.tools.WebDBAdminTool
Emit the webdb to 2 text files.
toArray() - Method in class net.nutch.io.ArrayWritable
 
toArray() - Method in class net.nutch.io.TwoDArrayWritable
 
toContent() - Method in class net.nutch.protocol.file.FileResponse
 
toContent() - Method in class net.nutch.protocol.ftp.FtpResponse
 
toContent() - Method in class net.nutch.protocol.http.HttpResponse
 
toDate(String) - Static method in class net.nutch.net.protocols.HttpDateFormat
 
toHtml() - Method in class net.nutch.searcher.HitDetails
Display as HTML.
toLong(String) - Static method in class net.nutch.net.protocols.HttpDateFormat
 
toString() - Method in class net.nutch.analysis.lang.NGramProfile
Return ngramprofile as text
toString() - Method in class net.nutch.db.Link
Print out the record
toString() - Method in class net.nutch.db.Page
Print out the Page
toString() - Method in class net.nutch.fetcher.Fetcher.FetcherStatus
 
toString() - Method in class net.nutch.fetcher.FetcherOutput
 
toString() - Method in class net.nutch.fs.LocalFileSystem
 
toString() - Method in class net.nutch.fs.NDFSFileSystem
 
toString() - Method in class net.nutch.io.IntWritable
 
toString() - Method in class net.nutch.io.LongWritable
 
toString() - Method in class net.nutch.io.MD5Hash
Returns a string representation of this object.
toString() - Method in class net.nutch.io.SequenceFile.Reader
Returns the name of the file.
toString() - Method in class net.nutch.io.UTF8
Convert to a String.
toString() - Method in class net.nutch.io.VersionMismatchException
Returns a string representation of this object.
toString() - Method in class net.nutch.ndfs.Block
 
toString() - Method in class net.nutch.ndfs.DatanodeInfo
 
toString(Date) - Static method in class net.nutch.net.protocols.HttpDateFormat
Get the HTTP format of the specified date.
toString(Calendar) - Static method in class net.nutch.net.protocols.HttpDateFormat
 
toString(long) - Static method in class net.nutch.net.protocols.HttpDateFormat
 
toString() - Method in class net.nutch.pagedb.FetchListEntry
 
toString() - Method in class net.nutch.parse.Outlink
 
toString() - Method in class net.nutch.parse.ParseData
 
toString() - Method in class net.nutch.parse.ParseText
 
toString() - Method in class net.nutch.parse.html.DOMContentUtils.LinkParams
 
toString() - Method in class net.nutch.parse.msword.WordTextBuffer
 
toString() - Method in class net.nutch.protocol.Content
 
toString() - Method in class net.nutch.protocol.http.RobotRulesParser.RobotRuleSet
 
toString() - Method in class net.nutch.quality.dynamic.Token
Returns the image.
toString() - Method in class net.nutch.searcher.Hit
Display as a string.
toString() - Method in class net.nutch.searcher.HitDetails
Display as a string.
toString() - Method in class net.nutch.searcher.Query.Clause
 
toString() - Method in class net.nutch.searcher.Query.Phrase
 
toString() - Method in class net.nutch.searcher.Query.Term
 
toString() - Method in class net.nutch.searcher.Query
 
toString() - Method in class net.nutch.searcher.Summary.Ellipsis
Returns an HTML representation of this fragment.
toString() - Method in class net.nutch.searcher.Summary.Fragment
Returns an HTML representation of this fragment.
toString() - Method in class net.nutch.searcher.Summary.Highlight
Returns an HTML representation of this fragment.
toString() - Method in class net.nutch.searcher.Summary
Returns an HTML representation of this fragment.
toStrings() - Method in class net.nutch.io.ArrayWritable
 
toTabbedString() - Method in class net.nutch.db.Link
Get a tab-delimited version of the text data.
toTabbedString() - Method in class net.nutch.db.Page
A tab-delimited text version of the Page's data.
token - Variable in class net.nutch.analysis.NutchAnalysis
 
token - Variable in class net.nutch.quality.dynamic.PageDescription
 
tokenImage - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
tokenImage - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
tokenImage - Variable in class net.nutch.quality.dynamic.ParseException
This is a reference to the "tokenImage" array of the generated parser within which the parse error occurred.
tokenStream(String, Reader) - Method in class net.nutch.analysis.NutchDocumentAnalyzer
Returns a new token stream for text from the named field.
token_source - Variable in class net.nutch.analysis.NutchAnalysis
 
token_source - Variable in class net.nutch.quality.dynamic.PageDescription
 
totalCapacity() - Method in class net.nutch.ndfs.FSNamesystem
Total raw bytes
totalIsExact() - Method in class net.nutch.searcher.Hits
True if Hits.getTotal() gives the exact number of hits, or false if it is only an estimate of the total number of hits.
totalRawCapacity() - Method in class net.nutch.ndfs.NDFSClient
 
totalRawUsed() - Method in class net.nutch.ndfs.NDFSClient
 
totalRecords - Variable in class net.nutch.tools.SegmentMergeTool.SegmentMergeStatus
 
totalRemaining() - Method in class net.nutch.ndfs.FSNamesystem
Total non-used raw bytes
tryagain() - Method in class net.nutch.ndfs.FSResults
Whether the client should give it another shot

U

UNQUOTED_VALUE - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
URLFilter - interface net.nutch.net.URLFilter.
Interface used to limit which URLs enter Nutch.
URLFilterFactory - class net.nutch.net.URLFilterFactory.
Factory to create a URLFilter from "urlfilter.class" config property.
URL_KEYSPACE - Static variable in class net.nutch.db.EditSectionGroupWriter
 
URL_KEYSPACE_DIVIDERS - Static variable in class net.nutch.db.DBKeyDivision
 
UTF8 - class net.nutch.io.UTF8.
A WritableComparable for strings that uses the UTF8 encoding.
UTF8() - Constructor for class net.nutch.io.UTF8
 
UTF8(String) - Constructor for class net.nutch.io.UTF8
Construct from a given string.
UTF8(UTF8) - Constructor for class net.nutch.io.UTF8
Construct from a given string.
UTF8.Comparator - class net.nutch.io.UTF8.Comparator.
A WritableComparator optimized for UTF8 keys.
UTF8.Comparator() - Constructor for class net.nutch.io.UTF8.Comparator
 
UpdateDatabaseTool - class net.nutch.tools.UpdateDatabaseTool.
This class takes the output of the fetcher and updates the page and link DBs accordingly.
UpdateDatabaseTool(IWebDBWriter, boolean, int) - Constructor for class net.nutch.tools.UpdateDatabaseTool
Take in the WebDBWriter, instantiated elsewhere.
UpdateLineColumn(char) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
UrlNormalizer - interface net.nutch.net.UrlNormalizer.
Interface used to convert URLs to normal form and optionally do regex substitutions
UrlNormalizerFactory - class net.nutch.net.UrlNormalizerFactory.
Factory to create a UrlNormalizer from "urlnormalizer.class" config property.
unzip(byte[]) - Static method in class net.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array.
unzipBestEffort(byte[]) - Static method in class net.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array.
unzipBestEffort(byte[], int) - Static method in class net.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array, truncated to sizeLimit bytes, if necessary.
updateBlocks(Block[]) - Method in class net.nutch.ndfs.DatanodeInfo
 
updateForSegment(NutchFileSystem, String) - Method in class net.nutch.tools.UpdateDatabaseTool
Iterate through items in the FetcherOutput.
updateHeartbeat(long, long) - Method in class net.nutch.ndfs.DatanodeInfo
 
updateObsoleteCheck() - Method in class net.nutch.ndfs.DatanodeInfo
 
urlCompare(Object) - Method in class net.nutch.db.Link
Compare URLs, then compare MD5s.

V

VersionMismatchException - exception net.nutch.io.VersionMismatchException.
Thrown by VersionedWritable.readFields(DataInput) when the version of an object being read does not match the current implementation version as returned by VersionedWritable.getVersion().
VersionMismatchException(byte, byte) - Constructor for class net.nutch.io.VersionMismatchException
 
VersionedWritable - class net.nutch.io.VersionedWritable.
A base class for Writables that provides version checking.
VersionedWritable() - Constructor for class net.nutch.io.VersionedWritable
 
value() - Method in class net.nutch.quality.dynamic.PageDescription
 
values() - Method in class net.nutch.util.SoftHashMap
Not Implemented

W

WHITE - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
WORD - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
WORD_PUNCT - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
WRITE_COMPLETE - Static variable in interface net.nutch.ndfs.FSConstants
 
WRITE_METAINFO_PREFIX - Static variable in class net.nutch.db.EditSectionWriter
 
WebDBAdminTool - class net.nutch.tools.WebDBAdminTool.
The WebDBAdminTool is for Nutch administrators who need special access to the webdb.
WebDBAdminTool(IWebDBReader) - Constructor for class net.nutch.tools.WebDBAdminTool
 
WebDBInjector - class net.nutch.db.WebDBInjector.
This class takes a flat file of URLs and adds them as entries into a pagedb.
WebDBInjector(IWebDBWriter) - Constructor for class net.nutch.db.WebDBInjector
WebDBInjector takes a reference to a WebDBWriter that it should add to.
WebDBReader - class net.nutch.db.WebDBReader.
The WebDBReader implements all the read-only parts of accessing our web database.
WebDBReader(NutchFileSystem, File) - Constructor for class net.nutch.db.WebDBReader
Open a web db reader for the named directory.
WebDBWriter - class net.nutch.db.WebDBWriter.
This is a wrapper class that allows us to reorder write operations to the linkdb and pagedb.
WebDBWriter(NutchFileSystem, File) - Constructor for class net.nutch.db.WebDBWriter
Create a WebDBWriter.
WebDBWriter.LinkInstruction - class net.nutch.db.WebDBWriter.LinkInstruction.
Holds an instruction over a Link.
WebDBWriter.LinkInstruction() - Constructor for class net.nutch.db.WebDBWriter.LinkInstruction
 
WebDBWriter.LinkInstruction(Link, int) - Constructor for class net.nutch.db.WebDBWriter.LinkInstruction
 
WebDBWriter.LinkInstruction.MD5Comparator - class net.nutch.db.WebDBWriter.LinkInstruction.MD5Comparator.
Sorts the instruction first by Md5, then by opcode.
WebDBWriter.LinkInstruction.MD5Comparator() - Constructor for class net.nutch.db.WebDBWriter.LinkInstruction.MD5Comparator
 
WebDBWriter.LinkInstruction.UrlComparator - class net.nutch.db.WebDBWriter.LinkInstruction.UrlComparator.
Sorts the instruction first by url, then by opcode.
WebDBWriter.LinkInstruction.UrlComparator() - Constructor for class net.nutch.db.WebDBWriter.LinkInstruction.UrlComparator
 
WebDBWriter.LinkInstructionWriter - class net.nutch.db.WebDBWriter.LinkInstructionWriter.
LinkInstructionWriter very efficiently writes a LinkInstruction to a SequenceFile.Writer.
WebDBWriter.LinkInstructionWriter() - Constructor for class net.nutch.db.WebDBWriter.LinkInstructionWriter
 
WebDBWriter.PageInstruction - class net.nutch.db.WebDBWriter.PageInstruction.
PageInstruction holds an operation over a Page.
WebDBWriter.PageInstruction() - Constructor for class net.nutch.db.WebDBWriter.PageInstruction
 
WebDBWriter.PageInstruction(Page, int) - Constructor for class net.nutch.db.WebDBWriter.PageInstruction
 
WebDBWriter.PageInstruction(Page, Link, int) - Constructor for class net.nutch.db.WebDBWriter.PageInstruction
 
WebDBWriter.PageInstruction.PageComparator - class net.nutch.db.WebDBWriter.PageInstruction.PageComparator.
Sorts the instruction first by Page, then by opcode.
WebDBWriter.PageInstruction.PageComparator() - Constructor for class net.nutch.db.WebDBWriter.PageInstruction.PageComparator
 
WebDBWriter.PageInstruction.UrlComparator - class net.nutch.db.WebDBWriter.PageInstruction.UrlComparator.
Sorts the instruction first by url, then by opcode.
WebDBWriter.PageInstruction.UrlComparator() - Constructor for class net.nutch.db.WebDBWriter.PageInstruction.UrlComparator
 
WebDBWriter.PageInstructionWriter - class net.nutch.db.WebDBWriter.PageInstructionWriter.
PageInstructionWriter very efficiently writes a PageInstruction to a SequenceFile.Writer.
WebDBWriter.PageInstructionWriter() - Constructor for class net.nutch.db.WebDBWriter.PageInstructionWriter
 
Word6CHPBinTable - class net.nutch.parse.msword.chp.Word6CHPBinTable.
This class holds all of the character formatting properties from a Word 6.0/95 document.
Word6CHPBinTable(byte[], int, int, int) - Constructor for class net.nutch.parse.msword.chp.Word6CHPBinTable
Constructor used to read a binTable in from a Word document.
WordExtractor - class net.nutch.parse.msword.WordExtractor.
This class extracts the text from a Word 6.0/95/97/2000/XP word doc
WordExtractor() - Constructor for class net.nutch.parse.msword.WordExtractor
Constructor
WordTextBuffer - class net.nutch.parse.msword.WordTextBuffer.
This class acts as a StringBuffer for text from a word document.
WordTextBuffer() - Constructor for class net.nutch.parse.msword.WordTextBuffer
 
Writable - interface net.nutch.io.Writable.
A simple, efficient, serialization protocol, based on DataInput and DataOutput.
WritableComparable - interface net.nutch.io.WritableComparable.
An interface which extends both Writable and Comparable.
WritableComparator - class net.nutch.io.WritableComparator.
A Comparator for WritableComparables.
WritableComparator(Class) - Constructor for class net.nutch.io.WritableComparator
Construct for a WritableComparable implementation.
WritableName - class net.nutch.io.WritableName.
Utility to permit renaming of Writable implementation classes without invalidiating files that contain their class name.
WritableUtils - class net.nutch.io.WritableUtils.
 
WritableUtils() - Constructor for class net.nutch.io.WritableUtils
 
walk(Node, URL, Properties) - Static method in class org.creativecommons.nutch.CCParseFilter.Walker
Scan the document adding attributes to metadata.
write(DataOutput) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
write(DataOutput) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
write(DataOutput) - Method in class net.nutch.db.Link
Write bytes out to stream
write(DataOutput) - Method in class net.nutch.db.Page
Write the bytes out to the bytestream
write(DataOutput) - Method in class net.nutch.db.WebDBWriter.LinkInstruction
 
write(DataOutput) - Method in class net.nutch.db.WebDBWriter.PageInstruction
 
write(DataOutput) - Method in class net.nutch.fetcher.FetcherOutput
 
write(DataOutput) - Method in class net.nutch.indexer.DeleteDuplicates.IndexedDoc
 
write(DataOutput) - Method in class net.nutch.io.ArrayWritable
 
write(DataOutput) - Method in class net.nutch.io.BooleanWritable
 
write(DataOutput) - Method in class net.nutch.io.BytesWritable
 
write(DataInput, int) - Method in class net.nutch.io.DataOutputBuffer
Writes bytes from a DataInput directly into the buffer.
write(DataOutput) - Method in class net.nutch.io.IntWritable
 
write(DataOutput) - Method in class net.nutch.io.LongWritable
 
write(DataOutput) - Method in class net.nutch.io.MD5Hash
 
write(DataOutput) - Method in class net.nutch.io.NullWritable
 
write(DataOutput) - Method in class net.nutch.io.TwoDArrayWritable
 
write(DataOutput) - Method in class net.nutch.io.UTF8
 
write(DataOutput) - Method in class net.nutch.io.VersionedWritable
 
write(DataOutput) - Method in interface net.nutch.io.Writable
Writes the fields of this object to out.
write(DataOutput) - Method in class net.nutch.linkdb.LinkAnalysisEntry
 
write(DataOutput) - Method in class net.nutch.ndfs.Block
 
write(DataOutput) - Method in class net.nutch.ndfs.DatanodeInfo
 
write(DataOutput) - Method in class net.nutch.ndfs.FSParam
 
write(DataOutput) - Method in class net.nutch.ndfs.FSResults
 
write(DataOutput) - Method in class net.nutch.ndfs.HeartbeatData
 
write(DataOutput) - Method in class net.nutch.ndfs.NDFSFileInfo
 
write(DataOutput) - Method in class net.nutch.pagedb.FetchListEntry
 
write(DataOutput) - Method in class net.nutch.parse.Outlink
 
write(DataOutput) - Method in class net.nutch.parse.ParseData
 
write(DataOutput) - Method in class net.nutch.parse.ParseText
 
write(DataOutput) - Method in class net.nutch.protocol.Content
 
write(DataOutput) - Method in class net.nutch.searcher.DistributedSearch.Param
 
write(DataOutput) - Method in class net.nutch.searcher.DistributedSearch.Result
 
write(DataOutput) - Method in class net.nutch.searcher.Hit
 
write(DataOutput) - Method in class net.nutch.searcher.HitDetails
 
write(DataOutput) - Method in class net.nutch.searcher.Hits
 
write(DataOutput) - Method in class net.nutch.searcher.Query.Clause
 
write(DataOutput) - Method in class net.nutch.searcher.Query.Phrase
 
write(DataOutput) - Method in class net.nutch.searcher.Query.Term
 
write(DataOutput) - Method in class net.nutch.searcher.Query
 
write(DataOutput) - Method in class net.nutch.tools.FetchListTool.SortableScore
 
writeCompressedByteArray(DataOutput, byte[]) - Static method in class net.nutch.io.WritableUtils
 
writeCompressedString(DataOutput, String) - Static method in class net.nutch.io.WritableUtils
 
writeString(DataOutput, String) - Static method in class net.nutch.io.UTF8
Write a UTF-8 encoded string.
writeString(DataOutput, String) - Static method in class net.nutch.io.WritableUtils
 
writeStringArray(DataOutput, String[]) - Static method in class net.nutch.io.WritableUtils
 
writeToBlock(Block) - Method in class net.nutch.ndfs.FSDataset
Start writing to a block file

X

X_POINT_ID - Static variable in interface net.nutch.clustering.OnlineClusterer
The name of the extension point.
X_POINT_ID - Static variable in interface net.nutch.indexer.IndexingFilter
The name of the extension point.
X_POINT_ID - Static variable in interface net.nutch.ontology.Ontology
The name of the extension point.
X_POINT_ID - Static variable in interface net.nutch.parse.HtmlParseFilter
The name of the extension point.
X_POINT_ID - Static variable in interface net.nutch.parse.Parser
The name of the extension point.
X_POINT_ID - Static variable in interface net.nutch.protocol.Protocol
The name of the extension point.
X_POINT_ID - Static variable in interface net.nutch.searcher.QueryFilter
The name of the extension point.

Z

zip(byte[]) - Static method in class net.nutch.util.GZIPUtils
Returns an gzipped copy of the input array.

_

__openPassiveDataConnection(int, String) - Method in class net.nutch.protocol.ftp.Client
 

A B C D E F G H I J K L M N O P Q R S T U V W X Z _

Copyright © 2005 The Nutch Organization.