Nutch 0.7.1 API

Nutch is the open-source search engine.

See:
          Description

Core
org.apache.nutch.analysis Tokenizer for documents and query parser.
org.apache.nutch.clustering  
org.apache.nutch.clustering.carrot2  
org.apache.nutch.db Web database: tracks page fetches and link structure.
org.apache.nutch.fetcher The Nutch robot.
org.apache.nutch.fs  
org.apache.nutch.html  
org.apache.nutch.indexer Maintain Lucene full-text indexes.
org.apache.nutch.io Generic i/o code for use when reading and writing data to the network, to databases, and to files.
org.apache.nutch.ipc Client/Server code used by distributed search.
org.apache.nutch.linkdb  
org.apache.nutch.mapReduce A system for scalable, fault-tolerant, distributed computation over large data collections.
org.apache.nutch.mapReduce.demo  
org.apache.nutch.mapReduce.lib Library of generally useful mappers, reducers, and partitioners.
org.apache.nutch.ndfs  
org.apache.nutch.net A url filter plugin.
org.apache.nutch.net.protocols  
org.apache.nutch.ontology  
org.apache.nutch.pagedb  
org.apache.nutch.parse  
org.apache.nutch.plugin  
org.apache.nutch.protocol  
org.apache.nutch.quality.dynamic  
org.apache.nutch.searcher Search API
org.apache.nutch.searcher.more A more query plugin.
org.apache.nutch.segment  
org.apache.nutch.servlet  
org.apache.nutch.tools  
org.apache.nutch.util  
org.apache.nutch.util.mime  

 

Plugins
org.apache.nutch.analysis.lang Text document language identifier.
org.apache.nutch.indexer.basic A basic indexing plugin.
org.apache.nutch.indexer.more A more indexing plugin.
org.apache.nutch.parse.html An HTML document parsing plugin.
org.apache.nutch.parse.js  
org.apache.nutch.parse.msword A Word document parsing plugin.
org.apache.nutch.parse.msword.chp  
org.apache.nutch.parse.pdf A pdf parsing plugin.
org.apache.nutch.parse.text A plain text parsing plugin.
org.apache.nutch.protocol.file Protocol plugin which supports retrieving local file resources.
org.apache.nutch.protocol.ftp Protocol plugin which supports retrieving documents via the ftp protocol.
org.apache.nutch.protocol.http Protocol plugin which supports retrieving documents via the http protocol.
org.apache.nutch.protocol.httpclient Protocol plugin which supports retrieving documents via the HTTP protocol.
org.creativecommons.nutch Sample plugins that parse and index Creative Commons medadata.

 

Nutch is the open-source search engine.



Copyright © 2005 The Apache Software Foundation