Package org.apache.nutch.tools

Interface Summary
PruneIndexTool.PruneChecker This interface can be used to implement additional checking on matching documents.
 

Class Summary
DmozParser Utility that converts DMOZ RDF into a flat file of URLs to be injected.
PruneIndexTool This tool prunes existing Nutch indexes of unwanted content.
PruneIndexTool.PrintFieldsChecker This checker's main function is just to print out selected field values from each document, just before they are deleted.
PruneIndexTool.StoreUrlsChecker This checker's main function is just to store the URLs of each document to be deleted in a text file.
 



Copyright © 2006 The Apache Software Foundation