net.nutch.net
Class RegexURLFilter

java.lang.Object
  extended bynet.nutch.net.RegexURLFilter
All Implemented Interfaces:
URLFilter

public class RegexURLFilter
extends Object
implements URLFilter

Filters URLs based on a file of regular expressions. The config file is named by the Nutch configuration property "urlfilter.regex.file".

The format of this file is:

 [+-]
 
where plus means go ahead and index it and minus means no.


Constructor Summary
RegexURLFilter()
           
RegexURLFilter(String filename)
           
 
Method Summary
 String filter(String url)
           
static void main(String[] args)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RegexURLFilter

public RegexURLFilter()
               throws IOException,
                      org.apache.oro.text.regex.MalformedPatternException

RegexURLFilter

public RegexURLFilter(String filename)
               throws IOException,
                      org.apache.oro.text.regex.MalformedPatternException
Method Detail

filter

public String filter(String url)
Specified by:
filter in interface URLFilter

main

public static void main(String[] args)
                 throws IOException,
                        org.apache.oro.text.regex.MalformedPatternException
Throws:
IOException
org.apache.oro.text.regex.MalformedPatternException


Copyright © 2005 The Nutch Organization.