net.nutch.net
Class RegexURLFilter
java.lang.Object
net.nutch.net.RegexURLFilter
- All Implemented Interfaces:
- URLFilter
- public class RegexURLFilter
- extends Object
- implements URLFilter
Filters URLs based on a file of regular expressions. The config file is
named by the Nutch configuration property "urlfilter.regex.file".
The format of this file is:
[+-]
where plus means go ahead and index it and minus means no.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RegexURLFilter
public RegexURLFilter()
throws IOException,
org.apache.oro.text.regex.MalformedPatternException
RegexURLFilter
public RegexURLFilter(String filename)
throws IOException,
org.apache.oro.text.regex.MalformedPatternException
filter
public String filter(String url)
- Specified by:
filter
in interface URLFilter
main
public static void main(String[] args)
throws IOException,
org.apache.oro.text.regex.MalformedPatternException
- Throws:
IOException
org.apache.oro.text.regex.MalformedPatternException
Copyright © 2005 The Nutch Organization.