public class MimeTypeIndexingFilter extends java.lang.Object implements IndexingFilter
IndexingFilter
that allows filtering
of documents based on the MIME Type detected by TikaModifier and Type | Field and Description |
---|---|
static java.lang.String |
MIMEFILTER_REGEX_FILE |
X_POINT_ID
Constructor and Description |
---|
MimeTypeIndexingFilter() |
Modifier and Type | Method and Description |
---|---|
NutchDocument |
filter(NutchDocument doc,
Parse parse,
Text url,
CrawlDatum datum,
Inlinks inlinks)
Adds fields or otherwise modifies the document that will be indexed for a
parse.
|
Configuration |
getConf() |
static void |
main(java.lang.String[] args)
Main method for invoking this tool
|
void |
setConf(Configuration conf) |
public static final java.lang.String MIMEFILTER_REGEX_FILE
public NutchDocument filter(NutchDocument doc, Parse parse, Text url, CrawlDatum datum, Inlinks inlinks) throws IndexingException
IndexingFilter
filter
in interface IndexingFilter
doc
- document instance for collecting fieldsparse
- parse data instanceurl
- page urldatum
- crawl datum for the page (fetch datum from segment containing
fetch status and fetch time)inlinks
- page inlinksIndexingException
public void setConf(Configuration conf)
setConf
in interface Configurable
public Configuration getConf()
getConf
in interface Configurable
public static void main(java.lang.String[] args) throws java.io.IOException, IndexingException
java.io.IOException
IndexingException
Copyright © 2018 The Apache Software Foundation