public class Injector extends NutchTool implements Tool
Note, that some metadata keys are reserved:
Example:
http://www.nutch.org/ \t nutch.score=10 \t nutch.fetchInterval=2592000 \t userType=open_source
Modifier and Type | Class and Description |
---|---|
static class |
Injector.InjectMapper
InjectMapper reads
the CrawlDb seeds are injected into
the plain-text seed files and parses each line into the URL and
metadata.
|
static class |
Injector.InjectReducer
Combine multiple new entries for a url.
|
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
nutchFetchIntervalMDName
metadata key reserved for setting a custom fetchInterval for a specific URL
|
static java.lang.String |
nutchFixedFetchIntervalMDName
metadata key reserved for setting a fixed custom fetchInterval for a
specific URL
|
static java.lang.String |
nutchScoreMDName
metadata key reserved for setting a custom score for a specific URL
|
static java.lang.String |
URL_FILTER_NORMALIZE_ALL
property to pass value of command-line option -filterNormalizeAll to mapper
|
currentJob, currentJobNum, numJobs, results, status
Constructor and Description |
---|
Injector() |
Injector(Configuration conf) |
Modifier and Type | Method and Description |
---|---|
void |
inject(Path crawlDb,
Path urlDir) |
void |
inject(Path crawlDb,
Path urlDir,
boolean overwrite,
boolean update) |
void |
inject(Path crawlDb,
Path urlDir,
boolean overwrite,
boolean update,
boolean normalize,
boolean filter,
boolean filterNormalizeAll) |
static void |
main(java.lang.String[] args) |
java.util.Map<java.lang.String,java.lang.Object> |
run(java.util.Map<java.lang.String,java.lang.Object> args,
java.lang.String crawlId)
Used by the Nutch REST service
|
int |
run(java.lang.String[] args) |
void |
usage() |
getProgress, getStatus, killJob, stopJob
getConf, setConf
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getConf, setConf
public static final java.lang.String URL_FILTER_NORMALIZE_ALL
public static java.lang.String nutchScoreMDName
public static java.lang.String nutchFetchIntervalMDName
public static java.lang.String nutchFixedFetchIntervalMDName
public Injector()
public Injector(Configuration conf)
public void inject(Path crawlDb, Path urlDir) throws java.io.IOException, java.lang.ClassNotFoundException, java.lang.InterruptedException
java.io.IOException
java.lang.ClassNotFoundException
java.lang.InterruptedException
public void inject(Path crawlDb, Path urlDir, boolean overwrite, boolean update) throws java.io.IOException, java.lang.ClassNotFoundException, java.lang.InterruptedException
java.io.IOException
java.lang.ClassNotFoundException
java.lang.InterruptedException
public void inject(Path crawlDb, Path urlDir, boolean overwrite, boolean update, boolean normalize, boolean filter, boolean filterNormalizeAll) throws java.io.IOException, java.lang.ClassNotFoundException, java.lang.InterruptedException
java.io.IOException
java.lang.ClassNotFoundException
java.lang.InterruptedException
public void usage()
public static void main(java.lang.String[] args) throws java.lang.Exception
java.lang.Exception
public int run(java.lang.String[] args) throws java.lang.Exception
Copyright © 2019 The Apache Software Foundation