org.apache.nutch.crawl
Class Generator

java.lang.Object
  extended byorg.apache.hadoop.conf.Configured
      extended byorg.apache.nutch.crawl.Generator
All Implemented Interfaces:
Configurable

public class Generator
extends Configured

Generates a subset of a crawl db to fetch.


Nested Class Summary
static class Generator.DecreasingFloatComparator
           
static class Generator.HashComparator
          Sort fetch lists by hash of URL.
static class Generator.Selector
          Selects entries due for fetch.
static class Generator.SelectorEntry
           
static class Generator.SelectorInverseMapper
           
 
Field Summary
static org.apache.commons.logging.Log LOG
           
 
Constructor Summary
Generator(Configuration conf)
          Construct a generator.
 
Method Summary
 Path generate(Path dbDir, Path segments)
          Generate fetchlists in a segment.
 Path generate(Path dbDir, Path segments, int numLists, long topN, long curTime)
          Generate fetchlists in a segment.
static String generateSegmentName()
           
static void main(String[] args)
          Generate a fetchlist from the pagedb and linkdb
 
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final org.apache.commons.logging.Log LOG
Constructor Detail

Generator

public Generator(Configuration conf)
Construct a generator.

Method Detail

generate

public Path generate(Path dbDir,
                     Path segments)
              throws IOException
Generate fetchlists in a segment.

Throws:
IOException

generate

public Path generate(Path dbDir,
                     Path segments,
                     int numLists,
                     long topN,
                     long curTime)
              throws IOException
Generate fetchlists in a segment.

Throws:
IOException

generateSegmentName

public static String generateSegmentName()

main

public static void main(String[] args)
                 throws Exception
Generate a fetchlist from the pagedb and linkdb

Throws:
Exception


Copyright © 2006 The Apache Software Foundation