Save This Page
Home » nutch-1.0 » org.apache.nutch » crawl » [javadoc | source]
org.apache.nutch.crawl
public class: Generator [javadoc | source]
java.lang.Object
   org.apache.hadoop.conf.Configured
      org.apache.nutch.crawl.Generator

All Implemented Interfaces:
    org.apache.hadoop.util.Tool

Generates a subset of a crawl db to fetch.
Nested Class Summary:
public static class  Generator.SelectorEntry   
public static class  Generator.Selector  Selects entries due for fetch. 
public static class  Generator.DecreasingFloatComparator   
public static class  Generator.SelectorInverseMapper   
public static class  Generator.PartitionReducer   
public static class  Generator.HashComparator  Sort fetch lists by hash of URL. 
public static class  Generator.CrawlDbUpdater  Update the CrawlDB so that the next generate won't include the same URLs. 
Field Summary
public static final  String CRAWL_GENERATE_FILTER     
public static final  String GENERATE_MAX_PER_HOST_BY_IP     
public static final  String GENERATE_MAX_PER_HOST     
public static final  String GENERATE_UPDATE_CRAWLDB     
public static final  String CRAWL_TOP_N     
public static final  String CRAWL_GEN_CUR_TIME     
public static final  String CRAWL_GEN_DELAY     
public static final  Log LOG     
Constructor:
 public Generator() 
 public Generator(Configuration conf) 
Method from org.apache.nutch.crawl.Generator Summary:
generate,   generate,   generateSegmentName,   main,   run
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.nutch.crawl.Generator Detail:
 public Path generate(Path dbDir,
    Path segments,
    int numLists,
    long topN,
    long curTime) throws IOException 
    Generate fetchlists in a segment. Whether to filter URLs or not is read from the crawl.generate.filter property in the configuration files. If the property is not found, the URLs are filtered.
 public Path generate(Path dbDir,
    Path segments,
    int numLists,
    long topN,
    long curTime,
    boolean filter,
    boolean force) throws IOException 
    Generate fetchlists in a segment.
 public static synchronized String generateSegmentName() 
 public static  void main(String[] args) throws Exception 
    Generate a fetchlist from the crawldb.
 public int run(String[] args) throws Exception