Save This Page
Home » nutch-1.0 » org.apache.nutch » protocol » http » api » [javadoc | source]
org.apache.nutch.protocol.http.api
abstract public class: HttpBase [javadoc | source]
java.lang.Object
   org.apache.nutch.protocol.http.api.HttpBase

All Implemented Interfaces:
    Protocol

Direct Known Subclasses:
    Http, Http

Field Summary
public static final  int BUFFER_SIZE     
protected  String proxyHost    The proxy hostname. 
protected  int proxyPort    The proxy port. 
protected  boolean useProxy    Indicates if a proxy is used 
protected  int timeout    The network timeout in millisecond 
protected  int maxContent    The length limit for downloaded content, in bytes. 
protected  int maxDelays    The number of times a thread will delay when trying to fetch a page. 
protected  int maxThreadsPerHost    The maximum number of threads that should be allowed to access a host at one time. 
protected  long serverDelay    The number of seconds the fetcher will delay between successive requests to the same server. 
protected  String userAgent    The Nutch 'User-Agent' request header 
protected  boolean useHttp11    Do we use HTTP/1.1? 
protected  long maxCrawlDelay    Skip page if Crawl-Delay longer than this value. 
protected  boolean checkBlocking    Plugin should handle host blocking internally. 
protected  boolean checkRobots    Plugin should handle robot rules checking internally. 
Constructor:
 public HttpBase() 
 public HttpBase(Log logger) 
    Creates a new instance of HttpBase
Method from org.apache.nutch.protocol.http.api.HttpBase Summary:
getConf,   getMaxContent,   getMaxDelays,   getMaxThreadsPerHost,   getProtocolOutput,   getProxyHost,   getProxyPort,   getResponse,   getRobotRules,   getServerDelay,   getTimeout,   getUseHttp11,   getUserAgent,   logConf,   main,   processDeflateEncoded,   processGzipEncoded,   setConf,   useProxy
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.nutch.protocol.http.api.HttpBase Detail:
 public Configuration getConf() 
 public int getMaxContent() 
 public int getMaxDelays() 
 public int getMaxThreadsPerHost() 
 public ProtocolOutput getProtocolOutput(Text url,
    CrawlDatum datum) 
 public String getProxyHost() 
 public int getProxyPort() 
 abstract protected Response getResponse(URL url,
    CrawlDatum datum,
    boolean followRedirects) throws IOException, ProtocolException
 public RobotRules getRobotRules(Text url,
    CrawlDatum datum) 
 public long getServerDelay() 
 public int getTimeout() 
 public boolean getUseHttp11() 
 public String getUserAgent() 
 protected  void logConf() 
 protected static  void main(HttpBase http,
    String[] args) throws Exception 
 public byte[] processDeflateEncoded(byte[] compressed,
    URL url) throws IOException 
 public byte[] processGzipEncoded(byte[] compressed,
    URL url) throws IOException 
 public  void setConf(Configuration conf) 
 public boolean useProxy()