public class Ftp extends java.lang.Object implements Protocol
FtpResponse
object and gets the content of the url from it.
Configurable parameters are ftp.username
, ftp.password
,
ftp.content.limit
, ftp.timeout
, ftp.server.timeout
,
ftp.password
, ftp.keep.connection
and ftp.follow.talk
. For details see "FTP properties" section in nutch-default.xml
.Modifier and Type | Field and Description |
---|---|
protected static org.slf4j.Logger |
LOG |
X_POINT_ID
Constructor and Description |
---|
Ftp() |
Modifier and Type | Method and Description |
---|---|
protected void |
finalize() |
int |
getBufferSize() |
Configuration |
getConf()
Get the
Configuration object |
ProtocolOutput |
getProtocolOutput(Text url,
CrawlDatum datum)
Creates a
FtpResponse object corresponding to the url and returns a
ProtocolOutput object as per the content received |
crawlercommons.robots.BaseRobotRules |
getRobotRules(Text url,
CrawlDatum datum,
java.util.List<Content> robotsTxtContent)
Get the robots rules for a given url
|
static void |
main(java.lang.String[] args)
For debugging.
|
void |
setConf(Configuration conf)
Set the
Configuration object |
void |
setFollowTalk(boolean followTalk)
Set followTalk
|
void |
setKeepConnection(boolean keepConnection)
Set keepConnection
|
void |
setMaxContentLength(int length)
Set the point at which content is truncated.
|
void |
setTimeout(int to)
Set the timeout.
|
public void setTimeout(int to)
public void setMaxContentLength(int length)
public void setFollowTalk(boolean followTalk)
public void setKeepConnection(boolean keepConnection)
public ProtocolOutput getProtocolOutput(Text url, CrawlDatum datum)
FtpResponse
object corresponding to the url and returns a
ProtocolOutput
object as per the content receivedgetProtocolOutput
in interface Protocol
url
- Text containing the ftp urldatum
- The CrawlDatum object corresponding to the urlProtocolOutput
object for the urlprotected void finalize()
finalize
in class java.lang.Object
public static void main(java.lang.String[] args) throws java.lang.Exception
java.lang.Exception
public void setConf(Configuration conf)
Configuration
objectsetConf
in interface Configurable
public Configuration getConf()
Configuration
objectgetConf
in interface Configurable
public crawlercommons.robots.BaseRobotRules getRobotRules(Text url, CrawlDatum datum, java.util.List<Content> robotsTxtContent)
getRobotRules
in interface Protocol
url
- URL to checkdatum
- page datumrobotsTxtContent
- container to store responses when fetching the robots.txt file for
debugging or archival purposes. Instead of a robots.txt file, it
may include redirects or an error page (404, etc.). Response
Content
is appended to the passed list. If null is passed
nothing is stored.public int getBufferSize()
Copyright © 2019 The Apache Software Foundation