delay request handling for stupid crawlers 11
I’m sure you know what “Crawl-Delay” is, but you may or may not know that, not all search engine crawlers support this nice stuff.
What to do for those don’t obey the instrustion? They’ll eat all your Mbits/month or slow your webserver down. OK, ban it with url.access-deny. This is the only option u can choose before. But you don’t want to remove your pages from the stupid search engine index, do you?
Here comes another option for you: with this patch, u can delay handling of a specified request for some seconds. Example configuration:
$HTTP["user-agent"] =~ "stupid-crawler" {
connection.delay-seconds = 2
}
OK, here’s the link to the lighttpd-2296-request-handle-delay.patch which applies to branches/lighttpd-1.4.x@2296
Be aware that this patch is to be reviewed before commited to repo.
Trackbacks
Use the following link to trackback from your own site:
http://blog.lighttpd.net/articles/trackback/5565
It seems to me that delaying the request by n seconds will not change the overall rate of crawling - it'll just offset the responses by n seconds, and increase the concurrent connections (and thus memory usage) to the server during that time.
Or is there some aspect of the patch that I'm missing?
Also - I seem to remember that some crawlers take site response time into account when ranking results. If that is the case, this could make the site look awfully slow to the crawler.
That said - I like the idea :) But perhaps more of a rate-limiting approach would work better?
m0o has done a great job creating a nice open source application that you get without paying. And you complain about his usage of "u" instead of "you"?
What about saying "thank you" instead? :)
Thanks for creating lighttpd. It's really nice. Typing 'u' instead of 'you' isn't endearing you to anyone and I fear that it would give the impression to some folks that this was put together by an inexperienced, immature kid who couldn't possibly write anything of value.
i.e. - it could needlessly hinder uptake and I'd hate to see that happen to such a fine software package.