Accelerating Small File-Transfers 14
Thanks to some help from a irc-channel (#lighttpd at irc.freenode.net) we solved another long-standing problem:
As lighttpd is event-based web-server we have problems when it comes to blocking operations. In 1.5.0 we add async sendfile() operations which helps for large files alot. For small files most of the time is spent on the initial stat() call which has no async interface.
Fobax submitted a nice solution for this problem: move the stat() to a fastcgi app which returns with X-LIGHTTPD-send-file: and hands the request back to lighttpd. The fastcgi can block and spend some time while lighttpd moves on the with other requests. When the fastcgi returns the information for the stat() call is in the fs-buffers and lighttpd doesn’t block on the stat() anymore.
All this is documented by darix in the wiki at HowtoSpeedUpStatWithFastcgi
This works with mod_fastcgi in 1.4.0 or with mod-proxy-core in 1.5.0 + aio.
For 1.5.0 I added fcgi-stat-accel to svn and to the cmake build.
I want to on port 1029 as a first test round. The -C 1 is to start only one thread in the back to see the impact later.
$ ./build/spawn-fcgi -f ./build/fcgi-stat-accel -p 1029 -C 1
As config on lighttpd side we have to enable X-Sendfile and keep a few connections open in the pool.
$SERVER["socket"] == ":1025" {
$HTTP["url"] =~ "^/seek-bound/" {
proxy-core.protocol = "fastcgi"
proxy-core.backends = ( "127.0.0.1:1029" )
proxy-core.allow-x-sendfile = "enable"
proxy-core.max-pool-size = 20
}
}
As test-env I used 100k files as in the other tests (10G of data over all).
$ http_load -parallel 200 -seconds 60 urls.100k
iostat said:
$ iostat -xm 5
avg-cpu: %user %nice %system %iowait %steal %idle
9.20 0.00 45.80 45.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 73.00 0.00 13278.40 0.00 6.48 0.00 181.90 7.09 98.30 13.71 100.08
sdb 0.00 0.00 69.20 0.00 12625.60 0.00 6.16 0.00 182.45 13.63 194.71 14.46 100.08
We are limited by the disks now, perhaps we can reduce the CPU usage a bit more by using unix domain sockets instead of TCP:
avg-cpu: %user %nice %system %iowait %steal %idle
8.19 0.00 38.56 53.25 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 1.00 67.63 4.30 12533.07 47.95 6.12 0.02 174.91 10.28 144.44 13.89 99.90
sdb 0.00 1.00 66.13 4.30 12442.76 47.95 6.08 0.02 177.35 11.92 168.46 14.18 99.90
The system time drops by 6, good enough.
Summary
Thanks to Fobax great idea I can finally max out my two disks. If you have more disks the impact will be a lot larger. Give it a try.
name Throughput util% ----------------- ------------- --------- no stat-accel 12.07MByte/s 81% stat-accel (tcp) 13.64MByte/s 99% stat-accel (unix) 13.86MByte/s 99%
Trackbacks
Use the following link to trackback from your own site:
http://blog.lighttpd.net/articles/trackback/3238
-
Just as a proof of concept I implemented a threaded stat() call. It is a bit of a hack currently, but it looks promising when I look at the performance data: avg-cpu: %user %nice %system %iowait %steal %idle 5.00 0.00 26.60 68.40 0.00 0.00 Device:...
Comments
-
Linux has a dcache (directory entry cache). It should be possible to 'prime the cache' by running a 'find' command over the webroot thus making lighty's stat call return from the in-memory cache.
-
that doesnt help, as over the time the requested data might change so cache entries are lost and than we have long blocking stat() calls again.
-
Someone else below asked this already about antispam scripts. I am getting nailed with Spam on my website mails and in our blog website - now its offline too much spam. Is there anyway to stop this? If not, there really isn't any point in leaving it up and active. Any help will be greatly appreciated.
-
@Pozycjonowanie: How's this to do with Accerlerating small files? Anyway I think the idea is great, could you possible have lighty do this automatically by itself? So that lighty spawns fcgi automatically, etc?
-
The reason this works, is that lighty blocks on file io but network io is async. If you're serving lots of small files, a very large amount of reads will block the whole lighty process waiting for the disk to seek. During that time your other disks and cpu is sitting idle. By pushing the waits into a separate threaded process lighty can keep serving files that are already in the kernel cache while the disk is working. It also lets you keep all disks busy. The cpu overhead from this is often worth the trade off.
-
How about making another thread/backend (okay this is some major change versus the mono-thread) which handles disk io in lighttpd itself and comunicates with the network thread asynchronous? Having too much fastcgi backends (php, rails) in my opinion makes the life more messy (you'll hit some open file limits (in case of tcp, sockets are way to unrealable), and other problems appear)..
-
I second Roze's suggestion. It feels sort of "hacky" to delegate this work to another process via FastCGI.
-
Hey: I don't mean to nag, but I'm considering deploying lighthttpd to a website in the netcraft top 1000. Is there a scheduled release date for 1.5.0? Thanks!
-
Hey: I don't mean to nag, but I'm considering deploying lighthttpd to a website in the netcraft top 1000. Is there a scheduled release date for 1.5.0? Thanks!
-
when its done. and it does _not_ help if you ask too often. ;)
-
Roze: I totally agree that this should be integrated into lighty itself, but that isn't trivial. Until that day comes, this works very well.
-
Fobax: Sure, I don't brag about this solution or anything.. My thought was just to maybe consider such feature sooner (even maybe in the new 1.5 frame) as implementing it later wont be any easier (well this could also be wrong because the environment changes itself (the kernel and new filesystems with customizable IO policy appear))..
-
There is some work going on in the Linux kernel that may make async disk i/o a lot easier in the near future. Go to this URL and scroll down to "Asynchronous buffered file I/O": http://lwn.net/Articles/215235/
-
is this related http://www.ussg.iu.edu/hypermail/linux/kernel/0701.3/2163.html ?