1.5.0 will be a big win for all users. It will be more flexible in the handling and will have huge improvement for static files thanks to async io.
The following benchmarks shows a increase of 80% for the new linux-aio-sendfile backend compared the classic linux-sendfile one.
The test-env is
- client: Mac Mini 1.2Ghz, MacOS X 10.4.8, 1Gb RAM, 100Mbit
- server: AMD64 3000+, 1Gb RAM, Linux 2.6.16.21-xen, 160Gb RAID1 (soft-raid)
The server is running lighttpd 1.4.13 and lighttpd 1.5.0-svn with a clean config [no modules loaded], the client will use http_load.
The client will run:
$ ./http_load -verbose -parallel 100 -fetches 10000 urls
I used this little script to generate 1000 folders, with 100 files each of 100kbyte.
for i in `seq 1 1000`; do mkdir -p files-$i; for j in `seq 1 100`; do dd if=/dev/zero of=files-$i/$j bs=100k count=1 2> /dev/null; done; done
That’s 10Gbyte of data, 10 times larger the RAM size of the server as we want to become seek-bound on our disks.
The Limits
2 Seagate Barracuda 160Gb disks (ST3160827AS) are building a RAID1 via the linux-md driver. The 7200 RPMs will give us 480 seeks/s max (7200 RPM = 120 r/s, .5 rotations avg. per seek, 2 disks).
Each disk can send 30Mbyte/s sequential read, combined 60Mbyte.
The Network is 100Mbit/s, we expect it to limit at 10Mbyte/s.
lighttpd 1.4.13, sendfile
A first test run against lighttpd 1.4.13 with linux-sendfile gives use:
$ iostat 5 avg-cpu: %user %nice %system %iowait %steal %idle 0.99 0.00 4.77 86.68 0.20 7.36 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 35.19 3503.78 438.97 17624 2208 sdb 33.40 4052.49 438.97 20384 2208 md0 119.48 7518.09 429.42 37816 2160 avg-cpu: %user %nice %system %iowait %steal %idle 0.60 0.00 4.61 78.36 0.00 16.43 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 31.46 3408.42 365.53 17008 1824 sdb 30.06 3313.83 365.53 16536 1824 md0 104.21 6760.72 357.52 33736 1784
The http_load returned:
./http_loadverbose -parallel 100 -fetches 10000 urls- 60.006 secs, 1744 fetches started, 1644 completed, 100 current
-
-120 secs, 3722 fetches started, 3622 completed, 100 current
-180 secs, 5966 fetches started, 5866 completed, 100 current
-240 secs, 8687 fetches started, 8587 completed, 100 current
10000 fetches, 100 max parallel, 1.024e+09 bytes, in 274.323 seconds
102400 mean bytes/connection
36.4534 fetches/sec, 3.73283e+06 bytes/sec
msecs/connect: 51.7815 mean, 147.412 max, 0.181 min
msecs/first-response: 360.689 mean, 6178.2 max, 1.08 min
HTTP response codes:
code 200 — 10000
lighttpd 1.5.0, sendfile
The same test with lighttpd 1.5.0 using the same network backend: linux-sendfile.
avg-cpu: %user %nice %system %iowait %steal %idle 0.40 0.00 3.60 85.60 0.00 10.40 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 33.80 4606.40 564.80 23032 2824 sdb 37.00 4723.20 564.80 23616 2824 md0 136.00 9368.00 553.60 46840 2768 avg-cpu: %user %nice %system %iowait %steal %idle 0.80 0.00 4.80 81.80 0.00 12.60 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 33.40 4198.40 504.00 20992 2520 sdb 30.60 4564.80 504.00 22824 2520 md0 123.60 8763.20 496.00 43816 2480 avg-cpu: %user %nice %system %iowait %steal %idle 0.80 0.00 5.19 81.24 0.00 12.77 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 36.53 4490.22 493.41 22496 2472 sdb 32.34 4784.03 493.41 23968 2472 md0 126.75 9274.25 483.83 46464 2424
The client said:
-60 secs, 2444 fetches started, 2344 completed, 100 current
-120.003 secs, 4957 fetches started, 4857 completed, 100 current
-180 secs, 7359 fetches started, 7259 completed, 100 current
-240 secs, 9726 fetches started, 9626 completed, 100 current
10000 fetches, 100 max parallel, 1.024e+09 bytes, in 246.803 seconds
102400 mean bytes/connection
40.5181 fetches/sec, 4.14906e+06 bytes/sec
msecs/connect: 55.5808 mean, 186.153 max, 0.24 min
msecs/first-response: 398.639 mean, 6101.44 max, 9.313 min
HTTP response codes:
code 200 — 10000
This is minimal better, but has still the same problems. We are maxed out by the disks and not by the network.
lighttpd 1.5.0, linux-aio-sendfile
We only switch the network-backend to the async io one:
server.network-backend = "linux-aio-sendfile"
… and run our benchmark again:
avg-cpu: %user %nice %system %iowait %steal %idle 8.38 0.00 10.18 38.52 0.00 42.91 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 42.91 7190.42 526.95 36024 2640 sdb 36.93 6144.51 526.95 30784 2640 md0 205.99 13213.57 517.37 66200 2592 avg-cpu: %user %nice %system %iowait %steal %idle 0.80 0.00 9.84 48.39 0.20 40.76 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 50.40 8369.48 573.49 41680 2856 sdb 44.18 7318.88 573.49 36448 2856 md0 241.77 15890.76 563.86 79136 2808 avg-cpu: %user %nice %system %iowait %steal %idle 0.60 0.00 8.38 44.91 0.00 46.11 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 50.10 7580.04 720.16 37976 3608 sdb 47.50 7179.24 720.16 35968 3608 md0 242.12 14558.08 710.58 72936 3560
The client said:
--- 60.0001 secs, 3792 fetches started, 3692 completed, 100 current --- 120 secs, 8778 fetches started, 8678 completed, 100 current 10000 fetches, 100 max parallel, 1.024e+09 bytes, in 137.551 seconds 102400 mean bytes/connection 72.7004 fetches/sec, 7.44452e+06 bytes/sec msecs/connect: 66.9088 mean, 197.157 max, 0.223 min msecs/first-response: 226.181 mean, 6066.96 max, 2.098 min HTTP response codes: code 200 -- 10000
Summary
Using Async IO allows lighttpd it overlap file-operations. We send a IO-request for the file and get notified when it is ready. Instead of waiting for the file (as in the normal sendfile()) and blocking the server, we can handle other requests instead.
On the other side we give the kernel to reorder the file-requests as it wants to.
Taking this two improments we can increase the throughput by 80%.
On the other side we don’t spend any time in wait in lighty itself. 64 kernel threads are handling the read()-calls for us in the background which increases the idle-time from 12% to 40%, a improvement of 230% .