Async IO on Linux 10
trunk/ just got support Linux Native AIO.
I implemented Async IO based on libaio which is a minimal wrapper around the aio-syscalls for the 2.6.x kernels.
Implementation
It was a bit tricky to get it working as libaio is basicly undocumented, but hey … that’s why we are hackers :)
The async file IO support is part of Linux 2.6.9 and later and should be on every recent linux box. A separate library call libaio is providing very simple wrappers and is used as the base for the new network backend.
The idea is:
- create a buffer in /dev/shm and mmap() it
- start a async read() from the source file to the mmap() buffer
- wait until the data is ready
- use sendfile() to send the data from /dev/shm to the network socket
Important for the performance: the data is never copied into user space. We only move it from one side of the kernel to the other side.
Hack ahead
Sadly I had to add pthread to the dependencies. Having threads in a single-threaded server is a bit strange, but it is necessary.
fdevent_poll() was waiting for fd-events for 1s. While it was waiting the server was waiting. The handling the async-notifications is also blocking and we can’t make them return as soon as one of them is done.
If necessary we start a io-getevent-thread which run in parallel to the fdevent_poll() call. The call which returns first is interrupting the other one by sending a SIGUSR1 to the process. It makes the waiting calls (poll() and io_getevents()) return with a EINTR and we can continue handling the result of one of the two calls.
Benchmarks
As testbed we have a RAID1 (linux md) via two
- ST3160827AS (SATA, 120Mb each)
- nVidia Corporation CK8S as SATA controller
- AMD Athlon™ 64 Processor 3000+
- Linux 2.6.16.21-0.25-xen (SuSE 10.1)
siege, 700Mb
I’ll compare linux-sendfile vs. linux-aio-sendfile.
| conc | non-aio | aio [512k] | aio [1M] |
| 1 | 52.38 MB/sec [9% idle] | 89.85 MB/sec [70% idle] | 107.50 MB/sec [67% idle] |
| 2 | 39.94 MB/sec [8% idle] | 94.52 MB/sec [70% idle] | 92.74 MB/sec [70% idle] |
| 5 | 35.45 MB/sec [7% idle] | 31.81 MB/sec [86% idle] | 72.84 MB/sec [70% idle] |
| 10 | .. | 25.22 MB/sec [82% idle] | 32.87 MB/sec [90%] idle |
More important than the throughput is the CPU time that can be spent with other tasks now.
What’s next ?
Next is bug fixing, load testing (more parallel connections), random load, ...Trackbacks
Use the following link to trackback from your own site:
http://blog.lighttpd.net/articles/trackback/2199
Comments
-
great job, Jan I have no idea about how to enalbe aio like set server.network-backend = "linux-aio-sendfile" but lighttpd does not know it. "2006-11-09 13:00:58: (network.c.535) server.network-backend has a unknown value: linux-aio-sendfile"
-
Great, thanks a lot! BTW, how does it compare against epoll?
-
I tested it out, but lighttpd 1.5 will crash after serve 1 request, and I can not get file from lighttpd 1.5 with wget or browser, that file size is 0. strace gave me this 14:03:25.028334 write(5, "2006-11-09 14:03:25: (src/network_linux_aio.c.166) sendfile failed: Bad file descriptor 6 \n", 91) = 91
-
Alberto: Perhaps I'm missing something, but I don't see how epoll() can help us here. poll() vs. epoll() doesn't matter for less than 10 connections (what I was testing here). Eric, stay on IRC and let us fix it there.
-
Jan please take a look on fcgi code. When the backend (php) runs slow, e.g., blocked by a slow MySQL query, lighty will practically refuse all dynamic requests (returns 500). The problem makes me unable to replace Apache with lighty.
-
I have the same problem as namosys sometimes. I wish lighty would wait until it can get a backend before spitting out 500 right away... there isn't a timeout setting for this, right?
-
Great work Jan, really! I don't rely heavily on file I/O for my site, but any efficiency and speed improvements are good, right?. I would also like to urge you to look into the fcgi code aswell. I run lighty with fcgi-php over network and i also have problems with 500 errors. I am more than willing to help test/develop any changes you might consider for this.
-
Great work Jan, really! I don't rely heavily on file I/O for my site, but any efficiency and speed improvements are good, right?. I would also like to urge you to look into the fcgi code aswell. I run lighty with fcgi-php over network and i also have problems with 500 errors. I am more than willing to help test/develop any changes you might consider for this.
-
Great work Jan, really! I don't rely heavily on file I/O for my site, but any efficiency and speed improvements are good, right?. I would also like to urge you to look into the fcgi code aswell. I run lighty with fcgi-php over network and i also have problems with 500 errors. I am more than willing to help test/develop any changes you might consider for this.
-
We are serving large amount of thumbnail images (5-7kb) from squid-like directory structure. Avg file size: ~7kb Total data: ~1Terabyte Acess: totally random Disks: 4 sata disk, 16mb cache, linux sofware raid 0 (striping) Fs: reiserfs Deadline sheduler. Core duo processor. According to siege and /proc: = 1.4 branch = cpu waiting 95% 100rq/s (7MB)O 64 workers (!!!) load ~equal to number of workers = trunk branch (lbaio) = 1000rq/s (70MB) 64 workers load: 5 10x FASTER (!!!). Aio really kicks ass in these conditions. These are only FIRST PRE_TEST, done 5 times for 30s peridion, runned from other machine i GB network. Tommorow i will do some more test and send full results. (had to put server back online in old version due to lack of fastcgi support). Too good to be true (and siege sometimes lies)