Async IO on Linux 10
trunk/ just got support Linux Native AIO.
I implemented Async IO based on libaio which is a minimal wrapper around the aio-syscalls for the 2.6.x kernels.
Implementation
It was a bit tricky to get it working as libaio is basicly undocumented, but hey … that’s why we are hackers :)
The async file IO support is part of Linux 2.6.9 and later and should be on every recent linux box. A separate library call libaio is providing very simple wrappers and is used as the base for the new network backend.
The idea is:
- create a buffer in /dev/shm and mmap() it
- start a async read() from the source file to the mmap() buffer
- wait until the data is ready
- use sendfile() to send the data from /dev/shm to the network socket
Important for the performance: the data is never copied into user space. We only move it from one side of the kernel to the other side.
Hack ahead
Sadly I had to add pthread to the dependencies. Having threads in a single-threaded server is a bit strange, but it is necessary.
fdevent_poll() was waiting for fd-events for 1s. While it was waiting the server was waiting. The handling the async-notifications is also blocking and we can’t make them return as soon as one of them is done.
If necessary we start a io-getevent-thread which run in parallel to the fdevent_poll() call. The call which returns first is interrupting the other one by sending a SIGUSR1 to the process. It makes the waiting calls (poll() and io_getevents()) return with a EINTR and we can continue handling the result of one of the two calls.
Benchmarks
As testbed we have a RAID1 (linux md) via two
- ST3160827AS (SATA, 120Mb each)
- nVidia Corporation CK8S as SATA controller
- AMD Athlon™ 64 Processor 3000+
- Linux 2.6.16.21-0.25-xen (SuSE 10.1)
siege, 700Mb
I’ll compare linux-sendfile vs. linux-aio-sendfile.
| conc | non-aio | aio [512k] | aio [1M] |
| 1 | 52.38 MB/sec [9% idle] | 89.85 MB/sec [70% idle] | 107.50 MB/sec [67% idle] |
| 2 | 39.94 MB/sec [8% idle] | 94.52 MB/sec [70% idle] | 92.74 MB/sec [70% idle] |
| 5 | 35.45 MB/sec [7% idle] | 31.81 MB/sec [86% idle] | 72.84 MB/sec [70% idle] |
| 10 | .. | 25.22 MB/sec [82% idle] | 32.87 MB/sec [90%] idle |
More important than the throughput is the CPU time that can be spent with other tasks now.