In lighttpd 1.4.6 we have added some modifications for sites which have handle some 100 files in parallel with size of more than 100Mb each.
The problem in earlier releases was that lighttpd had to wait until the disk had seeked to the right place, read a few 100 kbyte to send it out. And this for each request as this scenario was completly trashing the disk-buffering. The IO-wait went sky-high and we were completly bound to the disk-io.
You could see this by running vmstat:
$ vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 7 306852 51396 152 237004 632 492 34952 7531 6983 11677 3 25 0 72
You see, we have 72% io-wait and 25% are spent in the kernel doing something. Only 3% are spent in userland (lighty).
When we investigated the problem we layed out a plan:
/* Optimizations for the future: * * adaptive mem-mapping * the problem: * we mmap() the whole file. If someone has alot large files and 32bit * machine the virtual address area will be unrun and we will have a failing * mmap() call. * solution: * only mmap 16M in one chunk and move the window as soon as we have finished * the first 8M * * read-ahead buffering * the problem: * sending out several large files in parallel trashes the read-ahead of the * kernel leading to long wait-for-seek times. * solutions: (increasing complexity) * 1. use madvise * 2. use a internal read-ahead buffer in the chunk-structure * 3. use non-blocking IO for file-transfers * */
And that’s what we did step-by-step:
- instead of mmap()ing the whole file we use a moving window of 512kb
- we used madvise() to tell the kernel to load this area into memory NOW
- we have a switch to use local-buffering instead of asking the kernel to do so
If you want to use it, set:
server.network-backend = “writev”
If you want to try out the local-buffering define LOCAL_BUFFERING in network_writev.c.
Up to now we don’t put any pressure on async-file-io as the coding impact is quite large and we don’t expect a big win over the current behaviour.
Both ways result in better caching the data we need which reduces the number of seeks and in the end reduces io-wait for us.
Problem solved, next one :)