PRE-RELEASE: lighttpd-1.5.0-r1454.tar.gz 16
Thanks to brave testers in #lighttpd the AIO-support is stabilizing very well and the corruptions that have been reported are fixed now.
Next to bugfixes, I implemented chunk-stealing and doubled the performance of aio for small files (100k) [16MByte/s instead of 9MByte/s].
Download: http://www.lighttpd.net/download/lighttpd-1.5.0-r1454.tar.gz
linux AIO and large files 8
The benchmarks only showed results for small files (100kbyte). Time to add larger files to the pool and talk about the chunk-size.
I just push all the work to the kernel and hope that it does it right. Currently I allow 64 jobs to be pushed to the kernel. Kernel threads are more light-weight that “real” threads.
Currently I’m working on a posix AIO version. On linux that is using threads to handle the read(), let’s see how that works out.
I did a third benchmark round against 1000 10Mbyte files. tibco @ IRC is running a flv-site in china and said that their files are around 12-17Mb.
Client was a win2003-amd64, dual core box connected via Intel Pro/1000 to the server [raid1 … as before].
linux-aio-sendfile: 52Mbyte/s [reading 1Mbyte chunks]
avg-cpu: %user %nice %system %iowait %steal %idle
1.80 0.00 46.20 13.40 0.00 38.60
linux-aio-sendfile: 55Mbyte/s [reading 768kbyte chunks]
avg-cpu: %user %nice %system %iowait %steal %idle
2.99 0.00 56.37 4.58 0.00 36.06
linux-aio-sendfile: 58Mbyte/s [reading 512kbyte chunks]
avg-cpu: %user %nice %system %iowait %steal %idle
1.40 0.00 62.67 5.39 0.00 30.54
linux-aio-sendfile: 54Mbyte/s [reading 384kbyte chunks]
avg-cpu: %user %nice %system %iowait %steal %idle
5.18 0.00 55.38 1.99 0.00 37.45
linux-aio-sendfile: 21Mbyte/s [reading 256kbyte chunks]
avg-cpu: %user %nice %system %iowait %steal %idle
21.00 0.00 28.60 0.80 0.00 49.60
Compared to:
linux-sendfile: 30Mbyte/s
avg-cpu: %user %nice %system %iowait %steal %idle
1.20 0.00 22.20 71.00 0.00 5.60
Summary
No matter what, large files or small files, when you disk start to suffer from seeking around AIO will give you, at least in my setup, 80% more throughput.
mod-proxy-core and SQF 2
mod-proxy-core has 3 different balancers for different needs. Round Robin, Shortest Queue First and CARP.
We can categorize the balancers into two sections:
- load balancing by distribution (RR, SQF)
- load balancing by separation (CARP)
Round Robin
Round Robin (RR) is simple and straight forward.
If you have 3 hosts
- A1
- A2
- A3
the first request goes to A1, the seconds to A2 and the third to A3. The forth request starts at A1 again.
We use a slightly different implementation. Instead of really going from A1 to A2 to A3, we take all active backends and pick one randomly. On average each host gets the same number of requests.
Shortest Queue First
RR has a little problem. If A1 is slower than A2 and A3, the fast backends will get the same number of requests as the A1.
SQF tries to take that into account and take the queue-length as the base for the balancing.
The first request goes to A1 and takes 10s to complete. Meanwhile we get 4 other requests which A2 and A3 execute in 2s.
After two seconds it looks like this:
- A1 needs 8 more seconds [q-len: 1]
- A2 is free [q-len: 0]
- A3 is free [q-len: 0]
Request 3 goes to A2:
- A1 needs 8 more seconds [q-len: 1]
- A2 needs 2 seconds [q-len: 1]
- A3 is free [q-len: 0]
and Request 4 goes to A3:
- A1 needs 8 more seconds [q-len: 1]
- A2 needs 2 seconds [q-len: 1]
- A3 needs 2 seconds [q-len: 1]
If another request comes in now, we put it into the backlog.
Benchmarks
This had to be benchmarked. What a luck that I have enough hardware at home, so we have 4 boxes joining the ring:
- client (.23): a Mac Mini, 1.2GHz, 100Mbit
- proxy (.27): AMD64 3000+, Linux 2.6.x, 1Gbit
- backend-1 (.22): Intel P4 1.2GHz, WinXP 32-bit
- backend-2 (.25): AMD64 X2, 3500+, Win2003 64-bit
The backends are running Apache 2.2.x taken from the MSI, mod_status enabled.
The proxy is lighty 1.5.0-r1435 with mod-proxy-core:
$SERVER["socket"] == ":1445" {
proxy-core.protocol = "http"
# proxy-core.balancer = "round-robin"
proxy-core.balancer = "sqf"
proxy-core.backends = ( "192.168.178.25:80", "192.168.178.22:8080" )
# proxy-core.backends = ( "192.168.178.22:8080" )
# proxy-core.backends = ( "192.168.178.25:80" )
proxy-core.max-pool-size = 32
}
The backends are serving the 44byte index.html which is in the htdocs/ folder by default.
The client is always running the same command:$ ab -k -n 100000 -c 16 http://192.168.178.27:1445/As a first test we only active .25, the dual core box:
Requests per second: 2833.60 [#/sec] (mean)Only .22 [my 3yr old Centrino notebook] gives:
Requests per second: 1249.45 [#/sec] (mean)
Using RR will balance the request equally over both hosts. We expect at max the double request-rate of the slowest backend.
| balancer | req/s | req .22 | req .25 | %idle .27 |
| RR | 2122 | 50122 | 49896 | 50% |
| SQF | 3213 | 30970 | 69048 | 25% |
You see how SQF takes the adjusts to the possibilities of the backend and balances nicely while RR is just doing its thing and results in alot less throughput in the end.
BTW: If keep-alive is disabled, the req/s drop from 3213 to 2678 req/s with SQF and from 2122 to 1835 for RR.
The proxy-server (our lighty) is using its CPU very well and I already found ways to optimize the proxy-code to use less CPU. It won’t affect the performance of this benchmark alot as the backends are at 100% already.
PRE-RELEASE: lighttpd-1.5.0-r1435.tar.gz 5
Yeah, really.
Before you jump around and empty a barrel of beer, try to compile it first. :)
Download: http://www.lighttpd.net/download/lighttpd-1.5.0-r1435.tar.gz
Finally I got some time to finish the loose ends of 1.5.0. MySQL Network MAS is going to release (hopefully) next week, giving me time to work on lighty again.
What works and what doesn’t ?
- mod_fastcgi, mod_cgi, mod_scgi, mod_proxy are removed
- mod_proxy_core is the replace for the above plugins
- you have to spawn fastcgi processes with spawn-fcgi
- mod_cml is removed and mod_magnet isn’t included yet
Linux AIO
I blogged about Linux AIO before, now you can try it out. Install libaio and build lighttpd with—with-linux-aio.
server.network-backend = "linux-aio-sendfile"
mod-proxy-core
I checked that balancing and uploading works nicely work mod-proxy-core with fastcgi and http as protocols.
PHP
Start PHP with spawn-fcgi as documented in the manual and add
$HTTP["url"] =~ "\.php$" {
proxy-core.balancer = "round-robin"
proxy-core.protocol = "fastcgi"
proxy-core.backends = ( "127.0.0.1:1026" )
proxy-core.max-pool-size = 16
}
the to config.
BTW: we use FCGI_KEEP_CONN to keep the connection between lighttpd and the FsatCGI backend up as long as possible.
HTTP (mongrel)
We use keep-alive and HTTP/1.1 by default. Give it a try.
$SERVER["socket"] == ":1445" {
proxy-core.protocol = "http"
# proxy-core.balancer = "round-robin"
proxy-core.balancer = "sqf"
proxy-core.backends = (
"10.0.0.10:80",
"10.0.0.11:80" )
}
sqf is Shortest Queue First and is the preferred balancer if you have backends which different CPUs. See the next blog-post.
mod-upload-progress
lighty 1.5.0 and linux-aio 14
1.5.0 will be a big win for all users. It will be more flexible in the handling and will have huge improvement for static files thanks to async io.
The following benchmarks shows a increase of 80% for the new linux-aio-sendfile backend compared the classic linux-sendfile one.
The test-env is
- client: Mac Mini 1.2Ghz, MacOS X 10.4.8, 1Gb RAM, 100Mbit
- server: AMD64 3000+, 1Gb RAM, Linux 2.6.16.21-xen, 160Gb RAID1 (soft-raid)
The server is running lighttpd 1.4.13 and lighttpd 1.5.0-svn with a clean config [no modules loaded], the client will use http_load.
The client will run:$ ./http_load -verbose -parallel 100 -fetches 10000 urls
I used this little script to generate 1000 folders, with 100 files each of 100kbyte.
for i in `seq 1 1000`; do
mkdir -p files-$i;
for j in `seq 1 100`; do
dd if=/dev/zero of=files-$i/$j bs=100k count=1 2> /dev/null;
done;
done
That’s 10Gbyte of data, 10 times larger the RAM size of the server as we want to become seek-bound on our disks.
The Limits
2 Seagate Barracuda 160Gb disks (ST3160827AS) are building a RAID1 via the linux-md driver. The 7200 RPMs will give us 480 seeks/s max (7200 RPM = 120 r/s, .5 rotations avg. per seek, 2 disks).
Each disk can send 30Mbyte/s sequential read, combined 60Mbyte.
The Network is 100Mbit/s, we expect it to limit at 10Mbyte/s.
lighttpd 1.4.13, sendfile
A first test run against lighttpd 1.4.13 with linux-sendfile gives use:
$ iostat 5
avg-cpu: %user %nice %system %iowait %steal %idle
0.99 0.00 4.77 86.68 0.20 7.36
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 35.19 3503.78 438.97 17624 2208
sdb 33.40 4052.49 438.97 20384 2208
md0 119.48 7518.09 429.42 37816 2160
avg-cpu: %user %nice %system %iowait %steal %idle
0.60 0.00 4.61 78.36 0.00 16.43
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 31.46 3408.42 365.53 17008 1824
sdb 30.06 3313.83 365.53 16536 1824
md0 104.21 6760.72 357.52 33736 1784
The http_load returned:
./http_load -verbose -parallel 100 -fetches 10000 urls --- 60.006 secs, 1744 fetches started, 1644 completed, 100 current --- 120 secs, 3722 fetches started, 3622 completed, 100 current --- 180 secs, 5966 fetches started, 5866 completed, 100 current --- 240 secs, 8687 fetches started, 8587 completed, 100 current 10000 fetches, 100 max parallel, 1.024e+09 bytes, in 274.323 seconds 102400 mean bytes/connection 36.4534 fetches/sec, 3.73283e+06 bytes/sec msecs/connect: 51.7815 mean, 147.412 max, 0.181 min msecs/first-response: 360.689 mean, 6178.2 max, 1.08 min HTTP response codes: code 200 -- 10000
lighttpd 1.5.0, sendfile
The same test with lighttpd 1.5.0 using the same network backend: linux-sendfile.
avg-cpu: %user %nice %system %iowait %steal %idle
0.40 0.00 3.60 85.60 0.00 10.40
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 33.80 4606.40 564.80 23032 2824
sdb 37.00 4723.20 564.80 23616 2824
md0 136.00 9368.00 553.60 46840 2768
avg-cpu: %user %nice %system %iowait %steal %idle
0.80 0.00 4.80 81.80 0.00 12.60
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 33.40 4198.40 504.00 20992 2520
sdb 30.60 4564.80 504.00 22824 2520
md0 123.60 8763.20 496.00 43816 2480
avg-cpu: %user %nice %system %iowait %steal %idle
0.80 0.00 5.19 81.24 0.00 12.77
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 36.53 4490.22 493.41 22496 2472
sdb 32.34 4784.03 493.41 23968 2472
md0 126.75 9274.25 483.83 46464 2424
The client said:
--- 60 secs, 2444 fetches started, 2344 completed, 100 current --- 120.003 secs, 4957 fetches started, 4857 completed, 100 current --- 180 secs, 7359 fetches started, 7259 completed, 100 current --- 240 secs, 9726 fetches started, 9626 completed, 100 current 10000 fetches, 100 max parallel, 1.024e+09 bytes, in 246.803 seconds 102400 mean bytes/connection 40.5181 fetches/sec, 4.14906e+06 bytes/sec msecs/connect: 55.5808 mean, 186.153 max, 0.24 min msecs/first-response: 398.639 mean, 6101.44 max, 9.313 min HTTP response codes: code 200 -- 10000
This is minimal better, but has still the same problems. We are maxed out by the disks and not by the network.
lighttpd 1.5.0, linux-aio-sendfile
We only switch the network-backend to the async io one:
server.network-backend = "linux-aio-sendfile"
... and run our benchmark again:
avg-cpu: %user %nice %system %iowait %steal %idle
8.38 0.00 10.18 38.52 0.00 42.91
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 42.91 7190.42 526.95 36024 2640
sdb 36.93 6144.51 526.95 30784 2640
md0 205.99 13213.57 517.37 66200 2592
avg-cpu: %user %nice %system %iowait %steal %idle
0.80 0.00 9.84 48.39 0.20 40.76
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 50.40 8369.48 573.49 41680 2856
sdb 44.18 7318.88 573.49 36448 2856
md0 241.77 15890.76 563.86 79136 2808
avg-cpu: %user %nice %system %iowait %steal %idle
0.60 0.00 8.38 44.91 0.00 46.11
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 50.10 7580.04 720.16 37976 3608
sdb 47.50 7179.24 720.16 35968 3608
md0 242.12 14558.08 710.58 72936 3560
The client said:
--- 60.0001 secs, 3792 fetches started, 3692 completed, 100 current --- 120 secs, 8778 fetches started, 8678 completed, 100 current 10000 fetches, 100 max parallel, 1.024e+09 bytes, in 137.551 seconds 102400 mean bytes/connection 72.7004 fetches/sec, 7.44452e+06 bytes/sec msecs/connect: 66.9088 mean, 197.157 max, 0.223 min msecs/first-response: 226.181 mean, 6066.96 max, 2.098 min HTTP response codes: code 200 -- 10000
Summary
Using Async IO allows lighttpd it overlap file-operations. We send a IO-request for the file and get notified when it is ready. Instead of waiting for the file (as in the normal sendfile()) and blocking the server, we can handle other requests instead.
On the other side we give the kernel to reorder the file-requests as it wants to.
Taking this two improments we can increase the throughput by 80%.
On the other side we don’t spend any time in wait in lighty itself. 64 kernel threads are handling the read()-calls for us in the background which increases the idle-time from 12% to 40%, a improvement of 230% .
Async IO on Linux 10
trunk/ just got support Linux Native AIO.
I implemented Async IO based on libaio which is a minimal wrapper around the aio-syscalls for the 2.6.x kernels.
Implementation
It was a bit tricky to get it working as libaio is basicly undocumented, but hey … that’s why we are hackers :)
The async file IO support is part of Linux 2.6.9 and later and should be on every recent linux box. A separate library call libaio is providing very simple wrappers and is used as the base for the new network backend.
The idea is:
- create a buffer in /dev/shm and mmap() it
- start a async read() from the source file to the mmap() buffer
- wait until the data is ready
- use sendfile() to send the data from /dev/shm to the network socket
Important for the performance: the data is never copied into user space. We only move it from one side of the kernel to the other side.
Hack ahead
Sadly I had to add pthread to the dependencies. Having threads in a single-threaded server is a bit strange, but it is necessary.
fdevent_poll() was waiting for fd-events for 1s. While it was waiting the server was waiting. The handling the async-notifications is also blocking and we can’t make them return as soon as one of them is done.
If necessary we start a io-getevent-thread which run in parallel to the fdevent_poll() call. The call which returns first is interrupting the other one by sending a SIGUSR1 to the process. It makes the waiting calls (poll() and io_getevents()) return with a EINTR and we can continue handling the result of one of the two calls.
Benchmarks
As testbed we have a RAID1 (linux md) via two
- ST3160827AS (SATA, 120Mb each)
- nVidia Corporation CK8S as SATA controller
- AMD Athlon™ 64 Processor 3000+
- Linux 2.6.16.21-0.25-xen (SuSE 10.1)
siege, 700Mb
I’ll compare linux-sendfile vs. linux-aio-sendfile.
| conc | non-aio | aio [512k] | aio [1M] |
| 1 | 52.38 MB/sec [9% idle] | 89.85 MB/sec [70% idle] | 107.50 MB/sec [67% idle] |
| 2 | 39.94 MB/sec [8% idle] | 94.52 MB/sec [70% idle] | 92.74 MB/sec [70% idle] |
| 5 | 35.45 MB/sec [7% idle] | 31.81 MB/sec [86% idle] | 72.84 MB/sec [70% idle] |
| 10 | .. | 25.22 MB/sec [82% idle] | 32.87 MB/sec [90%] idle |
More important than the throughput is the CPU time that can be spent with other tasks now.
What’s next ?
Next is bug fixing, load testing (more parallel connections), random load, ...What is Jan doing all the time ?
You might wonder why it takes to long to release 1.5.0 when most of it is already in trunk.
At MySQL we are in the final strokes of getting a GA release of Monitoring and Advisoring Service of MySQL Enterprise out of the door.
I’m still monitoring the IRC channel on freenode, but all development time is going into my MySQL stuff right now.
RELEASE: lighttpd 1.4.13 17
Only 2 weeks after .12 hit the servers we have a new release cleaning up the issues that were introduced by it.
On the fix side we have:- fixed a seg-fault in the HTTP-Request splitting
- fixed long-standing bug with Content-Length and HEAD requests
- fixed a possible abort of a upload if xattr is enabled
- mod-magnet finally handles ‘require “lfs”’ without complaining
- mod-magnet got light.stat() which uses the stat-cache
- mod-webdav supports LOCK if compiled with—with-webdav-locks
$ configure --with-lua=lua5.1 ...as their lua-5.1 package isn’t called ‘lua’.
Enjoy this release and watch out for 1.5.0 on the horizon. :)
Download- lighttpd-1.4.13-1.i386.rpm [built on Fedora Core 4] MD5: 3d4a857e02e111d6955ccf76e416cb41
- lighttpd-1.4.13-1.src.rpm MD5: 6272d8310fae8bcc35e6ab4778e2016c
- lighttpd-1.4.13.tar.gz MD5: d775d6478391b95d841a1018c8db0b95
ChangeLog
- added initgroups in spawn-fcgi (#871)
- added apr1 support htpasswd in mod-auth (#870)
- added lighty.stat() to mod_magnet
- fixed segfault in splitted CRLF CRLF sequences (introduced in 1.4.12) (#876)
- fixed compilation of LOCK support in mod-webdav
- fixed fragments in request-URLs (#869)
- fixed pkg-config check for lua5.1 on debian
- fixed Content-Length = 0 on HEAD requests without a known Content-Length (#119)
- fixed mkdir() forcing 0700 (#884)
- fixed writev() on FreeBSD 4.x and older (#875)
- removed warning about a 404-error-handler returned 404
- backported and fixed the buildsystem changes for webdav locks
- fixed plugin loading so we can finally load lua extensions in mod_magnet scripts
- fixed large uploads if xattr is enabled
reducing Requests-Setup-Costs 5
Back in the times of the first implementations of mod-cml I took the request setup costs as the root of all evil. They were the problem I wanted to fix with mod-cml.
But what is the request setup cost ? What is influencing the request-time ? Where can you influence it ?
When you send a request to the browser it has to:
- receive the request
- parse the request
- connect the a back-end
- send the request to the back-end
- wait for a response
- receive the response and send it to the client
Using ab to fire the same request to the back-ends we should hit the caches very well and get a good idea what is left when we have hot caches:
$ ab -n 100 -c 1 http://127.0.0.1:1025/123.php ... Time per request: 7.515 [ms] (mean) [strace] ...ab was run against a lighty 1.4.13-r1385 running in strace. We will run all tests with this setup.
$ strace -o costs.txt -tt -s 512 lighttpd -D -f ./lighttpd.conf ...:26.574274 ... (last syscall of the previous request) ...:26.575448 accept(...) = 8 ...:26.576006 read(8, ...) ...:26.576702 connect(9, ... ) ...:26.577239 writev(9, ... ) ...:26.579688 read(9, ...) ...:26.580459 writev(8, ... ) ...:26.581128 close(8)
- 1.2ms for accept()ing the connection
- 0.5ms for reading the request from the client.
- 0.7ms for connecting to the backend over a unix-socket
- 0.5ms to writev() the request data to the backend
- 2.4ms to wait for the backend and reading the response
- 0.8ms for writev()ing the response to the client
- 0.5ms to close the connection again
(sum is 6.8ms)
Without strace the response-time is:Time per request: 2.946 [ms] (mean)and goes down to 0.5ms for the request in lighty. The 2.4ms in the backend stay the same.
Keep-Alive
The next attempt is using keep-alive to get rid of the accept() and the close() at the end. In the strace-timing it costed 1.7ms to execute those two calls.
$ ab -n 100 -c 1 -k http://127.0.0.1:1025/foo.php ... Time per request: 5.564 [ms] (mean) [strace] ...strace tells us:
...:03.242201 ... (the last syscall of the previous request) ...:03.242903 read(8, ...) ...:03.243703 connect(9, ...) ...:03.244144 writev(9, ...) ...:03.246396 read(9, ...) ...:03.246969 writev(8, ...) ...:03.247261 ... (last syscall of this request)
- 0.5ms for the read() of the request
- 0.8ms for the connect()
- 0.4ms for the writev() to the backend
- 2.2ms waiting + read()ing the response
- 0.7ms writev()ing the response to the client
(sum is 4.6ms)
Time per request: 2.394 [ms] (mean)
The 2.2ms seen here are spent in the backend and are not affected by the strace. It is the time spent in the backend. If you take them out of the calculation you get:
100 * (1 - ((2.4ms - 2.2ms) / (2.9ms - 2.5ms))) = 50% 100 * (1 - ((4.6ms - 2.2ms) / (6.8ms - 2.5ms))) = 44%
50% saving the lighty internal costs. But those 50% saving are only 10% in reality as most of the request-time is spent in the backend.
The Backend
The limiting factor for low request-times is a backend with as little overhead as possible. For the above timings I used:which is executing the script:
<?php echo "123" ?>
What is PHP doing in those 2.2-2.5ms ? strace will help us again.
...:58.351886 ... (last syscall of the previous request) ...:58.352020 accept(0, ...) = 4 ...:58.353110 read(4, ...) ...:58.353260 stat() ...:58.354431 open(...) ...:58.355489 read(5, ...) ...:58.357595 write(4, ...) ...:58.359911 close(4)
- 0.2ms for the accept()
- 1.1ms to read the whole fastcgi-request
- 0.1ms for the stat() to see of the file exists
- 1.2ms to open the file
- 1.0ms to read it
- 2.1ms to execute the script and send the response
- 2.4ms to close the connection
- 2.6ms are spend to bring up and shutdown the connection. FastCGI has the possibility to use keep-alive. This will be supported lighty 1.5.x
- 2.2ms are spent in getting the script-file into PHP.
PHP without a byte-code cache is reading twice
I wonder why there is a open()+read() on the file at all as I used XCache 1.0.x here as code-cache and the file is on the cache and is getting cache-hits.
If I remove xcache, the same php-file is read twice by PHP:
open("..../foo.php", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0664, st_size=26, ...}) = 0
...
fstat64(4, {st_mode=S_IFREG|0664, st_size=26, ...}) = 0
read(4, "<?php\n print \"123\";\n?>\n", 4096) = 26
_llseek(4, 0, [0], SEEK_SET) = 0
read(4, "<?php\n print \"123\";\n?>\n", 8192) = 26
read(4, "", 4096) = 0
read(4, "", 8192) = 0
close(4) = 0
The php-file is open()ed, stat()ed twice [why ?] and read() once, all 26 bytes.
Afterwards we seek to the start (_llseek()), read() the 26 bytes again (the same size that the fstat() told us) and have to call read() two times to really understand that there is really no more data.
update After discussing this with the php core devs it is a feature of the FastCGI/CGI SAPI. As old CLI scripts can be executed with the CGI SAPI it has to support the “#!/usr/bin/php” sequence for shell scripts. For webapps this line has to be skipped. Otherwise it would be printed to the output.
Removing the code gives us:Time per request: 2.150 [ms] (mean)
Getting Raw
Perhaps it is better to stay away from PHP and use a language which is not providing use my with all the nice features we like about PHP:- the automatic parsing of GET parameters
- the mapping from var[] into arrays
- file-upload support in the back
- output buffering, compression
- internal variables like $PHP_SELF, ...
As I already wrote a byte-code cache for mod-magnet I stretched the idea a bit and took the byte-code cache, wrapped a FCGI_accept() from the fastcgi-lib around it.
The lua-fastcgi-magnet is really simple and just does:
- creates a global lua environment
- calls FCGI_accept()
- loads the script from the script-cache
- creates a empty-script env for the script
- registers the print() function to use the fastcgi stdio wrapper
- executes the script
- goes to FCGI_accept() again
I want to execute the same script and see how the response time is now:
Time per request: 1.594 [ms] (mean)
We saved 0.8ms or 30% of the whole request time.
The strace for magnet is alot simpler:...:28.541361 ... (last call of previous request) ...:28.541507 accept(0, ...) = 3 ...:28.541916 read(3, ...) ...:28.542397 stat64(...) ...:28.542809 write(3, ...) ...:28.544807 close(3)
- 0.2ms for the accept()
- 0.4ms for the read()
- 0.4ms for the stat()
- 0.5ms to execute the script and write the response
- 2.0ms to wait for the final packet and closing the connection
If we now had keep-alive in our FastCGI implementation …
We already saw, that the script executing is 0.8ms faster in real-time. If we attach strace to lighty again and run ab, we get what we expect:
Time per request: 4.800 [ms] (mean) [strace]
The 5.6ms from PHP in strace with keep-live is 0.8ms more.
a core-magnet
If you really want spend even less time and don’t have to wait on any external sources you can also try to use mod-magnet to execute the script.Time per request: 0.584 [ms] (mean)
As I save the connect to the backend and transferring the data to it I can response blazingly fast.
Conclusion
Let’s ask the final question: why do I care about those 0.8ms at all ?| PHP without keep-alive | 2.9ms | 345 req/s |
|---|---|---|
| PHP | 2.4ms | 416 req/s |
| PHP patched | 2.1ms | 465 req/s |
| lua-fastcgi | 1.6ms | 625 req/s |
| mod-magnet | 0.6ms | 1700 req/s |
As the print “123” is the smallest script which really generates output it should show that this is highest request count you can get. You can’t get more than those 416 req/s with PHP from my test machine:
- AMD Duron 1.3GHz
- 640MB RAM DDR
- Disk and Network don’t matter as the benchmark is RAM and loopback based
If you need a higher throughput use mod-magnet to offload some work from the backend. Perhaps you can cache some content and handle the response directly in mod-magnet and only have to escalate in a few percent of the caches.
Or use another scripting language which allows you to work more directly on the FastCGI level like:
- ruby
- python
- perl
- http://jan.kneschke.de/projects/lua/
PRE-RELEASE: lighttpd-1.4.13-r1385 1
It looks like mod_magnet gets more and more attraction.
- Paul Querna thinks about a magnet-like module for apache
- On marsorange is step by step install for mod-magnet on OS X
- darix finally got all the pieces together for Dr Magneto vs Mr 404 handler
Against the least pre-release we have some minor bug-fixes and the new lighty.stat() function for mod-magnet which is using our internal stat-cache to reduce the number of stat() calls which hit the kernel. darix had some nice benchmarks on it.
Download: http://www.lighttpd.net/download/lighttpd-1.4.13-r1385.tar.gz
ChangeLog
- added initgroups in spawn-fcgi (#871)
- added apr1 support htpasswd in mod-auth (#870)
- added lighty.stat() to mod_magnet
- fixed segfault in splitted CRLF CRLF sequences (introduced in 1.4.12) (#876)
- fixed compilation of LOCK support in mod-webdav
- fixed fragments in request-URLs (#869)
- fixed pkg-config check for lua5.1 on debian
- fixed Content-Length = 0 on HEAD requests without a known Content-Length (#119)
- fixed mkdir() forcing 0700 (#884)
- fixed writev() on FreeBSD 4.x and older (#875)
- removed warning about a 404-error-handler returned 404