PRE-RELEASE: lighttpd-1.5.0-r1454.tar.gz 16

Posted by jan Wed, 15 Nov 2006 22:57:00 GMT

Thanks to brave testers in #lighttpd the AIO-support is stabilizing very well and the corruptions that have been reported are fixed now.

Next to bugfixes, I implemented chunk-stealing and doubled the performance of aio for small files (100k) [16MByte/s instead of 9MByte/s].

Download: http://www.lighttpd.net/download/lighttpd-1.5.0-r1454.tar.gz

linux AIO and large files 8

Posted by jan Tue, 14 Nov 2006 12:27:00 GMT

The benchmarks only showed results for small files (100kbyte). Time to add larger files to the pool and talk about the chunk-size.

I just push all the work to the kernel and hope that it does it right. Currently I allow 64 jobs to be pushed to the kernel. Kernel threads are more light-weight that “real” threads.

Currently I’m working on a posix AIO version. On linux that is using threads to handle the read(), let’s see how that works out.

I did a third benchmark round against 1000 10Mbyte files. tibco @ IRC is running a flv-site in china and said that their files are around 12-17Mb.

Client was a win2003-amd64, dual core box connected via Intel Pro/1000 to the server [raid1 … as before].
linux-aio-sendfile: 52Mbyte/s [reading 1Mbyte chunks]

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.80    0.00   46.20   13.40    0.00   38.60

linux-aio-sendfile: 55Mbyte/s [reading 768kbyte chunks]

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.99    0.00   56.37    4.58    0.00   36.06

linux-aio-sendfile: 58Mbyte/s [reading 512kbyte chunks]

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.40    0.00   62.67    5.39    0.00   30.54

linux-aio-sendfile: 54Mbyte/s [reading 384kbyte chunks]

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.18    0.00   55.38    1.99    0.00   37.45

linux-aio-sendfile: 21Mbyte/s [reading 256kbyte chunks]

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          21.00    0.00   28.60    0.80    0.00   49.60
Compared to:
linux-sendfile: 30Mbyte/s

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.20    0.00   22.20   71.00    0.00    5.60 

Summary

No matter what, large files or small files, when you disk start to suffer from seeking around AIO will give you, at least in my setup, 80% more throughput.

mod-proxy-core and SQF 2

Posted by jan Tue, 14 Nov 2006 01:57:00 GMT

mod-proxy-core has 3 different balancers for different needs. Round Robin, Shortest Queue First and CARP.

We can categorize the balancers into two sections:

  • load balancing by distribution (RR, SQF)
  • load balancing by separation (CARP)

Round Robin

Round Robin (RR) is simple and straight forward.

If you have 3 hosts

  • A1
  • A2
  • A3

the first request goes to A1, the seconds to A2 and the third to A3. The forth request starts at A1 again.

We use a slightly different implementation. Instead of really going from A1 to A2 to A3, we take all active backends and pick one randomly. On average each host gets the same number of requests.

Shortest Queue First

RR has a little problem. If A1 is slower than A2 and A3, the fast backends will get the same number of requests as the A1.

SQF tries to take that into account and take the queue-length as the base for the balancing.

The first request goes to A1 and takes 10s to complete. Meanwhile we get 4 other requests which A2 and A3 execute in 2s.

After two seconds it looks like this:

  • A1 needs 8 more seconds [q-len: 1]
  • A2 is free [q-len: 0]
  • A3 is free [q-len: 0]

Request 3 goes to A2:

  • A1 needs 8 more seconds [q-len: 1]
  • A2 needs 2 seconds [q-len: 1]
  • A3 is free [q-len: 0]

and Request 4 goes to A3:

  • A1 needs 8 more seconds [q-len: 1]
  • A2 needs 2 seconds [q-len: 1]
  • A3 needs 2 seconds [q-len: 1]

If another request comes in now, we put it into the backlog.

Benchmarks

This had to be benchmarked. What a luck that I have enough hardware at home, so we have 4 boxes joining the ring:

  • client (.23): a Mac Mini, 1.2GHz, 100Mbit
  • proxy (.27): AMD64 3000+, Linux 2.6.x, 1Gbit
  • backend-1 (.22): Intel P4 1.2GHz, WinXP 32-bit
  • backend-2 (.25): AMD64 X2, 3500+, Win2003 64-bit

The backends are running Apache 2.2.x taken from the MSI, mod_status enabled.

The proxy is lighty 1.5.0-r1435 with mod-proxy-core:

$SERVER["socket"] == ":1445" {
  proxy-core.protocol = "http" 
#  proxy-core.balancer = "round-robin" 
  proxy-core.balancer = "sqf" 
  proxy-core.backends = ( "192.168.178.25:80", "192.168.178.22:8080" )
#  proxy-core.backends = ( "192.168.178.22:8080" )
#  proxy-core.backends = ( "192.168.178.25:80" )
  proxy-core.max-pool-size = 32
}

The backends are serving the 44byte index.html which is in the htdocs/ folder by default.

The client is always running the same command:
$ ab -k -n 100000 -c 16 http://192.168.178.27:1445/
As a first test we only active .25, the dual core box:
Requests per second:    2833.60 [#/sec] (mean)
Only .22 [my 3yr old Centrino notebook] gives:
Requests per second:    1249.45 [#/sec] (mean)

Using RR will balance the request equally over both hosts. We expect at max the double request-rate of the slowest backend.

balancer req/s req .22 req .25 %idle .27
RR 2122 50122 49896 50%
SQF 3213 30970 69048 25%

You see how SQF takes the adjusts to the possibilities of the backend and balances nicely while RR is just doing its thing and results in alot less throughput in the end.

BTW: If keep-alive is disabled, the req/s drop from 3213 to 2678 req/s with SQF and from 2122 to 1835 for RR.

The proxy-server (our lighty) is using its CPU very well and I already found ways to optimize the proxy-code to use less CPU. It won’t affect the performance of this benchmark alot as the backends are at 100% already.

PRE-RELEASE: lighttpd-1.5.0-r1435.tar.gz 5

Posted by jan Tue, 14 Nov 2006 01:37:00 GMT

Yeah, really.

Before you jump around and empty a barrel of beer, try to compile it first. :)

Download: http://www.lighttpd.net/download/lighttpd-1.5.0-r1435.tar.gz

Finally I got some time to finish the loose ends of 1.5.0. MySQL Network MAS is going to release (hopefully) next week, giving me time to work on lighty again.

What works and what doesn’t ?

  • mod_fastcgi, mod_cgi, mod_scgi, mod_proxy are removed
    • mod_proxy_core is the replace for the above plugins
    • you have to spawn fastcgi processes with spawn-fcgi
  • mod_cml is removed and mod_magnet isn’t included yet

Linux AIO

I blogged about Linux AIO before, now you can try it out. Install libaio and build lighttpd with—with-linux-aio.

server.network-backend = "linux-aio-sendfile" 

mod-proxy-core

I checked that balancing and uploading works nicely work mod-proxy-core with fastcgi and http as protocols.

PHP

Start PHP with spawn-fcgi as documented in the manual and add

$HTTP["url"] =~ "\.php$" {
  proxy-core.balancer = "round-robin" 
  proxy-core.protocol = "fastcgi" 
  proxy-core.backends = ( "127.0.0.1:1026" )
  proxy-core.max-pool-size = 16
}

the to config.

BTW: we use FCGI_KEEP_CONN to keep the connection between lighttpd and the FsatCGI backend up as long as possible.

HTTP (mongrel)

We use keep-alive and HTTP/1.1 by default. Give it a try.

$SERVER["socket"] == ":1445" {
  proxy-core.protocol = "http" 
#  proxy-core.balancer = "round-robin" 
  proxy-core.balancer = "sqf" 
  proxy-core.backends = (
    "10.0.0.10:80", 
    "10.0.0.11:80" )
}

sqf is Shortest Queue First and is the preferred balancer if you have backends which different CPUs. See the next blog-post.

mod-upload-progress

Works.

lighty 1.5.0 and linux-aio 14

Posted by jan Sun, 12 Nov 2006 00:56:00 GMT

1.5.0 will be a big win for all users. It will be more flexible in the handling and will have huge improvement for static files thanks to async io.

The following benchmarks shows a increase of 80% for the new linux-aio-sendfile backend compared the classic linux-sendfile one.

The test-env is

  • client: Mac Mini 1.2Ghz, MacOS X 10.4.8, 1Gb RAM, 100Mbit
  • server: AMD64 3000+, 1Gb RAM, Linux 2.6.16.21-xen, 160Gb RAID1 (soft-raid)

The server is running lighttpd 1.4.13 and lighttpd 1.5.0-svn with a clean config [no modules loaded], the client will use http_load.

The client will run:
$ ./http_load -verbose -parallel 100 -fetches 10000 urls

I used this little script to generate 1000 folders, with 100 files each of 100kbyte.

for i in `seq 1 1000`; do 
  mkdir -p files-$i; 
  for j in `seq 1 100`; do 
    dd if=/dev/zero of=files-$i/$j bs=100k count=1 2> /dev/null; 
  done; 
done

That’s 10Gbyte of data, 10 times larger the RAM size of the server as we want to become seek-bound on our disks.

The Limits

2 Seagate Barracuda 160Gb disks (ST3160827AS) are building a RAID1 via the linux-md driver. The 7200 RPMs will give us 480 seeks/s max (7200 RPM = 120 r/s, .5 rotations avg. per seek, 2 disks).

Each disk can send 30Mbyte/s sequential read, combined 60Mbyte.

The Network is 100Mbit/s, we expect it to limit at 10Mbyte/s.

lighttpd 1.4.13, sendfile

A first test run against lighttpd 1.4.13 with linux-sendfile gives use:

$ iostat 5
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.99    0.00    4.77   86.68    0.20    7.36

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              35.19      3503.78       438.97      17624       2208
sdb              33.40      4052.49       438.97      20384       2208
md0             119.48      7518.09       429.42      37816       2160

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.60    0.00    4.61   78.36    0.00   16.43

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              31.46      3408.42       365.53      17008       1824
sdb              30.06      3313.83       365.53      16536       1824
md0             104.21      6760.72       357.52      33736       1784
The http_load returned:
./http_load -verbose -parallel 100 -fetches 10000 urls
--- 60.006 secs, 1744 fetches started, 1644 completed, 100 current
--- 120 secs, 3722 fetches started, 3622 completed, 100 current
--- 180 secs, 5966 fetches started, 5866 completed, 100 current
--- 240 secs, 8687 fetches started, 8587 completed, 100 current
10000 fetches, 100 max parallel, 1.024e+09 bytes, in 274.323 seconds
102400 mean bytes/connection
36.4534 fetches/sec, 3.73283e+06 bytes/sec
msecs/connect: 51.7815 mean, 147.412 max, 0.181 min
msecs/first-response: 360.689 mean, 6178.2 max, 1.08 min
HTTP response codes:
  code 200 -- 10000

lighttpd 1.5.0, sendfile

The same test with lighttpd 1.5.0 using the same network backend: linux-sendfile.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.40    0.00    3.60   85.60    0.00   10.40

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              33.80      4606.40       564.80      23032       2824
sdb              37.00      4723.20       564.80      23616       2824
md0             136.00      9368.00       553.60      46840       2768

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.80    0.00    4.80   81.80    0.00   12.60

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              33.40      4198.40       504.00      20992       2520
sdb              30.60      4564.80       504.00      22824       2520
md0             123.60      8763.20       496.00      43816       2480

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.80    0.00    5.19   81.24    0.00   12.77

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              36.53      4490.22       493.41      22496       2472
sdb              32.34      4784.03       493.41      23968       2472
md0             126.75      9274.25       483.83      46464       2424
The client said:
--- 60 secs, 2444 fetches started, 2344 completed, 100 current
--- 120.003 secs, 4957 fetches started, 4857 completed, 100 current
--- 180 secs, 7359 fetches started, 7259 completed, 100 current
--- 240 secs, 9726 fetches started, 9626 completed, 100 current
10000 fetches, 100 max parallel, 1.024e+09 bytes, in 246.803 seconds
102400 mean bytes/connection
40.5181 fetches/sec, 4.14906e+06 bytes/sec
msecs/connect: 55.5808 mean, 186.153 max, 0.24 min
msecs/first-response: 398.639 mean, 6101.44 max, 9.313 min
HTTP response codes:
  code 200 -- 10000

This is minimal better, but has still the same problems. We are maxed out by the disks and not by the network.

lighttpd 1.5.0, linux-aio-sendfile

We only switch the network-backend to the async io one:

server.network-backend = "linux-aio-sendfile" 

... and run our benchmark again:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.38    0.00   10.18   38.52    0.00   42.91

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              42.91      7190.42       526.95      36024       2640
sdb              36.93      6144.51       526.95      30784       2640
md0             205.99     13213.57       517.37      66200       2592

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.80    0.00    9.84   48.39    0.20   40.76

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              50.40      8369.48       573.49      41680       2856
sdb              44.18      7318.88       573.49      36448       2856
md0             241.77     15890.76       563.86      79136       2808

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.60    0.00    8.38   44.91    0.00   46.11

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              50.10      7580.04       720.16      37976       3608
sdb              47.50      7179.24       720.16      35968       3608
md0             242.12     14558.08       710.58      72936       3560

The client said:

--- 60.0001 secs, 3792 fetches started, 3692 completed, 100 current
--- 120 secs, 8778 fetches started, 8678 completed, 100 current
10000 fetches, 100 max parallel, 1.024e+09 bytes, in 137.551 seconds
102400 mean bytes/connection
72.7004 fetches/sec, 7.44452e+06 bytes/sec
msecs/connect: 66.9088 mean, 197.157 max, 0.223 min
msecs/first-response: 226.181 mean, 6066.96 max, 2.098 min
HTTP response codes:
  code 200 -- 10000

Summary

Using Async IO allows lighttpd it overlap file-operations. We send a IO-request for the file and get notified when it is ready. Instead of waiting for the file (as in the normal sendfile()) and blocking the server, we can handle other requests instead.

On the other side we give the kernel to reorder the file-requests as it wants to.

Taking this two improments we can increase the throughput by 80%.

On the other side we don’t spend any time in wait in lighty itself. 64 kernel threads are handling the read()-calls for us in the background which increases the idle-time from 12% to 40%, a improvement of 230% .

Async IO on Linux 10

Posted by jan Thu, 09 Nov 2006 02:59:00 GMT

trunk/ just got support Linux Native AIO.

I implemented Async IO based on libaio which is a minimal wrapper around the aio-syscalls for the 2.6.x kernels.

Implementation

It was a bit tricky to get it working as libaio is basicly undocumented, but hey … that’s why we are hackers :)

The async file IO support is part of Linux 2.6.9 and later and should be on every recent linux box. A separate library call libaio is providing very simple wrappers and is used as the base for the new network backend.

The idea is:

  1. create a buffer in /dev/shm and mmap() it
  2. start a async read() from the source file to the mmap() buffer
  3. wait until the data is ready
  4. use sendfile() to send the data from /dev/shm to the network socket

Important for the performance: the data is never copied into user space. We only move it from one side of the kernel to the other side.

Hack ahead

Sadly I had to add pthread to the dependencies. Having threads in a single-threaded server is a bit strange, but it is necessary.

fdevent_poll() was waiting for fd-events for 1s. While it was waiting the server was waiting. The handling the async-notifications is also blocking and we can’t make them return as soon as one of them is done.

If necessary we start a io-getevent-thread which run in parallel to the fdevent_poll() call. The call which returns first is interrupting the other one by sending a SIGUSR1 to the process. It makes the waiting calls (poll() and io_getevents()) return with a EINTR and we can continue handling the result of one of the two calls.

Benchmarks

As testbed we have a RAID1 (linux md) via two

  • ST3160827AS (SATA, 120Mb each)
  • nVidia Corporation CK8S as SATA controller
  • AMD Athlon™ 64 Processor 3000+
  • Linux 2.6.16.21-0.25-xen (SuSE 10.1)

siege, 700Mb

I’ll compare linux-sendfile vs. linux-aio-sendfile.

$ siege—reps=1 -c 1—benchmark http://127.0.0.1:1025/file-700M
conc non-aio aio [512k] aio [1M]
1 52.38 MB/sec [9% idle] 89.85 MB/sec [70% idle] 107.50 MB/sec [67% idle]
2 39.94 MB/sec [8% idle] 94.52 MB/sec [70% idle] 92.74 MB/sec [70% idle]
5 35.45 MB/sec [7% idle] 31.81 MB/sec [86% idle] 72.84 MB/sec [70% idle]
10 .. 25.22 MB/sec [82% idle] 32.87 MB/sec [90%] idle

More important than the throughput is the CPU time that can be spent with other tasks now.

What’s next ?

Next is bug fixing, load testing (more parallel connections), random load, ...

What is Jan doing all the time ?

Posted by jan Wed, 08 Nov 2006 01:49:00 GMT

You might wonder why it takes to long to release 1.5.0 when most of it is already in trunk.

At MySQL we are in the final strokes of getting a GA release of Monitoring and Advisoring Service of MySQL Enterprise out of the door.

I’m still monitoring the IRC channel on freenode, but all development time is going into my MySQL stuff right now.

RELEASE: lighttpd 1.4.13 17

Posted by jan Tue, 10 Oct 2006 10:18:00 GMT

Only 2 weeks after .12 hit the servers we have a new release cleaning up the issues that were introduced by it.

On the fix side we have:
  • fixed a seg-fault in the HTTP-Request splitting
  • fixed long-standing bug with Content-Length and HEAD requests
  • fixed a possible abort of a upload if xattr is enabled
New are
  • mod-magnet finally handles ‘require “lfs”’ without complaining
  • mod-magnet got light.stat() which uses the stat-cache
  • mod-webdav supports LOCK if compiled with—with-webdav-locks
Debian user have to compile their lua-support with:
$ configure --with-lua=lua5.1 ...
as their lua-5.1 package isn’t called ‘lua’.

Enjoy this release and watch out for 1.5.0 on the horizon. :)

Download

ChangeLog

  • added initgroups in spawn-fcgi (#871)
  • added apr1 support htpasswd in mod-auth (#870)
  • added lighty.stat() to mod_magnet
  • fixed segfault in splitted CRLF CRLF sequences (introduced in 1.4.12) (#876)
  • fixed compilation of LOCK support in mod-webdav
  • fixed fragments in request-URLs (#869)
  • fixed pkg-config check for lua5.1 on debian
  • fixed Content-Length = 0 on HEAD requests without a known Content-Length (#119)
  • fixed mkdir() forcing 0700 (#884)
  • fixed writev() on FreeBSD 4.x and older (#875)
  • removed warning about a 404-error-handler returned 404
  • backported and fixed the buildsystem changes for webdav locks
  • fixed plugin loading so we can finally load lua extensions in mod_magnet scripts
  • fixed large uploads if xattr is enabled

reducing Requests-Setup-Costs 5

Posted by jan Sun, 08 Oct 2006 12:18:00 GMT

Back in the times of the first implementations of mod-cml I took the request setup costs as the root of all evil. They were the problem I wanted to fix with mod-cml.

But what is the request setup cost ? What is influencing the request-time ? Where can you influence it ?

When you send a request to the browser it has to:

  1. receive the request
  2. parse the request
  3. connect the a back-end
  4. send the request to the back-end
  5. wait for a response
  6. receive the response and send it to the client

Using ab to fire the same request to the back-ends we should hit the caches very well and get a good idea what is left when we have hot caches:

$ ab -n 100 -c 1 http://127.0.0.1:1025/123.php
...
Time per request:       7.515 [ms] (mean) [strace]
...
ab was run against a lighty 1.4.13-r1385 running in strace. We will run all tests with this setup.
$ strace -o costs.txt -tt -s 512 lighttpd -D -f ./lighttpd.conf
...:26.574274 ... (last syscall of the previous request)
...:26.575448 accept(...) = 8
...:26.576006 read(8, ...)
...:26.576702 connect(9, ... )
...:26.577239 writev(9, ... )
...:26.579688 read(9, ...)
...:26.580459 writev(8, ... )
...:26.581128 close(8)
  • 1.2ms for accept()ing the connection
  • 0.5ms for reading the request from the client.
  • 0.7ms for connecting to the backend over a unix-socket
  • 0.5ms to writev() the request data to the backend
  • 2.4ms to wait for the backend and reading the response
  • 0.8ms for writev()ing the response to the client
  • 0.5ms to close the connection again

(sum is 6.8ms)

Without strace the response-time is:
Time per request:       2.946 [ms] (mean)
and goes down to 0.5ms for the request in lighty. The 2.4ms in the backend stay the same.

Keep-Alive

The next attempt is using keep-alive to get rid of the accept() and the close() at the end. In the strace-timing it costed 1.7ms to execute those two calls.

$ ab -n 100 -c 1 -k http://127.0.0.1:1025/foo.php
...
Time per request:       5.564 [ms] (mean) [strace]
...
strace tells us:
...:03.242201 ... (the last syscall of the previous request)
...:03.242903 read(8, ...)
...:03.243703 connect(9, ...)
...:03.244144 writev(9, ...)
...:03.246396 read(9, ...)
...:03.246969 writev(8, ...)
...:03.247261 ... (last syscall of this request)
  • 0.5ms for the read() of the request
  • 0.8ms for the connect()
  • 0.4ms for the writev() to the backend
  • 2.2ms waiting + read()ing the response
  • 0.7ms writev()ing the response to the client

(sum is 4.6ms)

Time per request:       2.394 [ms] (mean)

The 2.2ms seen here are spent in the backend and are not affected by the strace. It is the time spent in the backend. If you take them out of the calculation you get:

  100 * (1 - ((2.4ms - 2.2ms) / (2.9ms - 2.5ms))) = 50%
  100 * (1 - ((4.6ms - 2.2ms) / (6.8ms - 2.5ms))) = 44%

50% saving the lighty internal costs. But those 50% saving are only 10% in reality as most of the request-time is spent in the backend.

The Backend

The limiting factor for low request-times is a backend with as little overhead as possible. For the above timings I used:

which is executing the script:

<?php echo "123" ?>

What is PHP doing in those 2.2-2.5ms ? strace will help us again.

...:58.351886 ... (last syscall of the previous request)
...:58.352020 accept(0, ...) = 4
...:58.353110 read(4, ...)
...:58.353260 stat() 
...:58.354431 open(...)
...:58.355489 read(5, ...)
...:58.357595 write(4, ...)
...:58.359911 close(4)
  • 0.2ms for the accept()
  • 1.1ms to read the whole fastcgi-request
  • 0.1ms for the stat() to see of the file exists
  • 1.2ms to open the file
  • 1.0ms to read it
  • 2.1ms to execute the script and send the response
  • 2.4ms to close the connection
We have 2 options:
  • 2.6ms are spend to bring up and shutdown the connection. FastCGI has the possibility to use keep-alive. This will be supported lighty 1.5.x
  • 2.2ms are spent in getting the script-file into PHP.

PHP without a byte-code cache is reading twice

I wonder why there is a open()+read() on the file at all as I used XCache 1.0.x here as code-cache and the file is on the cache and is getting cache-hits.

If I remove xcache, the same php-file is read twice by PHP:

open("..../foo.php", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0664, st_size=26, ...}) = 0
...
fstat64(4, {st_mode=S_IFREG|0664, st_size=26, ...}) = 0
read(4, "<?php\n    print \"123\";\n?>\n", 4096) = 26
_llseek(4, 0, [0], SEEK_SET) = 0
read(4, "<?php\n    print \"123\";\n?>\n", 8192) = 26
read(4, "", 4096) = 0
read(4, "", 8192) = 0
close(4)          = 0

The php-file is open()ed, stat()ed twice [why ?] and read() once, all 26 bytes.

Afterwards we seek to the start (_llseek()), read() the 26 bytes again (the same size that the fstat() told us) and have to call read() two times to really understand that there is really no more data.

update After discussing this with the php core devs it is a feature of the FastCGI/CGI SAPI. As old CLI scripts can be executed with the CGI SAPI it has to support the “#!/usr/bin/php” sequence for shell scripts. For webapps this line has to be skipped. Otherwise it would be printed to the output.

Removing the code gives us:
Time per request:       2.150 [ms] (mean)

Getting Raw

Perhaps it is better to stay away from PHP and use a language which is not providing use my with all the nice features we like about PHP:
  • the automatic parsing of GET parameters
  • the mapping from var[] into arrays
  • file-upload support in the back
  • output buffering, compression
  • internal variables like $PHP_SELF, ...

As I already wrote a byte-code cache for mod-magnet I stretched the idea a bit and took the byte-code cache, wrapped a FCGI_accept() from the fastcgi-lib around it.

The lua-fastcgi-magnet is really simple and just does:

  1. creates a global lua environment
  2. calls FCGI_accept()
  3. loads the script from the script-cache
  4. creates a empty-script env for the script
  5. registers the print() function to use the fastcgi stdio wrapper
  6. executes the script
  7. goes to FCGI_accept() again

I want to execute the same script and see how the response time is now:

Time per request:       1.594 [ms] (mean)

We saved 0.8ms or 30% of the whole request time.

The strace for magnet is alot simpler:
...:28.541361 ... (last call of previous request) 
...:28.541507 accept(0, ...) = 3
...:28.541916 read(3, ...)
...:28.542397 stat64(...)
...:28.542809 write(3, ...)
...:28.544807 close(3)
  • 0.2ms for the accept()
  • 0.4ms for the read()
  • 0.4ms for the stat()
  • 0.5ms to execute the script and write the response
  • 2.0ms to wait for the final packet and closing the connection

If we now had keep-alive in our FastCGI implementation …

We already saw, that the script executing is 0.8ms faster in real-time. If we attach strace to lighty again and run ab, we get what we expect:

Time per request:       4.800 [ms] (mean) [strace]

The 5.6ms from PHP in strace with keep-live is 0.8ms more.

a core-magnet

If you really want spend even less time and don’t have to wait on any external sources you can also try to use mod-magnet to execute the script.
Time per request:       0.584 [ms] (mean)

As I save the connect to the backend and transferring the data to it I can response blazingly fast.

Conclusion

Let’s ask the final question: why do I care about those 0.8ms at all ?
PHP without keep-alive 2.9ms 345 req/s
PHP 2.4ms 416 req/s
PHP patched 2.1ms 465 req/s
lua-fastcgi 1.6ms 625 req/s
mod-magnet 0.6ms 1700 req/s

As the print “123” is the smallest script which really generates output it should show that this is highest request count you can get. You can’t get more than those 416 req/s with PHP from my test machine:

  • AMD Duron 1.3GHz
  • 640MB RAM DDR
  • Disk and Network don’t matter as the benchmark is RAM and loopback based

If you need a higher throughput use mod-magnet to offload some work from the backend. Perhaps you can cache some content and handle the response directly in mod-magnet and only have to escalate in a few percent of the caches.

Or use another scripting language which allows you to work more directly on the FastCGI level like:

PRE-RELEASE: lighttpd-1.4.13-r1385 1

Posted by jan Sat, 07 Oct 2006 18:05:00 GMT

It looks like mod_magnet gets more and more attraction.

Against the least pre-release we have some minor bug-fixes and the new lighty.stat() function for mod-magnet which is using our internal stat-cache to reduce the number of stat() calls which hit the kernel. darix had some nice benchmarks on it.

Download: http://www.lighttpd.net/download/lighttpd-1.4.13-r1385.tar.gz

ChangeLog

  • added initgroups in spawn-fcgi (#871)
  • added apr1 support htpasswd in mod-auth (#870)
  • added lighty.stat() to mod_magnet
  • fixed segfault in splitted CRLF CRLF sequences (introduced in 1.4.12) (#876)
  • fixed compilation of LOCK support in mod-webdav
  • fixed fragments in request-URLs (#869)
  • fixed pkg-config check for lua5.1 on debian
  • fixed Content-Length = 0 on HEAD requests without a known Content-Length (#119)
  • fixed mkdir() forcing 0700 (#884)
  • fixed writev() on FreeBSD 4.x and older (#875)
  • removed warning about a 404-error-handler returned 404

Older posts: 1 ... 3 4 5 6 7