<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="/stylesheets/rss.css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>lighty's life: reducing Requests-Setup-Costs</title>
    <link>http://blog.lighttpd.net/articles/2006/10/08/reducing-requests-setup-costs</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description></description>
    <item>
      <title>reducing Requests-Setup-Costs</title>
      <description>&lt;p&gt;Back in the times of the first implementations of mod-cml I took the request setup costs as the root of all evil. They were the problem I wanted to fix with mod-cml.&lt;/p&gt;


	&lt;p&gt;But what is the request setup cost ? What is influencing the request-time ? Where can you influence it ?&lt;/p&gt;
&lt;p&gt;When you send a request to the browser it has to:&lt;/p&gt;


	&lt;ol&gt;
	&lt;li&gt;receive the request&lt;/li&gt;
		&lt;li&gt;parse the request&lt;/li&gt;
		&lt;li&gt;connect the a back-end&lt;/li&gt;
		&lt;li&gt;send the request to the back-end&lt;/li&gt;
		&lt;li&gt;wait for a response &lt;/li&gt;
		&lt;li&gt;receive the response and send it to the client&lt;/li&gt;
	&lt;/ol&gt;


	&lt;p&gt;Using &lt;kbd&gt;ab&lt;/kbd&gt; to fire the same request to the back-ends we should hit the caches very well and get a good idea what is left when we have hot caches:&lt;/p&gt;


&lt;pre&gt;
$ ab -n 100 -c 1 http://127.0.0.1:1025/123.php
...
Time per request:       7.515 [ms] (mean) [strace]
...
&lt;/pre&gt;

&lt;kbd&gt;ab&lt;/kbd&gt; was run against a lighty 1.4.13-r1385 running in &lt;kbd&gt;strace&lt;/kbd&gt;. We will run all tests with this setup. 

&lt;pre&gt;
$ strace -o costs.txt -tt -s 512 lighttpd -D -f ./lighttpd.conf
...:26.574274 ... (last syscall of the previous request)
...:26.575448 accept(...) = 8
...:26.576006 read(8, ...)
...:26.576702 connect(9, ... )
...:26.577239 writev(9, ... )
...:26.579688 read(9, ...)
...:26.580459 writev(8, ... )
...:26.581128 close(8)
&lt;/pre&gt;

	&lt;ul&gt;
	&lt;li&gt;1.2ms for &lt;kbd&gt;accept()&lt;/kbd&gt;ing the connection&lt;/li&gt;
		&lt;li&gt;0.5ms for &lt;kbd&gt;reading&lt;/kbd&gt; the request from the client.&lt;/li&gt;
		&lt;li&gt;0.7ms for &lt;kbd&gt;connecting&lt;/kbd&gt; to the backend over a unix-socket&lt;/li&gt;
		&lt;li&gt;0.5ms to &lt;kbd&gt;writev()&lt;/kbd&gt; the request data to the backend&lt;/li&gt;
		&lt;li&gt;2.4ms to wait for the backend and reading the response&lt;/li&gt;
		&lt;li&gt;0.8ms for &lt;kbd&gt;writev()&lt;/kbd&gt;ing the response to the client&lt;/li&gt;
		&lt;li&gt;0.5ms to close the connection again&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;(sum is 6.8ms)&lt;/p&gt;


Without strace the response-time is:
&lt;pre&gt;
Time per request:       2.946 [ms] (mean)
&lt;/pre&gt;
and goes down to 0.5ms for the request in lighty. The 2.4ms in the backend stay the same.

&lt;h3&gt;Keep-Alive&lt;/h3&gt;

	&lt;p&gt;The next attempt is using keep-alive to get rid of the &lt;kbd&gt;accept()&lt;/kbd&gt; and the &lt;kbd&gt;close()&lt;/kbd&gt; at the end. In the &lt;kbd&gt;strace&lt;/kbd&gt;-timing it costed 1.7ms to execute those two calls.&lt;/p&gt;


&lt;pre&gt;
$ ab -n 100 -c 1 -k http://127.0.0.1:1025/foo.php
...
Time per request:       5.564 [ms] (mean) [strace]
...
&lt;/pre&gt;

strace tells us:
&lt;pre&gt;
...:03.242201 ... (the last syscall of the previous request)
...:03.242903 read(8, ...)
...:03.243703 connect(9, ...)
...:03.244144 writev(9, ...)
...:03.246396 read(9, ...)
...:03.246969 writev(8, ...)
...:03.247261 ... (last syscall of this request)
&lt;/pre&gt;

	&lt;ul&gt;
	&lt;li&gt;0.5ms for the read() of the request&lt;/li&gt;
		&lt;li&gt;0.8ms for the connect()&lt;/li&gt;
		&lt;li&gt;0.4ms for the writev() to the backend&lt;/li&gt;
		&lt;li&gt;2.2ms waiting + read()ing the response&lt;/li&gt;
		&lt;li&gt;0.7ms writev()ing the response to the client&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;(sum is 4.6ms)&lt;/p&gt;


&lt;pre&gt;
Time per request:       2.394 [ms] (mean)
&lt;/pre&gt;

	&lt;p&gt;The 2.2ms seen here are spent in the backend and are not affected by the &lt;kbd&gt;strace&lt;/kbd&gt;. It is the time spent in the backend. If you take them out of the calculation you get:&lt;/p&gt;


&lt;pre&gt;
  100 * (1 - ((2.4ms - 2.2ms) / (2.9ms - 2.5ms))) = 50%
  100 * (1 - ((4.6ms - 2.2ms) / (6.8ms - 2.5ms))) = 44%
&lt;/pre&gt;

	&lt;p&gt;50% saving the lighty internal costs. But those 50% saving are only 10% in reality as most of the request-time is spent in the backend.&lt;/p&gt;


&lt;h3&gt;The Backend&lt;/h3&gt;
The limiting factor for low request-times is a backend with as little overhead as possible. For the above timings I used:

	&lt;ul&gt;
	&lt;li&gt;&lt;a href="http://php.net/"&gt;&lt;span class="caps"&gt;PHP&lt;/span&gt;&lt;/a&gt; 5.1.4&lt;/li&gt;
		&lt;li&gt;&lt;a href="http://xcache.lighttpd.net/"&gt;XCache&lt;/a&gt; 1.0.x&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;which is executing the script:&lt;/p&gt;


&lt;pre&gt;
&amp;lt;?php echo "123" ?&amp;gt;
&lt;/pre&gt;

	&lt;p&gt;What is &lt;span class="caps"&gt;PHP&lt;/span&gt; doing in those 2.2-2.5ms ? &lt;kbd&gt;strace&lt;/kbd&gt; will help us again.&lt;/p&gt;


&lt;pre&gt;
...:58.351886 ... (last syscall of the previous request)
...:58.352020 accept(0, ...) = 4
...:58.353110 read(4, ...)
...:58.353260 stat() 
...:58.354431 open(...)
...:58.355489 read(5, ...)
...:58.357595 write(4, ...)
...:58.359911 close(4)
&lt;/pre&gt;

	&lt;ul&gt;
	&lt;li&gt;0.2ms for the &lt;kbd&gt;accept()&lt;/kbd&gt;&lt;/li&gt;
		&lt;li&gt;1.1ms to read the whole fastcgi-request&lt;/li&gt;
		&lt;li&gt;0.1ms for the &lt;kbd&gt;stat()&lt;/kbd&gt; to see of the file exists&lt;/li&gt;
		&lt;li&gt;1.2ms to open the file&lt;/li&gt;
		&lt;li&gt;1.0ms to read it&lt;/li&gt;
		&lt;li&gt;2.1ms to execute the script and send the response&lt;/li&gt;
		&lt;li&gt;2.4ms to close the connection&lt;/li&gt;
	&lt;/ul&gt;


We have 2 options:
	&lt;ul&gt;
	&lt;li&gt;2.6ms are spend to bring up and shutdown the connection. FastCGI has the possibility to use keep-alive. This will be supported lighty 1.5.x&lt;/li&gt;
		&lt;li&gt;2.2ms are spent in getting the script-file into &lt;span class="caps"&gt;PHP&lt;/span&gt;.&lt;/li&gt;
	&lt;/ul&gt;


&lt;h4&gt;&lt;span class="caps"&gt;PHP&lt;/span&gt; without a byte-code cache is reading twice&lt;/h4&gt;

	&lt;p&gt;I wonder why there is a &lt;kbd&gt;open()&lt;/kbd&gt;+&lt;kbd&gt;read()&lt;/kbd&gt; on the file at all as I used XCache 1.0.x here as code-cache and the file is on the cache and is getting cache-hits.&lt;/p&gt;


	&lt;p&gt;If I remove xcache, the same php-file is read twice by &lt;span class="caps"&gt;PHP&lt;/span&gt;:&lt;/p&gt;


&lt;pre&gt;
open("..../foo.php", O_RDONLY) = 4
fstat64(4, {st_mode=S_IFREG|0664, st_size=26, ...}) = 0
...
fstat64(4, {st_mode=S_IFREG|0664, st_size=26, ...}) = 0
read(4, "&amp;lt;?php\n    print \"123\";\n?&amp;gt;\n", 4096) = 26
_llseek(4, 0, [0], SEEK_SET) = 0
read(4, "&amp;lt;?php\n    print \"123\";\n?&amp;gt;\n", 8192) = 26
read(4, "", 4096) = 0
read(4, "", 8192) = 0
close(4)          = 0
&lt;/pre&gt;

	&lt;p&gt;The php-file is &lt;kbd&gt;open()&lt;/kbd&gt;ed, &lt;kbd&gt;stat()&lt;/kbd&gt;ed twice [why ?] and &lt;kbd&gt;read()&lt;/kbd&gt; once, all 26 bytes.&lt;/p&gt;


	&lt;p&gt;Afterwards we seek to the start (&lt;kbd&gt;_llseek()&lt;/kbd&gt;), &lt;kbd&gt;read()&lt;/kbd&gt; the 26 bytes again (the same size that the &lt;kbd&gt;fstat()&lt;/kbd&gt; told us) and have to call &lt;kbd&gt;read()&lt;/kbd&gt; two times to really understand that there is &lt;b&gt;really&lt;/b&gt; no more data.&lt;/p&gt;


	&lt;p&gt;&lt;strong&gt;update&lt;/strong&gt; After discussing this with the php core devs it is a feature of the FastCGI/CGI &lt;span class="caps"&gt;SAPI&lt;/span&gt;. As old &lt;span class="caps"&gt;CLI&lt;/span&gt; scripts can be executed with the &lt;span class="caps"&gt;CGI SAPI&lt;/span&gt; it has to support the &amp;#8220;&lt;kbd&gt;#!/usr/bin/php&lt;/kbd&gt;&amp;#8221; sequence for shell scripts. For webapps this line has to be skipped. Otherwise it would be printed to the output.&lt;/p&gt;


Removing the code gives us:
&lt;pre&gt;
Time per request:       2.150 [ms] (mean)
&lt;/pre&gt;

&lt;h3&gt;Getting Raw&lt;/h3&gt;
Perhaps it is better to stay away from &lt;span class="caps"&gt;PHP&lt;/span&gt; and use a language which is not providing use my with all the nice features we like about &lt;span class="caps"&gt;PHP&lt;/span&gt;:

	&lt;ul&gt;
	&lt;li&gt;the automatic parsing of &lt;span class="caps"&gt;GET&lt;/span&gt; parameters&lt;/li&gt;
		&lt;li&gt;the mapping from var[] into arrays&lt;/li&gt;
		&lt;li&gt;file-upload support in the back&lt;/li&gt;
		&lt;li&gt;output buffering, compression&lt;/li&gt;
		&lt;li&gt;internal variables like $PHP_SELF, ...&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;As I already wrote a byte-code cache for &lt;a href="http://blog.lighttpd.net/articles/2006/09/16/a-new-power-magnet"&gt;mod-magnet&lt;/a&gt; I stretched the idea a bit and took the byte-code cache, wrapped a &lt;kbd&gt;FCGI_accept()&lt;/kbd&gt; from the &lt;a href="http://www.fastcgi.com/"&gt;fastcgi-lib&lt;/a&gt; around it.&lt;/p&gt;


	&lt;p&gt;The &lt;a href="http://jan.kneschke.de/projects/lua/"&gt;lua-fastcgi-magnet&lt;/a&gt; is really simple and just does:&lt;/p&gt;


	&lt;ol&gt;
	&lt;li&gt;creates a global lua environment&lt;/li&gt;
		&lt;li&gt;calls &lt;kbd&gt;FCGI_accept()&lt;/kbd&gt;&lt;/li&gt;
		&lt;li&gt;loads the script from the script-cache&lt;/li&gt;
		&lt;li&gt;creates a empty-script env for the script&lt;/li&gt;
		&lt;li&gt;registers the print() function to use the fastcgi stdio wrapper&lt;/li&gt;
		&lt;li&gt;executes the script&lt;/li&gt;
		&lt;li&gt;goes to &lt;kbd&gt;FCGI_accept()&lt;/kbd&gt; again&lt;/li&gt;
	&lt;/ol&gt;


	&lt;p&gt;I want to execute the same script and see how the response time is now:&lt;/p&gt;


&lt;pre&gt;
Time per request:       1.594 [ms] (mean)
&lt;/pre&gt;

	&lt;p&gt;We saved 0.8ms or 30% of the whole request time.&lt;/p&gt;


The strace for magnet is alot simpler:
&lt;pre&gt;
...:28.541361 ... (last call of previous request) 
...:28.541507 accept(0, ...) = 3
...:28.541916 read(3, ...)
...:28.542397 stat64(...)
...:28.542809 write(3, ...)
...:28.544807 close(3)
&lt;/pre&gt;

	&lt;ul&gt;
	&lt;li&gt;0.2ms for the &lt;kbd&gt;accept()&lt;/kbd&gt;&lt;/li&gt;
		&lt;li&gt;0.4ms for the &lt;kbd&gt;read()&lt;/kbd&gt;&lt;/li&gt;
		&lt;li&gt;0.4ms for the &lt;kbd&gt;stat()&lt;/kbd&gt;&lt;/li&gt;
		&lt;li&gt;0.5ms to execute the script and write the response&lt;/li&gt;
		&lt;li&gt;2.0ms to wait for the final packet and closing the connection&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;If we now had keep-alive in our FastCGI implementation &amp;#8230;&lt;/p&gt;


	&lt;p&gt;We already saw, that the script executing is 0.8ms faster in real-time. If we attach strace to lighty again and run ab, we get what we expect:&lt;/p&gt;


&lt;pre&gt;
Time per request:       4.800 [ms] (mean) [strace]
&lt;/pre&gt;

	&lt;p&gt;The 5.6ms from &lt;span class="caps"&gt;PHP&lt;/span&gt; in strace with keep-live is 0.8ms more.&lt;/p&gt;


&lt;h3&gt;a core-magnet&lt;/h3&gt;
If you really want spend even less time and don&amp;#8217;t have to wait on any external sources you can also try to use mod-magnet to execute the script.

&lt;pre&gt;
Time per request:       0.584 [ms] (mean)
&lt;/pre&gt;

	&lt;p&gt;As I save the connect to the backend and transferring the data to it I can response blazingly fast.&lt;/p&gt;


&lt;h3&gt;Conclusion&lt;/h3&gt;
Let&amp;#8217;s ask the final question: why do I care about those 0.8ms at all ?

	&lt;table&gt;
		&lt;tr&gt;
			&lt;th&gt;&lt;span class="caps"&gt;PHP&lt;/span&gt; without keep-alive &lt;/th&gt;
			&lt;td&gt; 2.9ms &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;345 req/s&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;th&gt;&lt;span class="caps"&gt;PHP&lt;/span&gt; &lt;/th&gt;
			&lt;td&gt; 2.4ms &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;416 req/s&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;th&gt;&lt;span class="caps"&gt;PHP&lt;/span&gt; patched &lt;/th&gt;
			&lt;td&gt; 2.1ms &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;465 req/s&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;th&gt;lua-fastcgi &lt;/th&gt;
			&lt;td&gt; 1.6ms &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;625 req/s&lt;/td&gt;
		&lt;/tr&gt;
		&lt;tr&gt;
			&lt;th&gt;mod-magnet &lt;/th&gt;
			&lt;td&gt; 0.6ms &lt;/td&gt;
			&lt;td style="text-align:right;"&gt;1700 req/s&lt;/td&gt;
		&lt;/tr&gt;
	&lt;/table&gt;




	&lt;p&gt;As the print &amp;#8220;123&amp;#8221; is the smallest script which really generates output it should show that this is highest request count you can get. You can&amp;#8217;t get more than those 416 req/s with &lt;span class="caps"&gt;PHP&lt;/span&gt; from my test machine:&lt;/p&gt;


	&lt;ul&gt;
	&lt;li&gt;&lt;span class="caps"&gt;AMD&lt;/span&gt; Duron 1.3GHz&lt;/li&gt;
		&lt;li&gt;640MB &lt;span class="caps"&gt;RAM DDR&lt;/span&gt;&lt;/li&gt;
		&lt;li&gt;Disk and Network don&amp;#8217;t matter as the benchmark is &lt;span class="caps"&gt;RAM&lt;/span&gt; and loopback based&lt;/li&gt;
	&lt;/ul&gt;


	&lt;p&gt;If you need a higher throughput use mod-magnet to offload some work from the backend. Perhaps you can cache some content and handle the response directly in mod-magnet and only have to escalate in a few percent of the caches.&lt;/p&gt;


	&lt;p&gt;Or use another scripting language which allows you to work more directly on the FastCGI level like:&lt;/p&gt;


	&lt;ul&gt;
	&lt;li&gt;ruby&lt;/li&gt;
		&lt;li&gt;python&lt;/li&gt;
		&lt;li&gt;perl&lt;/li&gt;
		&lt;li&gt;&lt;a href="http://jan.kneschke.de/projects/lua/"&gt;http://jan.kneschke.de/projects/lua/&lt;/a&gt;&lt;/li&gt;
	&lt;/ul&gt;</description>
      <pubDate>Sun, 08 Oct 2006 12:18:00 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:24012505-dd33-414e-aec8-90e31a33307b</guid>
      <author>jan</author>
      <link>http://blog.lighttpd.net/articles/2006/10/08/reducing-requests-setup-costs</link>
      <category>lighttpd</category>
      <category>magnet</category>
      <trackback:ping>http://blog.lighttpd.net/articles/trackback/2083</trackback:ping>
    </item>
    <item>
      <title>"reducing Requests-Setup-Costs" by Jagsusweet@yahoo.com.au</title>
      <description>
To whom it may consern you.

What are the set-up costs for network?

Reguards
Suramya
e-mail: &lt;a href="mailto:Jagsusweet@yahoo.com.au" rel="nofollow"&gt;Jagsusweet@yahoo.com.au&lt;/a&gt;
</description>
      <pubDate>Wed, 25 Oct 2006 22:03:50 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:157c9259-920c-4286-8282-1880fa62c131</guid>
      <link>http://blog.lighttpd.net/articles/2006/10/08/reducing-requests-setup-costs#comment-2180</link>
    </item>
    <item>
      <title>"reducing Requests-Setup-Costs" by Jagsusweet@yahoo.com.au</title>
      <description>
To whom it may consern you.

What are the set-up costs for network?

Reguards
Suramya
e-mail: &lt;a href="mailto:Jagsusweet@yahoo.com.au" rel="nofollow"&gt;Jagsusweet@yahoo.com.au&lt;/a&gt;
</description>
      <pubDate>Wed, 25 Oct 2006 22:03:36 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:5b554f0f-dc9b-45c1-9ba3-3a7dd00db1f8</guid>
      <link>http://blog.lighttpd.net/articles/2006/10/08/reducing-requests-setup-costs#comment-2179</link>
    </item>
    <item>
      <title>"reducing Requests-Setup-Costs" by neuschnee</title>
      <description>"As old CLI scripts can be executed with the CGI SAPI it has to support the “#!/usr/bin/php” sequence for shell scripts. For webapps this line has to be skipped." -- mind sharing a patch for removing this?</description>
      <pubDate>Mon, 16 Oct 2006 18:26:43 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:6321c901-9dfd-40bc-859a-2a63ae99f235</guid>
      <link>http://blog.lighttpd.net/articles/2006/10/08/reducing-requests-setup-costs#comment-2126</link>
    </item>
    <item>
      <title>"reducing Requests-Setup-Costs" by Jan Kneschke</title>
      <description>That's why I always generated the same, constant load. Only one request at a time (to have no overlapped requests) and a minimal CPU load from ab. 

The testing had to be done locally to get the network-latency (which is in the range of the request-time) out of the measurement.

For throughput oriented benchmarks httpperf, flood, siege, http_load, ... are better tools.</description>
      <pubDate>Mon, 09 Oct 2006 07:50:26 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:74b7e3a9-bded-4408-a040-d1248bda8192</guid>
      <link>http://blog.lighttpd.net/articles/2006/10/08/reducing-requests-setup-costs#comment-2094</link>
    </item>
    <item>
      <title>"reducing Requests-Setup-Costs" by Mark Nottingham</title>
      <description>Very nice. I'd suggest taking a look at httperf for testing instead of ab; it's easier to assure that you're testing the server, not the client or network.

Speaking of which, it's probably a good idea to generate your load on another machine. It doesn't matter so much for lower-performance scripts, but when you're doing multiple thousands of requests a second, using the same machine can introduce a lot of skew in your results.

Cheers,</description>
      <pubDate>Mon, 09 Oct 2006 04:46:18 +0000</pubDate>
      <guid isPermaLink="false">urn:uuid:c3e42bc0-a40d-4f38-801e-74f2878a3426</guid>
      <link>http://blog.lighttpd.net/articles/2006/10/08/reducing-requests-setup-costs#comment-2088</link>
    </item>
  </channel>
</rss>
