Hash Balancing with mod_proxy_core 9

Posted by jan Wed, 19 Jul 2006 11:38:52 GMT

mod_proxy and mod_proxy_core support 3 balancers to spread the load over multiple backends. One of them is Hash balancing which is very good for balancing the load of caching proxies like Squid.

If you compare the performance of Hash-Balancing to the classic round-robin balancing you should see a increase of the performance as the backends can use their caches a lot better. With RR each backend has to handle the full URL namespace, with Hash-Balancing only a part. This increases the cache-locality and the overall performance.

I’ve taken wikipedia as testbed for the hash-balancing:

$SERVER["socket"] == ":1445" {
  proxy-core.balancer = "hash" 
  proxy-core.protocol = "http" 
  proxy-core.backends = ( "wikipedia.org" )
  proxy-core.rewrite-response = (
    "Location" => ( "^http://en.wikipedia.org/(.*)" => "http://127.0.0.1:1445/$1" ),
  )
  proxy-core.rewrite-request = (
    "Host" => ( ".*" => "en.wikipedia.org" ),
 )
}

The domain wikipedia.org resolves to several IP-addresses:

(trace) resolving wikipedia.org on port 80
(trace) adding 207.142.131.204:80 to the address-pool
(trace) adding 207.142.131.205:80 to the address-pool
(trace) adding 207.142.131.206:80 to the address-pool
(trace) adding 207.142.131.210:80 to the address-pool
(trace) adding 207.142.131.213:80 to the address-pool
(trace) adding 207.142.131.214:80 to the address-pool
(trace) adding 207.142.131.235:80 to the address-pool
(trace) adding 207.142.131.236:80 to the address-pool
(trace) adding 207.142.131.245:80 to the address-pool
(trace) adding 207.142.131.246:80 to the address-pool
(trace) adding 207.142.131.247:80 to the address-pool
(trace) adding 207.142.131.248:80 to the address-pool
(trace) adding 207.142.131.202:80 to the address-pool
(trace) adding 207.142.131.203:80 to the address-pool

When I request http://127.0.0.1:1445/ the load-balancer takes the URL hashes it and sends it the one of the backends.

(trace) using hash-balancing: /wiki/Main_Page -> 207.142.131.204:80
(trace) using hash-balancing: /skins-1.5/monobook/main.css -> 207.142.131.204:80
(trace) using hash-balancing: /skins-1.5/common/commonPrint.css -> 207.142.131.213:80
(trace) using hash-balancing: /skins-1.5/common/wikibits.js -> 207.142.131.245:80
(trace) using hash-balancing: /w/index.php -> 207.142.131.206:80
(trace) using hash-balancing: /w/index.php -> 207.142.131.206:80
(trace) using hash-balancing: /w/index.php -> 207.142.131.206:80
(trace) using hash-balancing: /w/index.php -> 207.142.131.206:80
(trace) using hash-balancing: /skins-1.5/monobook/headbg.jpg -> 207.142.131.202:80
(trace) using hash-balancing: /skins-1.5/monobook/bullet.gif -> 207.142.131.245:80
(trace) using hash-balancing: /skins-1.5/common/images/poweredby_mediawiki_88x31.png -> 207.142.131.236:80
(trace) using hash-balancing: /images/wikimedia-button.png -> 207.142.131.203:80
(trace) using hash-balancing: /skins-1.5/monobook/bullet.gif -> 207.142.131.245:80
(trace) using hash-balancing: /skins-1.5/monobook/user.gif -> 207.142.131.202:80
(trace) using hash-balancing: /images/wiki-en.png -> 207.142.131.204:80

You see that the same URL results in the same address that is connected.

If one of the backends goes down, all requests that were meant for that backend are spread over the other backends. If you had 10 backends and 1 goes down, each backend has to server 1/9 URLs of the dead backend.

  • you have 100 URLs, 10 backends
  • each backend handles (100/10) = 10 URLs
  • one backend goes down and its 10 URLs are spread over the other 9 backends
  • each backend handles now its 10 URLs as before + (10/9) URLs of the dead backend

Hash balancing is following the ideas from http://icp.ircache.net/carp.txt

Trackbacks

Use the following link to trackback from your own site:
http://blog.lighttpd.net/articles/trackback/1786

  1. I just uploaded the 3rd pre-release of lighttpd 1.4.12: http://www.lighttpd.net/download/lighttpd-1.4.12-20060724-0947.tar.gz This pre-release should work on most platforms and it mainly got improvements for our mongrel users. A small test has sho...
Comments

Leave a response

  1. runa Wed, 19 Jul 2006 13:47:28 GMT
    As I understand, this will make no difference if all the caching proxies have all the urls cached, or Im missing something?
  2. Jan Kneschke Wed, 19 Jul 2006 13:57:47 GMT
    It is assumed that all proxies have all the same data or at least can get all the cached data. They have to to be able to handle a failure of all but one backend. The difference is the utilization of the FS-cache in the backends. With RR all backends get requests for all URLs, with Hash each backend only gets requests for a part of the URLs. This a basicly like Partitioning in databases.
  3. qhy Thu, 20 Jul 2006 04:36:12 GMT
    how about implemented client-ip hash balancing, which means to use host based on client ip. client-ip hash balancing is useful for application using local session to store user data
  4. qhy Thu, 20 Jul 2006 04:51:10 GMT
    mod_proxy_core may upgrade backend's HTTP 1.0 response to HTTP 1.1 response because mod_proxy_core don't check backend response's HTTP version. From HTTP 1.1 RFC 2616 sec 3.1 we learn that "Due to interoperability problems with HTTP/1.0 proxies discovered since the publication of RFC 2068[33], caching proxies MUST, gateways MAY, and tunnels MUST NOT upgrade the request to the highest version they support. The proxy/gateway’s response to that request MUST be in the same major version as the request." I don't know whether mod_proxy_core belongs to tunnel or not.
  5. Jan Kneschke Thu, 20 Jul 2006 05:27:35 GMT
    The HTTP/1.1 upgrade was for testing only. We were handling the HTTP version correctly as we recoded the response for the backend, but you never know. Upgrade is gone.
  6. Jan Kneschke Thu, 20 Jul 2006 05:30:39 GMT
    client-ip handling wouldn't work for the same reason why you don't use the client-ip as part of the session id: Due to proxy-farms the same client might have another IP on each request. BUT ... if you specify the name of the session-paramter (POST, GET or COOKIE) lighty could balance on it. This is not implemented yet. A shared session-storage like memcached is prefered as it allows real load-balancing and failover.
  7. Malte Geierhos Thu, 20 Jul 2006 09:11:24 GMT
    If you're on the way to let lighty access session variables in POST,GET,COOKIE values, would you please think of all the people out there waiting for access to session in CML ? memcached integration could be a point (push the values on demand into memcache) and make them available via CML or as lighty variable... so that we can access it and use it for session/server stickyness etc.
  8. Tobias Luetke Fri, 21 Jul 2006 23:17:13 GMT
    With the modern crop of web applications using the host name as account key ( turtles.myshopify.com, tobi.backpackit.com etc ) will the hash based load balancing take the host name into account or is only the actual path part of the decision making?
  9. Jan Kneschke Sat, 22 Jul 2006 04:22:47 GMT
    The hash is built over the path and the hostname.
Comments