A few days ago, I was looking for a HTTP taffic generator and found Curl-Loader. This tool allows you to query your webservers for a certain URL at maximum speed with a configurable number of concurrent clients, each using their own, unique IP address generated on-the-fly by Curl-Loader.
In my case, Curl-Loader was started on 2 machines, generating a total load of 4000 simultaneous clients, each querying the same URL from a Load-Balanced setup with 2 servers running HAProxy in an Active/Standby setup using HeartBeat and 4 backend webservers (3 active/1 backup) running Lighttpd.
When Curl-Loader is started, it generates an IP address per client connection, so running 4000 clients results in 4000 IP addresses connecting to the Load-Balancer. During the test, the Load-Balancer and traffic generators were in the same network, so the Load-Balancer suddenly had to cope with 4000 new ARP entries.
What I could notice was, that, as the number of clients went up (Curl-Loader does this gradually – in a configurable way), the number of connections and traffic load on the Load-Balancer went downhill.
A quick look in dmesg output on the Load-Balancer showed the following :
[ 513.254026] Neighbour table overflow.
Turns out, this means that the table holding the ARP cache is a bit… well… overcrowded and reaches its hard maximum of number of entries, so it’s time to start tweaking a few parameters…
There are three (actually 4, but the last one isn’t needed here) parameters controlling the ARP cache size and garbage collecting:.
- gc_thresh1 is the minimum number of entries in the ARP cache. If the actual number of entries is below this value, the garbage collector will not run (default value on Debian Lenny: 128)
- gc_thresh2 is the soft maximum number of entries. If the actual number of entries is above this value for more than 5 seconds, the garbage collector will run (default value on Debian Lenny: 512)
- gc_thresh3 is the hard maximum number of entries. If the actual number of entries is above this value, the garbage collector with immediately run. It is also the maximum value of ARP entries that can be kept in the table (default value on Debian Lenny: 1024)
- gc_interval is the interval the garbage collection will run and remove entries that are no longer in use (default value on Debian Lenny: 30)
The value of these parameters can be read by issuing the following command:
To change their value, use the sysctl command, or echo a new value into the parameter:
sysctl -w net.ipv4.neigh.default.gc_thresh3=1024000
In the test setup, I changed all three gc_thresh values up to 1024000 allowing a million entries in the ARP cache, and the garbage collection will never run. Is this ideal ? Maybe not, but during load testing, it allows not to be confronted with ARP cache issues.
Some links :
- Curl-Loader: http://curl-loader.sourceforge.net/
- A blog posting on the same issue, in a slightly different context: http://blog.lachmann.org/2010/01/neighbour-table-overflow/
- HAProxy: http://haproxy.1wt.eu/
- Lighttpd: http://www.lighttpd.net/