Giving a definitive performance statement for our ZXTM software is always a challenge because, unlike our hardware appliance contemporaries, our customers have an almost infinite choice of platforms to deploy our software on.
Periodically, we take a new-generation server and run it through our internal benchmarking platform so that we can produce some performance guidelines; this is the same platform that we use for our internal software tuning and regression testing at each release. With all of the buzz around Intel’s Nehalem chipsets, there was a lot of anticipation in the engineering team to the performance that ZXTM might achieve.
It didn’t disappoint: The x4170 Sun server is equipped with two Nehalem-class processors (sounding like a cold war nuclear submarine):
This Nehalem server was the first one to saturate the 10Gbits of testing infrastructure that we had available at the time.
See the full performance chart here; there’s about 12 months of processor development between each column in the performance chart, so performance of a ZXTM-based Traffic Manager is growing by 50-100% year-on-year on most counts, simply due to advances in processors and system architecture.
The Nehalem server’s list price is just shy of $10,000 (source sun.com, July ’09). The previous generation lists at $7,000, and the Opteron server is EoL (an equivalent 4-core system from Dell costs in the region of $1000-$2000). You can buy the server to meet your traffic needs now, and cost-effectively upgrade the hardware just when you need to.
CPU Utilization
During the tests, we also measure CPU utilization, seeking to drive the idle time to single figure percentages to verify that we have extracted all the performance we can from the system (notwithstanding ZXTM’s supra-linear performance growth with CPU utilization). During the higher-bandwidth tests, we saw considerable idle time, topping out at over 70% idle when the system was serving 10Gbits from its webcache. This means there’s a lot of CPU resource remaining for other tasks... running TrafficScript rules for example.
Can I obtain these figures in real life?
A Bugatti Veyron will reach 253 mph at the end of a 9km straight on Volkswagen’s test track, but will struggle to maintain 60 mph on a twisty back road. Likewise, the performance of any network appliance will vary significantly between a carefully controlled benchmark environment, and in real life.
Zeus’ benchmarks are fairly straightforward and representative. We don’t rely on specialist layer-3 packet processing hardware or dumbed down layer-4 profiles to get great performance figures on a configuration that is nothing like what you would deploy in production (because we haven't got any trick hardware or dumbed down profiles). As a result, ZXTM users get fewer surprises than they sometimes do with hardware appliances.
The most significant variables that will cause a deviation from the benchmarks above and what you will achieve on your own service are:
- the complexity of the Trafficscript and Java rules you use;
- the differences in behaviour between slow, jittery, unreliable real-world clients and networks and the fast local network and well-behaved clients that are used in a benchmark.
As ever, the best advice anyone can give you is to acquire and test your shortlisted load balancers with your required configuration, and ultimately against your live traffic. And, put ZXTM on your shortlist – it only takes a couple of minutes to download and much less than an hour to install and get started, and the feature set and support is second to none.
Very interesting blog and provides insight into performance of using commodity HW as a ADC. One question I have though on the performance number related to Layer7 Connections per second is much higher than Layer4 Connections per second. Can you please explain what the reason is for this? I had seen that devices just doing Layer4 providing much better CPS numbers as the CPU cycles are not used to proxy the connections which is the case in Layer7.
Thanks,
Tech Savy user
Posted by: Tech Savy | August 07, 2009 at 10:55 PM
Hi Tech Savy,
You're correct to point out that Layer 7 load balancing takes more CPU cycles than Layer 4.
However, when ZXTM processes HTTP traffic at layer 7, it can deploy a number of optimizations that improve performance. The most significant is HTTP keepalives. Without keepalives, ZXTM would have to open a new connection to the server each time a client connected. With HTTP keepalives, ZXTM can reuse TCP connections to the server. This reduces server bandwidth and greatly reduces CPU cycles on ZXTM (and on the server).
The benefits of optimizations like HTTP Keepalives greatly outweighs the cost of the CPU cycles needed to manage the traffic at layer 7.
regards
Owen Garrett
Posted by: Zeus Technology | August 10, 2009 at 09:36 AM