logo
Published on WebPerformance @ Peragro.info (http://webperformance.peragro.info)

Fundamentals - basic networking skills are still important

By Donald Foss
Created 2006-11-03 16:24

On Monday night this week, I performed a considerably large load test for a customer. They eventually found that anything that can break, will. The test was configured in a standard stepped ramp configuration. The test begins at 25% load, maintains that for 30 minutes, then goes to the next. Within 5 minutes of the beginning of the test, response time starts climbing dramatically. TCP connect time is normal, but first byte time is hitting near 5 seconds. This is a site that normally has a page response time of less than 2 seconds across all pages, which is quite an accomplishment. This is an Alexa Top 500 site. These guys know what they are doing.

3-OC3 n-tier diagram

Unfortunately they are not able to find the problem. I restart the test at a lower load level, and they find that the traffic stops when the internal interface of their F5 load balancer reaches 100 megabit. This is a very interesting number! It could also be red herring. The external interface of the F5 reports 140 megabit, which is also interesting. It shows that client traffic is getting to the F5 without problem, but something is wrong behind it. The F5 smells bad (red herring) so everyone focuses on it. End to end load testing is the best way to make sure that everything works, however it does have its flaws. The biggest is that it finds bottlenecks from the outter most layers or tiers first, and those bottlenecks must be resolved before you can find the next one. Using external testing, it is difficult to perform point testing without being able to inject load at the right layer.

I advised them to do simple bandwidth testing through various layers of their infrastructure. Put some large files on the web servers, and using something simple to pull them. They performed the point testing that I suggested and found that the problem was an interface on their new firewall. They installed a high-end ASA firewall with gigabit interfaces, and it was working fine in production. They found that the interface on the firewall was set to "auto-negiotate". Bad, bad, bad. The term auto-negiotage should be expunged from every network engineer's mind when it comes to servers. Even though it was a gigabit interface, which only runs at full duplex, the switch negiotated down to 100 megabit full duplex. I am surprised that it selected full duplex. This would have been easier to find if it had been half-duplex because the engineer would have observed errors on the interface. The engineer should have noticed dropped packets on the F5 internal interface, but since there were no "errors", no one noticed.

The moral of this story is simple. It does not matter how big your pipe is, how many web servers you have, how many application servers you have...if users cannot get to your site because of the network, then nothing else matters. Your doors are locked and barred.

Technorati Tags: bandwidth [1] web site performance [2] performance [3] load testing [4]

Source URL:
http://webperformance.peragro.info/fundamentals_basic_networking_skills_are_still_important