By Josh Anderson
As high-bandwidth carrier Ethernet services become more prevalent, skeptical clients are increasingly challenging providers and agents to prove they got the big pipe they paid for. You might think the solution is straightforward do an online speed test or in a private environment conduct a large file transfer test, and all is well. Not so fast.
We had a client who thought it was that easy and ended up nearly ripping out a multisite Ethernet network, unconvinced they got what they purchased. That experience taught us that its vital for carriers and consultants to figure out how to explain the intricacies of high-bandwidth networks in a way that is understandable to non-technical decision makers. Its a lot harder than it sounds, because in reality its very technical.
First things first lets dispense with the overhead myth.
Most folks brush off client questions of capacity by explaining that theres overhead” that accounts for the difference between what the client sees and what they purchased. While there is indeed overhead associated with both TCP/IP and Ethernet, from a pure payload perspective that’s rarely the root cause of perceived performance issues on very large circuits.
Overhead for a given frame has many elements things like IP headers, error correction bits and time stamps. But even if you max out all of these overhead bits, the perfect network should still have almost 93 percent of the bandwidth available for application data.
What are client expectations?
Lets say you have a client who just bought a 100 megabits per second Ethernet circuit. That translates to a throughput of around 11.9MBps (Google the math). In a perfect world, 93 percent of the bandwidth should be available, making the theoretical max usable throughput around 11MBps (I know it’s not perfect, but that’s a discussion for another article).
So your client thinks he should be able to send an 11GB file in around 16 minutes. If they have a clever IT person, he knows that it’s not fair to just transfer a file to a Windows shared directory (SMB isnt terribly efficient). So they use a freeware FTP server and put a Blu-ray rip of “Avatar” on it, then fire up their trusty FTP client and try to transfer it over the link. And the performance … isnt 11MB/sec.
Whats going on here?
To understand why these types of tests will never accurately represent a big circuits capacity, it helps to think of the Internet as a bunch of tubes (no, really). Actually, as a bunch of tubes inside another tube.
The big tube is your fat Ethernet connection. The carrier assures you its capacity is 100Mbps. Inside that tube are any number of TCP connections. Think of these as virtual circuits small tubes inside that big tube. In order to truly test the full capacity of that Ethernet connection, you have to establish a TCP connection that fills up that big tube.
For those of you that like to read ahead to the end, Ill save you the trouble. You can never get that little tube to fill up the big tube. The way the TCP protocol is built and the physics of network latency make it nearly impossible. (In fairness, when some people say overhead” they mean all this stuff).
Now we have to get really technical.
If your customer doesnt believe that layman’s explanation (and I dont blame them), youre going to have to get technical. The reason there is a limit to the size of those TCP connections has to do with how the TCP protocol was originally designed to provide reliable data transmission while still performing well.
When you send data via TCP, your computer breaks that data up into a bunch of packets then sends them out over the network to the destination. Those packets may not all follow the same path, but they all have to show up eventually at the destination so they can be put back together into that spreadsheet or Gangnam Style parody video.
The way TCP handles this is by requiring the receiver to send back an acknowledgement that it got a clean packet with no errors. If the sender doesnt get this verification, or if it hears that the packet was damaged in transit, itll resend it.
This back and forth would be very inefficient if it happened one packet at a time, so TCP allows a sender to negotiate with the receiver to agree on a number of packets that can be in transit at one time. This way the sender can blast out a bunch of packets and process the acknowledgements as they come in. This strategy ensures that the performance of the TCP connection is maximized.
The number of packets that the sender and destination agree on is known as the receive window,” and it is the limiting factor in any TCP transfer. It effectively sets the capacity for that TCP connection.
By default, the TCP protocol defines this window to be no larger than 64KB. If youre on a network with round-trip latency of 40ms, that means that your FTP transfer can have a maximum of 64KB of data flying around the network at any given point in time. Since it takes 40ms for those TCP packets to get from point A to point B and for the acknowledgement to be received, that works out to a data transfer rate of 13.1MBps (for those of you checking the math at home dont forget 1KB = 1024B). Before you go cancelling that GigE connection, stick with me itll all make sense soon.
You heard right a standard TCP connection cant go faster than 13.1MBps. Unless …
There are ways to manipulate this window and make it bigger than 64KB. I wont go into that here, but if you really want to dig deep, these articles are a good place to start. The problem is that there are a number of components in a network system that can manipulate the window. Your OS has its own window setting, but so may the software youre using to do the transfer. Depending on the network setup and the actual protocol youre using to do the transfer, your routers may also manipulate the TCP traffic or may not be able to handle the tricks that are used to make the receive window bigger.
The bottom line is that there are a lot of reasons why your single TCP connection will have trouble breaking the 13.1MBps barrier. And even if you do open the receive window, you still have physical limitations on the computers doing the sending and receiving. They need enough memory and fast enough disks and CPUs to keep up with the storm of TCP traffic.
As if that werent enough, the fact that TCP requires acknowledgements at all means youll never be able to send at the full capacity of your circuit because you have to share the pipe with those return packets. And we havent even talked about dropped packets or transmission errors.
So how can you convince your client theyre getting what they paid for?
While there are devices that can test the actual capacity of a circuit, Im betting you dont have one (or two, since thats what youd really need). My suggestion would be to explain that there are limitations to a single communication session that would cause a stand-alone test to appear to fail. To demonstrate this fact, you can set up two transfers on two sets of computers across the same pipe. You should see both transfers max out at the same rate as one transfer would by itself, proving that the limitation is in the transfer, not in the pipe.
The reason this is important is that in the real world, traffic going over that connection isnt going to come in the form of a single large transfer. Its going to be dozens or hundreds of connections accessing data from any number of servers. Each of those little tubes wont have to jockey for space in the big tube, meaning the net performance of the network will be greater. And meaning that in reality, your client will indeed see most of the bandwidth they were promised.
Josh Anderson is the CEO of Telephony Partners, a telecom master agency he founded in 2002 leveraging engineering and software expertise. He also is a former member of the Channel Partners Advisory Board.