Download presentation
Presentation is loading. Please wait.
Published byKimberly Welch Modified over 9 years ago
1
03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University 2008-04-01
2
03/12/08Nuova Systems Inc. Page 2 TCP: Not Just for “The Internet” Essentially all network software relies on TCP/IP semantics “The network is the data center” In the data center, gigabits are “free” 10 5 times cheaper than WAN bandwidth Terabit class switches 10Gb endpoints TCP needs: High bandwidth Low Latency Predictability & Fairness
3
03/12/08Nuova Systems Inc. Page 3 Storage Networks Storage Access slowly evolving from hardware bus to open network NAS vs SAN NFS & CIFS vs SCSI's many flavors Ethernet vs Fibre Channel vs Infiniband
4
03/12/08Nuova Systems Inc. Page 4 Storage Networks: Ethernet vs EtherNot iSCSI, NFS, CIFS TCP & Ethernet Congestion Loss Stream Oriented Software Transport High CPU overhead SCSI-FCP, SCSI-SRP F.C. and Infiniband Credit Flow Control Block Oriented Hardware Transport Low CPU overhead
5
03/12/08Nuova Systems Inc. Page 5 Storage Networks: Convergence Data Center Ethernet Choice of congestion classes Lossy vs lossless Choice of storage transports TCP or F.C. (FCOE) Choice of hardware or software transport TOE w TCP, software FCOE,...
6
03/12/08Nuova Systems Inc. Page 6 TCP: Time Out of Joint TCP was standardized in a much slower world ½ Second minimum retransmit timeout 20 micro-second RTT achievable today! Fast re-transmit algorithm only works for streams – more data being sent Most data center traffic is request/response – often single packets Packet loss hurts because TCP won't (not can't) respond fast enough
7
03/12/08Nuova Systems Inc. Page 7 Congestion in the Data Center Gigantic, non-blocking switches are the norm Hundreds of ports, terabits of throughput Buffers and buffer management are the most costly part of the switch Link based flow control (“pause”) allows switch to push congestion back to its upstream neighbors If the upstream neighbor is the source server, then the congestion “Goes away” Or does it?
8
03/12/08Nuova Systems Inc. Page 8 Servers and Gigabits Any current x86 server can easily saturate a 1Gb Ethernet link with TCP traffic Many current servers can saturate 10Gb Ethernet links! Lossless classes cause the pipe to fill faster What happens when the first hop, the server's own Ethernet link, is the point of congestion?
9
03/12/08Nuova Systems Inc. Page 9 TCP and the Fat Pipe If TCP doesn't “see” congestion (loss or ECN) then it will continue to increase its window to try to get more bandwidth in the network Lossless network => high throughput But... a single streaming connection will consume all available buffers Newer connections will have a hard time getting buffers => extreme unfairness The server needs good congestion management
10
03/12/08Nuova Systems Inc. Page 10 Servers, Ethernet, and Queues “Everyone” knows that big, simple FIFO queues are a bad idea in routers What do servers have today? - big, simple FIFO queues! The queues are owned and maintained by the Ethernet NIC hardware Horrible unfairness can be demonstrated with only 2 TCP connections Many servers deal with 1000s of TCP connections
11
03/12/08Nuova Systems Inc. Page 11 Connection Size vs Throughput – idle 1G link
12
03/12/08Nuova Systems Inc. Page 12 Connection Size vs Throughput – busy 1G link – competing with a single “hog” connection UNFAIR!
13
03/12/08Nuova Systems Inc. Page 13 Improving Server Congestion Management Omitted due to event rules!
14
03/12/08Nuova Systems Inc. Page 14 TCP: Rock or Hard Place? With lossy Ethernet, TCP bandwidth can collapse due to stupidly high timeouts => Unpredictable performance With lossless Ethernet, TCP fairness can collapse due to stupid queuing policies => Unpredictable performance Data Center Managers hate unpredictability Ethernet standards have evolved, TCP needs to catch up TCP and Ethernet implementations must improve
15
03/12/08Nuova Systems Inc. Page 15 Why does this matter? The Earth is being paved by data centers Google, Microsoft, NSA, Walmart, Facebook,... Improving TCP means more overall efficiency in the data center Heat, CO 2, and radioactive waste are becoming measurable by-products of TCP inefficiency Fix TCP => Save the World!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.