FAST TCP in Linux Cheng Jin David Wei
netlab.caltech.edu Outline Overview of FAST TCP. Implementation Details. SC2002 Experiment Results. FAST Evaluation and WAN-in-Lab.
netlab.caltech.edu FAST vs. Linux TCP Distance = 10,037 km; Delay = 180 ms; MTU = 1500 B; Duration: 3600 s Linux TCP Experiments: Jan 28-29, Linux TCP txqlen= Linux TCP txqlen= Linux TCP txqlen= Transfer (GB) 1, FAST FAST Linux TCP txqlen=10000 Throughput (Mbps) Bmps Peta Flows
netlab.caltech.edu Aggregate Throughput Linux TCP Linux TCP FAST Average utilization 19% 27% 92% FAST Standard MTU Utilization averaged over 1hr txq=100txq= % 16% 48% Linux TCP Linux TCP FAST 2G 1G
netlab.caltech.edu Summary of Changes RTT estimation: fine-grain timer. Fast convergence to equilibrium. Delay monitoring in equilibrium. Pacing: reducing burstiness.
netlab.caltech.edu FAST TCP Flow Chart Slow Start Fast Convergence Equilibrium Loss Recovery Normal Recovery Time-out
netlab.caltech.edu RTT Estimation Measure queueing delay. Kernel timestamp with s resolution. Use SACK to increase the number of RTT samples during recovery. Exponential averaging of RTT samples to increase robustness.
netlab.caltech.edu Fast Convergence Rapidly increase or decrease cwnd toward equilibrium. Monitor the per-ack queueing delay to avoid overshoot.
netlab.caltech.edu Equilibrium Vegas-like cwnd adjustment in large time-scale -- per RTT. Small step-size to maintain stability in equilibrium. Per-ack delay monitoring to enable timely detection of changes in equilibrium.
netlab.caltech.edu Pacing What do we pace? Increment to cwnd. Time-Driven vs. event-driven. Trade-off between complexity and performance. Timer resolution is important.
netlab.caltech.edu Time-Based Pacing cwnd increments are scheduled at fixed intervals. dataackdata
netlab.caltech.edu Event-Based Pacing Detect sufficiently large gap between consecutive bursts and delay cwnd increment until the end of each such burst.
SCinet Caltech-SLAC experiments netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002 Experiment SunnyvaleBaltimore Chicago Geneva 3000km 1000km 7000km C. Jin, D. Wei, S. Low FAST Team and Partners Internet: distributed feedback system R f (s) R b ’ (s) x p TCP AQM Theory FAST TCP Standard MTU Peak window = 14,255 pkts Throughput averaged over > 1hr 925 Mbps single flow/GE card 9.28 petabit-meter/sec 1.89 times LSR 8.6 Gbps with 10 flows 34.0 petabit-meter/sec 6.32 times LSR 21TB in 6 hours with 10 flows Implementation Sender-side modification Delay based Highlights Geneva-Sunnyvale Baltimore-Sunnyvale FAST I2 LSR #flows
netlab.caltech.edu Network (Sylvain Ravot, caltech/CERN)
netlab.caltech.edu FAST BMPS Internet2 Land Speed Record FAST Geneva-Sunnyvale Baltimore-Sunnyvale #flows FAST Standard MTU Throughput averaged over > 1hr
netlab.caltech.edu Aggregate Throughput 1 flow 2 flows 7 flows 9 flows 10 flows Average utilization 95% 92% 90% 88% FAST Standard MTU Utilization averaged over > 1hr 1hr 6hr 1.1hr6hr
netlab.caltech.edu Caltech-SLAC Entry Rapid recovery after possible hardware glitch Power glitch Reboot Mbps ACK traffic
SCinet Caltech-SLAC experiments netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002 Prototype C. Jin, D. Wei Theory D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA) Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S. Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J. Navratil, J. Williams StarLight: T. deFanti, L. Winkler TeraGrid: L. Winkler Major sponsors ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF Acknowledgments
netlab.caltech.edu Evaluating FAST End-to-End monitoring doesn’t tell the whole story. Existing network emulation (dummynet) is not always enough. Better optimization if we can look inside and understand the real network.
netlab.caltech.edu Dummynet and Real Testbed
netlab.caltech.edu Dummynet Issues Not running on a real-time OS -- imprecise timing. Lack of priority scheduling of dummynet events. Bandwidth fluctuates significantly with workload. Much work needed to customize dummynet for protocol testing.
netlab.caltech.edu 10 GbE Experiment Long-distance testing of Intel 10GbE cards. Sylvain Ravot (Caltech) achieved 2.3 Gbps using single stream with jumbo frame and stock Linux TCP. Tested HSTCP, Scalable TCP, FAST, and stock TCP under Linux. 1500B MTU: 1.3 Gbps SNV -> CHI; 9000B MTU: 2.3 Gbps SNV -> GVA
netlab.caltech.edu TCP Loss Mystery Frequent packet loss with 1500-byte MTU. None with larger MTUs. Packet loss even when cwnd is capped at packets. Routers have large queue size of 4000 packets. Packets captured at both sender and receiver using tcpdump.
netlab.caltech.edu How Did the Loss Happen? loss detected
netlab.caltech.edu How Can WAN-in-Lab Help? We will know exactly where packets are lost. We will also know the sequence of events (packet arrivals) that lead to loss. We can either fix the problem in the network if any, or improve the protocol.
netlab.caltech.edu Conclusion FAST improves the end-to-end performance of TCP. Many issues are still to be understood and resolved. WAN-in-Lab can help make FAST a better protocol.