FAST Protocols for High Speed Network David netlab, Caltech For HENP WG, Feb 1st 2003
FAST Protocols for Ultrascale Networks netlab.caltech.edu/FAST Internet: distributed feedback control system TCP: adapts sending rate to congestion AQM: feeds back congestion information R f (s) R b ’ (s) xy pq TCPAQM Theory Calren2/Abilene Chicago Amsterdam CERN Geneva SURFNet StarLight WAN in Lab Caltech research & production networks Multi-Gbps ms delay Experiment 155Mb/s slow start equilibrium FAST recovery FAST retransmit time out 10Gb/s Implementation Students Choe (Postech/CIT) Hu (Williams) J. Wang (CDS) Z.Wang (UCLA) Wei (CS) Industry Doraiswami (Cisco) Yip (Cisco) Faculty Doyle (CDS,EE,BE) Low (CS,EE) Newman (Physics) Paganini (UCLA) Staff/Postdoc Bunn (CACR) Jin (CS) Ravot (Physics) Singh (CACR) Partners CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco People
FAST project Goal: Protocols (TCP/AQM) for ultrascale networks Bandwidth: 10Mbps ~ > 100 Gbps Delay: ms delay Research: Theory, algorithms, design, implement, demo, deployment Urgent Need: –Large amount of Data to share (500TB in SLAC) –Typical file in SLAC transfer ~1 TB (15 mins with 10Gbps)
HEP Network (DataTAG) NL SURFnet GENEVA UK SuperJANET4 ABILEN E ESNET CALRE N It GARR-B GEANT NewYork Fr Renater STAR-TAP STARLIGHT Wave Triangle 2.5 Gbps Wavelength Triangle Gbps Triangle in 2003 Newman (Caltech)
Projected performance Ns-2: capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps 100 sources, 100 ms round trip propagation delay ’ ’ ’ ’04 5 ’05 10 J. Wang (Caltech)
Throughput as function of the time Chicago -> CERN Linux kernel Traffic generated by iperf (I measure the throughput over the last 5 sec) TCP single stream RTT = 119msMTU = 1500 Duration of the test : 2 hours By Sylvain Ravot (Caltech) Current TCP (Linux Reno)
As MTU increase… 1.5K, 4K, 9K … By Sylvain Ravot (Caltech) Current TCP (Linux Reno)
Better? ???? By Some Dreamers (Somewhere)
FAST Network CERN (Geneva) SLAC (Sunnyvale), GE, Standard MTU Sunnyval -> CERN Linux kernel FAST enabled RTT = 180 ms MTU = 1500 By C. Jin & D. Wei (Caltech)
Theoretical Background
Congestion control x i (t) R
Congestion control x i (t) Example congestion measure p l (t) –Loss (Reno) –Queueing delay (Vegas) p l (t) x i (t) → p l (t) AQM: y l (t) TCP
TCP/AQM Congestion control is a distributed asynchronous algorithm to share bandwidth It has two components –TCP: adapts sending rate (window) to congestion –AQM: adjusts & feeds back congestion information They form a distributed feedback control system –Equilibrium & stability depends on both TCP and AQM –And on delay, capacity, routing, #connections p l (t) x i (t) TCP: Reno Vegas AQM: DropTail RED REM/PI AVQ
Methodology Protocol (Reno, Vegas, RED, REM/PI…) Equilibrium Performance Throughput, loss, delay Fairness Utility Dynamics Local stability Cost of stabilization
Goal: Fast AQM Scalable TCP Equilibrium properties –Uses end-to-end delay (and loss) as congestion measure –Achieves any desired fairness, expressed by utility function –Very high bandwidth utilization (99% in theory) Stability properties –Stability for arbitrary delay, capacity, routing & load –Good performance Negligible queueing delay & loss introduced by the protocol Fast response
Implementation and Experiment
Implementation First Version (demonstrated in SuperComputing Conf, Nov 2002): Sender-side kernel modification (Good for File sharing service) Challenges: –Effects ignored in theory –Large window size and high speed
SCinet Caltech-SLAC experiments netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002 Network Topology SunnyvaleBaltimore Chicago Geneva 3000km 1000km 7000km C. Jin, D. Wei, S. Low FAST Team and Partners FAST TCP Standard MTU Peak window = 14,255 pkts Throughput averaged over > 1hr 925 Mbps single flow/GE card 9.28 petabit-meter/sec 1.89 times LSR 8.6 Gbps with 10 flows 34.0 petabit-meter/sec 6.32 times LSR 21TB in 6 hours with 10 flows Highlights Geneva-Sunnyvale Baltimore-Sunnyvale FAST I2 LSR #flows
FAST BMPS flowsBmps Peta Thruput Mbps Distance km Delay ms MTU B Duration s Transfer GB Path Alaska- Amsterdam , Fairbanks, AL – Amsterdam, NL MS-ISI ,626-4, MS, WA – ISI, Va Caltech-SLAC , ,5003,600387CERN - Sunnyvale Caltech-SLAC ,79710, ,5003,600753CERN - Sunnyvale Caltech-SLAC ,1233,948851,50021,60015,396Baltimore - Sunnyvale Caltech-SLAC ,9403,948851,5004,0303,725Baltimore - Sunnyvale Caltech-SLAC ,6093,948851,50021,60021,647Baltimore - Sunnyvale Mbps = 10 6 b/s; GB = 2 30 bytes C. Jin, D. Wei, S. Low FAST Team and Partners
FAST BMPS flowsBmps Peta Thruput Mbps Distance km Delay ms MTU B Duration s Transfer GB Path Alaska- Amsterdam , Fairbanks, AL – Amsterdam, NL MS-ISI ,626-4, MS, WA – ISI, Va Caltech-SLAC , ,5003,600387CERN - Sunnyvale Caltech-SLAC ,79710, ,5003,600753CERN - Sunnyvale Mbps = 10 6 b/s; GB = 2 30 bytes C. Jin, D. Wei, S. Low FAST Team and Partners
FAST BMPS flowsBmps Peta Thruput Mbps Distance km Delay ms MTU B Duration s Transfer GB Path Alaska- Amsterdam , Fairbanks, AL – Amsterdam, NL MS-ISI ,626-4, MS, WA – ISI, Va Caltech-SLAC , ,5003,600387CERN - Sunnyvale Caltech-SLAC ,79710, ,5003,600753CERN - Sunnyvale Mbps = 10 6 b/s; GB = 2 30 bytes C. Jin, D. Wei, S. Low FAST Team and Partners
FAST BMPS flow s Bmps Peta Thruput Mbps Distance km Delay ms MTU B Duration s Transfer GB Path Alaska- Amsterdam , Fairbanks, AL – Amsterdam, NL MS-ISI ,626-4, MS, WA – ISI, Va Caltech-SLAC , ,5003,600387CERN - Sunnyvale Caltech-SLAC ,79710, ,5003,600753CERN - Sunnyvale Caltech-SLAC ,1233,948851,50021,60015,396Baltimore - Sunnyvale Caltech-SLAC ,9403,948851,5004,0303,725Baltimore - Sunnyvale Caltech-SLAC ,6093,948851,50021,60021,647Baltimore - Sunnyvale Mbps = 10 6 b/s; GB = 2 30 bytes C. Jin, D. Wei, S. Low FAST Team and Partners
FAST Aggregate throughput 1 flow 2 flows 7 flows 9 flows 10 flows Average utilization 95% 92% 90% 88% FAST Standard MTU Utilization averaged over > 1hr 1hr 6hr1.1hr 6hr C. Jin, D. Wei, S. Low
FAST vs Linux TCP ( ) Linux TCP Linux TCP FAST Average utilization 19% 27% 92% FAST Standard MTU Utilization averaged over 1hr txq=100txq= % 16% 48% Linux TCP Linux TCP FAST 2G 1G C. Jin (Caltech)
Trial Deployment FAST Kernel Installed: SLAC: Les Cottrell, etc. www-iepm.slac.stanford.edu/monitoring/bulk/fast FermiLab: Michael Ernst, etc. Coming soon: 10-Gbps NIC Testing (Sunnyval - CERN) Internet2 …
Detailed Information: Home Page: Theory: Implementation & Testing: Publications:
FAST netlab.caltech.edu/FAST Theory D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA) Prototype C. Jin, D. Wei Experiment/facilities –Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S. Ravot (Caltech/CERN), S. Singh –CERN: O. Martin, P. Moroni –Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip –DataTAG: E. Martelli, J. P. Martin-Flatin –Internet2: G. Almes, S. Corbato –Level(3): P. Fernes, R. Struble –SCinet: G. Goddard, J. Patton –SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J. Navratil, J. Williams –StarLight: T. deFanti, L. Winkler –TeraGrid: L. Winkler Major sponsors ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF Acknowledgments
Thanks Questions?