SCinet Caltech-SLAC experiments netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002  Prototype C. Jin, D. Wei  Theory D. Choe (Postech/Caltech), J. Doyle,

Slides:



Advertisements
Similar presentations
Congestion Control and Fairness Models Nick Feamster CS 4251 Computer Networking II Spring 2008.
Advertisements

Helping TCP Work at Gbps Cheng Jin the FAST project at Caltech
Equilibrium of Heterogeneous Protocols Steven Low CS, EE netlab.CALTECH.edu with A. Tang, J. Wang, Clatech M. Chiang, Princeton.
FAST TCP Steven Low CS/EE netlab.CALTECH.edu Oct 2003.
Michele Pagano – A Survey on TCP Performance Evaluation and Modeling 1 Department of Information Engineering University of Pisa Network Telecomunication.
FAST TCP Anwis Das Ajay Gulati Slides adapted from : IETF presentation slides Link:
Internet Protocols Steven Low CS/EE netlab.CALTECH.edu October 2004 with J. Doyle, L. Li, A. Tang, J. Wang.
Cheng Jin David Wei Steven Low FAST TCP: design and experiments.
Congestion Control Created by M Bateman, A Ruddle & C Allison As part of the TCP View project.
Congestion Control: TCP & DC-TCP Swarun Kumar With Slides From: Prof. Katabi, Alizadeh et al.
1 End to End Bandwidth Estimation in TCP to improve Wireless Link Utilization S. Mascolo, A.Grieco, G.Pau, M.Gerla, C.Casetti Presented by Abhijit Pandey.
CS268: Beyond TCP Congestion Control Ion Stoica February 9, 2004.
XCP: Congestion Control for High Bandwidth-Delay Product Network Dina Katabi, Mark Handley and Charlie Rohrs Presented by Ao-Jan Su.
TCP Stability and Resource Allocation: Part II. Issues with TCP Round-trip bias Instability under large bandwidth-delay product Transient performance.
The War Between Mice and Elephants Liang Guo and Ibrahim Matta Boston University ICNP 2001 Presented by Thangam Seenivasan 1.
The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP
Control Theory in TCP Congestion Control and new “FAST” designs. Fernando Paganini and Zhikui Wang UCLA Electrical Engineering July Collaborators:
Congestion Control Tanenbaum 5.3, /12/2015Congestion Control (A Loss Based Technique: TCP)2 What? Why? Congestion occurs when –there is no reservation.
TCP Stability and Resource Allocation: Part I. References The Mathematics of Internet Congestion Control, Birkhauser, The web pages of –Kelly, Vinnicombe,
Cheng Jin David Wei Steven Low FAST TCP: Motivation, Architecture, Algorithms, Performance.
1 Minseok Kwon and Sonia Fahmy Department of Computer Sciences Purdue University {kwonm, TCP Increase/Decrease.
FAST TCP Speaker: Ray Veune: Room 1026 Date: 25 th October, 2003 Time:10:00am.
Heterogeneous Congestion Control Protocols Steven Low CS, EE netlab.CALTECH.edu with A. Tang, J. Wang, D. Wei, Caltech M. Chiang, Princeton.
FAST TCP in Linux Cheng Jin David Wei
Multi-Gbps TCP 9:00-10:00 Harvey Newman (Physics, Caltech) High speed networks & grids 10:00-10:45 Sylvain Ravot (Physics, Caltech/CERN) LHC networks and.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
Presented by Anshul Kantawala 1 Anshul Kantawala FAST TCP: From Theory to Experiments C. Jin, D. Wei, S. H. Low, G. Buhrmaster, J. Bunn, D. H. Choe, R.
Rafael C. Nunez - Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware May 19 th, 2005 Diffusion Marking Mechanisms.
FAST Protocols for Ultrascale Networks netlab.caltech.edu/FAST Internet: distributed feedback control system  TCP: adapts sending rate to congestion 
Diffusion Mechanisms for Active Queue Management Department of Electrical and Computer Engineering University of Delaware Aug 19th / 2004 Rafael Nunez.
1 A State Feedback Control Approach to Stabilizing Queues for ECN- Enabled TCP Connections Yuan Gao and Jennifer Hou IEEE INFOCOM 2003, San Francisco,
Diffusion Early Marking Department of Electrical and Computer Engineering University of Delaware May / 2004 Rafael Nunez Gonzalo Arce.
Utility, Fairness, TCP/IP Steven Low CS/EE netlab.CALTECH.edu Feb 2004.
Congestion models for bursty TCP traffic Damon Wischik + Mark Handley University College London DARPA grant W911NF
CS/EE 145A Congestion Control Netlab.caltech.edu/course.
FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu.
FAST TCP in Linux Cheng Jin David Wei Steven Low California Institute of Technology.
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
Acknowledgments S. Athuraliya, D. Lapsley, V. Li, Q. Yin (UMelb) S. Adlakha (UCLA), J. Doyle (Caltech), K. Kim (SNU/Caltech), F. Paganini (UCLA), J. Wang.
TCP with Variance Control for Multihop IEEE Wireless Networks Jiwei Chen, Mario Gerla, Yeng-zhong Lee.
Transport Layer 3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March
1 Time-scale Decomposition and Equivalent Rate Based Marking Yung Yi, Sanjay Shakkottai ECE Dept., UT Austin Supratim Deb.
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
Scalable Laws for Stable Network Congestion Control Fernando Paganini UCLA Electrical Engineering IPAM Workshop, March Collaborators: Steven Low,
Bartek Wydrowski Steven Low
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
XCP: eXplicit Control Protocol Dina Katabi MIT Lab for Computer Science
FAST Protocols for High Speed Network David netlab, Caltech For HENP WG, Feb 1st 2003.
Analysis and Design of an Adaptive Virtual Queue (AVQ) Algorithm for AQM By Srisankar Kunniyur & R. Srikant Presented by Hareesh Pattipati.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
Peer-to-Peer Networks 13 Internet – The Underlay Network
FAST TCP Cheng Jin David Wei Steven Low netlab.CALTECH.edu GNEW, CERN, March 2004.
1 FAST TCP for Multi-Gbps WAN: Experiments and Applications Les Cottrell & Fabrizio Coccetti– SLAC Prepared for the Internet2, Washington, April 2003
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
1 Network Transport Layer: TCP Analysis and BW Allocation Framework Y. Richard Yang 3/30/2016.
@Yuan Xue A special acknowledge goes to J.F Kurose and K.W. Ross Some of the slides used in this lecture are adapted from their.
Congestion Control for High Bandwidth-Delay Product Networks Dina Katabi, Mark Handley, Charlie Rohrs Presented by Yufei Chen.
Introduction to Congestion Control
TCP Congestion Control
TCP Congestion Control
cs/ee/ids 143 Communication Networks Chapter 4 Transport
FAST TCP : From Theory to Experiments
TCP Congestion Control
TCP Congestion Control
Understanding Congestion Control Mohammad Alizadeh Fall 2018
Transport Layer: Congestion Control
Presentation transcript:

SCinet Caltech-SLAC experiments netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002  Prototype C. Jin, D. Wei  Theory D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)  Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S. Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J. Navratil, J. Williams StarLight: T. deFanti, L. Winkler  Major sponsors ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF Acknowledgments

FAST Protocols for Ultrascale Networks netlab.caltech.edu/FAST Internet: distributed feedback control system  TCP: adapts sending rate to congestion  AQM: feeds back congestion information R f (s) R b ’ (s) xy pq TCPAQM Theory Calren2/Abilene Chicago Amsterdam CERN Geneva SURFNet StarLight WAN in Lab Caltech research & production networks Multi-Gbps ms delay Experiment 155Mb/s slow start equilibrium FAST recovery FAST retransmit time out 10Gb/s Implementation Students Choe (Postech/CIT) Hu (Williams) J. Wang (CDS) Z.Wang (UCLA) Wei (CS) Industry Doraiswami (Cisco) Yip (Cisco) Faculty Doyle (CDS,EE,BE) Low (CS,EE) Newman (Physics) Paganini (UCLA) Staff/Postdoc Bunn (CACR) Jin (CS) Ravot (Physics) Singh (CACR) Partners CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco People

netlab.caltech.edu FAST project  Protocols for ultrascale networks >100 Gbps throughput, ms delay Theory, algorithms, design, implement, demo, deployment  Faculty Doyle (CDS, EE, BE): complex systems theory Low (CS, EE): PI, networking Newman (Physics): application, deployment Paganini (EE, UCLA): control theory  Research staff 3 postdocs, 3 engineers, 8 students  Collaboration Cisco, Internet2/Abilene, CERN, DataTAG (EU), …  Funding NSF, DoE, Lee Center (AFOSR, ARO, Cisco)

netlab.caltech.edu Outline  Motivation  Theory TCP/AQM TCP/IP  Experimental results

netlab.caltech.edu High Energy Physics Large global collaborations  2000 physicists from 150 institutions in >30 countries  physicists in US from >30 universities & labs SLAC has 500TB data by 4/2002, world’s largest database Typical file transfer ~1 TB  At 622Mbps: ~ 4 hrs  At 2.5Gbps: ~ 1 hr  At 10Gbps: ~15min  Gigantic elephants! LHC (Large Hadron Collider) at CERN, to open 2007  Generate data at PB (10 15 B)/sec  Filtered in realtime by a factor of 10 6 to 10 7  Data stored at CERN at 100MB/sec  Many PB of data per year  To rise to Exabytes (10 18 B) in a decade

netlab.caltech.edu HEP high speed network … that must change

netlab.caltech.edu HEP Network (DataTAG) NL SURFnet GENEVA UK SuperJANET4 ABILEN E ESNET CALRE N It GARR-B GEANT NewYork Fr Renater STAR-TAP STARLIGHT Wave Triangle  2.5 Gbps Wavelength Triangle 2002  10 Gbps Triangle in 2003 Newman (Caltech)

netlab.caltech.edu Network upgrade ’ ’ ’ ’04 5 ’05 10

netlab.caltech.edu Projected performance Ns-2: capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps 100 sources, 100 ms round trip propagation delay ’ ’ ’ ’04 5 ’05 10 J. Wang (Caltech)

netlab.caltech.edu Projected performance Ns-2: capacity = 10Gbps 100 sources, 100 ms round trip propagation delay FAST TCP/RED J. Wang (Caltech)

netlab.caltech.edu Outline  Motivation  Theory TCP/AQM TCP/IP  Experimental results

netlab.caltech.edu HTTP TCP IP Routers Files packets from J. Doyle Protocol decomposition

netlab.caltech.edu Protocol Decomposition Applications TCP/AQM IP Transmission WWW, , Napster, FTP, … Ethernet, ATM, POS, WDM, … Power control Maximize channel capacity Shortest-path routing Minimize path costs Duality model Maximize aggregate utility HOT (Doyle et al) Minimize user response time Heavy-tailed file sizes

netlab.caltech.edu Congestion Control Heavy tail  Mice-elephants Elephant Internet Mice Congestion control efficient & fair sharing small delay queueing + propagation CDN

netlab.caltech.edu Congestion control x i (t)

netlab.caltech.edu Congestion control x i (t) p l (t) Example congestion measure p l (t) Loss (Reno) Queueing delay (Vegas)

netlab.caltech.edu TCP/AQM  Congestion control is a distributed asynchronous algorithm to share bandwidth  It has two components TCP: adapts sending rate (window) to congestion AQM: adjusts & feeds back congestion information  They form a distributed feedback control system Equilibrium & stability depends on both TCP and AQM And on delay, capacity, routing, #connections p l (t) x i (t) TCP: Reno Vegas AQM: DropTail RED REM/PI AVQ

netlab.caltech.edu Network model F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p

netlab.caltech.edu for every RTT { if W/RTT min – W/RTT <  then W ++ if W/RTT min – W/RTT >  then W -- } queue size Vegas model Fi:Fi: Gl:Gl: Link queueing delay E2E queueing delay

netlab.caltech.edu Vegas model F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p

netlab.caltech.edu Methodology Protocol (Reno, Vegas, RED, REM/PI…) Equilibrium Performance Throughput, loss, delay Fairness Utility Dynamics Local stability Cost of stabilization

netlab.caltech.edu Model c1c1 c2c2  Network Links l of capacities c l Sources s L(s) - links used by source s U s (x s ) - utility if source rate = x s x1x1 x2x2 x3x3

netlab.caltech.edu Duality Model of TCP Primal-dual algorithm: Individual optimization

netlab.caltech.edu Duality Model of TCP Primal-dual algorithm: Reno, VegasDropTail, RED, REM Source algorithm iterates on rates Link algorithm iterates on prices With different utility functions

netlab.caltech.edu Duality Model of TCP Primal-dual algorithm: Reno, VegasDropTail, RED, REM (x*,p*) primal-dual optimal if and only if

netlab.caltech.edu Duality Model of TCP Primal-dual algorithm: Reno, VegasDropTail, RED, REM Any link algorithm that stabilizes queue generates Lagrange multipliers solves dual problem

netlab.caltech.edu Gradient algorithm Theorem (ToN’99) Converge to optimal rates in asynchronous environment

netlab.caltech.edu Summary: duality model Flow control problem TCP/AQM Maximize utility with different utility functions Primal-dual algorithm Reno, Vegas DropTail, RED, REM Theorem (Low 00): (x*,p*) primal-dual optimal iff

netlab.caltech.edu Equilibrium of Vegas Network  Link queueing delays: p l  Queue length: c l p l Sources  Throughput: x i  E2E queueing delay : q i  Packets buffered:  Utility funtion: U i (x) =  i d i log x Proportional fairness

netlab.caltech.edu Validation (L. Wang, Princeton) Source rates (pkts/ms) #src1 src2 src3 src4 src (6) (2) 3.92 (4) (0.94) 1.46 (1.49) 3.54 (3.57) (0.50) 0.72 (0.73) 1.34 (1.35) 3.38 (3.39) (0.29) 0.40 (0.40) 0.68 (0.67) 1.30 (1.30) 3.28 (3.34) #queue (pkts)baseRTT (ms) (20) (10.18) (60)13.36 (13.51) (127)20.17 (20.28) (238)31.50 (31.50) (416)49.86 (49.80)

netlab.caltech.edu Persistent congestion  Vegas exploits buffer process to compute prices (queueing delays)  Persistent congestion due to Coupling of buffer & price Error in propagation delay estimation  Consequences Excessive backlog Unfairness to older sources Theorem (Low, Peterson, Wang ’02) A relative error of  i in propagation delay estimation distorts the utility function to

netlab.caltech.edu Validation (L. Wang, Princeton)  Single link, capacity = 6 pkt/ms,  s = 2 pkts/ms, d s = 10 ms  With finite buffer: Vegas reverts to Reno Without estimation errorWith estimation error

netlab.caltech.edu Validation (L. Wang, Princeton) Source rates (pkts/ms) #src1 src2 src3 src4 src (6) (2) 3.92 (4) (0.94) 1.46 (1.49) 3.54 (3.57) (0.50) 0.72 (0.73) 1.34 (1.35) 3.38 (3.39) (0.29) 0.40 (0.40) 0.68 (0.67) 1.30 (1.30) 3.28 (3.34) #queue (pkts)baseRTT (ms) (20) (10.18) (60)13.36 (13.51) (127)20.17 (20.28) (238)31.50 (31.50) (416)49.86 (49.80)

netlab.caltech.edu Methodology Protocol (Reno, Vegas, RED, REM/PI…) Equilibrium Performance Throughput, loss, delay Fairness Utility Dynamics Local stability Cost of stabilization

netlab.caltech.edu TCP/RED stability  Small effect on queue AIMD Mice traffic Heterogeneity  Big effect on queue Stability!

netlab.caltech.edu Stable: 20ms delay Window Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking

netlab.caltech.edu Queue Stable: 20ms delay Window Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking

netlab.caltech.edu Unstable: 200ms delay Window Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking

netlab.caltech.edu Unstable: 200ms delay Queue Window Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking

netlab.caltech.edu Other effects on queue 20ms 200ms 30% noise avg delay 16ms avg delay 208ms

netlab.caltech.edu Stability condition Theorem: TCP/RED stable if  TCP: Small  Small c Large N RED: Small  Large delay

netlab.caltech.edu Theorem (Low et al, Infocom’02) Reno/RED is stable if Stability: Reno/RED F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p TCP: Small  Small c Large N RED: Small  Large delay

netlab.caltech.edu Stability: scalable control F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p Theorem (Paganini, Doyle, Low, CDC’01) Provided R is full rank, feedback loop is locally stable for arbitrary delay, capacity, load and topology

netlab.caltech.edu Stability: Vegas F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p Theorem (Choe & Low, Infocom’03) Provided R is full rank, feedback loop is locally stable if

netlab.caltech.edu Stability: Stabilized Vegas F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p Theorem (Choe & Low, Infocom’03) Provided R is full rank, feedback loop is locally stable if

netlab.caltech.edu Stability: Stabilized Vegas F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p Application  Stabilized TCP with current routers  Queueing delay as congestion measure has right scaling  Incremental deployment with ECN

netlab.caltech.edu Approximate model

netlab.caltech.edu Stabilized Vegas

netlab.caltech.edu Linearized model F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p Additional phase lead

netlab.caltech.edu Outline  Motivation  Theory TCP/AQM TCP/IP  Experimental results

netlab.caltech.edu Protocol Decomposition Applications TCP/AQM IP Transmission WWW, , Napster, FTP, … Ethernet, ATM, POS, WDM, … Power control Maximize channel capacity Shortest-path routing Minimize path costs Duality model Maximize aggregate utility HOT (Doyle et al) Minimize user response time Heavy-tailed file sizes

netlab.caltech.edu Network model F1F1 FNFN G1G1 GLGL R R T TCP Network AQM x y q p Reno, Vegas DT, RED, … IP routing

netlab.caltech.edu Duality model of TCP/AQM Flow control problem TCP/AQM Maximize utility with different utility functions Primal-dual algorithm Reno, Vegas DT, RED, REM/PI, AVQ Theorem (Low 00): (x*,p*) primal-dual optimal iff

netlab.caltech.edu Motivation

netlab.caltech.edu Motivation Can TCP/IP maximize utility? Shortest path routing!

netlab.caltech.edu TCP-AQM/IP Theorem (Wang et al, Infocom’03) Primal problem is NP-hard Proof Reduce integer partition to primal problem Given: integers {c 1, …, c n } Find: set A s.t.

netlab.caltech.edu TCP-AQM/IP Achievable utility of TCP/IP? Stability? Duality gap? Conclusion: Inevitable tradeoff between achievable utility routing stability Theorem (Wang et al, Infocom’03) Primal problem is NP-hard

netlab.caltech.edu Ring network destination r Single destination Instant convergence of TCP/IP Shortest path routing Link cost =  p l (t) +  d l pricestatic TCP/AQM IP r(0)  p l (0) r(1)  p l (1) … r(t), r(t+1), … routing

netlab.caltech.edu Ring network destination r TCP/AQM IP r(0)  p l (0) r(1)  p l (1) … r(t), r(t+1), … Stability: r  ? Utility: V  ? r* : optimal routing V* : max utility

netlab.caltech.edu Ring network destination r Theorem (Infocom 2003) “No” duality gap Unstable if  = 0 starting from any r(0), subsequent r(t) oscillates between 0 and 1 link cost =  p l (t) +  d l Stability: r  ? Utility: V  ?

netlab.caltech.edu Ring network destination r link cost =  p l (t) +  d l Theorem (Infocom 2003) Solve primal problem asymptotically as Stability: r  ? Utility: V  ?

netlab.caltech.edu Ring network destination r link cost =  p l (t) +  d l Theorem (Infocom 2003)  large: globally unstable  small: globally stable  medium: depends on r(0) Stability: r  ? Utility: V  ?

netlab.caltech.edu General network Conclusion: Inevitable tradeoff between achievable utility routing stability random graph 20 nodes, 200 links Achievable utility

netlab.caltech.edu Outline  Motivation  Theory TCP/AQM TCP/IP Non-adaptive sources Content distribution  Implementation  WAN in Lab

netlab.caltech.edu Current protocols  Equilibrium problems Unfairness to connections with large delay At high bandwidth, equilibrium loss probability too small  Unreliable  Stability problems Unstable as delay, or as capacity scales up! Instability causes large slow-timescale oscillations  Long time to ramp up after packet losses  Jitters in rate and delay  Underutilization as queue jumps between empty & high

netlab.caltech.edu Fast AQM Scalable TCP  Equilibrium properties Uses end-to-end delay and loss Achieves any desired fairness, expressed by utility function Very high utilization (99% in theory)  Stability properties Stability for arbitrary delay, capacity, routing & load Robust to heterogeneity, evolution, … Good performance  Negligible queueing delay & loss (with ECN)  Fast response

netlab.caltech.edu Implementation  Sender-side kernel modification  Build on Reno, NewReno, SACK, Vegas New insights  Difficulties due to Effects ignored in theory Large window size First demonstration in SuperComputing Conf, Nov 2002  Developers: Cheng Jin & David Wei  FAST Team & Partners

netlab.caltech.edu Implementation  Packet level effects Burstiness (pacing helps)  Bottlenecks in host Interrupt control Request buffering  Coupling of flow and error control  Rapid convergence  TCP monitor/debugger slow start equilibrium FAST recovery FAST retransmit time out

netlab.caltech.edu Outline  Motivation  Theory TCP/AQM TCP/IP  Experimental results  WAN in Lab

FAST Protocols for Ultrascale Networks netlab.caltech.edu/FAST Internet: distributed feedback control system  TCP: adapts sending rate to congestion  AQM: feeds back congestion information R f (s) R b ’ (s) xy pq TCPAQM Theory Calren2/Abilene Chicago Amsterdam CERN Geneva SURFNet StarLight WAN in Lab Caltech research & production networks Multi-Gbps ms delay Experiment 155Mb/s slow start equilibrium FAST recovery FAST retransmit time out 10Gb/s Implementation Students Choe (Postech/CIT) Hu (Williams) J. Wang (CDS) Z.Wang (UCLA) Wei (CS) Industry Doraiswami (Cisco) Yip (Cisco) Faculty Doyle (CDS,EE,BE) Low (CS,EE) Newman (Physics) Paganini (UCLA) Staff/Postdoc Bunn (CACR) Jin (CS) Ravot (Physics) Singh (CACR) Partners CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco People

netlab.caltech.edu Network (Sylvain Ravot, caltech/CERN)

netlab.caltech.edu FAST BMPS Internet2 Land Speed Record FAST Geneva-Sunnyvale Baltimore-Sunnyvale #flows FAST  Standard MTU  Throughput averaged over > 1hr

netlab.caltech.edu FAST BMPS flowsBmps Peta Thruput Mbps Distance km Delay ms MTU B Duration s Transfer GB Path Alaska- Amsterdam , Fairbanks, AL – Amsterdam, NL MS-ISI ,626-4, MS, WA – ISI, Va Caltech-SLAC , ,5003,600387CERN - Sunnyvale Caltech-SLAC ,79710, ,5003,600753CERN - Sunnyvale Caltech-SLAC ,1233,948851,50021,60015,396Baltimore - Sunnyvale Caltech-SLAC ,9403,948851,5004,0303,725Baltimore - Sunnyvale Caltech-SLAC ,6093,948851,50021,60021,647Baltimore - Sunnyvale Mbps = 10 6 b/s; GB = 2 30 bytes

netlab.caltech.edu Aggregate throughput 1 flow 2 flows 7 flows 9 flows 10 flows Average utilization 95% 92% 90% 88% FAST  Standard MTU  Utilization averaged over > 1hr 1hr 6hr 1.1hr6hr

SCinet Caltech-SLAC experiments netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002 Experiment SunnyvaleBaltimore Chicago Geneva 3000km 1000km 7000km C. Jin, D. Wei, S. Low FAST Team and Partners Internet: distributed feedback system R f (s) R b ’ (s) x p TCP AQM Theory FAST TCP  Standard MTU  Peak window = 14,255 pkts  Throughput averaged over > 1hr  925 Mbps single flow/GE card 9.28 petabit-meter/sec 1.89 times LSR  8.6 Gbps with 10 flows 34.0 petabit-meter/sec 6.32 times LSR  21TB in 6 hours with 10 flows Implementation  Sender-side modification  Delay based Highlights Geneva-Sunnyvale Baltimore-Sunnyvale FAST I2 LSR #flows

netlab.caltech.edu FAST vs Linux TCP flowsBmps Peta Thruput Mbps Distance km Delay ms MTU B Duration s Transfer GB Path Linux TCP txqueulen= , , CERN - Sunnyvale Linux TCP txqueulen= , , CERN - Sunnyvale FAST , , CERN - Sunnyvale Linux TCP txqueulen= , , CERN - Sunnyvale Linux TCP txqueulen= , , CERN - Sunnyvale FAST ,79710, , CERN - Sunnyvale Mbps = 10 6 b/s; GB = 2 30 bytes; Delay = propagation delay Linux TCP expts: Jan 28-29, 2003

netlab.caltech.edu Aggregate throughput Linux TCP Linux TCP FAST Average utilization 19% 27% 92% FAST  Standard MTU  Utilization averaged over 1hr txq=100txq= % 16% 48% Linux TCP Linux TCP FAST 2G 1G

netlab.caltech.edu Effect of MTU (Sylvain Ravot, Caltech/CERN) Linux TCP

netlab.caltech.edu Caltech-SLAC entry Rapid recovery after possible hardware glitch Power glitch Reboot Mbps ACK traffic

SCinet Caltech-SLAC experiments netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002  Prototype C. Jin, D. Wei  Theory D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)  Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S. Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J. Navratil, J. Williams StarLight: T. deFanti, L. Winkler  Major sponsors ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF Acknowledgments

netlab.caltech.edu FAST URL’s  FAST website  Cottrell’s SLAC website /monitoring/bulk/fast

netlab.caltech.edu Outline  Motivation  Theory TCP/AQM TCP/IP Non-adaptive sources Content distribution  Implementation  WAN in Lab

fiber spool OPM Max path length = 10,000 km Max one-way delay = 50ms S S S S R R H R : server : router electronic crossconnect (Cisco 15454) S S S S R R EDFA 500 km

netlab.caltech.edu Unique capabilities  WAN in Lab Capacity: 2.5 – 10 Gbps Delay: 0 – 100 ms round trip  Configurable & evolvable Topology, rate, delays, routing Always at cutting edge  Risky research MPLS, AQM, routing, …  Integral part of R&A networks Transition from theory, implementation, demonstration, deployment Transition from lab to marketplace  Global resource (a) Physical network R1 R2 R

netlab.caltech.edu Unique capabilities  WAN in Lab Capacity: 2.5 – 10 Gbps Delay: 0 – 100 ms round trip  Configurable & evolvable Topology, rate, delays, routing Always at cutting edge  Risky research MPLS, AQM, routing, …  Integral part of R&A networks Transition from theory, implementation, demonstration, deployment Transition from lab to marketplace  Global resource R1 R2 R3 R 10 (b) Logical network

netlab.caltech.edu  WAN in Lab Capacity: 2.5 – 10 Gbps Delay: 0 – 100 ms round trip  Configurable & evolvable Topology, rate, delays, routing Always at cutting edge  Risky research MPLS, AQM, routing, …  Integral part of R&A networks Transition from theory, implementation, demonstration, deployment Transition from lab to marketplace  Global resource Unique capabilities Calren2/Abilene Chicago Amsterdam CERN Geneva SURFNet StarLight WAN in Lab Caltech research & production networks Multi-Gbps ms delay Experiment

netlab.caltech.edu Coming together … Clear & present Need Resources

netlab.caltech.edu Clear & present Need Coming together … Resources

netlab.caltech.edu Clear & present Need Coming together … Resources FAST Protocols

FAST Protocols for Ultrascale Networks netlab.caltech.edu/FAST Internet: distributed feedback control system  TCP: adapts sending rate to congestion  AQM: feeds back congestion information R f (s) R b ’ (s) xy pq TCPAQM Theory Calren2/Abilene Chicago Amsterdam CERN Geneva SURFNet StarLight WAN in Lab Caltech research & production networks Multi-Gbps ms delay Experiment 155Mb/s slow start equilibrium FAST recovery FAST retransmit time out 10Gb/s Implementation Students Choe (Postech/CIT) Hu (Williams) J. Wang (CDS) Z.Wang (UCLA) Wei (CS) Industry Doraiswami (Cisco) Yip (Cisco) Faculty Doyle (CDS,EE,BE) Low (CS,EE) Newman (Physics) Paganini (UCLA) Staff/Postdoc Bunn (CACR) Jin (CS) Ravot (Physics) Singh (CACR) Partners CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco People

netlab.caltech.edu Backup slides

netlab.caltech.edu TCP Congestion States Established Slow Start High Throughput ack for syn/ack cwnd > ssthresh pacing? gamma?

netlab.caltech.edu From Slow Start to High Throughput  Linux TCP handshake differs from the TCP specification  Is 64 KB too small for ssthresh? 1 Gbps x 100 ms = 12.5 MB !  What about pacing?  Gamma parameter in Vegas

netlab.caltech.edu TCP Congestion States Established Slow Start High Throughput FAST’s Retransmit Time-out * 3 dup acks retransmision timer fired

netlab.caltech.edu High Throughput  Update cwnd as follows:  +1 pkts in queue <  + kq’  - 1 otherwise  Packet reordering may be frequent  Disabling delayed ack can generate many dup acks  Is THREE the right number for Gbps?

netlab.caltech.edu TCP Congestion States Established Slow Start High Throughput FAST’s Recovery FAST’s Retransmit 3 dup acks retransmit packet record snd_nxt reduce cwnd/ssthresh snd_una > recorded snd_nxt send packet if in_flight < cwnd

netlab.caltech.edu When Loss Happens  Reduce cwnd/ssthresh only when loss is due to congestion  Maintain in_flight and send data when in_flight < cwnd  Do FAST’s Recovery until snd_una >= recorded snd_nxt

netlab.caltech.edu TCP Congestion States Established Slow Start High Throughput FAST’s Recovery FAST’s Retransmit Time-out * 3 dup acks retransmit packet record snd_nxt reduce cwnd/ssthresh retransmision timer fired

netlab.caltech.edu When Time-out Happens  Very bad for throughput  Mark all unacknowledged pkts as lost and do slow start  Dup acks cause false retransmits since receiver’s state is unknown  Floyd has a “fix” (RFC 2582).

netlab.caltech.edu TCP Congestion States Established Slow Start High Throughput FAST’s Recovery FAST’s Retransmit Time-out * ack for syn/ack cwnd > ssthresh 3 dup acks retransmit packet record snd_nxt reduce cwnd/ssthresh snd_una > recorded snd_nxt retransmision timer fired

netlab.caltech.edu Individual Packet States BirthSendingIn Flight Received QueuedDroppedBuffered FreedDelivered queueing out of order queue and no memory ack’d

SCinet Bandwidth Challenge netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002 Experiment SunnyvaleBaltimore Chicago Geneva 3000km 1000km 7000km C. Jin, D. Wei, S. Low FAST Team and Partners Internet: distributed feedback system R f (s) R b ’ (s) x p TCP AQM Theory IPv flow multiple Baltimore-Geneva Baltimore-Sunnyvale SC flow SC flows SC flows I2 LSR Sunnyvale-Geneva FAST TCP  Standard MTU  Peak window = 14,100 pkts  940 Mbps single flow/GE card 9.4 petabit-meter/sec 1.9 times LSR  9.4 Gbps with 10 flows 37.0 petabit-meter/sec 6.9 times LSR  16TB in 6 hours with 7 flows Implementation  Sender-side modification  Delay based  Stabilized Vegas Highlights

netlab.caltech.edu Baltimore-Sunnyvale Sunnyvale-Geneva multiple IPv flow SC flow SC flows FAST BMPS I2 LSR Bmps Thruput Duration Gbps min Mbps 19 min Gbps 82 sec Mbps 13 sec Mbps 60 min FAST

netlab.caltech.edu FAST: 7 flows Statistics  Data: TB  Distance: 3,936 km  Delay: 85 ms Average  Duration: 60 mins  Thruput: 6.35 Gbps  Bmps: petab-m/s Peak  Duration: 3.0 mins  Thruput: 6.58 Gbps  Bmps: petab-m/s Network  SC2002 (Baltimore)  SLAC (Sunnyvale), GE, Standard MTU 18 Nov 2002 Mon cwnd = 6,658 pkts per flow 17 Nov 2002 Sun

netlab.caltech.edu FAST: single flow Statistics  Data: 273 GB  Distance: 10,025 km  Delay: 180 ms Average  Duration: 43 mins  Thruput: 847 Mbps  Bmps: 8.49 petab-m/s Peak  Duration: 19.2 mins  Thruput: 940 Mbps  Bmps: 9.42 petab-m/s Network  CERN (Geneva)  SLAC (Sunnyvale), GE, Standard MTU 17 Nov 2002 Sun cwnd = 14,100 pkts

SCinet Bandwidth Challenge netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002  Prototype C. Jin, D. Wei  Theory D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)  Experiment/facilities Caltech: J. Bunn, S. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S. Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, L. Cottrell, C. Logg, W. Matthews, R. Mount, J. Navratil StarLight: T. deFanti, L. Winkler  Major sponsors/partners ARO, CACR, Cisco, DataTAG, DoE, Lee Center, Level3, NSF Acknowledgments