FAST TCP Steven Low CS/EE netlab.CALTECH.edu Oct 2003
Congestion Control & Routing Steven Low netlab.CALTECH.edu Nov 2002
Can TCP/IP Maximize Utility Jiantao Wang Lun Li Steven Low John Doyle netlab.CALTECH.edu Nov 2002
FAST Protocols for Ultrascale Networks netlab.caltech.edu/FAST Internet: distributed feedback control system TCP: adapts sending rate to congestion AQM: feeds back congestion information R f (s) R b (s) xy pq TCPAQM Theory Calren2/Abilene Chicago Amsterdam CERN Geneva SURFNet StarLight WAN in Lab Caltech research & production networks Multi-Gbps ms delay Experiment Students Choe (Postech/CIT) Hu (Williams) J. Wang (CDS) Z.Wang (UCLA) Wei (CS) Industry Doraiswami (Cisco) Yip (Cisco) Faculty Doyle (CDS,EE,BE) Low (CS,EE) Newman (Physics) Paganini (UCLA) Staff/Postdoc Bunn (CACR) Jin (CS) Ravot (Physics) Singh (CACR) Partners CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco People 155Mb/s slow start equilibrium FAST recovery FAST retransmit time out 10Gb/s Implementation
netlab.caltech.edu Outline Motivation Network model FAST TCP Equilibrium Stability Experiments TCP/IP Applications TCP/AQM IP Transmission WWW, , Napster, FTP, … Ethernet, ATM, POS, WDM, …
netlab.caltech.edu High Energy Physics Large global collaborations 2000 physicists from 150 institutions in >30 countries physicists in US from >30 universities & labs SLAC has 500TB data by 4/2002, worlds largest database Typical file transfer ~1 TB At 622Mbps: ~ 4 hrs At 2.5Gbps: ~ 1 hr At 10Gbps: ~15min Gigantic elephants! LHC (Large Hadron Collider) at CERN, to open 2007 Generate data at PB (10 15 B)/sec Filtered in realtime by a factor of 10 6 to 10 7 Data stored at CERN at 100MB/sec Many PB of data per year To rise to Exabytes (10 18 B) in a decade
netlab.caltech.edu HEP high speed network … that must change
netlab.caltech.edu HEP Network (DataTAG) NL SURFnet GENEVA UK SuperJANET4 ABILEN E ESNET CALRE N It GARR-B GEANT NewYork Fr Renater STAR-TAP STARLIGHT Wave Triangle 2.5 Gbps Wavelength Triangle Gbps Triangle in 2003 Newman (Caltech)
netlab.caltech.edu Performance at large windows ns-2 simulation 10Gbps capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps; 100 ms round trip latency; 100 flows J. Wang (Caltech, June 02) 27% txq=100txq= % 1G Linux TCP Linux TCP FAST 19% average utilization capacity = 1Gbps; 180 ms round trip latency; 1 flow C. Jin, D. Wei, S. Ravot, etc (Caltech, Nov 02) DataTAG Network: CERN (Geneva) – StarLight (Chicago) – SLAC/Level3 (Sunnyvale) txq=100
netlab.caltech.edu Network upgrade
netlab.caltech.edu Projected performance Ns-2: capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps 100 sources, 100 ms round trip propagation delay J. Wang (Caltech)
netlab.caltech.edu Outline Motivation Network model FAST TCP Equilibrium Stability Experiments TCP/IP Applications TCP/AQM IP Transmission WWW, , Napster, FTP, … Ethernet, ATM, POS, WDM, …
netlab.caltech.edu Congestion Control ~ W packets per RTT Lost packet detected by missing ACK Congestion signal: delay and loss RTT time Source Destination 12W12W12W data ACKs 12W
netlab.caltech.edu Congestion control x i (t) p l (t) Example congestion measure p l (t) Loss (Reno) Queueing delay (Vegas)
netlab.caltech.edu TCP/AQM Congestion control is a distributed asynchronous algorithm to share bandwidth It has two components TCP: adapts sending rate (window) to congestion AQM: adjusts & feeds back congestion information They form a distributed feedback control system Equilibrium & stability depends on both TCP and AQM And on delay, capacity, routing, #connections p l (t) x i (t) TCP: Reno Vegas AQM: DropTail RED REM/PI AVQ
netlab.caltech.edu Network model c1c1 c2c2 Network Links l of capacities c l Sources s L(s) - links used by source s U s (x s ) - utility if source rate = x s x1x1 x2x2 x3x3
netlab.caltech.edu Network model F1F1 FNFN G1G1 GLGL R f (s) R b (s) TCP Network AQM x y q p
netlab.caltech.edu for every RTT { if W/RTT min – W/RTT < then W ++ if W/RTT min – W/RTT > then W -- } queue size Vegas model Fi:Fi: Gl:Gl: Link queueing delay E2E queueing delay
netlab.caltech.edu Vegas model F1F1 FNFN G1G1 GLGL R f (s) R b (s) TCP Network AQM x y q p
netlab.caltech.edu Outline Motivation Network model FAST TCP Equilibrium Stability Experiments TCP/IP Applications TCP/AQM IP Transmission WWW, , Napster, FTP, … Ethernet, ATM, POS, WDM, …
netlab.caltech.edu Methodology Protocol (Reno, Vegas, RED, REM/PI…) Equilibrium Performance Throughput, loss, delay Fairness Utility Dynamics Local stability Cost of stabilization
netlab.caltech.edu Model c1c1 c2c2 Network Links l of capacities c l Sources s L(s) - links used by source s U s (x s ) - utility if source rate = x s x1x1 x2x2 x3x3
netlab.caltech.edu Summary: duality model Flow control problem (Kelly, Malloo, Tan 98) TCP/AQM Maximize utility with different utility functions Primal-dual algorithm Reno, Vegas DropTail, RED, REM Result (L 00): (x*,p*) primal-dual optimal iff
netlab.caltech.edu Example utility functions
netlab.caltech.edu Game interpretation Source s : Link l :
netlab.caltech.edu Synchronous convergence Theorem (L & Lapsley 99) Provided R has full row rank & U s strictly concave: Gradient projection algorithm of dual problem Converges to optimal primal-dual solutions if Limit point: unique Pareto optimal Nash equilibrium
netlab.caltech.edu Asynchronous convergence Sources and links update & compute at different times with different frequencies using delayed info Theorem (L & Lapsley 99) Converges in asynchronous environment with smaller
netlab.caltech.edu Persistent congestion Vegas exploits buffer process to compute prices (queueing delays) Persistent congestion due to Coupling of buffer & price Error in propagation delay estimation Consequences Excessive backlog Unfairness to older sources Theorem (Low, Peterson, Wang 02) A relative error of i in propagation delay estimation distorts the utility function to
netlab.caltech.edu Equilibrium of Vegas Network Link queueing delays: p l Queue length: c l p l Sources Throughput: x i E2E queueing delay : q i Packets buffered: Utility funtion: U i (x) = i d i log x Proportional fairness
netlab.caltech.edu Validation (L. Wang, Princeton) Single link, capacity = 6 pkt/ms, s = 2 pkts/ms, d s = 10 ms With finite buffer: Vegas reverts to Reno Without estimation errorWith estimation error
netlab.caltech.edu Validation (L. Wang, Princeton) Source rates (pkts/ms) #src1 src2 src3 src4 src (6) (2) 3.92 (4) (0.94) 1.46 (1.49) 3.54 (3.57) (0.50) 0.72 (0.73) 1.34 (1.35) 3.38 (3.39) (0.29) 0.40 (0.40) 0.68 (0.67) 1.30 (1.30) 3.28 (3.34) #queue (pkts)baseRTT (ms) (20) (10.18) (60)13.36 (13.51) (127)20.17 (20.28) (238)31.50 (31.50) (416)49.86 (49.80)
netlab.caltech.edu Methodology Protocol (Reno, Vegas, RED, REM/PI…) Equilibrium Performance Throughput, loss, delay Fairness Utility Dynamics Local stability Cost of stabilization
netlab.caltech.edu Theorem (Low et al, Infocom02) Reno/RED is locally stable if Stability: Reno/RED F1F1 FNFN G1G1 GLGL R f (s) R b (s) TCP Network AQM x y q p TCP: Small Small c Large N RED: Small Large delay
netlab.caltech.edu Stability: scalable control F1F1 FNFN G1G1 GLGL R f (s) R b (s) TCP Network AQM x y q p Theorem (Paganini, Doyle, L, CDC01) Provided R is full rank, feedback loop is locally stable for arbitrary delay, capacity, load and topology
netlab.caltech.edu Stability: Stabilized Vegas F1F1 FNFN G1G1 GLGL R f (s) R b (s) TCP Network AQM x y q p Theorem (Choe & L, Infocom03) Provided R is full rank, feedback loop is locally stable if
netlab.caltech.edu Stability: Stabilized Vegas F1F1 FNFN G1G1 GLGL R f (s) R b (s) TCP Network AQM x y q p Theorem (Choe & L, Infocom03) Provided R is full rank, feedback loop is locally stable if
netlab.caltech.edu Stability: FAST F1F1 FNFN G1G1 GLGL R f (s) R b (s) TCP Network AQM x y q p Application Stabilized TCP with current routers Queueing delay as congestion measure has right scaling Incremental deployment with ECN
netlab.caltech.edu Outline Motivation Network model FAST TCP Equilibrium Stability Experiments TCP/IP Applications TCP/AQM IP Transmission WWW, , Napster, FTP, … Ethernet, ATM, POS, WDM, …
netlab.caltech.edu Window control algorithm Theorem (Jin, Wei, L 03) In absence of delay Mapping from w(t) to w(t+1) is contraction Global exponential convergence Full utilization after finite time Utility function: i log x i (proportional fairness)
netlab.caltech.edu Network (Sylvain Ravot, caltech/CERN)
netlab.caltech.edu FAST BMPS Internet2 Land Speed Record FAST Geneva-Sunnyvale Baltimore-Sunnyvale #flows FAST Standard MTU Throughput averaged over > 1hr
netlab.caltech.edu FAST BMPS flowsBmps Peta Thruput Mbps Distance km Delay ms MTU B Duration s Transfer GB Path Alaska- Amsterdam , Fairbanks, AL – Amsterdam, NL MS-ISI ,626-4, MS, WA – ISI, Va Caltech-SLAC , ,5003,600387CERN - Sunnyvale Caltech-SLAC ,79710, ,5003,600753CERN - Sunnyvale Caltech-SLAC ,1233,948851,50021,60015,396Baltimore - Sunnyvale Caltech-SLAC ,9403,948851,5004,0303,725Baltimore - Sunnyvale Caltech-SLAC ,6093,948851,50021,60021,647Baltimore - Sunnyvale Mbps = 10 6 b/s; GB = 2 30 bytes
netlab.caltech.edu Aggregate throughput 1 flow 2 flows 7 flows 9 flows 10 flows Average utilization 95% 92% 90% 88% FAST Standard MTU Utilization averaged over > 1hr 1hr 6hr 1.1hr6hr
netlab.caltech.edu FAST vs Linux TCP flowsBmps Peta Thruput Mbps Distance km Delay ms MTU B Duration s Transfer GB Path Linux TCP txqueulen= , , CERN - Sunnyvale Linux TCP txqueulen= , , CERN - Sunnyvale FAST , , CERN - Sunnyvale Linux TCP txqueulen= , , CERN - Sunnyvale Linux TCP txqueulen= , , CERN - Sunnyvale FAST ,79710, , CERN - Sunnyvale Mbps = 10 6 b/s; GB = 2 30 bytes; Delay = propagation delay Linux TCP expts: Jan 28-29, 2003
netlab.caltech.edu Aggregate throughput Linux TCP Linux TCP FAST Average utilization 19% 27% 92% FAST Standard MTU Utilization averaged over 1hr txq=100txq= % 16% 48% Linux TCP Linux TCP FAST 2G 1G
SCinet Caltech-SLAC experiments netlab.caltech.edu/FAST SC2002 Baltimore, Nov 2002 Acknowledgments Prototype C. Jin, D. Wei Theory D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA) Experiment/facilities Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S. Ravot (Caltech/CERN), S. Singh CERN: O. Martin, P. Moroni Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip DataTAG: E. Martelli, J. P. Martin-Flatin Internet2: G. Almes, S. Corbato Level(3): P. Fernes, R. Struble SCinet: G. Goddard, J. Patton SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J. Navratil, J. Williams StarLight: T. deFanti, L. Winkler Major sponsors ARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF
netlab.caltech.edu New Internet tech 153,000 times faster than modem March 18, 2003 Software breaks data-transfer record March 27, 2003 Connections that leaves broadband in the dust April 7, 2003 Press on FAST FAST protocol supercharges networks March 27, 2003 (UK) >50 articles, 10 countries Pushing the speed limit (Space.com) April 9, 2003 Technology Quarterly June 21, 2003 Goodbye net gridlock June, 2003 My Point of View June, 2003 June 5, 2003c New system can speed up web downloads June 5, 2003c Promise of ultra-fast downloads June 5, 2003c June 5, 2003
netlab.caltech.edu Dynamic sharing: 3 flows FASTLinux Dynamic sharing on Dummynet capacity = 800Mbps delay=120ms 3 flows iperf throughput Linux 2.4.x (HSTCP: UCL)
netlab.caltech.edu Dynamic sharing: 3 flows FASTLinux HSTCPSTCP Steady throughput
netlab.caltech.edu FASTLinux throughput loss queue STCPHSTCP Dynamic sharing on Dummynet capacity = 800Mbps delay=120ms 14 flows iperf throughput Linux 2.4.x (HSTCP: UCL) 30min
netlab.caltech.edu FASTLinux throughput loss queue STCPHSTCP 30min Room for mice ! HSTCP
netlab.caltech.edu Outline Motivation Network model FAST TCP Equilibrium Stability Experiments TCP/IP Applications TCP/AQM IP Transmission WWW, , Napster, FTP, … Ethernet, ATM, POS, WDM, …
netlab.caltech.edu Protocol Decomposition Applications TCP/AQM IP Transmission WWW, , Napster, FTP, … Ethernet, ATM, POS, WDM, … Power control Maximize channel capacity Shortest-path routing Minimize path costs Duality model (Kelly, Low et al) Maximize aggregate utility HOT (Doyle et al) Minimize user response time Heavy-tailed file sizes
netlab.caltech.edu Network model F1F1 FNFN G1G1 GLGL R R T TCP Network AQM x y q p Reno, Vegas DT, RED, … IP routing
netlab.caltech.edu Motivation
netlab.caltech.edu Motivation Can TCP/IP maximize utility? Shortest path routing!
netlab.caltech.edu TCP-AQM/IP Theorem (Wang, et al 03) Primal problem is NP-hard Proof Reduce integer partition to primal problem Given: integers {c 1, …, c n } Find: set A s.t.
netlab.caltech.edu TCP-AQM/IP Theorem (Wang, et al 03) Primal problem is NP-hard Achievable utility of TCP/IP? Stability? Duality gap? Conclusion: Inevitable tradeoff between achievable utility routing stability
netlab.caltech.edu Ring network destination r Single destination Instant convergence of TCP/IP Shortest path routing Link cost = p l (t) + d l pricestatic TCP/AQM IP r(0) p l (0) r(1) p l (1) … r(t), r(t+1), … routing
netlab.caltech.edu Ring network destination r TCP/AQM IP r(0) p l (0) r(1) p l (1) … r(t), r(t+1), … Stability: r ? Utility: V ? r* : optimal routing V* : max utility
netlab.caltech.edu Ring network destination r Theorem (Infocom 2003) No duality gap Unstable if = 0 starting from any r(0), subsequent r(t) oscillates between 0 and 1 link cost = p l (t) + d l Stability: r ? Utility: V ?
netlab.caltech.edu Ring network destination r link cost = p l (t) + d l Theorem (Infocom 2003) Solve primal problem asymptotically as Stability: r ? Utility: V ?
netlab.caltech.edu Ring network destination r link cost = p l (t) + d l Theorem (Infocom 2003) large: globally unstable small: globally stable medium: depends on r(0) Stability: r ? Utility: V ?
netlab.caltech.edu General network Conclusion: Inevitable tradeoff between achievable utility routing stability random graph 20 nodes, 200 links Achievable utility
netlab.caltech.edu FAST TCP: motivation, architecture, algorithms, performance. submitted for publication, July 1, release: August 2003 Inquiry: FAST Project Review Caltech, Oct 27-28, 2003 netlab.caltech.edu/FAST