Multi-Gbps TCP 9:00-10:00 Harvey Newman (Physics, Caltech) High speed networks & grids 10:00-10:45 Sylvain Ravot (Physics, Caltech/CERN) LHC networks and TCP Grid 10:45-11:00 Discussion 11:00-11:15Break 11:15-12:00 Steven Low (CS/EE, Caltech) TCP/AQM protocols and duality model 12:00-12:15 Dohy Hong (INRIA/Caltech) Synchronization effect of TCP 12:15- 1:00Lunch
Multi-Gbps TCP 1:00-2:00 Fernando Paganini (EE, UCLA) Control theory and stability of TCP/AQM 2:00-2:30Steven Low (CS/EE, Caltech) Stabilized Vegas (& instability of Reno/RED) 2:30-3:00 Zhikui Wang (EE, UCLA) FAST simulations 3:00-3:15Break 3:15-4:00David Wei, Cheng Hu (CS, Caltech) Some related projects and TCP kernel 4:00-4:15Break 4:15-5:00 Discussion
Copyright, 1996 © Dale Carnegie & Associates, Inc. Dualtiy Model of TCP/AQM Steven Low CS & EE, Caltech netlab.caltech.edu 2002
Acknowledgment S. Athuraliya, D. Lapsley, V. Li, Q. Yin (UMelb) S. Adlakha (UCLA), D. Choe (Postech/Caltech), J. Doyle (Caltech), K. Kim (SNU/Caltech), L. Li (Caltech), F. Paganini (UCLA), J. Wang (Caltech) L. Peterson, L. Wang (Princeton)
Part I Protocols
Window Flow Control ~ W packets per RTT Lost packet detected by missing ACK RTT time Source Destination 12W12W12W data ACKs 12W
Source Rate Limit the number of packets in the network to window W Source rate = bps If W too small then rate « capacity If W too big then rate > capacity => congestion
TCP Window Flow Controls Receiver flow control Avoid overloading receiver Set by receiver awnd:receiver (advertised) window Network flow control Avoid overloading network Set by sender Infer available network capacity cwnd: congestion window Set W = min (cwnd, awnd)
Receiver Flow Control Receiver advertises awnd with each ACK Window awnd closed when data is received and ack’d opened when data is read Size of awnd can be the performance limit (e.g. on a LAN) sensible default ~16kB
Network Flow Control Source calculates cwnd from indication of network congestion Congestion indications Losses Delay Marks Algorithms to calculate cwnd Tahoe, Reno, Vegas, RED, REM …
TCP Congestion Controls Tahoe (Jacobson 1988) Slow Start Congestion Avoidance Fast Retransmit Reno (Jacobson 1990) Fast Recovery Vegas (Brakmo & Peterson 1994) New Congestion Avoidance RED (Floyd & Jacobson 1993) Probabilistic marking BLUE (Feng, Kandlur, Saha, Shin 1999) REM/PI (Athuraliya et al 2000/Hollot et al 2001) Clear buffer, match rate AVQ (Kunniyur & Srikant 2001)
TCP/AQM Tahoe (Jacobson 1988) Slow Start Congestion Avoidance Fast Retransmit Reno (Jacobson 1990) Fast Recovery Vegas (Brakmo & Peterson 1994) New Congestion Avoidance BLUE RED (Floyd & Jacobson 1993) REM/PI (Athuraliya et al 2000, Hollot et al 2001) AVQ (Kunniyur & Srikant 2001)
TCP Tahoe (Jacobson 1988) SS time window CA SS: Slow Start CA: Congestion Avoidance
Slow Start Start with cwnd = 1 (slow start) On each successful ACK increment cwnd cwnd cnwd + 1 Exponential growth of cwnd each RTT: cwnd 2 x cwnd Enter CA when cwnd >= ssthresh
Slow Start data packet ACK receiversender 1 RTT cwnd cwnd cwnd + 1 (for each ACK)
Congestion Avoidance Starts when cwnd ssthresh On each successful ACK: cwnd cwnd + 1/cwnd Linear growth of cwnd each RTT: cwnd cwnd + 1
Congestion Avoidance cwnd RTT 4 data packet ACK cwnd cwnd + 1 (for each cwnd ACKS) receiversender
Packet Loss Assumption: loss indicates congestion Packet loss detected by Retransmission TimeOuts (RTO timer) Duplicate ACKs (at least 3) Packets Acknowledgements
Fast Retransmit Wait for a timeout is quite long Immediately retransmits after 3 dupACKs without waiting for timeout Adjusts ssthresh flightsize = min(awnd, cwnd) ssthresh max(flightsize/2, 2) Enter Slow Start (cwnd = 1)
Successive Timeouts When there is a timeout, double the RTO Keep doing so for each lost retransmission Exponential back-off Max 64 seconds 1 Max 12 restransmits Net/3 BSD
Summary: Tahoe Basic ideas Gently probe network for spare capacity Drastically reduce rate on congestion Windowing: self-clocking Other functions: round trip time estimation, error recovery for every ACK { if (W < ssthresh) then W++ (SS) else W += 1/W (CA) } for every loss { ssthresh = W/2 W = 1 }
TCP Reno (Jacobson 1990) CA SS Fast retransmission/fast recovery
Fast recovery Motivation: prevent `pipe’ from emptying after fast retransmit Idea: each dupACK represents a packet having left the pipe (successfully received) Enter FR/FR after 3 dupACKs Set ssthresh max(flightsize/2, 2) Retransmit lost packet Set cwnd ssthresh + ndup (window inflation) Wait till W=min(awnd, cwnd) is large enough; transmit new packet(s) On non-dup ACK (1 RTT later), set cwnd ssthresh (window deflation) Enter CA
Example: FR/FR Fast retransmit Retransmit on 3 dupACKs Fast recovery Inflate window while repairing loss to fill pipe time S R cwnd8 ssthresh Exit FR/FR
Summary: Reno Basic ideas Fast recovery avoids slow start dupACKs: fast retransmit + fast recovery Timeout: fast retransmit + slow start slow start retransmit congestion avoidance FR/FR dupACKs timeout
Active queue management Idea: provide congestion information by probabilistically marking packets Issues How to measure congestion ( p and G )? How to embed congestion measure? How to feed back congestion info? x(t+1) = F( p(t), x(t) ) p(t+1) = G( p(t), x(t) ) Reno, Vegas DropTail, RED, REM
RED (Floyd & Jacobson 1993) Congestion measure: average queue length p l (t+1) = [p l (t) + x l (t) - c l ] + Embedding: p-linear probability function Feedback: dropping or ECN marking Avg queue marking 1
REM (Athuraliya & Low 2000) Congestion measure: price p l (t+1) = [p l (t) + ( l b l (t)+ x l (t) - c l )] + Embedding: Feedback: dropping or ECN marking
REM (Athuraliya & Low 2000) Congestion measure: price p l (t+1) = [p l (t) + ( l b l (t)+ x l (t) - c l )] + Embedding: exponential probability function Feedback: dropping or ECN marking
Match rate Clear buffer and match rate Clear buffer Key features Sum prices Theorem (Paganini 2000) Global asymptotic stability for general utility function (in the absence of delay)
Active Queue Management p l (t) G(p(t), x(t)) DropTailloss [1 - c l / x l (t) ] + (?) REDqueue [ p l (t) + x l (t) - c l ] + Vegasdelay [ p l (t) + x l (t)/c l - 1 ] + REMprice [ p l (t) + l b l (t)+ x l (t) - c l ) ] + x(t+1) = F( p(t), x(t) ) p(t+1) = G( p(t), x(t) ) Reno, Vegas DropTail, RED, REM
Congestion & performance p l (t) G(p(t), x(t)) Renoloss [1 - c l / x l (t) ] + (?) Reno/REDqueue [ p l (t) + x l (t) - c l ] + Reno/REMprice [ p l (t) + l b l (t)+ x l (t) - c l ) ] + Vegasdelay [ p l (t) + x l (t)/c l - 1 ] + Decouple congestion & performance measure RED: `congestion’ = `bad performance’ REM: `congestion’ = `demand exceeds supply’ But performance remains good!
Part II Duality Model
Congestion Control Heavy tail Mice-elephants Elephant Internet Mice Congestion control efficient & fair sharing small delay queueing + propagation CDN
TCP x i (t)
TCP & AQM x i (t) p l (t) Example congestion measure p l (t) Loss (Reno) Queueing delay (Vegas)
Outline Protocol (Reno, Vegas, RED, REM/PI…) Equilibrium Performance Throughput, loss, delay Fairness Utility Dynamics Local stability Cost of stabilization
TCP & AQM x i (t) p l (t) Duality theory equilibrium Source rates x i (t) are primal variables Congestion measures p l (t) are dual variables Flow control is optimization process
TCP & AQM x i (t) p l (t) Control theory stability Internet as a feedback system Distributed & delayed
Outline TCP/AQM Reno/RED Equilibrium Duality model Stability & optimal control Linearized model A scalable control
TCP & AQM x i (t) p l (t) TCP: Reno Vegas AQM: DropTail RED REM/PI AVQ
TCP & AQM x i (t) p l (t) TCP: Reno Vegas AQM: DropTail RED REM/PI AVQ
Model structure F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM Multi-link multi-source network x y q p from F. Paganini
TCP Reno (Jacobson 1990) SS time window CA SS: Slow Start CA: Congestion Avoidance Fast retransmission/fast recovery
TCP Vegas (Brakmo & Peterson 1994) SS time window CA Converges, no retransmission … provided buffer is large enough
queue size for every RTT { if W/RTT min – W/RTT < then W ++ if W/RTT min – W/RTT > then W -- } for every loss W := W/2 Vegas model p l (t+1) = [ p l (t) + y l (t)/c l - 1 ] + Gl:Gl: Fi:Fi:
Vegas model F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p
Overview Protocol (Reno, Vegas, RED, REM/PI…) Equilibrium Performance Throughput, loss, delay Fairness Utility Dynamics Local stability Cost of stabilization
Outline TCP/AQM Reno/RED Equilibrium Duality model Stability Linearized model A scalable control
Flow control Interaction of source rates x s (t) and congestion measures p l (t) Duality theory They are primal and dual variables Flow control is optimization process Example congestion measure Loss (Reno) Queueing delay (Vegas)
Model c1c1 c2c2 Network Links l of capacities c l Sources i L(s) - links used by source i U i (x i ) - utility if source rate = x i x1x1 x2x2 x3x3
Primal problem Assumptions Strictly concave increasing U i Unique optimal rates x i exist Direct solution impractical
Related Work Formulation Kelly 1997 Penalty function approach Kelly, Maulloo & Tan 1998 Kunniyur & Srikant 2000 Duality approach Low & Lapsley 1999 Athuraliya & Low 2000 Extensions Mo & Walrand 2000 La & Anantharam 2000 Dynamics Johari & Tan 2000, Massoulie 2000, Vinnicombe 2000, … Hollot, Misra, Towsley & Gong 2001 Paganini 2000, Paganini, Doyle, Low 2001, …
Duality Approach Primal-dual algorithm:
Duality Model of TCP Primal-dual algorithm: Reno, VegasDropTail, RED, REM Source algorithm iterates on rates Link algorithm iterates on prices With different utility functions
Duality Model of TCP Primal-dual algorithm: Reno, VegasDropTail, RED, REM (x*,p*) primal-dual optimal if and only if
Duality Model of TCP Primal-dual algorithm: Reno, VegasDropTail, RED, REM Any link algorithm that stabilizes queue generates Lagrange multipliers solves dual problem
Gradient algorithm Theorem (Low & Lapsley ’99) Converge to optimal rates in distributed asynchronous environment
Gradient algorithm Vegas: approximate gradient algorithm
Summary: equilibrium Flow control problem TCP/AQM Maximize aggregate source utility With different utility functions Primal-dual algorithm Reno, VegasDropTail, RED, REM
Implications Performance Rate, delay, queue, loss Fairness Utility function Persistent congestion
Performance Delay Congestion measures: end to end queueing delay Sets rate Equilibrium condition: Little’s Law Loss No loss if converge (with sufficient buffer) Otherwise: revert to Reno (loss unavoidable)
Vegas Utility Equilibrium (x, p) = (F, G) Proportional fairness
Validation (L. Wang, Princeton) Source 1Source 3 Source 5 RTT (ms) 17.1 (17) 21.9 (22) 41.9 (42) Rate (pkts/s)1205 (1200) 1228 (1200) 1161 (1200) Window (pkts) 20.5 (20.4) 27 (26.4) 49.8 (50.4) Avg backlog (pkts) 9.8 (10) NS-2 simulation, single link, capacity = 6 pkts/ms 5 sources with different propagation delays, s = 2 pkts/RTT meausred theory
Persistent congestion Vegas exploits buffer process to compute prices (queueing delays) Persistent congestion due to Coupling of buffer & price Error in propagation delay estimation Consequences Excessive backlog Unfairness to older sources Theorem (Low, Peterson, Wang ’02) A relative error of s in propagation delay estimation distorts the utility function to
Validation (L. Wang, Princeton) Single link, capacity = 6 pkt/ms, s = 2 pkts/ms, d s = 10 ms With finite buffer: Vegas reverts to Reno Without estimation errorWith estimation error
Validation (L. Wang, Princeton) Source rates (pkts/ms) #src1 src2 src3 src4 src (6) (2) 3.92 (4) (0.94) 1.46 (1.49) 3.54 (3.57) (0.50) 0.72 (0.73) 1.34 (1.35) 3.38 (3.39) (0.29) 0.40 (0.40) 0.68 (0.67) 1.30 (1.30) 3.28 (3.34) #queue (pkts)baseRTT (ms) (20) (10.18) (60)13.36 (13.51) (127)20.17 (20.28) (238)31.50 (31.50) (416)49.86 (49.80)
Vegas/REM Vegas peak = 43 pkts utilization : 90% - 96%
Outline Protocol (Reno, Vegas, RED, REM/PI…) Equilibrium Performance Throughput, loss, delay Fairness Utility Dynamics Local stability Cost of stabilization
Vegas model F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p
Vegas model F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p
Approximate model
Linearized model F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p controls equilibrium delay
Stability Cannot be satisfied with >1 bottleneck link! Theorem (Choe & Low, ‘02) Locally asymptotically stable if > 0.63
Stabilized Vegas
Linearized model F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p controls equilibrium delay
Linearized model F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p controls equilibrium delay choose a i = a, i =
Stability Theorem (Choe & Low, ‘02) Locally asymptotically stable if example LHS < 10*10 = 100 a = 0.1, = ( a, ) = 120
Stability Theorem (Choe & Low, ‘02) Locally asymptotically stable if Application Stabilized TCP with current routers Queueing delay as congestion measure has the right scaling Incremental deployment with ECN
Papers A duality model of TCP flow controls (ITC, Sept 2000) Optimization flow control, I: basic algorithm & convergence (ToN, 7(6), Dec 1999) Understanding Vegas: a duality model (J. ACM, 2002) Scalable laws for stable network congestion control (CDC, 2001) REM: active queue management (Network, May/June 2001) netlab.caltech.edu
Backup slides
Dynamics Small effect on queue AIMD Mice traffic Heterogeneity Big effect on queue Stability!
Stable: 20ms delay Window Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking
Queue Stable: 20ms delay Window Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking
Unstable: 200ms delay Window Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking
Unstable: 200ms delay Queue Window Ns-2 simulations, 50 identical FTP sources, single link 9 pkts/ms, RED marking
Other effects on queue 20ms 200ms 50% noise avg delay 16ms avg delay 208ms
Effect of instability Larger jitters in throughput Lower utilization No noise
Linearized system F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) TCP Network AQM x y q p
Linearized system F1F1 FNFN G1G1 GLGL R f (s) R b ’ (s) x y q p TCPREDqueuedelay
Stability condition Theorem: TCP/RED stable if TCP: Small Small c Large N RED: Small Large delay
Validation
Stability region Unstable for Large delay Large capacity Small load
Role of AQM (Sufficient) stability condition
Role of AQM (Sufficient) stability condition TCP: AQM: scale down K & shape H
Role of AQM TCP: RED:REM/PI: Queue dynamics (windowing) Problem!!
Role of AQM To stabilize a given TCP F i with least cost Linearized F, single link, identical sources Any AQM solves optimal control with appropriate Q and R
Role of AQM To stabilize a given TCP F i with least cost Linearized F, single link, identical sources k 1 =0 Nonzero buffer k 3 =0 Slower decay