Internet congestion control Damon Wischik, UCL TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAA
What is queueing theory for? DJW 2006 2/24 Given the arrival process, what is We often seek limit theorems to describe the arrival process, and to approximate this probability Typical motivation: “this lets us choose buffer size etc. to ensure that queue overflow is rare” This does not make sense for Internet queues, since TCP (which governs most Internet traffic) deliberately causes the queue to overflow arrival process queue Example: for a queue with N independent flows, each self-similar with Hurst parameter H, then for some , >0
Some Internet history DJW 2006 3/24 1974: First draft of TCP/IP [“A protocol for packet network interconnection”, Vint Cerf and Robert Kahn] 1983: ARPANET switches on TCP/IP 1986: Congestion collapse 1988: Congestion control for TCP [“Congestion avoidance and control”, Van Jacobson]
TCP transmission control protocol DJW 2006 4/24 Internet congestion is controlled by the end-systems The TCP algorithm, running on the server, decides how fast to send data It sends data packets, and waits for ACKnowledgement packets It sets a target window size wt the number of packets that have been sent but not yet acknowledged (here 5+3) The average transmission rate is xt ≈ wt/RTT where RTT is the round trip time It adapts wt based on the ACKs it receives acknowledgements End-to-end argument: Generally, _some_ end2end control is necessary. It’s probably also sufficient. So it’s simplest/most elegant to use only end2end control. (It may not be most efficient.) Server (TCP) User data
TCP transmission rate [0–100 kB/sec] time [0–8 sec] if (seqno > _last_acked) { if (!_in_fast_recovery) { _last_acked = seqno; _dupacks = 0; inflate_window(); send_packets(now); _last_sent_time = now; return; } if (seqno < _recover) { uint32_t new_data = seqno - _last_acked; if (new_data < _cwnd) _cwnd -= new_data; else _cwnd=0; _cwnd += _mss; retransmit_packet(now); uint32_t flightsize = _highest_sent - seqno; _cwnd = min(_ssthresh, flightsize + _mss); _in_fast_recovery = false; if (_in_fast_recovery) { _dupacks++; if (_dupacks!=3) { _ssthresh = max(_cwnd/2, (uint32_t)(2 * _mss)); _cwnd = _ssthresh + 3 * _mss; _in_fast_recovery = true; _recover = _highest_sent; DJW 2006 5/24 transmission rate [0–100 kB/sec] Multiplicative decrease -- when network is congested, load grows exponentially (transmit, retransmit, reretransmit...) Additive increase -- helps with fairness (of relatively greater benefit to low-bandwidth users) Devised by Jain+Ramakrishnan (1988) DECbit time [0–8 sec]
How TCP shares capacity DJW 2006 6/24 individual flow rates available capacity aggregate flow rate time
How to describe a network DJW 2006 7/24 Some networks admit models at three different levels: Microscopic rules of behaviour for the individual elements e.g. rules of motion of electrons Macroscopic laws that describe the behaviour of aggregates e.g. Kirchhoff’s law, Ohm’s law Teleological principles that describe the operating point to which the system settles down e.g. Thomson’s principle
TCP model I: the sawtooth DJW 2006 8/24 Consider a single link, and a single flow with round trip time RTT When the link is not congested, transmission rate xt increases by 1/RTT i2 pkt/sec every sec When the link becomes congested, transmission rate is cut by 50% service capacity transmission rate x(t) * * * buffer size queue size time queue becomes full and packets are dropped TCP realizes there is congestion, and cuts its rate
TCP model II: billiards DJW 2006 9/24 Consider a single link, and many flows with round trip times RTT(1), RTT(2),... When the link is not saturated, flow r increases its transmission rate xt(r) by 1/RTT(r) pkt/sec2 When the link becomes saturated, some flows experience packet drops and they cut their rates Assume some probabilistic model that says which flows experience drops [Baccelli+Hong 2002] + + = * * * * * * *
TCP model III: fluid + + + + = = DJW 2006 10/24 Individual TCP flows follow a sawtooth + + individual flow rates + + = = aggregate flow rate time
TCP model III: fluid + + + + = = DJW 2006 11/24 Individual TCP flows follow a sawtooth ... but many TCP flows added together are smoother the average rate xt varies according to a time-delayed differential equation eqn involves pt, the packet drop probability experienced by packets sent at time t [Misra+Gong+Towsley 2000, Baccelli+McDonald+Reynier 2002, after Kelly et al. 1998] + + individual flow rates + + = = aggregate flow rate time
TCP model III: fluid DJW 2006 12/24 There are three possible limit models for describing the behaviour of the queue and calculating the drop probability TCP selects a limiting model, depending on the size of the buffer Round trip time RTT N TCP flows Service rate NC Buffer size B(N)
Three buffer-sizing regimes DJW 2006 13/24 small buffers B(N)=B medium buffers B(N)=B√N large buffers B(N)=BN drop probability [0–.3] queue size [0–B pkt] traffic intensity [0–1] time [0–5s] queue size [(B-25)–B pkt] time [50ms] These three regimes are qualitatively different; they are described by different dynamical systems, operating over different timescales The relationship between buffer size and timescales was first observed by [Deb+Srikant, 2004]
Instability / synchronization DJW 2006 14/24 Sometimes the dynamical system for the network is unstable, and the queue size flip-flops This means there are short periods of concentrated packet loss This might manifest itself as a burst of static in an Internet telephone call In fact, this corresponds to the billiards model for TCP Engineers can use the dynamical system description of TCP to help them design the network so that it is stable queue size [0–619pkt] TCP window sizes [0-25pkt] time [0–5s]
Equilibrium analysis DJW 2006 15/24 A well-designed network should be stable, i.e. it should reach some equilibrium state How can we characterize the equilibrium state?
Teleological description DJW 2006 16/24 Round trip time RTT N TCP flows; flow r has rate xr Service rate C Medium buffer size The equilibrium state is the solution to the optimization problem U(x) = const - 2 / RTT^2 x^2 x U(x)
Teleological description DJW 2006 17/24 Round trip time RTT N TCP flows; flow r has rate xr Service rate C Buffer size medium little extra value attached to high-throughput flows The equilibrium state is the solution to the optimization problem U(x) = const - 2 / RTT^2 x^2 U(x) severe penalty for allocating too little throughput x
Teleological description DJW 2006 18/24 Round trip time RTT N TCP flows; flow r has rate xr Service rate C Buffer size medium flows with large RTT are satisfied with just a little throughput The equilibrium state is the solution to the optimization problem U(x) = const - 2 / RTT^2 x^2 U(x) flows with small RTT demand higher throughput x
Teleological description DJW 2006 19/24 Round trip time RTT N TCP flows; flow r has rate xr Service rate C We should thus set up caches very close to the user, even if that means putting them in a congested part of the network Buffer size medium flows with large RTT are satisfied with just a little throughput The equilibrium state is the solution to the optimization problem U(x) = const - 2 / RTT^2 x^2 U(x) flows with small RTT demand higher throughput x
Teleological description DJW 2006 20/24 Round trip time RTT N TCP flows; flow r has rate xr Service rate C Buffer size medium The equilibrium state is the solution to the optimization problem U(x) = const - 2 / RTT^2 x^2 x U(x) there is no penalty until the link is overloaded, i.e. y>C The network seeks to be overloaded, leading to poor quality for Internet telephone calls
Maths conclusion TCP chooses its own limit regime DJW 2006 21/24 TCP chooses its own limit regime TCP, through its adaptive feedback mechanism, induces a fluid limit over the timescale of a round trip time The queue dynamics change fluidly over the timescale of a round trip time, and stochastically over much shorter timescales The choice of buffer size determines which of three queueing models is relevant
Design conclusion The TCP code admits description at two higher levels DJW 2006 22/24 The TCP code admits description at two higher levels a dynamical system model at the macroscopic level a welfare optimization problem at the teleological level This gives us insight into how to extend TCP make sure the dynamical system is stable choose a more desirable optimization problem These insights have led to new ideas about routing, load balancing, choice of buffer size in core routers, quality of service, high-speed TCP