Introduction to Congestion Control
How does a network become congested? Suppose a router is overloaded Transmission queue grows (overflows)…. …until packets get delayed very long or get lost This causes retransmissions due to timeouts or loss detections Retransmissions increase traffic … …. so delays and losses at overloaded router increase overflow
What happens during congestion? packet loss Knee Cliff Knee – Point after which throughput increases very slow, but delay increases fast Cliff – Point after which throughput starts to decrease to zero (congestion collapse) and delays grow to infinity Good goal: Operate network near the “Knee” Throughput congestion collapse Load Delay Load
Congestion control as a feedback system Congestion control problem can be seen as a feedback system switch switch Source Network switches sense state (load) and feeds state back to traffic sources Sources adjust traffic rate Reduce traffic if load is high Increase traffic if load is low
Congestion control as a feedback system switch switch Source Network switches sense state (load) and feeds state back to traffic sources Sources adjust traffic rate Issues to be addressed: 1. When to send feedback? 2. How to send feedback? 3. How to adjust rate?
How to detect congestion? Explicit network signal Set bit in header of a packet when it encounters congestion Receiver returns feedback signal Implicit network signal Acknowledgement for new data is interpreted as no congestion Packet loss is seen as sign of congestion TCP uses implicit congestion control signals (explicit signals exist, but are hardly used)
Objectives of congestion control algorithm Fairness: All sources should be treated “fairly” Efficiency: Network resources should be well utilized Convergence: Network should quickly converge to desired load level Load should not oscillate Distributedness: No entity has complete knowledge Sources do not communicate with each other
Binary congestion control Network Model: Discrete time: t=1,2,3,… (feedback takes one time unit) xi(t) : load from source i at time t Network is represented as a single resource ( “bottleneck resource”). Xgoal is desired load level at “Knee” y(t): binary feedback at time t y(t)=0: No congestion (increase load) y(t) = 1: Congestion (decrease load) x1 Source 1 Network x2 xi>Xgoal Source 2 xn Source n y Binary congestion control is widely used by TCP
Adjusting the Rate Multiplicative increase, additive decrease aI=0, bI>1, aD<0, bD=1 Additive increase, additive decrease aI>0, bI=1, aD<0, bD=1 Multiplicative increase, multiplicative decrease aI=0, bI>1, aD=0, 0<bD<1 Additive increase, multiplicative decrease (AIMD) aI>0, bI=1, aD=0, 0<bD<1 Which one?
Operating Point Operating point is at the intersection of fairness and efficiency lines Fairness line User 2: x2 Optimal operating point Efficiency line User 1: x1
Additive changes Additive changes: change move in a 45o angle CS757 (x1+aI,x2 +aI) Fairness line (x1,x2) User 2: x2 (x1,x2) Efficiency line (x1+aD,x2+aD) User 1: x1 CS757 © Jörg Liebeherr, 2000-2003
Multiplicative changes (bI x1, bIx2) Multiplicative changes: change move along a line through the current point and the origin (x1,x2) User 2: x2 (x1, x2) (bDx1, bDx2) User 1: x1
Multiplicative Increase, Additive Decrease fairness line Does not converge to fairness Does not converges to efficiency Reaches equilibrium iff (bI(x1+aD), bI(x2+aD)) (x1,x2) (x1+aD,x2+aD) User 2: x2 efficiency line User 1: x1
Additive Increase, Additive Decrease Does not converge to fairness Does not converge to efficiency Reaches equilibrium iff fairness line (x1+aD+aI), x2+aD+aI) (x1,x2) (x1+aD,x2+aD) User 2: x2 efficiency line User 1: x1
Multiplicative Increase, Multiplicative Decrease Does not converge to fairness Stable Converges to efficiency fairness line (x1,x2) (bIbDx1, bIbDx2) (bdx1,bdx2) User 2: x2 efficiency line User 1: x1
Additive Increase, Multiplicative Decrease (AIMD) Converges to fairness Converges to efficiency Increments smaller as fairness increases fairness line (x1,x2) (bDx1,bDx2) (bDx1+aI,bDx2+aI) User 2: x2 efficiency line User 1: x1
Importance of AIMD Characteristics Only needs binary feedback information Converges to efficiency and fairness Empirical evidence shows very good performance Performance degrades if delays are very long “Proportional fairness” in a general network AIMD-style congestion control used in most transport protocol Implementation of TCP congestion control introduced AIMD to Internet protocols Still basis for TCP
TCP Congestion Control
TCP Congestion Control TCP has binary congestion control: ACK received no congestion RTO Timeout congestion The mechanism is implemented at the sender by setting a congestion window The window size at the sender is set as follows: Send Window = MIN (flow control window, congestion window) where flow control window is advertised by the receiver congestion window is adjusted based on congestion information
TCP Congestion Control TCP congestion control is governed by two parameters: Congestion Window (cwnd) Slow-start threshhold Value (ssthresh) Initial value is advertised window Congestion control works in two modes: slow start (cwnd < ssthresh) congestion avoidance (cwnd ≥ ssthresh
Summary of TCP congestion control Initially: cwnd = 1; ssthresh = advertised window size; New Ack received: if (cwnd < ssthresh) /* Slow Start*/ cwnd = cwnd + 1; else /* Congestion Avoidance */ cwnd = cwnd + 1/cwnd; Timeout: /* Multiplicative decrease */ ssthresh = cwnd/2; Additive increase Multiplicative decrease
Slow Start Example The congestion window size grows very rapidly For every ACK, we increase cwnd by 1 irrespective of the number of segments ACK’ed TCP slows down the increase of cwnd when cwnd > ssthresh
Congestion Avoidance Congestion avoidance phase is started if cwnd has reached the slow-start threshold value If cwnd ≥ ssthresh then each time an ACK is received, increment cwnd as follows: cwnd = cwnd + 1/cwnd Then cwnd is (roughly) increased by one if all cwnd segments have been acknowledged.
Example of Slow Start/Congestion Avoidance Assume that ssthresh = 8 ssthresh Cwnd (in segments) Roundtrip times
Responses to Congestion So, TCP assumes there is congestion if it detects a packet loss A TCP sender can detect lost packets via: Timeout of a retransmission timer Receipt of a duplicate ACK TCP interprets a Timeout as a binary congestion signal. When a timeout occurs, the sender performs: cwnd is reset to one: cwnd = 1 ssthresh is set to half the current size of the congestion window: ssthressh = cwnd / 2 and slow-start is entered
Summary of TCP congestion control Initially: cwnd = 1; ssthresh = advertised window size; New Ack received: if (cwnd < ssthresh) /* Slow Start*/ cwnd = cwnd + 1; else /* Congestion Avoidance */ cwnd = cwnd + 1/cwnd; Timeout: /* Multiplicative decrease */ ssthresh = cwnd/2;
Slow Start / Congestion Avoidance A plot of cwnd for a TCP connection (MSS = 1500 bytes) with slow start and congestion avoidance:
Flavors of TCP Congestion Control TCP Tahoe (1988, FreeBSD 4.3 Tahoe) Fast Retransmit TCP Reno (1990, FreeBSD 4.3 Reno) Fast Retransmit/ Fast Recovery New Reno (1996) BIC (2005) CUBIC (2008) Compound TCP or CTPC (2008) Based on estimates of queueing delays Many more: ECN, RED, Fast TCP Linux and Mac sysctl -a Windows
Acknowledgments in TCP Receiver sends ACK to sender ACK is used for flow control, error control, and congestion control ACK number sent is the next sequence number expected Lost segment
Acknowledgments in TCP Receiver sends ACK to sender ACK is used for flow control, error control, and congestion control ACK number sent is the next sequence number expected Out-of-order arrivals
Fast Retransmit If three or more duplicate ACKs are received in a row, the TCP sender believes that a segment has been lost. Then TCP performs a retransmission of what seems to be the missing segment, without waiting for a timeout to happen. Enter slow start: ssthresh = cwnd/2 cwnd = 1
Fast Retransmit / Fast Recovery Fast recovery avoids slow start after a fast retransmit Intuition: Duplicate ACKs indicate that data is getting through After three duplicate ACKs set: Retransmit packet that is presumed lost ssthresh = cwnd/2 cwnd = ssthresh+3 (note the order of operations) Increment cwnd by one for each additional duplicate ACK When ACK arrives that acknowledges “new data” (here: AckNo=6148), set: cwnd=ssthresh enter congestion avoidance
TCP Reno Duplicate ACKs: Fast retransmit Fast recovery Fast Recovery avoids slow start Timeout: Retransmit Slow Start TCP Reno improves upon TCP Tahoe when a single packet is dropped in a round-trip time.
TCP Tahoe and TCP Reno (for single segment losses) cwnd Taho time Reno cwnd time
TCP New Reno When multiple packets are dropped, Reno has problems Partial ACK: Occurs when multiple packets are lost A partial ACK acknowledges some, but not all packets that are outstanding at the start of a fast recovery, takes sender out of fast recovery Sender has to wait until timeout occurs New Reno: Partial ACK does not take sender out of fast recovery Partial ACK causes retransmission of the segment following the acknowledged segment New Reno can deal with multiple lost segments without going to slow start
SACK SACK = Selective acknowledgment Issue: Reno and New Reno retransmit at most 1 lost packet per round trip time Selective acknowledgments: The receiver can acknowledge non-continuous blocks of data (SACK 0-1023, 1024-2047) Multiple blocks can be sent in a single segment. TCP SACK: Enters fast recovery upon 3 duplicate ACKs Sender keeps track of SACKs and infers if segments are lost. Sender retransmits the next segment from the list of segments that are deemed lost.