Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 TCP Congestion Control and Common AQM Schemes: Quick Revision Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Based in part upon slides of Prof. Raj Jain (OSU), Srini Seshan (CMU), J. Kurose (U Mass), I.Stoica (UCB)
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2 q TCP Congestion Control Model and Mechnisms q TCP Versions: Tahoe, Reno, NewReno, SACK, Vegas etc q AQM schemes: common goals, RED, … Overview
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3 TCP Congestion Control q Maintains three variables: q cwnd – congestion window q rcv_win – receiver advertised window q ssthresh – threshold size (used to update cwnd) q Rough estimate of knee point… q For sending use: win = min(rcv_win, cwnd)
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4 Packet Conservation: Self-clocking PrPr PbPb ArAr AbAb Receiver Sender AsAs q Implications of ack-clocking: q More batching of acks => bursty traffic q Less batching leads to a large fraction of Internet traffic being just acks (overhead)
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5 TCP: Slow Start q Whenever starting traffic on a new connection, or whenever increasing traffic after congestion was experienced: q Set cwnd =1 q Each time a segment is acknowledged increment cwnd by one (cwnd++). q Does Slow Start increment slowly? Not really. In fact, the increase of cwnd is exponential!! q Window increases to W in RTT * log 2 (W)
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6 Slow Start Example q The congestion window size grows very rapidly q TCP slows down the increase of cwnd when cwnd >= ssthresh ACK for segment 1 segment 1 cwnd = 1 cwnd = 2 segment 2 segment 3 ACK for segments cwnd = 4 segment 4 segment 5 segment 6 segment 7 ACK for segments cwnd = 8
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7 Slow Start Sequence Plot Time Sequence No Window doubles every round
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8 Congestion Avoidance q Goal: maintain operating point at the left of the cliff: q How? q additive increase: starting from the rough estimate (ssthresh), slowly increase cwnd to probe for additional available bandwidth q multiplicative decrease: cut congestion window size aggressively if a loss is detected. q If cwnd > ssthresh then each time a segment is acknowledged increment cwnd by 1/cwnd i.e. (cwnd += 1/cwnd).
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9 Additive Increase/Multiplicative Decrease (AIMD) Policy q Assumption: decrease policy must (at minimum) reverse the load increase over-and-above efficiency line q Implication: decrease factor should be conservatively set to account for any congestion detection lags etc x0x0 x1x1 x2x2 Efficiency Line Fairness Line User 1’s Allocation x 1 User 2’s Allocation x 2
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10 Congestion Avoidance Sequence Plot Time Sequence No Window grows by 1 every round
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11 Slow Start/Congestion Avoidance Eg. q Assume that ssthresh = 8 Roundtrip times Cwnd (in segments) ssthresh
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12 Putting Everything Together: TCP Pseudo-code Initially: cwnd = 1; ssthresh = infinite; New ack received: if (cwnd < ssthresh) /* Slow Start*/ cwnd = cwnd + 1; else /* Congestion Avoidance */ cwnd = cwnd + 1/cwnd; Timeout: (loss detection) /* Multiplicative decrease */ ssthresh = win/2; cwnd = 1; while (next < unack + win) transmit next packet; where win = min(cwnd, flow_win); unacknext win seq #
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13 The big picture Time cwnd Timeout Slow Start Congestion Avoidance
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14 Packet Loss Detection: Timeout Avoidance q Wait for Retransmission Time Out (RTO) q What’s the problem with this? q Because RTO is a performance killer q In BSD TCP implementation, RTO is usually more than 1 second q the granularity of RTT estimate is 500 ms q retransmission timeout is at least two times of RTT q Solution: Don’t wait for RTO to expire q Use fast retransmission/recovery for loss detection q Fall back to RTO only if these mechanisms fail. q TCP Versions: Tahoe, Reno, NewReno, SACK
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15 TCP Congestion Control Summary q Sliding window limited by receiver window. q Dynamic windows: slow start (exponential rise), congestion avoidance (additive rise), multiplicative decrease. q Ack clocking q Adaptive timeout: need mean RTT & deviation q Timer backoff and Karn’s algo during retransmission q Go-back-N or Selective retransmission q Cumulative and Selective acknowledgements q Timeout avoidance: Fast Retransmit
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16 Queuing Disciplines q Each router must implement some queuing discipline q Queuing allocates bandwidth and buffer space: q Bandwidth: which packet to serve next (scheduling) q Buffer space: which packet to drop next (buff mgmt) q Queuing also affects latency Class C Class B Class A Traffic Classes Traffic Sources Drop Scheduling Buffer Management
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17 Typical Internet Queuing q FIFO + drop-tail q Simplest choice q Used widely in the Internet q FIFO (first-in-first-out) q Implies single class of traffic q Drop-tail q Arriving packets get dropped when queue is full regardless of flow or importance q Important distinction: q FIFO: scheduling discipline q Drop-tail: drop (buffer management) policy
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18 FIFO + Drop-tail Problems q FIFO Issues: In a FIFO discipline, the service seen by a flow is convoluted with the arrivals of packets from all other flows! q No isolation between flows: full burden on e2e control q No policing: send more packets get more service q Drop-tail issues: q Routers are forced to have have large queues to maintain high utilizations q Larger buffers => larger steady state queues/delays q Synchronization: end hosts react to same events because packets tend to be lost in bursts q Lock-out: a side effect of burstiness and synchronization is that a few flows can monopolize queue space
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19 Queue Management Ideas q Synchronization, lock-out: q Random drop: drop a randomly chosen packet q Drop front: drop packet from head of queue q High steady-state queuing vs burstiness: q Early drop: Drop packets before queue full q Do not drop packets “too early” because queue may reflect only burstiness and not true overload q Misbehaving vs Fragile flows: q Drop packets proportional to queue occupancy of flow q Try to protect fragile flows from packet loss (eg: color them or classify them on the fly) q Drop packets vs Mark packets: q Dropping packets interacts w/ reliability mechanisms q Mark packets: need to trust end-systems to respond!
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20 Packet Drop Dimensions Aggregation Per-connection state Single class Drop position Head Tail Random location Class-based queuing Early dropOverflow drop
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21 Random Early Detection (RED) Min thresh Max thresh Average Queue Length min th max th max P 1.0 Avg queue length P(drop)
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22 Random Early Detection (RED) q Maintain running average of queue length q Low pass filtering q If avg Q < min th do nothing q Low queuing, send packets through q If avg Q > max th, drop packet q Protection from misbehaving sources q Else mark (or drop) packet in a manner proportional to queue length & bias to protect against synchronization q P b = max p (avg - min th ) / (max th - min th ) q Further, bias P b by history of unmarked packets q P a = P b /(1 - count*P b )
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23 RED Issues q Issues: q Breaks synchronization well q Extremely sensitive to parameter settings q Wild queue oscillations upon load changes q Fail to prevent buffer overflow as #sources increases q Does not help fragile flows (eg: small window flows or retransmitted packets) q Does not adequately isolate cooperative flows from non-cooperative flows q Isolation: q Fair queuing achieves isolation using per-flow state q RED penalty box: Monitor history for packet drops, identify flows that use disproportionate bandwidth
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24 REM Athuraliya & Low 2000 q Main ideas q Decouple congestion & performance measure q “Price” adjusted to match rate and clear buffer q Marking probability exponential in `price’ REM RED Avg queue 1
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 25 Comparison of AQM Performance DropTail queue = 94% RED min_th = 10 pkts max_th = 40 pkts max_p = 0.1 REM queue = 1.5 pkts utilization = 92% = 0.05, = 0.4, = 1.15
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 26 Area = 2w 2 /3 What is TCP Throughput? Each cycle delivers 2w 2 /3 packets Assume: each cycle delivers 1/p packets = 2w 2 /3 q Delivers 1/p packets followed by a drop => Loss probability = p/(1+p) ~ p if p is small. q Hence t window 2w/3 w = (4w/3+2w/3)/2 4w/3 2w/3
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27 Law q Equilibrium window size q Equilibrium rate Empirically constant a ~ 1 q Verified extensively through simulations and on Internet q References q T.J.Ott, J.H.B. Kemperman and M.Mathis (1996) q M.Mathis, J.Semke, J.Mahdavi, T.Ott (1997) q T.V.Lakshman and U.Mahdow (1997) q J.Padhye, V.Firoiu, D.Towsley, J.Kurose (1998)