Univ. of TehranComputer Network1 Computer Networks Computer Networks (Graduate level) University of Tehran Dept. of EE and Computer Engineering By: Dr.

Univ. of TehranComputer Network1 Computer Networks Computer Networks (Graduate level) University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani Lecture 8: Congestion Control

Univ. of TehranComputer Network2 Congestion Control Congestion control basics TCP congestion control Assigned reading [JK88] Congestion Avoidance and Control [CJ89] Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks

Univ. of TehranComputer Network3 Overview Congestion sources and collapse Congestion control basics TCP congestion control TCP interactions

Univ. of TehranComputer Network4 Why End-to-End Protocols? Underlying best-effort network Drop/ reorder messages delivers duplicate copies of a given message limits messages to some finite size delivers messages after an arbitrarily long delay multiple application processes on each host Different speed of sender and receiver (Flow control) Congestion in the network (Congestion controls) Initially, there was no end to end protocol. Now: UDP: A simple end to end protocol TCP: Reliable Transport protocol

Univ. of TehranComputer Network5 Reliable Transport (TCP) Communication abstraction: Connection oriented, Point to point Reliable Error Detection and correction Ordered Byte-stream Application writes bytes TCP sends segments Application reads bytes Full duplex, two way connection Flow and congestion controlled Protocol implemented entirely at the ends Fate sharing

Univ. of TehranComputer Network6 Difference From Link Layers Logical link vs. physical link Must establish connection Variable RTT May vary within a connection Reordering packets How long can packets live  max segment lifetime Can’t expect endpoints to exactly match link Buffer space availability Packets in transmission, delay X bandwidth Transmission rate Don’t directly know media/network transmission rate (Congestion) Try to adapt to the situation.

Univ. of TehranComputer Network7 Congestion Different sources compete for resources inside network where thery are unaware of current state of resource and each other In general it is resource allocation problem. manifestations: lost packets (buffer overflow at routers) long delays (queuing in router buffers) 10 Mbps 100 Mbps 1.5 Mbps

Univ. of TehranComputer Network8 Causes/costs of congestion: scenario 1 two senders, two receivers one router, infinite buffers no retransmission large delays when congested maximum achievable throughput

Univ. of TehranComputer Network9 Causes/costs of congestion: scenario 2 one router, finite buffers sender retransmission of lost packet

Univ. of TehranComputer Network10 Causes/costs of congestion: scenario 2 always: (goodput) “perfect” retransmission only when loss: retransmission of delayed (not lost) packet makes larger (than perfect case) for same in out = in out > in “costs” of congestion: more work (retrans) for given “goodput” unneeded retransmissions: link carries multiple copies of pkt

Univ. of TehranComputer Network11 Causes/costs of congestion: scenario 3 four senders multihop paths timeout/retransmit in Q: what happens as and increase ? in

Univ. of TehranComputer Network12 Causes/costs of congestion: scenario 3 Another “cost” of congestion: when packet dropped, any “upstream transmission capacity used for that packet was wasted!

Univ. of TehranComputer Network13 Approaches towards congestion control End-end congestion control: no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP Network-assisted congestion control: routers provide feedback to end systems single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) explicit rate sender should send at Two broad approaches towards congestion control:

Univ. of TehranComputer Network14 Congestion Collapse Increase in network load results in decrease of useful work done and crash the network ability to deliver data Possible causes Spurious retransmissions of packets still in flight Classical congestion collapse How can this happen with packet conservation? Solution: better timers and TCP congestion control Undelivered packets Packets consume resources and are dropped elsewhere in network Solution: congestion control for ALL traffic

Univ. of TehranComputer Network15 Other Congestion Collapse Causes Fragments Mismatch of transmission and retransmission units Solutions Make network drop all fragments of a packet (early packet discard in ATM) Do path MTU discovery Control traffic Large percentage of traffic is for control Headers, routing messages, DNS, etc. Stale or unwanted packets Packets that are delayed on long queues “Push” data that is never used

Univ. of TehranComputer Network16 Where to Prevent Collapse? Can end hosts prevent problem? Yes, but must trust end hosts to do right thing E.g., sending host must adjust amount of data it puts in the network based on detected congestion Can routers prevent collapse? No, not all forms of collapse Doesn’t mean they can’t help Sending accurate congestion signals Isolating well-behaved from ill-behaved sources

Univ. of TehranComputer Network17 Congestion Control and Avoidance A mechanism which: Uses network resources efficiently Preserves fair network resource allocation Prevents or avoids collapse Congestion collapse is not just a theory Has been frequently observed in many networks It is a top 10 problem.

Univ. of TehranComputer Network18 Congestion Collapse and Efficiency knee – point after which throughput increases slowly delay increases quickly cliff – point after which throughput decreases quickly to zero (congestion collapse) delay goes to infinity Congestion avoidance stay at knee Congestion control stay left of (but usually close to) cliff Note (in an M/M/1 queue) delay = 1/(1 – utilization ) Load Throughput Delay kneecliff over utilization under utilization saturation congestion collapse

Univ. of TehranComputer Network19 Goals Operate near the knee point Remain in equilibrium How to maintain equilibrium? Don’t put a packet into network until another packet leaves. How do you do it? Use ACK: send a new packet only after you receive and ACK. Why? Maintain number of packets in network “constant”

Univ. of TehranComputer Network20 How Do You Do It? Detect when network approaches/reaches knee point Stay there Questions How do you get there? What if you overshoot (i.e., go over knee point) ? Possible solution: Increase window size until you notice congestion Decrease window size if network congested

Univ. of TehranComputer Network22 Control System Model [CJ89] Simple, yet powerful model Explicit binary signal of congestion Why explicit (TCP uses implicit)? Implicit allocation of bandwidth User 1 User 2 User n x1x1 x2x2 xnxn   x i > X goal y

Univ. of TehranComputer Network23 Objectives Simple router behavior Distributedness Efficiency: X knee =  x i (t) Fairness: (  x i ) 2 /n(  x i 2 ) Power: (throughput  /delay) Convergence: control system must be stable, responsiveness.

Univ. of TehranComputer Network24 Power Power (ratio of throughput to delay) Optimal load Load Throughput/delay

Univ. of TehranComputer Network25 Fair Allocation Maxmin fairness Flows which share the same bottleneck get the same amount of bandwidth Assumes no knowledge of priorities Fairness = 1 - distance from fairness line User 1: x 1 User 2: x 2 2 user example 2 getting too much 1 getting too much fairness line

Univ. of TehranComputer Network26 Basic Control Model Let’s assume window-based control Reduce window when congestion is perceived How is congestion signaled? Either mark or drop packets When is a router congested? Drop tail queues – when queue is full Average queue length – at some threshold Increase window otherwise Probe for available bandwidth – how?

Univ. of TehranComputer Network27 Linear Control Many different possibilities for reaction to congestion and probing Examine simple linear controls Window(t + 1) = a + b Window(t) Different a i /b i for increase and a d /b d for decrease Supports various reaction to signals Increase/decrease additively Increased/decrease multiplicatively Which of the four combinations is optimal?

Univ. of TehranComputer Network28 Possible Choices Multiplicative increase, additive decrease a I =0, b I >1, a D <0, b D =1 Additive increase, additive decrease a I >0, b I =1, a D <0, b D =1 Multiplicative increase, multiplicative decrease a I =0, b I >1, a D =0, 0<b D <1 Additive increase, multiplicative decrease a I >0, b I =1, a D =0, 0<b D <1 Which one?

Univ. of TehranComputer Network29 Phase plots What are desirable properties? What if flows are not equal? Efficiency Line Fairness Line User 1’s Allocation x 1 User 2’s Alloca tion x 2 Optimal point Overload Underutilization

Univ. of TehranComputer Network30 Phase plots Simple way to visualize behavior of competing connections over time Efficiency Line Fairness Line User 1’s Allocation x 1 User 2’s Alloca tion x 2

Univ. of TehranComputer Network31 Additive Increase/Decrease T0T0 T1T1 Efficiency Line Fairness Line User 1’s Allocation x 1 User 2’s Alloca tion x 2 Both X 1 and X 2 increase/decrease by the same amount over time Additive increase improves efficiency and additive decrease reduces efficiency

Univ. of TehranComputer Network32 Muliplicative Increase/Decrease Both X 1 and X 2 increase by the same factor over time Extension from origin – constant fairness T0T0 T1T1 Efficiency Line Fairness Line User 1’s Allocation x 1 User 2’s Alloca tion x 2

Univ. of TehranComputer Network33 Convergence to Efficiency xHxH Efficiency Line Fairness Line User 1’s Allocation x 1 User 2’s Alloca tion x 2

Univ. of TehranComputer Network34 Convergence to Fairness xHxH Efficiency Line Fairness Line User 1’s Allocation x 1 User 2’s Allocation x 2 x H’

Univ. of TehranComputer Network35 Convergence to Efficiency & Fairness xHxH Efficiency Line Fairness Line User 1’s Allocation x 1 User 2’s Allocation x 2 x H’

Univ. of TehranComputer Network36 Increase Efficiency Line Fairness Line User 1’s Allocation x 1 User 2’s Allocation x 2 xLxL

Univ. of TehranComputer Network37 Constraints Distributed efficiency I.e.,  Window(t+1) >  Window(t) during increase a i > 0 & b i > 1 Similarly, a d < 0 & b d < 1 Must never decrease fairness a & b’s must be > 0 a i /b i > 0 and a d /b d  0 Full constraints a d = 0, 0  b d 0 and b i = 1

Univ. of TehranComputer Network38 What is the Right Choice? Constraints limit us to AIMD Can have multiplicative term in increase AIMD moves towards optimal point x0x0 x1x1 x2x2 Efficiency Line Fairness Line User 1’s Allocation x 1 User 2’s Allocation x 2

Univ. of TehranComputer Network40 TCP Congestion Control Motivated by ARPANET congestion collapse Underlying design principle: packet conservation At equilibrium, inject packet into network only when one is removed Basis for stability of physical systems Why was this not working? Connection doesn’t reach equilibrium Spurious retransmissions Resource limitations prevent equilibrium

Univ. of TehranComputer Network41 TCP Congestion Control - Solutions Reaching equilibrium Slow start Eliminates spurious retransmissions Accurate RTO estimation Fast retransmit Adapting to resource availability Congestion avoidance

Univ. of TehranComputer Network42 TCP Congestion Control Basics Keep a congestion window, cwnd Denotes how much network is able to absorb Sender’s maximum window: Min (advertised window, cwnd) Sender’s actual window: Max window - unacknowledged segments If we have large actual window, should we send data in one shot? No, use acks to clock sending new data

Univ. of TehranComputer Network43 Self-clocking PrPr PbPb ArAr AbAb Receiver Sender AsAs

Univ. of TehranComputer Network44 Slow Start How do we get this clocking behavior to start? Initialize cwnd = 1 Upon receipt of every ack, cwnd = cwnd + 1 Implications Window actually increases to W in RTT * log 2 (W) Can overshoot window and cause packet loss

Univ. of TehranComputer Network45 Slow Start Example 1 One RTT One pkt time 0R 2 1R 3 4 2R 5 6 7 8 3R 9 10 11 12 13 14 15 1 23 4567

Univ. of TehranComputer Network46 Slow Start Sequence Plot Time Sequence No......

Univ. of TehranComputer Network47 Congestion Avoidance Loss implies congestion – why? Not necessarily true on all link types If loss occurs when cwnd = W Network can handle 0.5W ~ W segments Set cwnd to 0.5W (multiplicative decrease) Upon receiving ACK Increase cwnd by 1/cwnd Results in additive increase

Univ. of TehranComputer Network48 Return to Slow Start If packet is lost we lose our self clocking as well Need to implement slow-start and congestion avoidance together When timeout occurs set ssthresh to 0.5w If cwnd < ssthresh, use slow start Else use congestion avoidance

Univ. of TehranComputer Network49 Overall TCP Behavior Time Window

Univ. of TehranComputer Network50 Congestion Window Time Congestion Window Slow start with each time out Time out Slow Start Time out is just wasting resource and time How to prevent it: Do not wait! Fast retransmission

Univ. of TehranComputer Network51 Fast Retransmit Resend a segment after 3 duplicate ACKs A duplicate ACK means that an out-of sequence segment was received Notes: duplicate ACKs due to packet reordering why reordering? window may be too small to get duplicate ACKs Then what? Slow start ACK 2 segment 1 cwnd = 1 cwnd = 2 segment 2 segment 3 ACK 4 cwnd = 4 segment 4 segment 5 segment 6 segment 7 ACK 3 3 duplicate ACKs ACK 4

Univ. of TehranComputer Network52 Fast Recovery A duplicate ack notifies sender that a packet has departed network When < cwnd packets are outstanding Allow new packets out with each new duplicate acknowledgement Behavior Sender is idle for some time – waiting for ½ cwnd worth of dupacks Transmits at original rate after wait Ack clocking rate is same as before loss At the end: No Slow start: W=W/2 and got to AIMD

Univ. of TehranComputer Network53 Fast Retransmit Time Sequence No Duplicate Acks Retransmission X

Univ. of TehranComputer Network54 Fast Recovery Time Sequence No Sent for each dupack after W/2 dupacks arrive

Univ. of TehranComputer Network55 Multiple Losses Time Sequence No Duplicate Acks Retransmission X X X X Now what?

Univ. of TehranComputer Network56 Time Sequence No X X X X Tahoe Slow start again

Univ. of TehranComputer Network57 TCP Reno (1990) All mechanisms in Tahoe Addition of fast-recovery Opening up congestion window after fast retransmit Delayed acks Header prediction Implementation designed to improve performance Has common case code inlined With multiple losses, Reno typically timeouts because it does not see duplicate acknowlegements

58 TCP Reno Fast retransmit: retransmit a segment after 3 DUP Acks Fast recovery: reduce cwnd to half instead of to one Time cwnd Slow Start Congestion Avoidance Timeout Fast Recovery Fast recovery

Univ. of TehranComputer Network59 Reno Time Sequence No X X X X Now what? - timeout

Univ. of TehranComputer Network60 NewReno The ack that arrives after retransmission (partial ack) should indicate that a second loss occurred When does NewReno timeout? When there are fewer than three dupacks for first loss When partial ack is lost How fast does it recover losses? One per RTT

Univ. of TehranComputer Network61 NewReno Time Sequence No X X X X Now what? – partial ack recovery

Univ. of TehranComputer Network62 SACK Basic problem is that cumulative acks only provide little information Ack for just the packet received What if acks are lost?  carry cumulative also Not used Bitmask of packets received Selective acknowledgement (SACK) How to deal with reordering

Univ. of TehranComputer Network63 SACK Time Sequence No X X X X Now what? – send retransmissions as soon as detected

Univ. of TehranComputer Network64 Performance Issues Timeout >> fast rexmit Need 3 dupacks/sacks Not great for small transfers Don’t have 3 packets outstanding What are real loss patterns like? Right edge recovery Allow packets to be sent on arrival of first and second duplicate ack Helps recovery for small windows How to deal with reordering?

Univ. of TehranComputer Network65 NewReno Changes Send a new packet out for each pair of dupacks Adapt more gradually to new window Will not halve congestion window again until recovery is completed Identifies congestion events vs. congestion signals Initial estimation for ssthresh

Univ. of TehranComputer Network66 Rate Halving Recovery Time Sequence No Sent after every other dupack

Univ. of TehranComputer Network67 Delayed Ack Impact TCP congestion control triggered by acks If receive half as many acks  window grows half as fast Slow start with window = 1 Will trigger delayed ack timer First exchange will take at least 200ms Start with > 1 initial window Bug in BSD, now a “feature”/standard

Univ. of TehranComputer Network68 TCP Congestion Control end-end control (no network assistance) transmission rate limited by congestion window size, Congwin, over segments: w segments, each with MSS bytes sent in one RTT: throughput = w * MSS RTT Bytes/sec Congwin

Univ. of TehranComputer Network69 Fast Retransmit and Recovery in Reno Upon reception of 3 dupes, thresh= thresh/2. Set the congwin = thresh + 3 This accounts for the 3 packets that have left the network. Increment congwin for each dupe subsequently received. Transmit a new segment if we are allowed When a new ack* finally arrives, set congwin=thresh and we are in congestion avoidance again with a “deflated window”. *”new ack” means an ack for any data not yet acked, but inside congwin.

Univ. of TehranComputer Network70 Fast Retransmit and Recovery in NewReno Upon reception of 3 dupes, thresh= thresh/2. Set the congwin = thresh + 3 This accounts for the 3 packets that have left the network. Increment congwin for each dup subsequently received. Transmit a new segment if we are allowed When a new ack* finally arrives, set congwin=thresh and we are in congestion avoidance again. * But only do this if the ack received is for the highest seq# sent, avoiding a stall while recovering.

71 TCP & Routers How Routers can help Congestion control Indeed, Congestion control and queue management are the same problem “Resource Allocation”. RED XCP Read Chapter 6 of the book, also look at [FJ93] Random Early Detection Gateways for Congestion Avoidance

72 Queuing Disciplines Each router must implement some queuing discipline Queuing allocates both bandwidth and buffer space: Bandwidth: which packet to serve (transmit) next Buffer space: which packet to drop next (when required) Queuing also affects latency

73 Packet Drop Dimensions Aggregation Per-connection state Single class Drop position Head Tail Random location Class-based queuing Early dropOverflow drop

74 Typical Internet Queuing FIFO + drop-tail Simplest choice Used widely in the Internet FIFO (first-in-first-out) Implies single class of traffic Drop-tail Arriving packets get dropped when queue is full regardless of flow or importance Important distinction: FIFO: scheduling discipline Drop-tail: drop policy

75 Active Queue Management Design active router queue management to aid congestion control Why? Routers can distinguish between propagation and persistent queuing delays Routers can decide on transient congestion, based on workload

76 Active Queue Designs Modify both router and hosts DECbit: congestion bit in packet header Modify router, hosts use TCP Fair queuing Per-connection buffer allocation RED (Random Early Detection) Drop packet or set bit in packet header as soon as congestion is starting

77 Random Early Detection (RED) Detect incipient congestion, allow bursts Keep power (throughput/delay) high Keep average queue size low Assume hosts respond to lost packets Avoid window synchronization Randomly mark packets Avoid bias against bursty traffic Some protection against ill-behaved users

78 RED Algorithm Maintain running average of queue length If avgq < min th do nothing Low queuing, send packets through If avgq > max th, drop packet Protection from misbehaving sources Else mark packet in a manner proportional to queue length Notify sources of incipient congestion

79 RED Operation Min thresh Max thresh Average Queue Length min th max th max P 1.0 Avg queue length P(drop)

80 RED Algorithm Maintain running average of queue length Byte mode vs. packet mode – why? For each packet arrival Calculate average queue size (avg) If min th ≤ avgq < max th Calculate probability P a With probability P a Mark the arriving packet Else if max th ≤ avg Mark the arriving packet

81 Queue Estimation Standard EWMA: avgq = (1-w q ) avgq + w q qlen Special fix for idle periods – why? Upper bound on w q depends on min th Want to ignore transient congestion Can calculate the queue average if a burst arrives Set w q such that certain burst size does not exceed min th Lower bound on w q to detect congestion relatively quickly Typical w q = 0.002

82 Thresholds min th determined by the utilization requirement Tradeoff between queuing delay and utilization Relationship between max th and min th Want to ensure that feedback has enough time to make difference in load Depends on average queue increase in one RTT Paper suggest ratio of two Current rule of thumb is factor of three

83 Packet Marking Marking probability based on queue length P b = max p (avgq - min th ) / (max th - min th ) Just marking based on P b can lead to clustered marking Could result in synchronization Better to bias P b by history of unmarked packets P a = P b /(1 - count*P b )

84 Packet Marking max p is reflective of typical loss rates Paper uses 0.02 0.1 is more realistic value If network needs marking of 20-30% then need to buy a better link! Gentle variant of RED (recommended) Vary drop rate from max p to 1 as the avgq varies from max th to 2* max th More robust to setting of max th and max p

85 Extending RED for Flow Isolation Problem: what to do with non- cooperative flows? Fair queuing achieves isolation using per- flow state – expensive at backbone routers How can we isolate unresponsive flows without per-flow state? RED penalty box Monitor history for packet drops, identify flows that use disproportionate bandwidth Isolate and punish those flows

86 FRED Fair Random Early Drop (Sigcomm, 1997) Maintain per flow state only for active flows (ones having packets in the buffer) min q and max q  min and max number of buffers a flow is allowed occupy avgcq = average buffers per flow Strike count of number of times flow has exceeded max q

87 Feedback Round Trip Time Congestion Window Congestion Header Feedback Round Trip Time Congestion Window How does XCP Work? Feedback = + 0.1 packet

88 Feedback = + 0.1 packet Round Trip Time Congestion Window Feedback = - 0.3 packet How does XCP Work?

89 Congestion Window = Congestion Window + Feedback Routers compute feedback without any per-flow state How does XCP Work? XCP extends ECN and CSFQ

90 How Does an XCP Router Compute the Feedback? Congestion Controller Fairness Controller Goal: Divides  between flows to converge to fairness Looks at a flow’s state in Congestion Header Algorithm: If  > 0  Divide  equally between flows If  < 0  Divide  between flows proportionally to their current rates MIMD AIMD Goal: Matches input traffic to link capacity & drains the queue Looks at aggregate traffic & queue Algorithm: Aggregate traffic changes by   ~ Spare Bandwidth  ~ - Queue Size So,  =  d avg Spare -  Queue  Congestion Controller Fairness Controller

91  =  d avg Spare -  Queue Theorem: System converges to optimal utilization (i.e., stable) for any link bandwidth, delay, number of sources if: (Proof based on Nyquist Criterion) Getting the devil out of the details … Congestion Controller Fairness Controller No Parameter Tuning Algorithm: If  > 0  Divide  equally between flows If  < 0  Divide  between flows proportionally to their current rates Need to estimate number of flows N RTT pkt : Round Trip Time in header Cwnd pkt : Congestion Window in header T: Counting Interval No Per-Flow State

92 Lessons TCP alternatives TCP being used in new/unexpected ways Key changes needed Routers FIFO, drop-tail interacts poorly with TCP Various schemes to desynchronize flows and control loss rate Fair-queuing Clean resource allocation to flows Complex packet classification and scheduling Core-stateless FQ & XCP Coarse-grain fairness Carrying packet state can reduce complexity

Univ. of TehranComputer Network93 Next Lecture: TCP behavior and New Versions High speed TCPs Assigned reading [BP95] TCP Vegas: End to End Congestion Avoidance on a Global Internet [FHPW00] Equation-Based Congestion Control for Unicast Applications

Univ. of TehranComputer Network1 Computer Networks Computer Networks (Graduate level) University of Tehran Dept. of EE and Computer Engineering By: Dr.

Similar presentations

Presentation on theme: "Univ. of TehranComputer Network1 Computer Networks Computer Networks (Graduate level) University of Tehran Dept. of EE and Computer Engineering By: Dr."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Univ. of TehranComputer Network1 Computer Networks Computer Networks (Graduate level) University of Tehran Dept. of EE and Computer Engineering By: Dr.

Similar presentations

Presentation on theme: "Univ. of TehranComputer Network1 Computer Networks Computer Networks (Graduate level) University of Tehran Dept. of EE and Computer Engineering By: Dr."— Presentation transcript:

Similar presentations

About project

Feedback