Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 TCP (Part II) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 TCP (Part II) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute shivkuma@ecse.rpi.edu http://www.ecse.rpi.edu/Homepages/shivkuma

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 2 q TCP interactive data flow q TCP bulk data flow q TCP congestion control q TCP timers q TCP futures and performance Ref: Chap 19-24; RFC 793, 1323, 2001, papers by Jacobson, Karn/Partridge Overview

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3 Reliability models q Reliability fundamentally requires redundancy to recover from uncertain loss or other failure modes. q Two types of redundancy: q Spatial redundancy: independent backup copies q Forward error correction (FEC) codes q Problem: requires huge overhead, since the FEC is also part of the packet(s) it cannot recover from erasure of all packets q Temporal redundancy: retransmit if packets lost/error q Requires trading off response time for reliability q Design of status reports and retransmission optimization (see next slide) important

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 4 Temporal Redundancy model

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 5 Status report design q Cumulative acks: q Robust to losses on the reverse channel q Can work with go-back-N retransmission q Cannot pinpoint blocks of data which are lost q The first lost packet can be pinpointed because the receiver would generate duplicate acks q Selective acks: q For a byte-stream model like TCP, need to specify ranges of bytes received (requires large overhead) q SACK is a TCP option over-and-above the cumulative acks q Bitmaps are not efficient because a bit is needed for every byte q NAKs have same problems like SACKs and bitmaps, but also are not robust to reverse channel losses

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 6 Retransmission optimization q Default retransmission: q Go-back-N: I.e. retransmit the entire window. q Triggered by timeout or persistent loss in TCP q Not efficient if windows are large: high speed n/ws q Selective retransmission: q Retransmit one packet based upon duplicate acks q Recovers quickly from isolated loss, but not from burst loss q SACK allows pinpointing retransmissions to just cover ranges of lost packets q Such retransmitted packets must finally be confirmed by acks since SACK is only an option and not reliable

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 7 TCP Interactive Data Flow q Problems: q Overhead: 40 bytes header + 1 byte data q To batch or not to batch: response time important q Batching acks: q Delay-ack timer: piggyback ack on echo q 200 ms timer (fig 19.3) q Batching data: q Nagle’s algo: Don’t send packet until next ack is received. q Developed because of congestion in WANs

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 8 TCP Bulk Data Flow q Sliding window: q Send multiple packets while waiting for acks (fig 20.1) upto a limit (W) q Receiver need not ack every packet q Acks are cumulative. q Ack # = Largest consecutive sequence number received + 1 q Two transfers of the data can have different dynamics (eg: fig 20.1 vs fig 20.2) q Receiver window field: q Reduced if TCP receiver short on buffers

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 9 TCP Bulk Data Flow (Contd) q End-to-end flow control q Window update acks: receiver ready q Default buffer sizes: 4096 to 16384 bytes. q Ideal: window and receiver buffer = bandwidth- delay product q TCP window terminology: figs 20.4, 20.5, 20.6 q Right edge, Left edge, usable window q “closes” => left edge (snd_una) advances q “opens” => right edge advances (receiver buffer freed => receiver window increases) q “shrinks” => right edge moves to left (rare)

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 10 The Congestion Problem q Problem: demand outstrips available capacity … q Q: Will the “congestion” problem be solved when: q a) Memory becomes cheap (infinite memory)? No buffer Too late All links 19.2 kb/s Replace with 1 Mb/s S S S S S S S S S S S S S S S S File Transfer Time = 7 hours File Transfer time = 5 mins q b) Links become cheap (high speed links)?

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 11 Ans: None of the above solves congestion ! q Congestion: Demand > Capacity q It is a dynamic problem => Static solutions are not sufficient q TCP provides a dynamic solution A B S C D Scenario: All links 1 Gb/s. A & B send to C. q c) Processors become cheap (fast routers switches)?

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 12 i  ii q If information about i, and  is known in a central location where control of i can be effected with zero time delays, the congestion problem is solved. q Problems: q Incomplete information (eg: loss indications) q Distributed solution required q Congestion and control/measurement locations different q Time-varying, heterogeneous time-delays

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 13 TCP Congestion Control q Window flow control: avoid receiver overrun q Dynamic window congestion control: avoid/control network overrun q Observation: Not a good idea to start with a large window and dump packets into network q Treat network like a black box and start from a window of 1 segment (“slow start”) q Increase window size exponentially (“exponential increase”) over successive RTTs => quickly grow to claim available capacity. q Technique: Every ack: increase cwnd (new window variable) by 1 segment. q Effective window = Min(cwnd, Wrcvr)

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 14 Dynamics q Rate of acks = rate of packets at the bottleneck: “Self-clocking” property. 100 Mbps10 Mbps Router Q 1st RTT 2nd RTT3rd RTT4th RTT

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 15 Congestion Detection q Packet loss as an indicator of congestion. q Set slow start threshold (ssthresh) to min(cwnd, Wrcvr)/2 q Retransmit pkt, set cwnd to 1 (reenter slow start) Time (units of RTTs) Congestion Window (cwnd) Receiver Window Idle Interval Timeout 1 ssthresh

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 16 Congestion avoidance q Increment cwnd by 1 per ack until ssthresh q Increment by 1/cwnd per ack afterwards (“Congestion avoidance” or “linear increase”) q Idea: ssthresh estimates the bandwidth-delay product for the connection. q Initialization: ssthresh = Receiver window or default 65535 bytes. Larger values thru options. q If source is idle for a long time, cwnd is reset to one MSS.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 17 q Implications of using packet loss as congestion indicator q Late congestion detection if the buffer sizes larger q Higher speed links or large buffers => larger windows => higher probability of burst loss q Interactions with retransmission algorithm and timeouts q Implications of ack-clocking q More batching of acks => bursty traffic (harder to manage) q Less batching leads to a large fraction of Internet traffic being just acks (huge overhead) q Additive Increase/Multiplicative Decrease Dynamics: q TCP approximates these dynamics

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 18 Timeout and RTT Estimation q Timeout: for robust detection of packet loss q Problem: How long should timeout be ? q Too long => underutilization; too short => wasteful retransmissions q Solution: adaptive timeout: based on RTT q RTT estimation: q Early method: exponential averaging: q R   *R + (1 -  )*M { M =measured RTT} q RTO =  *R {  = delay variance factor} q Suggested values:  = 0.9,  = 2

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 19 RTT Estimation q Jacobson [1988]: this method has problems w/ large RTT fluctuations q New method: Use mean & deviation of RTT q A = smoothed average RTT q D = smoothed mean deviation q Err = M - A { M = measured RTT} q A  A + g*Err {g = gain = 0.125} q D  D + h*(|Err| - D) {h = gain = 0.25} q RTO = A + 4D q Integer arithmetic used throughout. Complex initialization process...

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 20 Timer Backoff/Karn’s Algorithm q Timer backoff: If timeout, RTO = 2*RTO {exponential backoff} q Retransmission ambiguity problem: q During retransmission, it is unclear whether an ack refers to a packet or its retransmission. Problem for RTT estimation q Karn/Partridge: don’t update RTT estimators during retransmission. q Restart RTO only after an ack received for a segment that is not retransmitted

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 21 Fast Retransmit and Recovery q Goals: q Timeout avoidance: The 500 ms timer granularity can have an adverse performance impact especially for high speed n/ws q Selective retransmission: Especially when packets are dropped due to error or light congestion q Fast Recovery: Converge quickly to a state of congestion avoidance (linear increase) with half-current window -- the assumed ideal window size. q Observation: Receivers are required to send an immediate duplicate acknowledgment when they receives out-of-order data segments.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 22 Fast Retransmit and Recovery q 3 duplicate acks => assume loss q More duplicate acks => other packets have reached destination safely. q Wait for about 1/2*RTT, and resume transmitting new segments for every subsequent duplicate ack received. Stop this process once the ack for the missing segment received 0 500 Ack 500 FRR

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 23 Fast Retransmit and Recovery q Fast Retransmit: Received third duplicate ack: q Set ssthresh to 1/2 of current cwnd q Retransmit the missing segment q Set cwnd to ssthresh+3 q Fast Recovery: For each duplicate ack hence: q Increment cwnd by 1 MSS q New packets are transmitted once cwnd grows large enough. q [If old cwnd was a pipe of length 1*RTT, the network gets a relief period of 1/2*RTT]

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 24 FRR (contd) q Upon receiving the next (non-duplicate) Ack: q Set cwnd to ssthresh & enter linear growth phase CWND TIME CWND/2 New packets sent during this phase

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 25 FRR problems q Burst loss of 3 pkts => Timeout + window shutdown to cwnd/8 ! CWND Time 1st Fast Retransmit 2nd Fast Retransmit Timeout CWND/2 CWND/4 CWND/8 W

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 26 TCP Performance Optimization q SACK: selective acknowledgments: specifies blocks of packets received at destination. q Random early drop (RED) scheme spreads the dropping of packets more uniformly and reduces average queue length and packet loss rate. q Scheduling mechanisms protect well-behaved flows from rogue flows. q Explicit Congestion Notification (ECN): routers use a explicit bit-indication for congestion instead of loss indications.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 27 Congestion control summary q Sliding window limited by receiver window. q Dynamic windows: slow start (exponential rise), congestion avoidance (linear rise), multiplicative decrease. q Adaptive timeout: need mean RTT & deviation q Timer back off and Karn’s algo during retransmission q Go-back-N or Selective retransmission q Cumulative and Selective acknowledgements q Timeout avoidance: FRR q Drop policies, scheduling and ECN

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 28 TCP Persist Timer q Receiver flow control can set window to zero q Receiver later sends “window update acks” q But TCP does not transmit acks reliably => update acks may be lost and source may be stuck at a zero window value q TCP uses persist timer to query the receiver periodically to find if the window has been increased. q Persist timer always bounded between 5s and 60s. It does exponential backoff like other timers too.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 29 Silly Window Syndrome q A) The system operates at a small window (sends segments which are not MSS-sized) even if the receiver grants a large window. q B) Receiver advertises small windows. q Solution: batching q Receiver must not advertise small windows q Sender waits until segment full before sending (extension of Nagle’s algo), q It can transmit everything if it is not waiting for any ACK (or if Nagle’s algo has been disabled)

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 30 TCP Keepalive timer q Optional timer. q Not part of TCP spec, but found in most implementations. q Not necessary, because “connection” defined by endpoints. q Connection can be “up”as long as source/destination “up”. q Typical use: to detect idle clients or half-open connections and de-allocate server resources tied up to them. Eg: telnet, ftp.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 31 Gigabit Networks q “Higher Bandwidth Networks” q Propagation latency unchanged. q Increasing bandwidth from 1.5Mb/s to 45 Mb/s (factor of 29) decreases file transfer time of 1MB by a factor of 25. q But, increasing from 1 Gb/s to 2 Gb/s gives an improvement of only 10% ! q Transfer time = propagation time + transmission time + queueing/processing. q Design networks to minimize delay (queueing, processing, reduce retransmission latency)

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 32 q Long Fat Pipe Networks (LFN): Satellite links q Need very large window sizes. q Normally, Max window = 2 16 = 64 KBytes q Window scale: Window = W × 2 Scale Window Scaling Option Kind = 3Length = 3Scale q Max window = 2 16 × 2 255 q Option sent only in SYN and SYN + Ack segments. q RFC 1323

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 33 Timestamp option q For LFNs, need accurate and more frequent RTT estimates. q Timestamp option: q Place a timestamp value in any segment. q Receiver echoes timestamp value in ack q If acks are delayed, the timestamp value returned corresponds to the earliest segment being acked. q Segments lost/retransmitted => RTT overestimated

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 34 PAWS: Protection against wrapped sequence numbers q Largest receiver window = 2^30 = 1 GB q “Lost” segment may reappear before MSL, and the sequence numbers may have wrapped around q The receiver considers the timestamp as an extension of the sequence number => discard out-of-sequence segment based on both seq # and timestamp. q Reqt: timestamp values need to be monotonically increasing, and need to increase by at least one per window

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 35 Summary q Interactive and bulk TCP flow q TCP congestion control q Informal exercises: Perform some of the experiments described in chaps 19-21 to see various facets of TCP in action

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 TCP (Part II) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute

Similar presentations

Presentation on theme: "Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 TCP (Part II) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 TCP (Part II) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute

Similar presentations

Presentation on theme: "Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 TCP (Part II) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute"— Presentation transcript:

Similar presentations

About project

Feedback