TCP Timeout and Retransmission

TCP Timeout and Retransmission
Chapter 14 TCP Timeout and Retransmission

Triggers for Retransmissions
Time based RTO (Retransmission Timeout) upon non-receipt of ACKs Structure of ACKs Sufficient number of duplicate ACKs indicating a missed packet

How to Set the RTO: The Classic Method used in Original TCP specs
SRTT ← α(SRTT) + (1 – α) RTTs α is a smoothing factor with a recommended value between 0.8 and 0.9 RTO = min(ubound, max(lbound,(SRTT)β)) β is a delay variance factor with a recommended value of 1.3 to 2.0 ubound is an upper bound (suggested to be 1 minute), lbound is a lower bound (suggested to be 1s) on the RTO. Issues with this solution? generally results in the RTO being set either to 1s, or to about β SRTT: ok for stable values of RTT with highly variable RTTs (e.g., early packet radio network), it did not perform so well

Jacobson found problems with the Classic Method
Timer can’t keep up with wide fluctuations in the RTT Causes unnecessary retransmissions when the real RTT is much larger Unnecessary retransmissions add to the network load, when the network is already loaded and the RTT is increasing Variance needs to be accounted for

Mean Deviation Why not standard deviation? Mean deviation
Square and square root operations take time Mean deviation Deviation from the mean absolute difference of the new sample M from the running average srtt)

Jacobson’s Algorithm (The Standard Method, 1988)
srtt ← (1 - g)(srtt) + (g)M rttvar ← (1 - h)(rttvar) + (h)(|M - srtt|) RTO = srtt + 4(rttvar) gain g is the weight given to a new RTT sample M in the average srtt and is set to 1/8 h is the weight given to a new mean deviation sample for the deviation estimate rttvar and is set to 1/4.

Jacobson’s algorithm (Equivalent Representation)
Used for faster implementation: Err = M – srtt srtt ← srtt + g(Err) rttvar ← rttvar + h(|Err| – rttvar) RTO = srtt + 4(rttvar)

Considering Clock Granularity
TCP maintains a clock that advances with the system clock Traditionally 500 ms granularity Recent implementations use smaller values (e.g., 1ms in Linux) Clock Granularity is used to refine RTO computation RTO = max(srtt + max(G, 4(rttvar)), ms) G is clock granularity 1000 ms is lower bound

Setting Initial Values upon availability of first sample
Upon receiving the first RTT sample, the variables are initialized as follows: srtt ← M rttvar ← M/2

Retransmission Ambiguity
Upon receiving ACK for retx packet, should we use the first copy or the second copy for calculating time elapsed? If Timestamp field is used, no problem

Karn’s Algorithm Karn’s algorithm (1st part): ignore RTT sample for retransmitted packet (when Timestamp option not is use) But if we just ignore such RTT samples, it is not sufficient because the TCP connection is not reacting to the poor condition of the TCP connection Karn’s algorithm (2nd part) Double the backoff factor (starts at 1) upon retx Actual Timeout = Backoff factor * RTO Reset backoff factor to 1 upon receiving ACK for retx packet

RTT Samples using Timestamps
TSOPT: TCP Timestamp option A 32 bit number in the TCP segment which is returned back in a corresponding ACK Fields in TSOPT TSV: Timestamp Value field TSER:Timestamp Echo Reply field. Used in ACKs.

RTT Samples using Timestamps Challenges
Not all packets are ACKed (Delayed ACKs) Every other packet is typically ACKed (Chapter 15) Lack of correspondence between ACK and packets (dut to the cumulative ACK mechanism) in the following scenarios Lost Reordered Successfully retransmitted

For delayed or erratic ACKs (Solution)
RTT sample is from the oldest packet It is the real time the sender should wait for ACK It may be different from the actual network RTT

Modern systems address the challenges using the following solution
TSV: Timestamp Value field. TSER:Timestamp Echo Reply field. Used in ACKs. TSOPT: TCP Timestamp option The sending TCP includes a 32-bit timestamp value in the TSV (Time Stamp Value) portion of the TSOPT in each TCP segment it sends. A receiving TCP keeps track of the received TSV value to send in the next ACK it generates (in a variable typically named TsRecent) and the ACK number in the last ACK that it sent (in a variable named LastACK).

Using Timestamps (contd.)
When a new segment arrives, if it contains the sequence number matching the value in LastACK (i.e., it is the next expected segment), the segment’s TSV is saved in TsRecent. 4. Whenever the receiver sends an ACK, a TSOPT is included such that the timestamp value contained in TsRecent is placed in the TSER (Time Stamp Echo Reply) part of the TSOPT in the ACK. 5. A sender receiving an ACK that advances its window subtracts the TSER from its current TCP clock and uses the difference as a sample value to update its RTT estimators.

An Example sender: Linux, receiver: FreeBSD
Last ACK number sent Updated when data matching LastACK is received An Example sender: Linux, receiver: FreeBSD rttvar value is constrained to be at least 50 ms, and the RTO has a lower bound of 200ms. Not timed (as neither in response to DATA, SYN or FIN) Possibly an error TSER based on first of the unacked packets (considers real delay expected in getting a response) Initial Calculations: srtt = 16ms mdev = (16/2)ms = 8ms mdev_max = max(mdev, 50) = max(8, 50) = 50ms rttvar = mdev_max RTO = srtt + 4(rttvar) = (50) = 216ms

Linux introduced mdev, mdev_max
At each ACK, mdev is calculated and if this mdev is higher than the current highest one (mdev_max) then it is stored into mdev_max field When RTT time units passes, mdev_max is used to update rttvar.

What if RTT suddenly decreases
Rttvar (mdev in Linux) increases Srtt goes down Ideally, RTO should go down But, RTO increases as it depends more on rttvar

What if RTT suddenly decreases The Linux Method
Give lower weight to current sample if very low value of the current RTT sample if (m < (srtt – mdev)) mdev = (31/32) * mdev + (1/32) * |srtt - m| else mdev = (3/4) * mdev + (1/4) * |srtt - m| (RTT sample is too low)

Simulations The Linux and standard RTO assignment and RTT estimation algorithms applied to synthetic (pseudorandom) sample points. The first 100 points are drawn from an N(200, 50) distribution, and the second 100 are drawn from an N(50, 50) distribution with negative values turned positive. Linux avoids the increase in RTO when the mean drops after sample 100. With Linux, the minimum RTO is effectively set to 200ms, so after sample 120, the standard method is tighter. Linux avoids setting the RTO too low in all cases for this example.

Behavior for Out-of-order segment
TSER value is based on most recent packet to advance the window May not be the highest TSV received Increases the RTT estimate as an older packet’s TSER is used when sending ACK for out-of-order packet This is designed to allow sender to deal with reordering of packets

Example Sender does not update variables
(srtt, rttvar etc.) since this ACK does not advance the window When segments are reordered, the returned timestamp is that of the last segment to advance the receiver’s window (not the largest timestamp to arrive at the receiver). This biases the sender’s RTO toward overestimating the RTT during periods of packet reordering and reduces its aggressiveness.

Successful Retransmission filling a hole in receiver’s buffer
TSER value will correspond to this retransmitted packet rtt sample value will be based on the most recent arriving retransmitted packet It is the correct rtt sample It will likely bring down the srtt if out-of-order segments caused it to increase

Upon a Timeout (Karn’s algorithm “second part”)
RTO = γRTO Default: γ has the value 1 On subsequent retransmissions, γ is doubled: 2, 4, 8, and so forth There is typically a maximum backoff factor that γ is not allowed to exceed (default: 120s in Linux) Once an acceptable ACK is received, γ is reset to 1.

Segment 1401 is forcibly dropped twice
Segment 1401 is forcibly dropped twice. This results in a timer-based retransmission at the sender. The srtt, rttvar, and RTO values are updated only by a returning ACK that advances the sender’s window. ACKs with asterisks (*) include SACK information. Recompute on New ACK Update upon Data matching with LastACK

Fast Retransmit DUP ACKs are not delayed
TCP waits for a small number of duplicate ACKs (called the duplicate ACK threshold or dupthresh) to be received before concluding that a packet has been lost and initiating a fast retransmit Some implementations (including Linux) measure the level of reordering to figure out the value of dupthresh Traditionally 3 has been used Window size and ssthresh are both set to half of the current window size and the connection enters in the Fast Recovery mode

Partial ACK and Fast Recovery
Recovery point: The highest seq num sent before initiating a retransmission DUPACKs during recovery (also called fast recovery phase) are used to inflate the right side of the window by 1 MSS per DUPACK Why? Effectively, the window is set to half the window size + 3MSS Each subsequent ACK (beyond 3 DUPACKs) increases window by 1 MSS Partial ACK: ACK <= Recovery point, recvd during the fast recovery phase

Partial ACK and Fast Recovery
When a partial ACK is received the following packet is immediately retransmitted A new packet may also be released during fast recovery if allowed by the current window After exiting from fast recovery (receipt of an ACK that is not a partial ACK) inflation is removed, i.e., cwnd is set to half the window size when loss was observed

An Example with Fast Retx and Partial ACKs
TCP sequence numbers are on the y-axis and time is on the x-axis. Outgoing segments are displayed as darker line segments, and the incoming ACK numbers appear as lighter gray segments. Fast retransmit is triggered by the arrival of the third duplicate ACK at time 0.993s. This connection does not use SACK, so it is able to repair at most only one hole per RTT. Additional duplicate ACKs arriving after the third cause the sender to send new segments (not retransmissions). A “partial ACK” arriving at time 1.32 causes the next retransmission.

Window update is not counted in DUPACKs The TCP exchange showing relative sequence numbers. Packets 50 and 66 are retransmissions. Packet 50 is retransmitted because of the fast retransmit algorithm, which triggers as a result of three duplicate ACKs. No retransmission timer is required, so recovery is relatively quick.

The retransmission at time 0
The retransmission at time is triggered by the fast retransmit algorithm after receiving duplicate ACKs at times 0.890, 0.926, and The ACK at time is not considered a duplicate ACK because it contains a window update.

SACK SACK acknowledges n blocks of data
For each block, the starting and the ending (+1) 32-bit seq number is specified Option requires 8n+2 bytes With 40 bytes in TCP’s optional header, 4 blocks can be specified But, timestamp option (TSOPT) is usually used in TCP options which takes 12 bytes So, only 3 blocks are typically used in SACK SACK capability is indicated by the SACK permitted option during handshaking (SYN, SYN+ACK)

SACK Receiver Behavior
Acknowledge multiple blocks of received data at the receiver First block Contains most recently received segment Other Two blocks Repeat of most recently sent SACK blocks (sent as first SACK blocks in previous segments) that are not subsets of another block about to be placed in the SACK option in this packet Provides redundancy in case SACKs are lost

SACK Sender Behavior Sender does not transmit SACKed data
Order of transmissions First fill missing holes Then, send new data

TCP is modified to fast retx on 1 DUPACK
ACK before 1st DUPACK is a window update (so not counted towards fast retx) When using SACK (not window update), a hole is immediately filled up if allowed by cwnd In this example, two holes are observed, but the second hole is outside the cwnd so not retransmitted in response to the SACK Fast retransmit is triggered by the arrival of the first duplicate ACK containing SACK information. The arrival of the next ACK allows the sender to learn of the second missing segment and retransmit it within the same RTT.

The SACK-Permitted option is exchanged in SYN segments to indicate the capability to generate and process SACK information. Most modern TCPs support the MSS, Timestamps, Window Scale, and SACK-Permitted options during connection establishment.

The first ACK containing SACK information indicates an out-of-order block with sequence number range to

Spurious Retransmissions
Reasons for Spurious Retransmissions Spurious Timeouts (early timeouts) Packet Reordering Packet Duplication Lost ACKs

This illustration uses pkt number and ACK numbers for simplicity
This illustration uses pkt number and ACK numbers for simplicity. AX indicates that PX has been received. Window size becomes 1 MSS. Connection enters in slow start phase. Spurious timouts are bad. Detection and response algorithms have been proposed to address it Go-Back-N like behavior as each ACK results in 2 packets being transmitted (one due to window advancement and another due to window increase) A delay spike occurs after the transmission of packet 8, causing a spurious retransmission timeout and retransmission of packet 5. After retransmission, an ACK for the first copy of 5 arrives. The retransmission for 5 creates a duplicate packet at the receiver, followed by an undesirable “go-back-N” behavior whereby packets 6, 7, and 8 are retransmitted even though they are already present at the receiver.

Detection and Response: Spurious Transmissions
Duplicate SACK Eifel Detection Algorithm Forward-RTO Recovery (F-RTO) Response Eifel Response Algorithm

Duplicate SACK (DSACK)
First SACK block corresponds to the sequence number of duplicate segment that has arrived Usually, first SACK block contains a new SACK and the other two contain repetitions of old first SACK blocks for redundancy DSACK is sent even if the block corresponds to sequence numbers below the ACK number Compatible with conventional SACK If non-DSACK TCP shares a connection with DSACK TCP, they will interoperate, but without any of the benefits of DSACK Shortcomings DSACK is triggered very late when loss recovery has terminated DSACK is not repeated. So not tolerant to losses

The Eifel Detection Algorithm
Uses TCP Timestamp option Upon retransmission, store timestamp value Upon receiving ACK, check if TSER is for the first copy If so, then it means ACK is for the first copy and the retx was spurious Supposed to detect earlier than DSACK as sender can infer upon receipt of the first ACK for the retx packet rather than a SACK (which will typically arrive later)

Eifel Detection possible
DSACK A delay spike occurs after the transmission of packet 8, causing a spurious retransmission timeout and retransmission of packet 5. After retransmission, an ACK for the first copy of 5 arrives. The retransmission for 5 creates a duplicate packet at the receiver, followed by an undesirable “go-back-N” behavior whereby packets 6, 7, and 8 are retransmitted even though they are already present at the receiver.

Eifel Detection Algorithm + DSACK
(Please skip this. The description in the book is too short and raises more questions than it answers)

Forward-RTO Recovery (F-RTO) for spurious retx caused by expiration of retx timer
Works without using TCP TSOPT (older versions) After timeout, retransmit packet Ordinarily, TCP continues sending additional adjacent packets in order as additional ACKs arrive In F-RTO, upon following ACK, send new (data that has not been sent before) data if that ACK or the next one are DUPACKs, retx was ok else retx was spurious

The Eifel Response Algorithm: can be used with any detection algorithm
Take a snapshot upon timeout: Once an acceptable ACK is received, for a segment transmitted after the retransmission timer expires: Motivation: new RTT is likely very different. History is not so useful. Lets try to increase the estimates. srtt_prev = srtt + 2(G) rttvar_prev = rttvar srtt ← max(srtt_prev, m) rttvar ← max(rttvar_prev, m/2) RTO = srtt + max(G, 4(rttvar))

Mild Reordering Mild reordering (left) is overcome by ignoring a small number of duplicate ACKs. When reordering is more severe (right), as in this case where packet 4 is three places out of sequence, a spurious fast retransmit can be triggered.

Duplication Packet duplication in the network has caused a spurious fast retransmission due to the presence of duplicate ACKs.

Destination Metrics New TCP implementations maintain a cache of values of variables used in the last connection (or statistics of values in last few connections) to all destinations Linux% ip route show cache from tos 0x10 via dev eth0 cache mtu 1500 rtt 29ms rttvar 29ms cwnd 2 advmss 1460 hoplimit 64

Repacketization: Example
Linux% telnet hello there (first line gets sent OK) (then we disconnect the Ethernet cable) line number (this line gets retransmitted) and (reconnect Ethernet) ^] telnet> quit

Repacketization Also contains block already SACKed and FIN

TCP Timeout and Retransmission

Similar presentations

Presentation on theme: "TCP Timeout and Retransmission"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

TCP Timeout and Retransmission

Similar presentations

Presentation on theme: "TCP Timeout and Retransmission"— Presentation transcript:

Similar presentations

About project

Feedback