TCP Traffic Control: Performance and QoS Implications
Introduction Performance implications of TCP Flow and Error Control Performance implications of TCP Congestion Control Performance of TCP/IP over ATM Chapter 12: TCP Traffic Control
TCP Flow and Error Control Uses a form of sliding window (~GBN) TCP’s flow control is known as a credit allocation scheme: Each transmitted octet is considered to have a sequence number Each acknowledgement can grant permission to send a specific amount of additional data TCP Window Size <= Min (CongWin, RcvWin) Chapter 12: TCP Traffic Control
TCP Header Fields for Flow Control Sequence number (SN) of first octet in data segment Acknowledgement number (AN) of next expected octet Window size (W) in octets Acknowledgement contains AN = i, W = j: Octets through SN = i - 1 acknowledged Permission is granted to send W = j more octets, i.e., octets i through i + j - 1 Chapter 12: TCP Traffic Control
TCP Credit Allocation Mechanism Note: trailing edge advances each time A sends data, leading edge advances only when B grants additional credit by acknowledging receipt. Chapter 12: TCP Traffic Control
Credit Allocation is Flexible Suppose last message B issued was AN = i, W = j … To increase credit to k (k > j) when no new data, B issues AN = i, W = k To acknowledge segment containing m octets (m < j) without granting additional credit, B issues AN = i + m, W = j - m Chapter 12: TCP Traffic Control
Flow Control Perspectives Chapter 12: TCP Traffic Control
Credit Policy Receiver needs a policy for how much credit to give sender Conservative approach: grant credit up to limit of available buffer space May limit throughput in long-delay situations Optimistic approach: grant credit based on expectation of freeing space before data arrives Chapter 12: TCP Traffic Control
Effect of TCP Window Size on Performance W = TCP window size (octets) R = Data rate (bps) at TCP source D = Propagation delay (seconds) between source and destination After TCP source begins transmitting, it takes D seconds for first bits to arrive, and D seconds for acknowledgement to return (2RD = RTT) TCP source could transmit at most 2RD bits, or RD/4 octets Chapter 12: TCP Traffic Control
Maximum Normalized Throughput S 1 W RD / 4 4W W RD / 4 RD S = Where: W = window size (octets) R = data rate (bps) at TCP source D = dprop (seconds) between TCP source and destination, 2D = RTT Note: RD (bits) known as “rate-delay product” Chapter 12: TCP Traffic Control
Window Scale Parameter (TCP header option: RFC 7323) W = RD/4 W = 220 - 1 W = 216 - 1 “LONG, FAT NETWORKS” RD Chapter 12: TCP Traffic Control
Complicating Factors Multiple TCP connections are multiplexed over same network interface, reducing data rate, R, per connection (S) However: For multi-hop connections, D is the sum of delays across each network plus delays at each router, increasing D (S) If source data rate R at source exceeds data rate on one of the hops, that hop will be a bottleneck (S) Lost segments are retransmitted, reducing throughput. Impact depends on retransmission policy (S) Chapter 12: TCP Traffic Control
Retransmission Strategy TCP relies exclusively on positive acknowledgements and retransmission on acknowledgement timeout There is no explicit negative acknowledgement (NAK-less) Retransmission required when: Segment arrives damaged, as indicated by checksum error, causing receiver to discard segment Segment fails to arrive (implicit detection scheme) Chapter 12: TCP Traffic Control
TCP Timers A timer is associated with each segment as it is sent If a timer expires before the segment is acknowledged, sender must retransmit Key Design Issue: value of retransmission timer… Too small: many unnecessary retransmissions, wasting network bandwidth Too large: delay in handling lost segment Chapter 12: TCP Traffic Control
Two Strategies Timer should be longer than round-trip delay (send segment, receive ack) Round Trip Delay is variable Strategies: Fixed timer Adaptive Chapter 12: TCP Traffic Control
Problems with Adaptive Scheme Peer TCP entity may accumulate acknowledgements and not acknowledge immediately For retransmitted segments, sender can’t tell whether acknowledgement is response to original transmission or retransmission Network conditions may change suddenly These tend to introduce artificialities in timer measurements at the TCP source (more later) Chapter 12: TCP Traffic Control
Adaptive Retransmission Timer Average Round-Trip Time (ARTT) K + 1 ARTT(K + 1) = 1 ∑ RTT(i) K + 1 i = 1 = K × ARTT(K) + RTT(K + 1) K + 1 K + 1 Chapter 12: TCP Traffic Control
RFC 793 Exponential Averaging Smoothed Round-Trip Time (SRTT) SRTT(K + 1) = α × SRTT(K) + (1 – α) × RTT(K + 1) The older the observation, the less it is counted in the average. Chapter 12: TCP Traffic Control
Exponential Smoothing Coefficients = 0.5 = 0.875 Chapter 12: TCP Traffic Control
Exponential Averaging Decreasing function Increasing function Chapter 12: TCP Traffic Control
RFC 793 Retransmission Timeout RTO(K + 1) = Min(UB, Max(LB, β × SRTT(K + 1))) UB, LB: prechosen fixed upper and lower bounds Example values for α, β: 0.8 < α < 0.9 1.3 < β < 2.0 Chapter 12: TCP Traffic Control
TCP Implementation Policy Options (per RFC 793/1122) TCP’s Performance Dilemma: The TCP standard provides a very precise statement of the protocol to be used However, protocol stack implementers have certain leeway with regard to their implementation Mismatched implementations, while interoperable, may suffer reduced performance. Chapter 12: TCP Traffic Control
TCP Implementation Policy Options (per RFC 793/1122) Send: free to transmit when convenient Deliver: free to deliver to application when convenient Accept: how data are accepted In-order (out of order data is discarded) In-window (accept and buffer any data in window) Retransmit: how to handle queued, but not yet acknowledged, data First-only (timer per queue, retransmit segment FIFO) Batch (timer per queue, retransmit whole queue)… GBN Individual (timer per segment, retransmit by segment) Acknowledge: Immediate (send immediate ack for each good segment) Cumulative (timed wait, and piggyback cumulative ack) Performance implications? Chapter 12: TCP Traffic Control
TCP Congestion Control Dynamic routing can alleviate congestion by spreading load more evenly But, only effective for unbalanced loads and brief surges in traffic Congestion can only be effectively controlled by limiting total amount of data entering the network ICMP Source Quench message is crude and not effective RSVP may help, but not widely implemented Chapter 12: TCP Traffic Control
TCP Congestion Control is Difficult IP is connectionless and stateless, with no provision for detecting or controlling congestion RFC 3168 adds ECN to IP, but deployment not ubiquitous Standard TCP only provides end-to-end flow control There exists no cooperative, distributed algorithm to bind together various TCP entities Chapter 12: TCP Traffic Control
TCP Flow and Congestion Control The rate at which a TCP entity can transmit is determined by rate of incoming ACKs to previous segments with new credit (“TCP self-clocking”) Rate of ACK arrival is determined by the round-trip path between source and destination Bottlenecks may occur at the destination or in the network Sender cannot tell which Only the internet bottleneck can be due to congestion Chapter 12: TCP Traffic Control
TCP Segment Pacing: Self-Clocking Congestion Control (bottleneck in the network) IMPLICATIONS?? Flow Control ( bottleneck at the receiver) Chapter 12: TCP Traffic Control
TCP Flow and Congestion Control – Potential Bottlenecks Physical bottlenecks: physical capacity constraints Logical bottlenecks: queuing effects due to load Chapter 12: TCP Traffic Control
TCP Congestion Control Measures Note: TCP Tahoe and TCP Reno from Berkeley Unix TCP implementations Chapter 12: TCP Traffic Control
Retransmission Timer Management (per RFC 6298) Three Techniques to calculate retransmission time out (RTO) value: RTT Variance Estimation Exponential RTO Backoff Karn’s Algorithm Chapter 12: TCP Traffic Control
TCP/IP Performance Issues vis-à-vis QoS Appropriate Timeout Value “Goldilocks dilemma” Window management in the face of congestion “Go as fast as you can, but don’t get caught!” Implicit congestion control “Too little, too late?” TCP/IP over ATM “Ac-cen-tuate the positive… e-lim-inate the negative.” Chapter 12: TCP Traffic Control
RTT Variance Estimation (Jacobson’s Algorithm) 3 sources of high variance in RTT If data rate is relatively low, then transmission delay will be relatively large, with larger variance due to variance in packet size Load may change abruptly due to other sources Peer may not acknowledge segments immediately Chapter 12: TCP Traffic Control
Jacobson’s Algorithm SRTT(K+1) = (1 – g) × SRTT(K) + g × RTT(K+1) SERR(K+1) = RTT(K+1) – SRTT(K) SDEV(K+1) = (1 – h) × SDEV(K) + h ×|SERR(K+1)| RTO(K+1) = SRTT(K+1) + MAX[G, f × SDEV(K+1)] g = 0.125 () h = 0.25 () f = 2 or f = 4 (most implementations use f = 4) G is granularity of local clock Chapter 12: TCP Traffic Control
Jacobson’s RTO Calculations Decreasing function Increasing function Chapter 12: TCP Traffic Control
Two Other Factors Jacobson’s algorithm can significantly improve TCP performance, but: What RTO should we use for retransmitted segments? ANSWER: exponential RTO backoff algorithm Which round-trip samples should we use as input to Jacobson’s algorithm? ANSWER: Karn’s algorithm Chapter 12: TCP Traffic Control
Exponential RTO Backoff Increase RTO each time the same segment is retransmitted – backoff process Multiply RTO by constant: RTO = q × RTO q = 2 is called binary exponential backoff (like Ethernet CSMA/CD) Chapter 12: TCP Traffic Control
Which Round-trip Samples? If an ack is received for a retransmitted segment, there are 2 possibilities: Ack is for first transmission Ack is for second transmission TCP source cannot distinguish between these two cases No valid way to calculate RTT: From first transmission to ack, or From second transmission to ack? Chapter 12: TCP Traffic Control
Karn’s Algorithm Must not use measured RTT for retransmitted segments to update SRTT and SDEV Calculate exponential backoff RTO when a retransmission occurs Use backoff RTO for segments until an ack arrives for a segment that has not been retransmitted Then use Jacobson’s algorithm to calculate RTO Chapter 12: TCP Traffic Control
Window Management Slow start Dynamic window sizing on congestion Fast retransmit Fast recovery ECN Other Mechanisms Chapter 12: TCP Traffic Control
Slow Start awnd = MIN[ credit, cwnd] where awnd = allowed window in segments cwnd = congestion window in segments (assumes MSS bytes per segment) credit = amount of unused credit granted in most recent ack (rcvwindow) cwnd = 1 for a new connection and increased by 1 (except during slow start) for each ack received, up to a maximum Chapter 12: TCP Traffic Control
Effect of TCP Slow Start Chapter 12: TCP Traffic Control
Dynamic Window Sizing on Congestion A lost segment indicates congestion Prudent (conservative) to reset cwnd to 1 and begin slow start process May not be conservative enough: “easy to drive a network into saturation but hard for the net to recover” (Jacobson) Instead, use slow start with linear growth in cwnd after reaching a threshold value Chapter 12: TCP Traffic Control
Slow Start and Congestion Avoidance Chapter 12: TCP Traffic Control
Illustration of Slow Start and Congestion Avoidance Chapter 12: TCP Traffic Control
Fast Retransmit (TCP Tahoe) RTO is generally noticeably longer than actual RTT If a segment is lost, TCP may be slow to retransmit TCP rule: if a segment is received out of order, an ack must be issued immediately for the last in-order segment Tahoe/Reno Fast Retransmit rule: if 4 acks received for same segment (I.e. 3 duplicate acks), highly likely it was lost, so retransmit immediately, rather than waiting for timeout Chapter 12: TCP Traffic Control
Fast Retransmit Triple duplicate ACK Chapter 12: TCP Traffic Control
Fast Recovery (TCP Reno) When TCP retransmits a segment using Fast Retransmit, a segment was assumed lost Congestion avoidance measures are appropriate at this point E.g., slow-start/congestion avoidance procedure This may be unnecessarily conservative since multiple ACKs indicate segments are actually getting through Fast Recovery: retransmit lost segment, cut threshold in half, set congestion window to threshold +3, proceed with linear increase of cwnd This avoids initial slow-start Chapter 12: TCP Traffic Control
TCP Congestion Window: Slow Start, Fast Retransmit, Fast Recovery (Ex Fast Retransmit with Fast Recovery Congestion Avoidance Slow Start Chapter 12: TCP Traffic Control
Explicit Congestion Notification in TCP/IP – RFC 3168 ECN capable router sets congestion bits in the IP header of packets to indicate congestion TCP receiver sets bits in TCP ACK header to return congestion indication to peer (sender) TCP senders respond to congestion indication as if a packet loss had occurred DSCP 0 1 2 3 4 5 6 7 ECN IPv4 TOS field, IPv6 Traffic Class field 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 U A P R S F R C S S Y I G K H T N N C E W C R E HLEN RSVD TCP Header flag field Chapter 12: TCP Traffic Control
Explicit Congestion Notification in TCP/IP – RFC 3168 DSCP 0 1 2 3 4 5 6 7 ECN 0 0 Not ECN Capable Transport 0 1 ECT (1) 1 0 ECT (0) 1 1 Congestion Experienced 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 U A P R S F R C S S Y I G K H T N N C E W C R E HLEN RSVD 0 0 Set-up: not ECN Capable response 0 1 Receiver ECN-Echo: CE packet received 1 0 Sender Congestion Window Reduced acknowledgement 1 1 Set-up: with SYN to indicate ECN capability Chapter 12: TCP Traffic Control
TCP/IP ECN Protocol TCP TCP Sender Receiver Hosts negotiate ECN capability during TCP connection setup SYN + ECE + CWR SYN ACK + ECE Sender TCP reduces window size and marks next packet with CWR Routers mark ECN-capable IP packets if Congestion Experienced. Sender IP marks data packets as ECN Capable Transport Routers mark ECN-capable IP packets if Congestion Experienced. Receiver TCP Acks packets with ECN-Echo set if CE bits set in IP header. Chapter 12: TCP Traffic Control
TCP/IP ECN – Some Other Considerations Sender sets CWR only on first data packet after ECE CWR also set for window reduction for any other reason ECT must not be set in retransmitted packets Receiver continues to send ECE in all ACKs until CWR received Delayed ACKs: if any data packet has CE set, send ECE Fragmentation: if any fragment has CE set, send ECE Chapter 12: TCP Traffic Control
Some Other Mechanisms for TCP Congestion Control Limited Transmit – for small congestion windows; triggers fast retransmit for < 3 dup ACKs Appropriate Byte Counting (ABC) – congestion window modified based number of bytes acknowledged by each ACK, rather than by the number of ACKs that arrive Selective Acknowledgement – defines use of TCP selective acknowledgement option Chapter 12: TCP Traffic Control
Performance of TCP over ATM How best to manage TCP’s segment size, window management and congestion control mechanisms… …at the same time as ATM’s quality of service and traffic control policies TCP may operate end-to-end over one ATM network, or there may be multiple ATM LANs or WANs with non-ATM networks Chapter 12: TCP Traffic Control
TCP/IP over AAL5/ATM TCP IP AAL5 Convergence S/L ATM SAR S/L CPCS Trailer: CPCS-UU Indication Common Part Indicator PDU Payload Length Payload CRC TCP IP AAL5 ATM Convergence S/L SAR S/L Chapter 12: TCP Traffic Control
Performance of TCP over UBR Buffer capacity at ATM switches is a critical parameter in assessing TCP throughput performance (why?) Insufficient buffer capacity results in lost TCP segments and retransmissions No segments lost if each ATM switch has buffer capacity equal to or greater than the sum of the rcvwindows for all TCP connections through the switch (practical?) Chapter 12: TCP Traffic Control
Effect of Switch Buffer Size (example - Romanow & Floyd) Data rate of 141 Mbps End-to-end propagation delay of 6 μs IP packet sizes of 512 – 9180 octets TCP window sizes from 8 Kbytes to 64 Kbytes ATM switch buffer size per port from 256 – 8000 cells One-to-one mapping of TCP connections to ATM virtual circuits TCP sources have infinite supply of data ready Chapter 12: TCP Traffic Control
Performance of TCP over UBR Chapter 12: TCP Traffic Control
Observations If a single cell is dropped, other cells in the same IP datagram are unusable, yet ATM network forwards these useless cells to destination Smaller buffer increases probability of dropped cells Larger segment size increases number of useless cells transmitted if a single cell dropped Chapter 12: TCP Traffic Control
Approach: Partial Packet and Early Packet Discard Reduce the transmission of useless cells Work on a per-virtual-channel basis Partial Packet Discard If a cell is dropped, then drop all subsequent cells in that segment (i.e., up to and including the first cell with SDU type bit set to one) Early Packet Discard When a switch buffer reaches a threshold level, preemptively discard all cells in a segment Chapter 12: TCP Traffic Control
Performance of TCP over UBR Chapter 12: TCP Traffic Control
ATM Switch Buffer Layout – Selective Drop and Fair Buffer Allocation Chapter 12: TCP Traffic Control
EPD Fairness - Selective Drop Ideally, N/V cells buffered for each of the V virtual channels Weight ratio, W(i) = N(i) = N(i) × V N/V N Selective Drop: If N > R and W(i) > Z then drop next new packet on VC(i) Z is a parameter to be chosen (studies show optimal Z slightly < 1) Chapter 12: TCP Traffic Control
Fair Buffer Allocation More aggressive dropping of packets as congestion increases Drop new packet when: N > R and W(i) > Z × B – R N - R Note that the larger the portion of the “safety zone” (B-R) that is occupied, the smaller the number with which W(i) is compared. R Chapter 12: TCP Traffic Control
TCP over ABR Good performance of TCP over UBR can be achieved with minor adjustments to switch mechanisms (i.e., PPD/EPD) This reduces the incentive to use the more complex and more expensive ABR service Performance and fairness of ABR is quite sensitive to some ABR parameter settings Overall, ABR does not provide significant performance over simpler and less expensive UBR-EPD or UBR-EPD-FBA Chapter 12: TCP Traffic Control