Network Transport Layer: TCP Reliability; Transport Congestion Control; TCP/Reno CC Y. Richard Yang http://zoo.cs.yale.edu/classes/cs433/ 11/8/2018.

Slides:



Advertisements
Similar presentations
Introduction 1 Lecture 13 Transport Layer (Transmission Control Protocol) slides are modified from J. Kurose & K. Ross University of Nevada – Reno Computer.
Advertisements

Transport Layer3-1 TCP. Transport Layer3-2 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 r full duplex data: m bi-directional data flow in same connection.
3-1 TCP Protocol r point-to-point: m one sender, one receiver r reliable, in-order byte steam: m no “message boundaries” r pipelined: m TCP congestion.
Data Communications and Computer Networks Chapter 3 CS 3830 Lecture 16 Omar Meqdadi Department of Computer Science and Software Engineering University.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
1 Transport Layer Lecture 9 Imran Ahmed University of Management & Technology.
Transport Layer3-1 Summary of Reliable Data Transfer Checksums help us detect errors ACKs and NAKs help us deal with errors If ACK/NAK has errors sender.
1 TCP Congestion Control. 2 TCP Segment Structure source port # dest port # 32 bits application data (variable length) sequence number acknowledgement.
Week 9 TCP9-1 Week 9 TCP 3 outline r 3.5 Connection-oriented transport: TCP m segment structure m reliable data transfer m flow control m connection management.
Transport Layer1 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 r reliable, in-order byte steam: m no “message boundaries” r pipelined: m TCP congestion.
3: Transport Layer3b-1 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 r full duplex data: m bi-directional data flow in same connection m MSS: maximum.
Chapter 3 outline 3.1 transport-layer services
10/7/ /9/2003 TCP and Congestion Control October 7-9, 2003.
Transport Layer 3-1 Transport Layer r To learn about transport layer protocols in the Internet: m TCP: connection-oriented protocol m Reliability protocol.
Transport Layer Transport Layer: TCP. Transport Layer 3-2 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 r full duplex data: m bi-directional.
Transport Layer 3-1 Transport Layer r To learn about transport layer protocols in the Internet: m TCP: connection-oriented protocol m Reliability protocol.
1 Announcement r Project 2 out m Much harder than project 1, start early! r Homework 2 due next Tuesday.
Announcement Project 2 out –Much harder than project 1, start early! Homework 2 due next Tu.
Chapter 3 Transport Layer
The Future r Let’s look at the homework r The next test is coming the 19 th (just before turkey day!) r Monday will finish TCP canned slides r Wednesday.
Transport Layer3-1 Data Communication and Networks Lecture 7 Transport Protocols: TCP October 21, 2004.
Announcement Homework 1 graded Homework 2 out –Due in a week, 1/30 Project 2 problems –Minet can only compile w/ old version of gcc (2.96). –Only tlab-login.
Lecture 17 Congestion Control; AIMD; TCP Reno 10/31/2013
Transport Layer 3-1 Chapter 3 Outline r 3.5 Connection-oriented transport: TCP m segment structure m reliable data transfer m flow control m connection.
Transport Layer 3-1 Chapter 3b outline 3.1 connection-oriented transport: TCP  segment structure  reliable data transfer  flow control  connection.
Transport Layer3-1 TCP sender (simplified) NextSeqNum = InitialSeqNum SendBase = InitialSeqNum loop (forever) { switch(event) event: data received from.
Network LayerII-1 RSC Part III: Transport Layer 3. TCP Redes y Servicios de Comunicaciones Universidad Carlos III de Madrid These slides are, mainly, part.
Transport Layer1 Reliable Transfer Ram Dantu (compiled from various text books)
3: Transport Layer3b-1 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 r full duplex data: m bi-directional data flow in same connection m MSS: maximum.
2: Transport Layer 21 Transport Layer 2. 2: Transport Layer 22 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 r full duplex data: m bi-directional data.
TCOM 509 – Internet Protocols (TCP/IP) Lecture 04_b Transport Protocols - TCP Instructor: Dr. Li-Chuan Chen Date: 09/22/2003 Based in part upon slides.
1 End-to-End Protocols (UDP, TCP, Connection Management)
1 CS 4396 Computer Networks Lab TCP – Part II. 2 Flow Control Congestion Control Retransmission Timeout TCP:
September 26 th, 2013 CS1652 The slides are adapted from the publisher’s material All material copyright J.F Kurose and K.W. Ross, All Rights.
TCP. TCP ACK generation [RFC 1122, RFC 2581] Event at Receiver Arrival of in-order segment with expected seq #. All data up to expected seq # already.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Transport Layer3-1 Chapter 3 Transport Layer Computer Networking: A Top Down Approach 5 th edition. Jim Kurose, Keith Ross Addison-Wesley, April 2009.
Transport Layer3-1 Transport Layer If you are going through Hell Keep going.
CIS679: TCP and Multimedia r Review of last lecture r TCP and Multimedia.
Transport Layer1 Goals: r understand principles behind transport layer services and protocols: m UDP m TCP Overview: r transport layer services r multiplexing/demultiplexing.
CSEN 404 Transport Layer II Amr El Mougy Lamia AlBadrawy.
DMET 602: Networks and Media Lab Amr El Mougy Yasmeen EssamAlaa Tarek.
09-Transport Layer: TCP Transport Layer.
Chapter 3 outline 3.1 Transport-layer services
Chapter 3 Transport Layer
COMP 431 Internet Services & Protocols
Chapter 3 outline 3.1 Transport-layer services
DMET 602: Networks and Media Lab
Chapter 3 outline 3.1 transport-layer services
Chapter 5 TCP Sequence Numbers & TCP Transmission Control
CS 1652 Jack Lange University of Pittsburgh
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 full duplex data:
Introduction to Networks
Introduction to Congestion Control
CS1652 TCP Jack Lange University of Pittsburgh
Network Transport Layer: Congestion Control
Review: UDP demultiplexing TCP demultiplexing Multiplexing?
Transport Layer Goals: Overview:
Chapter 5 TCP Transmission Control
Lecture 19 – TCP Performance
Chapter 3 outline 3.1 Transport-layer services
CS4470 Computer Networking Protocols
Chapter 3 outline 3.1 Transport-layer services
Chapter 3 outline 3.1 Transport-layer services
TCP 3: Transport Layer.
Transportation Layer.
Transport Layer: Congestion Control
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 point-to-point:
Chapter 3 Transport Layer
Network Transport Layer: TCP/Reno Analysis, TCP Cubic, TCP/Vegas
Presentation transcript:

Network Transport Layer: TCP Reliability; Transport Congestion Control; TCP/Reno CC Y. Richard Yang http://zoo.cs.yale.edu/classes/cs433/ 11/8/2018

Admin. Programming assignment 4 Part 1 discussion instructor slots: Nov. 9: 3:30-5:00 pm Nov. 11: 3:00-4:30 pm Nov. 12: 2:00-3:00 pm Nov. 13: 2:30-3:30 pm (regular office hour) Project or Exam 2

Recap: Reliable Transport Basic Structure Basic structure: sliding window protocols Benefit: when window size is bandwidth-delay product, it achieves non-stop transmission until ack comes back, fully utilizing link An instance of generic computer science technique: pipelining.

Recap: Sliding Window Realizations Two major types: Go-Back-n (GBN) and Selective-Repeat (SR) GBN: sender SR: sender/receiver

Recap: Sliding Window Realizations GBN and SR comparison Go-Back-n (GBN): sequential receiving at receiver Selective-Repeat (SR): W parallel receiving at receiver

Recap: Transport Connection Management Host A Host B Host A Host B close SYN(seq=x) FIN ACK close ACK(seq=x), SYN(seq=y) FIN ACK ACK(seq=y) timed wait closed DATA(seq=x+1) Generic technique: challenge to verify freshness. Generic technique: Timeout to realize ”solve” infeasible problem.

Outline Admin and recap Transport reliability sliding window protocols connection management TCP reliability

TCP Reliable Data Transfer A sliding window protocol a combination of go-back-n and selective repeat: send & receive buffers (flow control helps manage buffer size) cumulative acks uses a single retransmission timer do not retransmit all packets upon timeout Connection managed setup (exchange of control msgs) init’s sender, receiver state before data exchange close Full duplex data: bi-directional data flow in same connection

TCP FSM %netstat -t -a SYN SYN/ACK ACK CLOSED SYN SENT ESTABLSIHED CLOSED LISTEN SYN RCVD ESTABLSIHED ESTABLSIHED ESTABLSIHED FIN WAIT 1 FIN CLOSE WAIT ACK LAST ACK FIN FIN WAIT 2 TIME WAIT ACK Example see: http://lxr.linux.no/linux+v4.15.14/net/ipv4/tcp_input.c#L4003

Example Transition (Client Connect + Client Close): Client SYN SYN/ACK ACK CLOSED SYN SENT ESTABLSIHED CLOSED TCP lifecycle: init SYN/FIN SYN RCVD FIN ACK FIN WAIT 1 ESTABLSIHED CLOSE WAIT LAST ACK FIN WAIT 2 TIME WAIT http://dsd.lbl.gov/TCP-tuning/ip-sysctl-2.6.txt

Example Transition (Client Connect + Client Close): Server SYN SYN/ACK ACK CLOSED SYN SENT ESTABLSIHED CLOSED TCP lifecycle: wait for SYN/FIN SYN RCVD FIN ACK FIN WAIT 1 ESTABLSIHED CLOSE WAIT LAST ACK FIN WAIT 2 TIME WAIT

Exercise: Wireshark Capture TCP

TCP Reliability Basic: TCP Segment Structure source port # dest port # 32 bits application data (variable length) sequence number acknowledgement number rcvr window size ptr urgent data checksum F S R P A U head len not used Options (variable length) URG: urgent data (generally not used) PSH: push data now ACK: ACK # valid counting by bytes of data (not segments!) Also in UDP flow control RST, SYN, FIN: connection management (reset, setup teardown commands)

simple telnet scenario TCP Reliability Basic: TCP Seq. #’s and ACKs Seq. #’s: byte stream “number” of first byte in segment’s data SYNC counts as data ACKs: seq # of next byte expected from other side cumulative ACK Host A Host B User types ‘C’ Seq=42, ACK=79, data = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=79, ACK=43, data = ‘C’ host ACKs receipt of echoed ‘C’ Seq=42, ACK=80 time simple telnet scenario

TCP Reliability Optimizations Multiple efforts to tune/optimize basic sliding window protocol, e.g., the “small-packet problem”: sender sends a lot of small packets (e.g., telnet one char at a time) Nagle’s algorithm: do not send data if there is small amount of data in send buffer and there is an unack’d segment [not required in Assignment 4] the ”ack inefficiency” problem: receiver sends too many ACKs, no chance of combing ACK with data Delayed ack to reduce # of ACKs/combine ACK with reply [not required in Assignment 4] the loss detection optimization problem Adaptive retransmission timeout (RTO) estimation, Fast retransmit before RTO [Required]

Offline Read: TCP Ack Generation Rules [RFC 1122, RFC 2581] Event at Receiver Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed expected seq #. One other segment has ACK pending Arrival of out-of-order segment higher-than-expect seq. # . Gap detected Arrival of segment that partially or completely fills gap TCP Receiver Action Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK Immediately send single cumulative ACK, ACKing both in-order segments Immediately send duplicate ACK, indicating seq. # of next expected byte Immediate send ACK, provided that segment starts at lower end of gap

TCP Timeout Estimation Why is good timeout value important Why is “too short” bad? premature timeout unnecessary retransmissions; many duplicates Why is “too long” bad? slow reaction to segment loss Ideal timeout and how to estimate?

TCP Retransmission Timeout (RTO) Alg Problem: Ideal timeout = RTT, but RTT is not a fixed value Possibility: using the average of RTT, but this will generate many timeouts due to network variations RTT freq. TCP solution: Timeout = EstRTT + 4 * DevRTT

TCP EstRTT and DevRTT Computation Exponential weighted moving average influence of past sample decreases exponentially fast EstRTT = (1-alpha)*EstRTT + alpha*SampleRTT - SampleRTT: measured time from segment transmission until ACK receipt - typical value: alpha = 0.125 DevRTT = (1-beta)*DevRTT + beta|SampleRTT-EstRTT| (typically, beta = 0.25)

An Example TCP Session

TCP Fast Retransmit Problem: Adding RTT variance into RTO may lead to longer than RTT delay. Detect lost segments via duplicate ACKs sender often sends many segments back-to-back if segment is lost, there will likely be many duplicate ACKs If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: resend segment before timer expires

Triple Duplicate Ack 1 2 3 4 5 6 7 2 3 4 Packets Acknowledgements (waiting seq#) 2 3 4

Fast Retransmit: if (y > SendBase) { … SendBase = y event: ACK received, with ACK field value of y if (y > SendBase) { … SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y a duplicate ACK for already ACKed segment fast retransmit

TCP: reliable data transfer 00 sendbase = initial_sequence number agreed by TWH 01 nextseqnum = initial_sequence number by TWH 02 loop (forever) { 03 switch(event) 04 event: data received from application above 05 if (window allows send) 06 create TCP segment with sequence number nextseqnum 06 if (no timer) start timer 07 pass segment to IP 08 nextseqnum = nextseqnum + length(data) else put packet in buffer 09 event: timer timeout for sendbase 10 retransmit segment 11 compute new timeout interval 12 restart timer 13 event: ACK received, with ACK field value of y 14 if (y > sendbase) { /* cumulative ACK of all data up to y */ 15 cancel the timer for sendbase sendbase = y if (no timer and packet pending) start timer for new sendbase while (there are segments and window allow) sent a segment; 18 } 19 else { /* y==sendbase, duplicate ACK for already ACKed segment */ 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y == 3) { 22 /* TCP fast retransmit */ 23 resend segment with sequence number y 24 restart timer for segment y 25 } 26 } /* end of loop forever */ TCP: reliable data transfer Simplified TCP sender

TCP Developed in Time 1975 Three-way Handshake 1984 Nagel Alg Tomlinson SIGCOMM75 1984 Nagel Alg 1990 Delayed ACK BSD 4.3 Reno SIGCOMM88 1983 TCP impl. BSD 4.2 1986 Kahn Alg to estimate RTT 1996 SACK TCP Selective Ack 1974 TCP described by Cerf & Kahn IEEE Trans Comm 1988 Fast Retransmit BSD 4.3 Tahoe SIGCOMM88 1981 TCP RFC 793/791

Summary: Transport Reliability Basic structure: sliding window protocols Basic analytical techniques: joint state machine, traces, state invariants Realization example: TCP Remaining issue: How to determine the “right” parameters?

History Network collapse in the mid-1980s UCB  LBL throughput dropped by 1000X ! The intuition was that the collapse was caused by wrong parameters… Key parameters for TCP in mid-1980s fixed window size W timeout value = 2 RTT (already discussed)

Key Parameter: Sliding Window Size Transmission rate determined by congestion window size, cwnd, over segments: cwnd cwnd segments, each with MSS bytes sent in one RTT: Rate = cwnd * MSS RTT Bytes/sec Assume W is small enough. Ignore small details. MSS: Minimum Segment Size

Outline Admin and recap Transport reliability sliding window protocols connection management TCP reliability Transport congestion control/resource allocation

Some General Questions Big picture question: How to determine a flow’s sending rate? For better understanding, we need to look at a few basic questions: What is congestion (cost of congestion)? Why are desired properties of congestion control?

Outline Admin and recap Transport reliability sliding window protocols connection management TCP reliability Transport congestion control/resource allocation what is congestion (cost of congestion)

Cause/Cost of Congestion: Single Bottleneck flow 1 5 Mbps 20 Mbps 10 Mbps 20 Mbps flow 2 (5 Mbps) 20 Mbps router 1 router 2 - Flow 2 has a fixed sending rate of 5 Mbps - We vary the sending rate of flow 1 from 0 to 20 Mbps - Assume no retransmission; link from router 1 to router 2 has infinite buffer throughput: e2e packets delivered in unit time Delay? sending rate by flow 1 (Mbps) delay at central link 5 delay due to randomness sending rate by flow 1 (Mbps) throughput of flow 1 & 2 (Mbps) 5 10

Cause/Cost of Congestion: Single Bottleneck router 3 router 5 flow 1 5 Mbps 20 Mbps 10 Mbps 20 Mbps flow 2 (5 Mbps) 20 Mbps router 1 router 2 router 4 router 6 Assume no retransmission the link from router 1 to router 2 has finite buffer throughput: e2e packets delivered in unit time Zombie packet: a packet dropped at the link from router 2 to router 5; the upstream transmission from router 1 to router 2 used for that packet was wasted! sending rate by flow 1 (Mbps) throughput of flow 1 & 2 (Mbps) 5 10 x

Summary: The Cost of Congestion Throughput knee cliff congestion collapse packet loss When sources sending rate too high for the network to handle”: Packet loss => wasted upstream bandwidth when a pkt is discarded at downstream wasted bandwidth due to retransmission (a pkt goes through a link multiple times) High delay Load Delay

Outline Admin and recap Transport reliability sliding window protocols connection management TCP reliability Transport congestion control/resource allocation what is congestion (cost of congestion) basic congestion control alg.

The Desired Properties of a Congestion Control Scheme Efficiency: close to full utilization but low delay fast convergence after disturbance Fairness (resource sharing) Distributedness (no central knowledge for scalability)

Derive CC: A Simple Model User 1 x1 d = sum xi > Xgoal? sum xi User 2 x2 xn User n Flows observe congestion signal d, and locally take actions to adjust rates.

Discussion: values of the parameters? Linear Control Proposed by Chiu and Jain (1988) Considers the simplest class of control strategy Discussion: values of the parameters?

State Space of Two Flows fairness line: x1=x2 x2 efficiency line: x1+x2=C x(0) overload underload x1

efficiency: distributed linear rule congestion efficiency: distributed linear rule x0 x0 efficiency fairness x0 intersection x0

Implication: Congestion (overload) Case In order to get closer to efficiency and fairness after each update, decreasing of rate must be multiplicative decrease (MD) aD = 0 bD < 1

efficiency: distributed linear rule no-congestion efficiency: distributed linear rule x0 x0 efficiency fairness x0 convergence x0

Implication: No Congestion Case In order to get closer to efficiency and fairness after each update, additive and multiplicative increasing (AMI), i.e., aI > 0, bI > 1 Simply additive increase gives better improvement in fairness (i.e., getting closer to the fairness line) Multiplicative increase may grow faster

Intuition: State Trace Analysis of Four Special Cases Additive Decrease Multiplicative Decrease Additive Increase AIAD (bI=bD=1) AIMD (bI=1, aD=0) Multiplicative Increase MIAD (aI=0, bI>1, bD=1) MIMD (aI=aD=0) Discussion: state transition trace.

AIMD: State Transition Trace x2 fairness line: x1=x2 efficiency line: x1+x2=C overload underload x0 x1

Intuition: Another Look Consider the difference or ratio of the rates of two flows AIAD MIMD MIAD AIMD AIAD: difference does not change MIMD: ratio does not change MIAD: difference becomes bigger AIMD: difference does not change

Mapping A(M)I-MD to Protocol Question to look at: How do we apply the A(M)I-MD algorithm? Implement d(t): loss/delay Adjust window size b_D: exponential backoff timeout after window size is 1

Rate-based vs. Window-based Congestion control by explicitly controlling the sending rate of a flow, e.g., set sending rate to 128Kbps Example: ATM Window-based: Congestion control by controlling the window size of a transport scheme, e.g., set window size to 64KBytes Example: TCP Discussion: rate-based vs. window-based

Window-based Congestion Control Window-based congestion control is self-clocking: considers flow conservation, and adjusts to RTT variation automatically. Hence, for better safety, more designs use window-based design.

Mapping A(M)I-MD to Protocol Driving question: How do we apply the A(M)I-MD algorithm for sliding window protocols? Implement d(t): loss/delay Adjust window size b_D: exponential backoff timeout after window size is 1

Outline Admin and recap Transport reliability sliding window protocols connection management TCP reliability Transport congestion control/resource allocation what is congestion (cost of congestion) basic congestion control alg. TCP/reno congestion control

TCP Congestion Control Closed-loop, end-to-end, window-based congestion control designed by Van Jacobson in late 1980s, based on the AIMD Worked in a large range of bandwidth values: the bandwidth of the Internet has increased by more than 200,000 times Many versions TCP/Tahoe: this is a less optimized version TCP/Reno: many OSs today implement Reno type congestion control For more details: see TCP/IP illustrated; or read http://lxr.linux.no/source/net/ipv4/tcp_input.c for linux implementation

Basic Structure Two “phases” Important variables: MI: slow-start AIMD: congestion avoidance Important variables: cwnd: congestion window size ssthresh: threshold between the slow-start phase and the congestion avoidance phase

Visualization of the Two Phases

MI: Slow Start Algorithm: MI double cwnd every RTT until network congested Goal: getting to equilibrium gradually but quickly, to get a rough estimate of the optimal of cwnd

MI: Slow-start Initially: cwnd = 1; ssthresh = infinite (e.g., 64K); For each newly ACKed segment: if (cwnd < ssthresh) /* MI: slow start*/ cwnd = cwnd + 1; segment 1 cwnd = 1 ACK for segment 1 cwnd = 2 segment 2 segment 3 ACK for segments 2 + 3 cwnd = 4 segment 4 segment 5 segment 6 segment 7 cwnd = 6 cwnd = 8

AIMD: Congestion Avoidance Algorithm: AIMD increases window by 1 per round-trip time cuts window size to half when detecting congestion by 3DUP to 1 if timeout if already timeout, doubles timeout Goal: Maintains equilibrium and reacts around equilibrium

TCP/Reno Full Alg Initially: cwnd = 1; ssthresh = infinite (e.g., 64K); For each newly ACKed segment: if (cwnd < ssthresh) // slow start: MI cwnd = cwnd + 1; else // congestion avoidance; AI cwnd += 1/cwnd; Triple-duplicate ACKs: // MD cwnd = ssthresh = cwnd/2; Timeout: ssthresh = cwnd/2; // reset (if already timed out, double timeout value; this is called exponential backoff)

Typical TCP/Reno Transmission Behavior See [Jac89]