Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3 outline 3.1 transport-layer services

Similar presentations


Presentation on theme: "Chapter 3 outline 3.1 transport-layer services"— Presentation transcript:

1 Chapter 3 outline 3.1 transport-layer services
3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 principles of congestion control 3.7 TCP congestion control Transport Layer

2 TCP: Overview RFCs: 793,1122,1323, 2018, 2581 point-to-point:
one sender, one receiver reliable, in-order byte steam: no “message boundaries” MSS: maximum segment size pipelined: TCP congestion and flow control set “window size” full duplex data: bi-directional data flow in same connection connection-oriented: handshaking (exchange of control msgs) initiates sender/receiver state before data exchange flow controlled: sender will not overwhelm receiver Transport Layer

3 TCP segment structure source port # dest port # sequence number
32 bits URG: urgent data (generally not used) counting by bytes of data (not segments!) source port # dest port # sequence number ACK: ACK # valid acknowledgement number head len not used PSH: push data now (generally not used) U A P R S F receive window # bytes rcvr willing to accept checksum Urg data pointer RST, SYN, FIN: connection estab (setup, teardown commands) options (variable length) application data (variable length) Internet checksum (as in UDP) Transport Layer

4 TCP Sequence Number Indicates the position of the data in the packets (segments) Every “byte” is sequenced Used for re-ordering packets and finding lost packets Initial Sequence Number (ISN) is randomly assigned for every TCP connection [Note] SYN and FIN packets also consume one sequence number, although they do not include any data. ACK packets without payload also consume one sequence number Transport Layer

5 TCP seq. numbers, ACKs sequence numbers:
source port # dest port # sequence number acknowledgement number checksum rwnd urg pointer outgoing segment from sender sequence numbers: byte stream “number” of first byte in segment’s data acknowledgements: seq # of next byte expected from sender cumulative ACK Q: how receiver handles out-of-order segments A: TCP spec doesn’t say, - up to implementer window size N sender sequence number space source port # dest port # sequence number acknowledgement number checksum rwnd urg pointer incoming segment to sender sent ACKed sent, not-yet ACKed (“in-flight”) usable but not yet sent not usable A Transport Layer

6 simple telnet scenario
TCP seq. numbers, ACKs Host A Host B User types ‘C’ Seq=42, data = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=79, ACK=43, data = ‘C’ host ACKs receipt of echoed ‘C’ Seq=43, ACK=80 simple telnet scenario Transport Layer

7 TCP round trip time, timeout
Q: how to set TCP timeout value? longer than RTT but RTT varies too long: slow reaction to segment loss too short: premature timeout, unnecessary retransmissions Transport Layer

8 TCP round trip time, timeout
Q: how to estimate RTT? SampleRTT: measured time from segment transmission until ACK receipt ignore RTT sampling for retransmissions ([note: Karn’s algorithm) SampleRTT will vary  need to “smooth” estimated RTT average (평균을 내다) several recent measurements, not just current SampleRTT Transport Layer

9 TCP round trip time, timeout
EstimatedRTT = (1 - )*EstimatedRTT + *SampleRTT Exponential Weighted Moving Average (EWMA) influence of past sample decreases exponentially fast Recommended values [RFC2988]:  = 0.125 RTT: gaia.cs.umass.edu to fantasia.eurecom.fr RTT (milliseconds) sampleRTT EstimatedRTT time (seconds) Transport Layer

10 A Measured SNR Values #1-2
TCP round trip time, timeout ms SampleRTT EstimatedRTT = (1-a) * EstimatedRTT + a * SampleRTT Use Moving Average! a=0.9 a=0.6 a=0.1 a=0.3 KUT Transport Layer

11 TCP round trip time, timeout
timeout interval: EstimatedRTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin estimate SampleRTT deviation from EstimatedRTT: DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT| (typically,  = 0.25) TimeoutInterval = EstimatedRTT + 4*DevRTT estimated RTT “safety margin” Transport Layer

12 TCP round trip time, timeout
Measurement of Internet Delays For 100 Successive Packets At 1 Second Intervals TCP Timeout Interval For Sampled Internet Delays Transport Layer

13 Karn’s Algorithm Karn's algorithm
Rule 1: Ignore measured RTT for retransmitted packets. When retransmissions occur, the RTT estimate is not updated Reuse RTT estimate only after one successful transmission Remove ambiguity from RTT measurements. Rule 2: “Timeout Interval” should be doubled after retransmission. This is called "Exponential Back-off"

14 Karn’s Algorithm Why is Rule 1 necessary?
Eliminate Retransmission Ambiguity A B A B Original transmission Original transmission loss Timeout Timeout X ACK Sample RTT Sample RTT retransmission retransmission ACK Which one is correct for “Sample RTT”? So, what should we do? Answer: just ignore RTT sampling for retransmissions Transport Layer

15 Karn’s Algorithm Why is Rule 2 necessary?
When “Timeout Interval” is smaller than Real RTT.. If only Rule 1 is applied, TCP will use S as Timeout interval for a long time (or forever). Many packets will be retransmitted. More severe congestion occurs Data 1 S Real RTT Retransmission Ack Data 2 S Real RTT Retransmission Ack

16 Chapter 3 outline 3.1 transport-layer services
3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 principles of congestion control 3.7 TCP congestion control Transport Layer

17 TCP reliable data transfer
TCP creates rdt service on top of IP’s unreliable service pipelined segments cumulative acks single retransmission timer retransmissions triggered by: timeout events duplicate acks Transport Layer

18 TCP sender events: data rcvd from app: create segment with seq #
seq # is byte-stream number of first data byte in segment start timer if not already running think of timer as for oldest unacked segment expiration interval: TimeOutInterval timeout: retransmit the segment that caused timeout restart timer ack rcvd: if ack acknowledges previously unacked segments update what is known to be ACKed start timer if there are still unacked segments Transport Layer

19 TCP sender (simplified)
create segment, seq. #: NextSeqNum pass segment to IP (i.e., “send”) NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer data received from application above L wait for event NextSeqNum = InitialSeqNum SendBase = InitialSeqNum retransmit not-yet-acked segment with smallest seq. # start timer timeout if (y > SendBase) { SendBase = y /* SendBase–1: last cumulatively ACKed byte */ if (there are currently not-yet-acked segments) start timer else stop timer } ACK received, with ACK field value y Example: SendBase = 72; y= 73, so the rcvr wants 73~ ; y > SendBase, so ~72 data are ACKed Transport Layer

20 TCP: retransmission scenarios
Host A Host B Host A Host B SendBase=92 Seq=92, 8 bytes of data Seq=92, 8 bytes of data Seq=100, 20 bytes of data timeout timeout ACK=100 Seq=92 timeout X ACK=100 Seq=100 timeout ACK=120 Seq=92, 8 bytes of data Seq=92, 8 bytes of data SendBase=100 SendBase=120 ACK=100 ACK=120 SendBase=100 SendBase=120 lost ACK scenario premature timeout and cumulative ACKs Transport Layer

21 TCP: retransmission scenarios
Host A Host B timeout Seq=92, 8 bytes of data Seq=100, 20 bytes of data ACK=100 X ACK=120 Seq=120, 15 bytes of data cumulative ACK Transport Layer

22 TCP ACK generation [RFC 1122, RFC 2581]
event at receiver arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed expected seq #. One other segment has ACK pending arrival of out-of-order segment higher-than-expect seq. # . Gap detected arrival of segment that partially or completely fills gap TCP receiver action Wait up to 500ms for next segment. If no next segment within 500ms, send delayed ACK. immediately send single cumulative (delayed) ACK, ACKing both in-order segments immediately send duplicate ACK, indicating seq. # of next expected byte immediate send cumulative ACK, Transport Layer

23 TCP - Cumulative Acknowledgement
Let’s think the following scenario (1/2) sender receiver Seq. #=101, 100 bytes data Seq. #=201, 100 bytes data Seq. #=301, 100 bytes data Seq. #=401, 100 bytes data Seq. #=501, 100 bytes data Seq. #=601, 100 bytes data Acq. #=201 Acq. #=401 Acq. #=501 Acq. #=601 Acq. #=701 Transport Layer

24 TCP - Cumulative Acknowledgement
Let’s think the following scenario (2/2) sender receiver Seq. #=101, 100 bytes data Seq. #=201, 100 bytes data Seq. #=301, 100 bytes data Seq. #=401, 100 bytes data Seq. #=501, 100 bytes data Seq. #=601, 100 bytes data Acq. #=201 Acq. #=201 Acq. #=201 Acq. #=201 Timeout Acq. #=201 Seq. #=201, 100 bytes data Transport Layer

25 TCP fast retransmit time-out period often relatively long:
long delay before resending lost packet detect lost segments via duplicate ACKs. sender often sends many segments to be requested by app if segment is lost, there will likely be many duplicate ACKs. TCP fast retransmit if sender receives 3 ACKs for same data (“triple duplicate ACKs”), resend unacked segment with smallest seq # likely that unacked segment lost, so don’t wait for timeout Transport Layer

26 TCP fast retransmit sender receiver Seq. #=101, 100 bytes data
Acq. #=201 Acq. #=201 Acq. #=201 Acq. #=201 Seq. #=201, 100 bytes data Acq. #=201 Duplicate ACKs & Fast Retransmit Acq. #=701 Transport Layer

27 Fast retransmit algorithm:
event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { /* y == SendBase */ increment count of “duplicate ACKs” received for y if (count of dup ACKs received for y == 3) { resend segment with sequence number y } a duplicate ACK for already ACKed segment fast retransmit Transport Layer

28 Timer Start(update) for the unACKed packet (Seq: 1401)
More TCP Scenario (1/3) Segment Corruption sender Receiver Timer Start Seq : 1001, 200bytes Seq : 1201, 200bytes cumulative (delayed) ACK Seq : 1401, 200bytes Segment 3 - corrupted Timer Start(update) for the unACKed packet (Seq: 1401) ACK : 1401 Seq : 1401, 200bytes Timeout ACK : 1601 OK (Everything is ok.) Transport Layer 3-28

29 Timer Start(update) for the unACKed packet (Seq: 1401)
More TCP Scenario (2/3) Lost segment sender Receiver Timer Start Seq : 1001, 200bytes Seq : 1201, 200bytes Seq : 1401, 200bytes Timer Start(update) for the unACKed packet (Seq: 1401) Segment 3 - lost ACK : 1401 Seq : 1401, 200bytes Timeout ACK : 1601 OK (Everything is ok.) Transport Layer

30 Cumulative Ack Scenario
More TCP Scenario (3/3) Cumulative Ack Scenario sender Receiver Seq : 1001, 200bytes Seq : 1201, 200bytes Seq : 1401, 200bytes ACK : 1401 Acknowledgement Lost ACK : 1601 (Everything is ok.) Transport Layer

31 TCP is GBN or SR? “TCP is a mix of GBN and SR, and enhanced by new features” TCP Features GBN SR New Features Pipelined Protocol O Duplicate ACK Cumulative ACK Single Retransmission Timer Retransmit the only timeout packet Buffering at Receiver ACK number represents the expected next number. Fast Retransmission (for three consecutive duplicate ACK) Delayed ACK Transport Layer

32 TCP is GBN or SR? GBN: ACK number is seq # of pkt being ACKed. TCP: ACK number represents the expected next number. GBN: No buffering at Receiver, TCP: buffering at Receiver GBN sender retransmits the pkt n and all higher seq # pkts in window at timeout(n). But, TCP retransmits only pkt n. rcv pkt0 send ACK0 rcv pkt0 send ACK1 rcv pkt0 send ACK1 rcv pkt0 send ACK2 rcv pkt3, No discard, Beffering, and send ACK3 rcv pkt3, No discard, Beffering, and send ACK2 rcv pkt4, No discard, Beffering, and send ACK2 rcv pkt4, No discard, Beffering, and send ACK4 rcv pkt5, No discard, Beffering, and send ACK2 rcv pkt5, No discard, Beffering, and send ACK5 rcv pkt2, send ACK6 rcv pkt2, send ACK2 GBN TCP SR Transport Layer

33 Chapter 3 outline 3.1 transport-layer services
3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 principles of congestion control 3.7 TCP congestion control Transport Layer

34 receiver protocol stack
TCP flow control application process application may remove data from TCP socket buffers …. receive side of TCP connection has a receive buffer: application OS TCP socket receiver buffers … slower than TCP receiver receives (sender is sending) TCP code IP datagrams TCP data (in buffer) (currently) unused buffer space application process IP code receiver controls sender, so sender won’t overflow receiver’s buffer by transmitting too much, too fast flow control from sender receiver protocol stack Transport Layer

35 speed-matching service: matching send rate to receiving application’s drain rate
TCP flow control receiver “advertises” free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size can be set via socket options many operating systems autoadjust RcvBuffer sender limits amount of unacked (“in-flight”) data to receiver’s rwnd value guarantees receive buffer will not overflow to application process buffered data free buffer space RcvBuffer rwnd TCP segment payloads receiver-side buffering Transport Layer

36 Chapter 3 outline 3.1 transport-layer services
3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 principles of congestion control 3.7 TCP congestion control Transport Layer

37 Connection Management
before exchanging data, sender/receiver “handshake”: agree to establish connection (each knowing the other willing to establish connection) agree on connection parameters application application connection state: ESTAB connection variables: seq # client-to-server server-to-client rcvBuffer size at server,client connection state: ESTAB connection Variables: seq # client-to-server server-to-client rcvBuffer size at server,client network network Socket clientSocket = newSocket("hostname","port number"); Socket connectionSocket = welcomeSocket.accept(); Transport Layer

38 TCP 3-way handshake client state server state LISTEN SYNSENT
SYNbit=1, Seq=x choose init seq num, x send TCP SYN msg SYN RCVD ESTAB SYNbit=1, Seq=y ACKbit=1; ACKnum=x+1 choose init seq num, y send TCP SYNACK msg, acking SYN ACKbit=1, ACKnum=y+1 received SYNACK(x+1) indicates server is live; send ACK for SYNACK; this segment may contain client-to-server data received ACK(y+1) indicates client is live ESTAB Transport Layer

39 TCP 3-way handshake: FSM
closed Socket connectionSocket = welcomeSocket.accept(); L Socket clientSocket = newSocket("hostname","port number"); SYN(x) SYNACK(seq=y,ACKnum=x+1) create new socket for communication back to client listen SYN(seq=x) SYN rcvd SYN sent SYNACK(seq=y,ACKnum=x+1) ACK(ACKnum=y+1) ACK(ACKnum=y+1) ESTAB L Transport Layer

40 TCP: closing a connection
client, server each close their side of connection send TCP segment with FIN bit = 1 respond to received FIN with ACK on receiving FIN, ACK can be combined with own FIN simultaneous FIN exchanges can be handled Instead of FIN, TCP layer can send a RST segment that terminates a connection if something is wrong. Transport Layer

41 TCP: closing a connection
Modified 3 way handshake (or 4 way termination) App1: “I have no more data for you. Send FIN segment”. App2: “OK, I understand you are done sending. Send ACK segment” …..server can send data to client…. App2: “OK - Now I’m also done sending data. Send FIN segment”. App1: “I understand , Goodbye. Send ACK segment” 1 App1 App2 FIN SN=X 1 ACK=X+1 2 ACK=Y+1 4 SN=Y 3 ... 2 3 4 Transport Layer

42 TCP: closing a connection
client state server state ESTAB ESTAB FIN_WAIT_1 FINbit=1, seq=x can no longer send but can receive data clientSocket.close() CLOSE_WAIT FIN_WAIT_2 ACKbit=1; ACKnum=x+1 wait for server close can still send data can no longer send data LAST_ACK TIMED_WAIT FINbit=1, seq=y CLOSED timed wait for 2*max segment lifetime CLOSED ACKbit=1; ACKnum=y+1 Transport Layer

43 TCP: closing a connection
Why TIME_WAIT? This gives enough time to Client TCP so as to ensure the ACK it sent to the server was correctly received. If the ACK the client sent is lost, the server will re-transmit FIN. The FIN should be received by Client How Long? 2 * MSL (Maximum Segment Lifetime) Usually 2 min. ~ 2min. 30sec. Start TIME_WAIT Re-Start TIME_WAIT Transport Layer

44 MSS (Maximum Segment Size)
Link MTU vs. Path MTU vs. MSS Maximum Transmission Unit (MTU) is defined by the maximum payload size of the Layer 2 frame. Link MTU: The max packet size that can be transmitted over a link Path MTU: The minimum link MTU of all links in a path between a source and a destination Layer 3 payload determines Layer 4 Maximum Segment Size (MSS) Transport Layer

45 MSS (Maximum Segment Size)
What is MSS? MSS: Maximum Segment Size Largest payload size that TCP can send for this connection. Usually, MSS is calculated by “Maximum Transmission Unit (MTU) - 40 bytes.” MAC Header (Path MTU) Transport Layer

46 MSS (Maximum Segment Size)
What is MSS? An example of MSS negotiation In this example, both sides use 960 bytes as MSS. Default TCP MSS is 536 bytes MSS is specified as a TCP option, initially in the TCP SYN packet during the TCP handshake. In Modern Internet, path MTU is usually 1500 and MSS can be 1460 Self-check: Transport Layer

47 Chapter 3 outline 3.1 transport-layer services
3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 & 3.7 principles of congestion control / TCP congestion control (영문서 기준 – pp. 259, pp.269~278) Transport Layer

48 Principles of congestion control
informally: “too many sources sending too much data too fast for network to handle” different from flow control! manifestations: lost packets (buffer overflow at routers) long delays (queueing in router buffers) a top-10 problem! Transport Layer

49 Principles of congestion control
Flow Control vs. Congestion Control Src Dest Limits amount of data that destination must buffer Src Dest Attempts to reduce buffer overflow inside the network Transport Layer

50 Principles of congestion control
Flow Control vs. Congestion Control Transport Layer

51 Principle of TCP congestion control:
goal: TCP sender should transmit as fast as possible, but without congesting network Q: how to find rate just below congestion level decentralized: each TCP sender sets its own rate, based on implicit feedback: ACK: segment received (a good thing!), network not congested, so increase sending rate lost segment: assume loss due to congested network, so decrease sending rate Timeout Duplicate ACKs Transport Layer

52 congestion window size
Principle of TCP congestion control: additive increase multiplicative decrease (AIMD) approach: “probing for bandwidth” – increase transmission rate on receipt of ACK, until eventually loss occurs, then decrease transmission rate additive increase: increase cwnd by 1 MSS every RTT until loss detected multiplicative decrease: cut cwnd in half after loss additively increase window size … …. until loss occurs (then cut window in half) AIMD saw tooth behavior: probing for bandwidth congestion window size cwnd: TCP sender Transport Layer time

53 TCP Congestion Control: details
TCP sending rate (throughout): roughly: send cwnd bytes, wait RTT for ACKS, then send more bytes sender sequence number space cwnd last byte ACKed last byte sent sent, not-yet ACKed (“in-flight”) cwnd RTT sender limits transmission: cwnd is dynamic, function of perceived network congestion rate ~ bytes/sec LastByteSent- LastByteAcked < min(cwnd, rwnd} cwnd bytes RTT ACK(s) Transport Layer

54 TCP Slow Start Host A Host B one segment when connection begins, increase CWND exponentially until first loss event: Slow Start Phase: initially cwnd = 1 MSS done by incrementing cwnd for every ACK received double cwnd every RTT That is, initial rate is slow but ramps up exponentially fast Congestion Avoidance Phase: increase linearly RTT two segments four segments time Slow Start algorithm initialize: cwin = 1 for (each segment ACKed) cwin++ until (loss event || cwin > ssthresh) Transport Layer

55 TCP Slow Start SSTHRESH Q: When should the exponential increase switch to linear? A: do end when cwnd > ssthresh and move to “Congestion Avoidance phase” Transport Layer

56 TCP Slow Start Slow Start  Congestion Avoidance Congestion Avoidance
cwin set to 1 MSS; window then grows exponentially multiplicative increase to ssthresh, then grows linearly Congestion Avoidance This is additive increase why not multiplicative increase? growing too fast in equilibrium => oscillations Slow Start Congestion Avoidance SSTHRESH SSTHRESH Packet loss Transport Layer

57 TCP: detecting, reacting to loss
SSTHRESH loss indicated by timeout: cwnd set to 1 MSS ssthresh = cwnd/2 window then grows exponentially (as in slow start) to threshold, then grows linearly TCP Tahoe always sets cwnd to 1 No consideration about 3 duplicate ACKs Transport Layer

58 TCP: detecting, reacting to loss
TCP Reno consider the loss indicated by 3 duplicate ACKs: dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly Retransmits the segment unacked without waiting for timeout Fast Retransmit Philosophy: “3 dup ACKs” indicates network capable of delivering some segments In the other hand, “timeout” indicates a “more serious alarming” for congestion scenario Transport Layer

59 TCP: detecting, reacting to loss
Fast Retransmit & Fast Recovery a duplicate ACK implies the receiver got a packet out of order an earlier packet might have been lost (or delayed) When TCP sender sees three duplicate ACKs, it retransmit the segment unacked without waiting for timeout Then, TCP sender start Congestion Avoidance 2 3 3 3 3 7 Transport Layer

60 Summary: TCP Congestion Control
when cwnd < ssthresh, sender in slow-start phase, window grows exponentially. when cwnd >= ssthresh, sender is in congestion- avoidance phase, window grows linearly. when triple duplicate ACK occurs, cwnd set to cwnd/2, ssthresh set to cwnd/2, and do fast retransmit and start congestion avoidance phase when timeout occurs, cwnd set to 1 MSS, ssthresh set to cwnd/2 and ,. start slow start phase Transport Layer

61 Summary: TCP Congestion Control
Transport Layer

62 With or Without Fast Recovery
Congestion Avoidance  (No Fast Recovery)  Congestion Avoidance Slow Start  Fast Recovery  Congestion Avoidance cwnd Slow Start Congestion Avoidance Time “inflating” cwnd with dupACKs “deflating” cwnd with a new ACK (initial) ssthresh new ACK fast-retransmit timeout roughly a 20% improvement in the throughput ssthresh = cwnd/2 Transport Layer


Download ppt "Chapter 3 outline 3.1 transport-layer services"

Similar presentations


Ads by Google