Download presentation
Presentation is loading. Please wait.
Published byDella Stokes Modified over 9 years ago
1
Transport Layer 3-1 Chapter 3 outline 3.1 transport-layer services 3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 principles of congestion control 3.7 TCP congestion control
2
Transport Layer 3-2 TCP: Overview RFCs: 793,1122,1323, 2018, 2581 full duplex data: bi-directional data flow in same connection MSS: maximum segment size connection-oriented: handshaking (exchange of control msgs) initiates sender/receiver state before data exchange flow controlled: sender will not overwhelm receiver point-to-point: one sender, one receiver reliable, in-order byte steam: no “message boundaries” pipelined: TCP congestion and flow control set “window size”
3
Transport Layer 3-3 TCP segment structure source port # dest port # 32 bits application data (variable length) sequence number acknowledgement number receive window Urg data pointer checksum F SR PAU head len not used options (variable length) URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) # bytes rcvr willing to accept counting by bytes of data (not segments!) Internet checksum (as in UDP)
4
KUT 4 TCP Sequence Number Indicates the position of the data in the packets (segments) Every “byte” is sequenced Used for re-ordering packets and finding lost packets Initial Sequence Number (ISN) is randomly assigned for every TCP connection [Note] SYN and FIN packets also consume one sequence number, although they do not include any data. ACK packets without payload also consume one sequence number
5
Transport Layer 3-5 TCP seq. numbers, ACKs sequence numbers: byte stream “number” of first byte in segment’s data acknowledgements: seq # of next byte expected from sender cumulative ACK Q: how receiver handles out-of-order segments A: TCP spec doesn’t say, - up to implementer source port # dest port # sequence number acknowledgement number checksum rwnd urg pointer incoming segment to sender A sent ACKed sent, not- yet ACKed (“in-flight”) usable but not yet sent not usable window size N sender sequence number space source port # dest port # sequence number acknowledgement number checksum rwnd urg pointer outgoing segment from sender
6
Transport Layer 3-6 TCP seq. numbers, ACK s User types ‘C’ host ACKs receipt of echoed ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ simple telnet scenario Host B Host A Seq=42, ACK=79, data = ‘C’ Seq=79, ACK=43, data = ‘C’ Seq=43, ACK=80
7
Transport Layer 3-7 TCP round trip time, timeout Q: how to set TCP timeout value? longer than RTT but RTT varies too long: slow reaction to segment loss too short: premature timeout, unnecessary retransmissions
8
Transport Layer 3-8 TCP round trip time, timeout Q: how to estimate RTT? SampleRTT : measured time from segment transmission until ACK receipt ignore RTT sampling for retransmissions ([note: see the next page) SampleRTT will vary need to “smooth” estimated RTT average ( 평균을 내다 ) several recent measurements, not just current SampleRTT
9
Transport Layer 3-9 TCP round trip time, timeout Retransmission Ambiguity AB ACK Sample RTT AB Original transmission retransmission Sample RTT Original transmission retransmission ACK Timeout Which one is correct for “Sample RTT”? So, what should we do? Answer: just ignore RTT sampling for retransmissions X loss
10
Transport Layer 3-10 EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT Exponential Weighted Moving Average (EWMA) influence of past sample decreases exponentially fast Recommended values [RFC2988]: = 0.125 TCP round trip time, timeout RTT (milliseconds) RTT: gaia.cs.umass.edu to fantasia.eurecom.fr sampleRTT EstimatedRTT time (seconds)
11
KUT A Measured SNR Values #1-2 dB ms EstimatedRTT = (1-a) * EstimatedRTT + a * SampleRTT a=0.9 a=0.6 a=0.3 a=0.1 Use Moving Average! TCP round trip time, timeout Transport Layer 3-11 SampleRTT
12
Transport Layer 3-12 timeout interval: EstimatedRTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin estimate SampleRTT deviation from EstimatedRTT: DevRTT = (1- )*DevRTT + *|SampleRTT-EstimatedRTT| TCP round trip time, timeout (typically, = 0.25) TimeoutInterval = EstimatedRTT + 4*DevRTT estimated RTT “safety margin”
13
Transport Layer 3-13 TCP round trip time, timeout Measurement of Internet Delays For 100 Successive Packets At 1 Second Intervals TCP Timeout Interval For Sampled Internet Delays
14
KUT 14 Karn’s Algorithm ( 참고사항 - 교과서외 ) Karn's algorithm Rule 1: Ignore measured RTT for retransmitted packets. When retransmissions occur, the RTT estimate is not updated Reuse RTT estimate only after one successful transmission Remove ambiguity from RTT measurements. Rule 2: “Timeout Interval” should be doubled after retransmission. This is called "Exponential Back-off"
15
KUT 15 Why is Rule 2 necessary? When “Timeout Interval” is smaller than Real RTT.. If only Rule 1 is applied, TCP will use S as Timeout interval for a long time (or forever). Many packets will be retransmitted. More severe congestion occurs Data 1 Retransmission Ack S Real RTT Data 2 Retransmission Ack S Real RTT Karn’s Algorithm ( 참고사항 - 교과서외 )
16
Transport Layer 3-16 Chapter 3 outline 3.1 transport-layer services 3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 principles of congestion control 3.7 TCP congestion control
17
Transport Layer 3-17 TCP reliable data transfer TCP creates rdt service on top of IP’s unreliable service pipelined segments cumulative acks single retransmission timer retransmissions triggered by: timeout events duplicate acks let’s initially consider simplified TCP sender: ignore duplicate acks ignore flow control, congestion control
18
Transport Layer 3-18 TCP sender events: data rcvd from app: create segment with seq # seq # is byte-stream number of first data byte in segment start timer if not already running think of timer as for oldest unacked segment expiration interval: TimeOutInterval timeout: retransmit segment that caused timeout restart timer ack rcvd: if ack acknowledges previously unacked segments update what is known to be ACKed start timer if there are still unacked segments
19
Transport Layer 3-19 TCP sender (simplified) wait for event NextSeqNum = InitialSeqNum SendBase = InitialSeqNum create segment, seq. #: NextSeqNum pass segment to IP (i.e., “send”) NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer data received from application above retransmit not-yet-acked segment with smallest seq. # start timer timeout if (y > SendBase) { SendBase = y /* SendBase–1: last cumulatively ACKed byte */ if (there are currently not-yet-acked segments) start timer else stop timer } ACK received, with ACK field value y Example: SendBase-1 = 71; y= 73, so the rcvr wants 73~ ; y > SendBase, so ~72 data are ACKed
20
Transport Layer 3-20 TCP: retransmission scenarios lost ACK scenario Host B Host A Seq=92, 8 bytes of data ACK=100 Seq=92, 8 bytes of data X timeout ACK=100 premature timeout Host B Host A Seq=92, 8 bytes of data ACK=100 Seq=92, 8 bytes of data timeout ACK=120 Seq=100, 20 bytes of data ACK=120 SendBase=100 SendBase=120 SendBase=92
21
Transport Layer 3-21 TCP: retransmission scenarios X cumulative ACK Host B Host A Seq=92, 8 bytes of data ACK=100 Seq=120, 15 bytes of data timeout Seq=100, 20 bytes of data ACK=120
22
Transport Layer 3-22 TCP ACK generation [RFC 1122, RFC 2581] event at receiver arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed arrival of in-order segment with expected seq #. One other segment has ACK pending arrival of out-of-order segment higher-than-expect seq. #. Gap detected arrival of segment that partially or completely fills gap TCP receiver action Wait up to 500ms for next segment. If no next segment within 500ms, send delayed ACK. immediately send single cumulative (delayed) ACK, ACKing both in-order segments immediately send duplicate ACK, indicating seq. # of next expected byte immediate send cumulative ACK,
23
Transport Layer 3-23 TCP fast retransmit time-out period often relatively long: long delay before resending lost packet detect lost segments via duplicate ACKs. sender often sends many segments back- to-back if segment is lost, there will likely be many duplicate ACKs. if sender receives 3 ACKs for same data (“triple duplicate ACKs”), resend unacked segment with smallest seq # likely that unacked segment lost, so don’t wait for timeout TCP fast retransmit (“triple duplicate ACKs”),
24
Transport Layer 3-24 X fast retransmit after sender receipt of triple duplicate ACK Host B Host A Seq=92, 8 bytes of data ACK=100 timeout ACK=100 TCP fast retransmit Seq=100, 20 bytes of data triple duplicate ACKs
25
Let’s think the following scenario (1/3) TCP - Cumulative Acknowledgement senderreceiver Seq. #=101, 100 bytes data Seq. #=201, 100 bytes data Seq. #=301, 100 bytes data Seq. #=401, 100 bytes data Seq. #=501, 100 bytes data Seq. #=601, 100 bytes data Acq. #=201 Acq. #=401 Acq. #=501 Acq. #=601 Acq. #=701 Transport Layer 3-25
26
Let’s think the following scenario (2/3) TCP - Cumulative Acknowledgement senderreceiver Seq. #=101, 100 bytes data Seq. #=201, 100 bytes data Seq. #=301, 100 bytes data Seq. #=401, 100 bytes data Seq. #=501, 100 bytes data Seq. #=601, 100 bytes data Acq. #=201 Seq. #=201, 100 bytes data Timeout Transport Layer 3-26
27
TCP - Cumulative Acknowledgement senderreceiver Seq. #=101, 100 bytes data Seq. #=201, 100 bytes data Seq. #=301, 100 bytes data Seq. #=401, 100 bytes data Seq. #=501, 100 bytes data Seq. #=601, 100 bytes data Acq. #=201 Seq. #=201, 100 bytes data Acq. #=701 Duplicate ACKs & Fast Retransmit Let’s think the following scenario (3/3) Transport Layer 3-27
28
event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { /* y == SendBase */ increment count of “duplicate ACKs” received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } Fast retransmit algorithm: a duplicate ACK for already ACKed segment fast retransmit Transport Layer 3-28
29
Segment Corruption Seq : 1001, 200bytes Seq : 1201, 200bytes Seq : 1401, 200bytes ACK : 1401 Seq : 1401, 200bytes ACK : 1601 Segment 3 - corrupted Timeout OK sender Receiver More TCP Scenario (1/3) (Everything is ok.) cumulative (delayed) ACK Timer Start Timer Start(update) for the unACKed packet (Seq: 1401) Transport Layer 3-29
30
Lost segment Seq : 1001, 200bytes Seq : 1201, 200bytes ACK : 1401 Seq : 1401, 200bytes ACK : 1601 Segment 3 - lost sender Receiver Seq : 1401, 200bytes More TCP Scenario (2/3) Timeout OK (Everything is ok.) Timer Start Timer Start(update) for the unACKed packet (Seq: 1401) Transport Layer 3-30
31
Cumulative Ack Scenario Seq : 1001, 200bytes Seq : 1201, 200bytes Seq : 1401, 200bytes ACK : 1401 ACK : 1601 Acknowledgement Lost sender Receiver More TCP Scenario (3/3) (Everything is ok.) Transport Layer 3-31
32
TCP is GBN or SR? If we should select one which is more similar to TCP, TCP is more close to GBN… (or TCP is mix of GBN and SR) However, TCP is different from GBN… Transport Layer 3-32
33
TCP is GBN or SR? GBN: ACK number is seq # of pkt being ACKed. TCP: ACK number represents the expected next number. GBN: No buffering at Receiver, TCP: buffering at Receiver GBN sender retransmits the pkt n and all higher seq # pkts in window at timeout(n). But, TCP retransmits only pkt n. rcv pkt3, No discard, Beffering send ACK2 rcv pkt4, No discard, Beffering send ACK2 rcv pkt5, No discard, Beffering send ACK2 rcv pkt2, send ACK6 1 2 1 2 Transport Layer 3-33
34
Transport Layer 3-34 Chapter 3 outline 3.1 transport-layer services 3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 principles of congestion control 3.7 TCP congestion control
35
Transport Layer 3-35 TCP flow control application process TCP socket receiver buffers TCP code IP code application OS receiver protocol stack application may remove data from TCP socket buffers …. … slower than TCP receiver receives (sender is sending) from sender receiver controls sender, so sender won’t overflow receiver’s buffer by transmitting too much, too fast flow control receive side of TCP connection has a receive buffer: IP datagrams TCP data (in buffer) (currently) unused buffer space application process
36
Transport Layer 3-36 TCP flow control buffered data free buffer space rwnd RcvBuffer TCP segment payloads to application process receiver “advertises” free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size set via socket options many operating systems autoadjust RcvBuffer sender limits amount of unacked (“in-flight”) data to receiver’s rwnd value guarantees receive buffer will not overflow receiver-side buffering speed-matching service: matching send rate to receiving application’s drain rate
37
Transport Layer 3-37 Chapter 3 outline 3.1 transport-layer services 3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 principles of congestion control 3.7 TCP congestion control
38
Transport Layer 3-38 Connection Management before exchanging data, sender/receiver “handshake”: agree to establish connection (each knowing the other willing to establish connection) agree on connection parameters connection state: ESTAB connection variables: seq # client-to-server server-to-client rcvBuffer size at server,client application network connection state: ESTAB connection Variables: seq # client-to-server server-to-client rcvBuffer size at server,client application network Socket clientSocket = newSocket("hostname","port number"); Socket connectionSocket = welcomeSocket.accept();
39
Transport Layer 3-39 TCP 3-way handshake SYNbit=1, Seq=x choose init seq num, x send TCP SYN msg ESTAB SYNbit=1, Seq=y ACKbit=1; ACKnum=x+1 choose init seq num, y send TCP SYNACK msg, acking SYN ACKbit=1, ACKnum=y+1 received SYNACK(x+1) indicates server is live; send ACK for SYNACK; this segment may contain client-to-server data received ACK(y+1) indicates client is live SYNSENT ESTAB SYN RCVD client state LISTEN server state LISTEN
40
Transport Layer 3-40 TCP 3-way handshake: FSM closed listen SYN rcvd SYN sent ESTAB Socket clientSocket = newSocket("hostname","port number"); SYN(seq=x) Socket connectionSocket = welcomeSocket.accept(); SYN(x) SYNACK(seq=y,ACKnum=x+1) create new socket for communication back to client SYNACK(seq=y,ACKnum=x+1) ACK(ACKnum=y+1)
41
Transport Layer 3-41 TCP: closing a connection client, server each close their side of connection send TCP segment with FIN bit = 1 respond to received FIN with ACK on receiving FIN, ACK can be combined with own FIN simultaneous FIN exchanges can be handled Instead of FIN, TCP layer can send a RST segment that terminates a connection if something is wrong.
42
Transport Layer 3-42 TCP: closing a connection Modified 3 way handshake (or 4 way termination) App1 App2 FIN SN=X FIN SN=X 1 ACK=X+1 2 ACK=Y+1 4 FIN SN=Y FIN SN=Y 3... 1 2 3 4 App1: “I have no more data for you. Send FIN segment”. App2: “OK, I understand you are done sending. Send ACK segment” …..server can send data to client…. App2: “OK - Now I’m also done sending data. Send FIN segment”. App1: “I understand, Goodbye. Send ACK segment”
43
Transport Layer 3-43 FIN_WAIT_2 CLOSE_WAIT FINbit=1, seq=y ACKbit=1; ACKnum=y+1 ACKbit=1; ACKnum=x+1 wait for server close can still send data can no longer send data LAST_ACK CLOSED TIMED_WAIT timed wait for 2*max segment lifetime CLOSED TCP: closing a connection FIN_WAIT_1 FINbit=1, seq=x can no longer send but can receive data clientSocket.close() client state server state ESTAB
44
Why TIME_WAIT? This gives enough time to Client TCP so as to ensure the ACK it sent to the server was correctly received. If the ACK the client sent is lost, the server will re-transmit FIN. The FIN should be received by Client Start TIME_WAIT Re-Start TIME_WAIT TCP: closing a connection Transport Layer 3-44
45
MSS (Maximum Segment Size) Link MTU vs. Path MTU vs. MSS Maximum Transmission Unit (MTU) is defined by the maximum payload size of the Layer 2 frame. Link MTU: The max packet size that can be transmitted over a link Path MTU: The minimum link MTU of all links in a path between a source and a destination Layer 3 payload determines Layer 4 Maximum Segment Size (MSS) Transport Layer 3-45
46
What is MSS? MSS: Maximum Segment Size Largest payload size that TCP can send for this connection. Usually, MSS is calculated by “Maximum Transmission Unit (MTU) - 40 bytes.” MAC Header (Path MTU) MSS (Maximum Segment Size) Transport Layer 3-46
47
What is MSS? An example of MSS negotiation In this example, both sides use 960 bytes as MSS. In Modern Internet, path MTU is usually 1500 and MSS can be 1460 Self-check: http://www.speedguide.net:8080 MSS (Maximum Segment Size) Transport Layer 3-47
48
Transport Layer 3-48 Chapter 3 outline 3.1 transport-layer services 3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 & 3.7 principles of congestion control / TCP congestion control ( 국내서 기준 - pp.301 and pp.310~317)
49
Transport Layer 3-49 congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control! manifestations: lost packets (buffer overflow at routers) long delays (queueing in router buffers) a top-10 problem! Principles of congestion control
50
Flow Control vs. Congestion Control SrcDest Limits amount of data that destination must buffer SrcDest Attempts to reduce buffer overflow inside the network Principles of congestion control Transport Layer 3-50
51
Flow Control vs. Congestion Control Principles of congestion control Transport Layer 3-51
52
Transport Layer 3-52 Principle of TCP congestion control: goal: TCP sender should transmit as fast as possible, but without congesting network Q: how to find rate just below congestion level decentralized: each TCP sender sets its own rate, based on implicit feedback: ACK: segment received (a good thing!), network not congested, so increase sending rate lost segment: assume loss due to congested network, so decrease sending rate Timeout Duplicate ACKs
53
Transport Layer 3-53 Principle of TCP congestion control: additive increase multiplicative decrease (AIMD) approach: “probing for bandwidth” – increase transmission rate on receipt of ACK, until eventually loss occurs, then decrease transmission rate additive increase: increase cwnd by 1 MSS every RTT until loss detected multiplicative decrease: cut cwnd in half after loss cwnd: TCP sender congestion window size AIMD saw tooth behavior: probing for bandwidth additively increase window size … …. until loss occurs (then cut window in half) time
54
Transport Layer 3-54 TCP Congestion Control: details sender limits transmission: cwnd is dynamic, function of perceived network congestion TCP sending rate: roughly: send cwnd bytes, wait RTT for ACKS, then send more bytes last byte ACKed sent, not- yet ACKed (“in-flight”) last byte sent cwnd LastByteSent- LastByteAcked < min(cwnd, rwnd} sender sequence number space rate ~ ~ cwnd RTT bytes/sec cwnd bytes RTT ACK(s)
55
Transport Layer 3-55 TCP Slow Start when connection begins, increase rate exponentially until first loss event: Slow Start Phase: initially cwnd = 1 MSS double cwnd every RTT done by incrementing cwnd for every ACK received That is, initial rate is slow but ramps up exponentially fast Congestion Avoidance Phase: increase linearly Host A one segment RTT Host B time two segments four segments initialize: cwin = 1 for (each segment ACKed) cwin++ until (loss event || cwin > ssthresh) Slow Start algorithm
56
Transport Layer 3-56 TCP Slow Start Q: When should the exponential increase switch to linear? A: do end when cwnd > ssthresh and move to “Congestion Avoidance phase” SSTHRESH
57
Slow Start Congestion Avoidance cwin set to 1 MSS; window then grows exponentially multiplicative increase to ssthresh, then grows linearly Congestion Avoidance This is additive increase why not multiplicative increase? growing too fast in equilibrium => oscillations Slow Start Congestion Avoidance SSTHRESH Packet loss TCP Slow Start Transport Layer 3-57
58
Transport Layer 3-58 TCP: detecting, reacting to loss loss indicated by timeout: cwnd set to 1 MSS ssthresh = cwnd/2 window then grows exponentially (as in slow start) to threshold, then grows linearly TCP Tahoe always sets cwnd to 1 No consideration about 3 duplicate ACKs SSTHRESH
59
Transport Layer 3-59 TCP: detecting, reacting to loss TCP Reno consider the loss indicated by 3 duplicate ACKs: dup ACKs indicate network capable of delivering some segments cwnd is cut in half window then grows linearly Retransmits the segment unacked without waiting for timeout “3 dup ACKs” indicates network capable of delivering some segments In the other hand, “timeout” indicates a “more serious alarming” for congestion scenario Philosophy: Fast Retransmit
60
Fast Retransmit & Fast Recovery a duplicate ACK implies the receiver got a packet out of order an earlier packet might have been lost (or delayed) When TCP sender sees three duplicate ACKs, it retransmit the segment unacked without waiting for timeout Then, TCP sender start Congestion Avoidance 2 3 3 3 3 7 TCP: detecting, reacting to loss Transport Layer 3-60 Fast Recovery
61
Summary: TCP Congestion Control when cwnd < ssthresh, sender in slow-start phase, window grows exponentially. when cwnd >= ssthresh, sender is in congestion- avoidance phase, window grows linearly. when triple duplicate ACK occurs, cwnd set to cwnd/2, ssthresh set to cwnd/2, and do fast retransmit and start congestion avoidance phase when timeout occurs, cwnd set to 1 MSS, ssthresh set to cwnd/2 and,. start slow start phase Transport Layer 3-61
62
Summary: TCP Congestion Control Transport Layer 3-62 slow start congestion avoidance fast recovery cwnd > ssthresh loss: timeout loss: timeout new ACK loss: 3dupACK fast retransmit loss: 3dupACK loss: timeout
63
Transport Layer 3-63 Chapter 3: summary principles behind transport layer services: multiplexing, demultiplexing reliable data transfer flow control congestion control instantiation, implementation in the Internet UDP TCP next: leaving the network “edge” (application, transport layers) into the network “core”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.