Transport Layer1 TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 r reliable, in-order byte steam: m no “message boundaries” r pipelined: m TCP congestion and flow control set window size r full duplex data transfer: m bi-directional data flow in same connection r A TCP is always point-to-point: m one sender, one receiver r connection-oriented: m Because handshaking take place. Handshaking (exchange of control msgs) initialize sender, receiver state before data exchange r send & receive buffers m MSS: maximum segment size
Transport Layer2 TCP segment structure source port # dest port # 32 bits application data (variable length) sequence number acknowledgement number Receive window Urg data pnter checksum F SR PAU head len not used Options (variable length) URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection setup and teardown # bytes rcvr willing to accept counting by bytes of data (not segments!) Internet checksum (as in UDP) RFC 793, 1323
Transport Layer3 TCP Round Trip Time and Timeout Q: how to set TCP timeout value? r too short: premature timeout m unnecessary retransmissions r too long: slow reaction to segment loss r longer than RTT m but RTT varies Q: how to estimate RTT? SampleRTT : measured time from segment transmission until ACK receipt m ignore retransmissions SampleRTT will vary, want estimated RTT “smoother” m average SampleRTT values
Transport Layer4 TCP Round Trip Time and Timeout Setting the timeout EstimtedRTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin
Transport Layer5 TCP reliable data transfer r TCP creates reliable data transfer service on top of IP’s unreliable service r Pipelined segments r Cumulative acks r Single retransmission timer r Retransmissions are triggered by: m timeout events m duplicate acks r Initially consider simplified TCP sender: m ignore duplicate acks m ignore flow control, congestion control
Transport Layer6 TCP sender events: data rcvd from app: r create segment with seq # NextSeqNum r start timer if not already running (think of timer as for oldest unacked segment) expiration interval: TimeOutInterval timeout: r retransmit not-yet-acked segment with smallest seq # r restart timer Ack(y) rcvd: r If acknowledges previously unacked segments (y > SendBase) m Set SendBase = y m start timer if there are outstanding segments
Transport Layer7 TCP sender (simplified) NextSeqNum = InitialSeqNum SendBase = InitialSeqNum loop (forever) { switch(event) event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data) event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } } /* end of loop forever */ Comment: SendBase-1: last cumulatively ack’ed byte Example: SendBase = 71; y= 73, so the rcvr wants 73+ ; y > SendBase new data (71,72) is acked
Transport Layer8 TCP: retransmission scenarios Host A Seq=100, 20 bytes data ACK=100 time premature timeout Host B Seq=92, 8 bytes data ACK=120 Seq=92, 8 bytes data Seq=92 timeout ACK=120 Host A Seq=92, 8 bytes data ACK=100 loss timeout lost ACK scenario Host B X Seq=92, 8 bytes data ACK=100 time Seq=92 timeout SendBase = 100 SendBase = 120 SendBase = 120 Sendbase = 100 discard segment
Transport Layer9 TCP retransmission scenarios (more) Host A Seq=92, 8 bytes data ACK=100 loss timeout Cumulative ACK scenario Host B X Seq=100, 20 bytes data ACK=120 time SendBase = 120
Transport Layer10 TCP ACK generation [RFC 1122, RFC 2581] Event at Receiver Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Arrival of in-order segment with expected seq #. One other segment has ACK pending Arrival of out-of-order segment higher-than-expect seq. #. Gap detected Arrival of segment that partially or completely fills gap TCP Receiver action Delayed ACK. Wait up to 500ms for another in-order segment. If no such segment, send ACK Immediately send single cumulative ACK, ACKing both in-order segments Immediately send duplicate ACK, indicating seq. # of next expected byte Immediate send ACK, provided that segment starts at lower end of gap
Transport Layer11 Fast Retransmit r Time-out period often relatively long: m long delay before resending lost packet r Detect lost segments via duplicate ACKs. m Sender often sends many segments back-to- back m If a segment is lost, there will likely be many duplicate ACKs. r If sender receives 3 duplicate ACKs for the same data, it supposes that segment after ACKed data was lost: m fast retransmit: resend segment before timer expires
Transport Layer12 event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } Fast retransmit algorithm: a duplicate ACK for already ACKed segment fast retransmit
Transport Layer13 TCP Flow Control r receive side of TCP connection has a receive buffer: r speed-matching service: matching the send rate to the receiving app’s drain rate r app process may be slow at reading from buffer sender won’t overflow receiver’s buffer by transmitting too much, too fast flow control
Transport Layer14 TCP Flow control: how it works (Suppose TCP receiver discards out-of-order segments) spare room in buffer = RcvWindow = RcvBuffer-[LastByteRcvd - LastByteRead] Rcvr advertises spare room by including value of RcvWindow in segments Sender limits unACKed data to RcvWindow: LastByteSent-LastByteAcked <= RcvWindow guarantees receive buffer doesn’t overflow
Transport Layer15 TCP Connection Management Recall: TCP sender, receiver establish “connection” before exchanging data segments r initialize TCP variables: m seq. #s buffers, flow control info (e.g. RcvWindow ) r client: connection initiator r server: contacted by client Three way handshake: Step 1: client host sends TCP SYN segment to server m specifies initial seq # m no data Step 2: server host receives SYN, replies with SYNACK segment m server allocates buffers, variables m specifies server initial seq. # Step 3: client receives SYNACK, allocate buffers and variables, replies with ACK segment, which may contain data
Transport Layer16 TCP Connection Management: Three way handshake client SYN=1, seq=client_ins server SYN=1, seq=server_ins, Ack=client_ins+1 SYN=0, seq=client_ins+1, Ack=server_ins+1 Connection request Connection granted ACK
Transport Layer17 TCP Connection Management (cont.) Closing a connection: client closes socket: Step 1: client end system sends TCP FIN control segment to server, FIN bit set to 1 Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN. client FIN server ACK FIN closing closed timed wait closed
Transport Layer18 TCP Connection Management (cont.) Step 3: client receives FIN, replies with ACK. m Enters “timed wait” – resend the final ACK in case it is lost m Connection closes after wait Step 4: server receives ACK. Connection closed. client FIN server ACK FIN closing closed timed wait closed
Transport Layer19 Principles of Congestion Control Congestion: r informally: “too many sources sending too much data too fast for network to handle” r different from flow control! r manifestations: m lost packets (buffer overflow at routers) m long delays (queueing in router buffers)
Transport Layer20 Causes/costs of congestion: scenario 2 r Case 1: send a packet only when a buffer is free: r Case 2: “perfect” retransmission only when loss: r Case 3: retransmission of delayed (not lost) packet makes larger (than case 2) for same in out “costs” of congestion: r more work performed by sender (retransmissions) r unneeded retransmissions (due to large delay) R/2 C/2 R/2 C/3 R/2 C/4
Transport Layer21 TCP Congestion Control r end-end control (no network assistance) r sender limits transmission rate: LastByteSent-LastByteAcked min{CongWin, RcvWindow) r Roughly, CongWin is dynamic, function of perceived network congestion How does sender perceive congestion? r loss event = timeout or 3 duplicate acks TCP sender reduces rate ( CongWin ) after loss event three mechanisms: m AIMD m slow start m reaction to timeout events rate = CongWin RTT Bytes/sec
Transport Layer22 TCP AIMD multiplicative decrease: cut CongWin in half after loss event additive increase: increase CongWin by 1 MSS every RTT in the absence of loss events: probing r also called congestion avoidance Long-lived TCP connection
Transport Layer23 TCP Slow Start When connection begins, CongWin = 1 MSS m Example: MSS = 500 bytes & RTT = 200 msec m initial rate = 20 kbps r available bandwidth may be >> MSS/RTT m desirable to quickly ramp up to respectable rate r When connection begins, increase rate exponentially fast until first loss event
Transport Layer24 TCP Slow Start (more) r When connection begins, increase rate exponentially until first loss event: double CongWin every RTT done by incrementing CongWin by one MSS for every ACK received r Summary: initial rate is slow but ramps up exponentially fast Host A one segment RTT Host B time two segments four segments
Transport Layer25 Refinement r After 3 dup ACKs: CongWin is cut in half m window then grows linearly r But after timeout event: CongWin instead set to 1 MSS; m window then grows exponentially m to a threshold, then grows linearly r Implementation: At loss event, threshold is set to 1/2 of CongWin just before loss event 3 dup ACKs indicates network capable of delivering some segments timeout before 3 dup ACKs is “more alarming” Philosophy:
Transport Layer26 Summary: TCP Congestion Control When CongWin is below Threshold, sender in slow-start phase, window grows exponentially. When CongWin is above Threshold, sender is in congestion- avoidance phase, window grows linearly. When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold. When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS. Triple duplicate ACK
Transport Layer27 Summary r principles behind transport layer services: m multiplexing, demultiplexing m reliable data transfer m flow control m congestion control r instantiation and implementation in the Internet m UDP m TCP