ECE544: Communication Networks-II Spring 2009 Hang Liu Lecture 8 Includes teaching materials from D. Raychaudhuri, S. Gopal, L. Peterson,
Today’s Lecture Introduction to transport protocols UDP TCP RTP
Protocol Stack R1 ETH FDDI IP ETH TCP/UDP R2 FDDI PPP IP R3 PPP ETH IP Host1 IP ETH TCP/UDP Host8 Appl. Transport protocol –Enable communication between 2 or more processes (which may be on different hosts in different networks) – The Transport Layer is the lowest Layer in the network stack that is an end-to-end protocol
Transport Protocols Different applications have different requirements for transport protocols. –Guarantee message delivery network may drop messages –Deliver messages in the same order they are sent Messages may be reordered in networks and incurs a long delay –Delivers at most one copy of each message Messages may duplicate in networks –Support arbitrarily large message Networks may limit message size –Support synchronization between sender and receiver –Allows the receiver to apply flow control to the sender –Support multiple application processes on each host –…… Design just a few transport protocols to meet most of the current and future application requirements –Each satisfies the requirements for a class of appls –Many applications=>few transport protocols
Most Popular Transport Protocols User Datagram Protocol (UDP): – Support multiple applications processes on each host – Option to check messages for correctness with CRC check Transmission Control Protocol (TCP): – Ensures reliable delivery of packets between source and destination processes – Ensures in-order delivery of packets to destination process – Other options Real Time Protocol (RTP): – Serves real-time multimedia applications – Header contains sequence number, timestamp, marker bit etc – Runs over UDP TCP, UDP and RTP satisfy needs of the most common applications –Applications requiring other functionality usually use UDP for transport protocol, and implement additional features as part of the application
6 User Datagram Protocol (UDP) De-multiplex the applications/processes on a host –Port: an identification for a communication process –Each process-to-process communication is identified by 4-Tuple Connection Identifier –Well-known port: Unix talk: port 517 SrcPortDesPort LengthChecksum Data Appl process UDP IP Appl process TCP Appl process Port Checksum: Length: UDP header + data bytes User Space Kernel
Transmission Control Protocol (TCP) First proposed by Vinton Cerf and Robert Kahn, 1974 –TCP/IP enabled computers of all sizes, from different vendors, different OSs, to communicate with each other. –Used by 80% of all traffic on the Internet Reliable, in-order delivery, connection- oriented, bye-stream service
A Simple File Transfer Application Server: –passive open and wait for connection Client: –Active open and initialize connection establishment After connection establishment –Reliable data transport Terminate connection
TCP Operation Sender application process only needs to provide a byte stream to the kernel Kernels on sending and receiving hosts operate TCP processes Receiver application process only needs to read received bytes from the assigned TCP buffers 4-Tuple Connection Identifier: SrcPort, SrcIPAddr, DestPort, DestIPAddr
TCP Operation Sequence TCP protocol completely implemented at the end hosts Sequence numbers maintained in bytes (remember, TCP serves a byte stream) Start of operation: –Connection Establishment by a Three-Way Handshake algorithm –Consensus on Initial Sequence Number (ISN) Data Transfer: –Sends the data in packets, reliably and as fast as the network/receiver permits Finish: –Both sides independently close their half of the connection
TCP Header Format Flags: –SYN –FIN –RESET –PUSH –URG –ACK
Connection Establishment Three-Way Handshake Algorithm – SYN and ACK flags in the header used –Initial Sequence numbers x and y selected at random Required to avoid same number for previous incarnation on the same connection Active participant (client) Passive participant (server) SYN, Seq#=x SYN+ACK, Seq#=y Ack#=x+1 ACK, Ack#=y+1 Data+ACK Connection Establishment Data transport Termination
Connection tear-down Any side can terminate the connection Each side closes its half of the connection independently FIN FIN-ACK FIN ACK DATA Data write Data ACK
TCP State-Transition Max segment lifetime (MSL): recommendation 120 sec CLOSED LISTEN ESTABLISHED SYN_RCVDSYN_SENT FIN_WAIT_1 FIN_WAIT_2 CLOSE_WAIT LAST_ACK CLOSEDTIME_WAIT CLOSING Passive openClose SYN/SYN+ACK Close Active open/SYN SYN/SYN+ACK ACK SYN+ACK/ACK Send/SYN Close/FIN FIN/ACK Event/Action ACK FIN/ACK ACK+FIN/ACK Timeout after 2 x MSL FIN/ACK ACK Close/FIN
TCP Functions Goal of TCP: Deliver data reliably and in order as fast as possible (Throughput = bytes delivered/ time taken) Flow Control: –avoid that the sender sends data too fast so that the TCP receiver cannot reliably receive and process it. Error Control and Congestion Control: –When packets are lost, it implies that the one or more queues in intermediate routers have overflowed. –Retransmit lost packets –Scale back flow rate to reduce congestion –Congestion control and error control are intertwined using the congestion window (cwnd) TCP increases the sending rate to use the network (the route) to full capacity or receiver capability –But scale back if congestion occurs or if receiver is flooded.
Flow Control TCP uses a sliding window flow control protocol –the receiver specifies the amount of data (in bytes) willing to buffer in the AdvertisedWindow field of each segment –The sender can send only up to that amount of data before it must wait for an ack and window update from the receiver. Sending Appl Receiving Appl LastByteAckedLastByteSent LastByteWritten NextByteExpected LastByteRcvd LastByteRead LastByteRead < NextByteExp <= LastByteRcvd+1LastByteAcked <= LastByteSent <= LastByteWritten AdvertisedWindow = MaxRcvBuffer- ((NextByteExp-1)-LastByteRead) LastByteSent – LastByteAcked <= AdvertisedWindow EffWin = AdvertisedWin- (LastByteSent-LastByteAcked) LastByteRcvd-LastByteRead<=MaxRcvBuffer LastByteWritten – LastByteAcked <= MaxSendBuffer TCP
Sequence Number Protect against SequenceNum wraparound –Sliding window Seq # space >= 2 x WinSize For TCP: 2 32 >> 2 x 2 16 –Seq # should not wraparound within a MSL (120 sec) period of time –For OC-48 (2.5 Gbps), time until wraparound: 14 sec TCP extension to the sequence # space for protecting against seq # wrapping around –Add 32-bit timestamp as optional header
Keep the pipe full AdvertisedWindow: 2 16 =>64 KB –Big enough to allow the sender to keep the pipe full (assume that the receiver has enough buffer to handle the data) –If RTT = 100 ms, Delay x Bandwidth = 122 KB for 10 Mbps Ethernet Delay x Bandwidth = 1.2 MB for 100 Mbps Ethernet (AdvertisedWindow is not large enough) TCP Extension: –Scaling factor option for AdvertisedWindow, e.g. use 16-byte units of data
Triggering Transmission When to transmit a segment: –small segments subject to large overhead Reach max segment size (MSS): the size of the largest segment TCP can send without causing the local IP to fragment –MSS = local MTU – IP & TCP header The sending process explicitly ask the TCP to transmit, “push”
TCP Deadlock – receiver advertises a window size of 0, the sender stops sending data –the window size update from the receiver is lost To solve it: –the sender starts the persist timer when AdvertisedWindow = 0 –When the persist timer expires, the sender sends a small packet
TCP Silly Window Syndrome –Sender has MSS bytes of data to send, but window is closed –ACK arrives with a small window –Sender sends a small segment (high overhead) –Receiver advertise a small window –Sender sends a small receive segment –Repeat the above To solve: Nagle’s Algorithm –When the application have data to send If both available data and the window >= MSS –Send a full segment Else –If there is unACKed data in flight »Buffer the new data until an ACK arrives –Else »Send all the new data now
TCP Error Control Cumulative retransmission: ack the expected seq # in Ack field –Extension: selective ack (SACK), ack additional blocks of received data in TCP optional header Adaptive retransmission –Adapt the retran timer to RTT
TCP Timeout Original Algorithm: –EstimatedRTT = x EstimatedRTT + (1- ) x SampleRTT –Timeout = 2 x EstimatedRTT –Issue: does not distinguish whether the ACK is for original transmission or retransmission Karn/Partridge Algorithm –Whenever TCP retransmits a segment, it stops taking samples of the RTT Only measure SampleRTT for segments that have only have been send once –Each time TCP retransmits, set the next timeout to be twice the last timeout Relieve congestion
TCP Timeout (Cont) TCP Timeout –If timeout too soon, unnecessarily retransmit and add load to network –If timeout too late, increase latency Jacobson/Karels Algorithm: better RTT estimation by considering the variance Difference = SampleRTT - EstimatedRTT EstimatedRTT = EstimatedRTT + ( x Difference) Deviation = Deviation + (|Difference|- Deviation) Timeout = x EstimatedRTT + x Deviation (default: set = 1 and = 4, )
Congestion TCP assumes packet loss as congestion Source 1 Source 2 Source 3 Dest 2 Dest 1
TCP Congestion Control TCP sends packets into network without reservation –Try to use network resource (bandwidth, buffer) as much as it can As congestion occurs, scales back Strategy: –Conservatively increases packet sending rate if no congestion –Quickly reduce sending rate as congestion detected (timeout)
Additive increase/multiplicative decrease (AIMD) Maintain a CongestionWindow –MaxWindow = MIN(CongestionWindow, AdvertisedWindow) –EffectiveWin = MaxWindow – (LastByteSent – LastByteAcked) Decrease congestion window aggressively and increase it conservatively –A simple algorithm: Additive increase/multiplicative decrease (AIMD) –Each time a timeout occurs, congestion window size (cwnd) cwnd = cwnd/2 –cwnd = Max(Cwnd, MSS) –Each time an ACK received Increment = MSS x (MSS/Cwnd) –Cwnd += Increment (CongestionWindow increase by one packet or MSS after all packets sent out during last RTT have been acked)
Additive increase/multiplicative decrease (Cont) TCP sawtooth pattern Time CongestionWindow Size Issues with additive increase: –takes too long to ramp up a connection from the beginning –The entire advertised window may be reopened when a lost packet retransmitted and a single cumulative ACK is received by the sender
TCP “Slow Start” When timeout –Slow Start Threshold (SSThresh) = cwnd/2 –Cwnd = MSS when receive an ack –If cwnd <= SSThresh (Slow start phase) incr = MSS (exponential growth, double cwnd every RTT) –Else (congestion avoidance mode) incr = MSS x MSS/cwnd (linear growth, add 1 MSS per RTT) Cwnd = min(cwnd+incr, TCP_MAXWIN)
TCP Slow Start (Cont) Slow start Time CongestionWindow Size
Slow-start and congestion avoidance A closer look
Fast Retransmit TCP timeout issue: –may be a long time periods of time during which the connection went dead while waiting for a timer expire Solution: –Add fast retransmit (not replace regular timeout): Everytime receiver receives a packet (out-of-order), send a duplicate ACK Sender retransmit the missing packet after it receives some number of duplicate ACKs (e.g. 3 duplicate ACKs) ACK 1 ACK 2 ACK 6 PKT 1 PKT 2 PKT 4 PKT 5 PKT 6 PKT 3 Retran PKT 3
Fast Recovery during congestion avoidance mode, when packets (detected through 3 duplicate ACKs) are not received, the congestion window size is reduced to the slow-start threshold, rather than the small initial value 1 MSS. –Cwnd = SSThresh = cwnd/2 (escape slow start phase when fast retransmit detects a lost packet and additive increase begins)
TCP Sender Operation TCP operation is paced by its ACKs Wait for ACK (operate in slow-start or congestion avoidance) Measure RTT if applicable Set cwnd_start = purge_acked_pkts() Set cwnd = cwnd + increment_value() Send new packets Retart timer if applicable If 3 rd duplicate ACK Fast_retransmit() Fast_recovery() Set ssthresh = cwnd/2, cwnd = 1 Retransmit cwnd_start packet ACKs indicate lossless operation ACKs indicate possible losses Duplicate ACK No ACKs Timeout New ACK
TCP Reno Most popular TCP flavor; implemented in most operating systems Includes Fast Retransmit and Fast Recovery
Real-time Traffic Quality of Service (QoS) factors: Reliability, Delay and Jitter –Late arrival = loss –Because of possibly unbounded retransmissions in TCP, large delay and jitter may ensue. Real-time applications prefer UDP instead. Jitter can be smoothed out by a playback buffer, but initial buffering introduces a longer delay and reduce the interactivity. –Depends on application delay tolerance Real-time transport protocol (RTP) operates over UDP, –RTP modules run in user- space. RTP libraries included in the application. –RTP provides no error recovery and congestion control; Applications handle all aspects themselves. Application RTP UDP IP Subnet
RTP RTP standard: RFC 3550 V=2PXCCM Payload Type Sequence number Timestamp Sync. source (SSRC) identifier Contributing source (CSRC) identifier (optional) Extension header RTP payload Padding length
RTCP Real-time transport control protocol (RTCP) –provides out-of-band statistics and control information for an RTP flow –partners RTP in the delivery and packaging of multimedia data, but does not transport any media streams itself. –RTP uses even port number, whereas RTCP uses the next higher odd port number.
RTCP Functions –Gather statistics on quality aspects of the media distribution can be used by the source for adaptive media encoding (codec) and detection of transmission faults –Convey the canonical end-point identifiers (CNAME) to all session participants SSRC may change during a session, but CNAME represents the unique identity of a sender –Correlate and synchronize different media streams RTCP messages –Source reports –Receiver reports –Source descriptions –Application-specific control packets
Homework Due 4/17