ECE544: Communication Networks-II Spring 2010 Sumit Rangwala Includes teaching materials from S. Gopal, L. Peterson, and D. Raychaudhuri
Today’s Lecture Introduction to transport protocols UDP TCP RTP
The Disconnect Applications running on hosts need to communicate Guaranteed Service Best-effort R1 ETH FDDI IP R2 FDDI PPP IP R3 PPP ETH IP IP ETH IP ETH Applications running on hosts need to communicate Require some guarantees from the underlying layer Network Layer (IP) provides only best-effort communication services Only between hosts (not applications)
Transport Protocol Transport protocol Host1 Host8 Appl. Appl. TCP/UDP TCP/UDP R1 ETH FDDI IP R2 FDDI PPP IP R3 PPP ETH IP IP IP ETH ETH Transport protocol Provides services required by applications using the services provides by the network layer The Transport Layer is the lowest layer in the network stack that is an end-to-end protocol
Transport Protocols Applications requirements vs. IP layer limitations Guarantee message delivery Network may drop messages. How? Deliver messages in the same order they are sent Messages may be reordered in networks and incurs a long delay. How? Delivers at most one copy of each message Messages may duplicate in networks. How? Support arbitrarily large message Network may limit message size. Why? Support synchronization between sender and receiver Allows the receiver to apply flow control to the sender Support multiple application processes on each host Network only support communication between hosts Many more Design just a few transport protocols to meet most of the current and future application requirements Each satisfies the requirements for a class of applications Many applications=>few transport protocols
Most Popular Transport Protocols User Datagram Protocol (UDP) Support multiple applications processes on each host Option to check messages for correctness with CRC check Transmission Control Protocol (TCP) Ensures reliable delivery of packets between source and destination processes Ensures in-order delivery of packets to destination process Other services Real Time Protocol (RTP) Serves real-time multimedia applications Moves decision making to the applications Runs over UDP TCP, UDP and RTP satisfy needs of the most common applications Applications requiring other functionality usually use UDP for transport protocol, and implement additional features as part of the application
User Datagram Protocol (UDP): Demultiplexing Service: Support for multiple processes on each host to communicate Issue: IP only provides communication between hosts (IP addresses) Solution Add port number and associate a process with a port number 4-Tuple Unique Connection Identifier: [SrcPort, SrcIPAddr, DestPort, DestIPAddr ] Appl process UDP IP Appl process UDP IP 16 31 SrcPort DesPort Length Checksum Payload Network UDP Packet Format
User Datagram Protocol (UDP): Error Detection Service: Ensure message correctness Issue: Packet corruption in transit Solution Use Checksum. Why isn’t IP checksum enough? Includes UDP header, payload, pseudo header Pseudo header Protocol number, source IP address, destination IP address, and UDP length 16 31 SrcPort DesPort Length Checksum Payload
Transmission Control Protocol (TCP) First proposed by Vinton Cerf and Robert Kahn, 1974 TCP/IP enabled computers of all sizes, from different vendors, different OSs, to communicate with each other. Used by 80% of all traffic on the Internet Reliable, in-order delivery, connection-oriented, bye-stream service
TCP: Connection-oriented Service: Connection-oriented Application states the destination once Issue: IP is connection-less Solution: TCP maintains the connection state Connection Establishment Connection Termination
A Simple File Transfer Connection Establishment Server: passive open and wait for connection (on a port) Client: Active open and initialize connection establishment After connection establishment Data transport (more later) Terminate connection Both sides independently close their half of the connection
TCP: Packet Format Flags Sequence number Acknowledgement SYN, FIN, ACK, RESET, URG, PUSH Sequence number Sequence number of the first byte of data in the segment It is an abstract number (more later) Acknowledgement Next sequence number expected from the sender
Connection Establishment Active participant (client) Passive participant (server) SYN, Seq#=x SYN+ACK, Seq#=y Ack#=x+1 ACK, Ack#=y+1 Data+ACK Connection Establishment Data transport Server Informs TCP about the listening port Up-call registration Client Performs three way handshake SYN and ACK flags in the header are used Initial Sequence numbers x and y selected at random Why?
Connection Termination FIN FIN-ACK ACK DATA Data write Data ACK Any side can terminate the connection Each side closes its half of the connection independently A connection may be half-opened Can only receive data
TCP State-Transition Max segment lifetime (MSL): 120 sec (recommended)
TCP: Byte-stream Service: Byte-stream Application reads or writes a stream of bytes to the transport Issue: IP is packet-oriented Solution: TCP maintains a local buffer Chop the stream into packets and transmit (sender) Coalesce data from packets to form a stream (receiver) Issues?
TCP: Reliable and Ordered Delivery Service: Reliable Delivery of byte-stream Solution: Sliding Window Protocol Studied earlier Buffer size at the receiver should be at least receiver window size Receiving Appl Sending Appl LastByteRead TCP LastByteWritten TCP NextByteAcked LastByteRcvd LastByteAcked LastByteSent Receiver Window Size Receiver Window Size But what if the receiving application cannot read data fast enough?
Slow Receiver Receiver cannot read bytes at the speed the network is delivering data Requires a buffer > receiver window size If receiver window size is kept constant, in worst case requires infinite buffer Receiving Appl Sending Appl LastByteRead TCP LastByteWritten TCP NextByteExpected LastByteRcvd LastByteAcked LastByteSent < Receiver Window Size Receiver Window Size
TCP: Flow Control Flow Control “Prevent sender from overrunning the capacity (buffer) of the receiver” Solution: Use adaptive receiver window size Goal is to keep (C) – (A) < MaxRcvBuffer Every packet carries ACK and AdvertisedWindow Sending Appl Receiving Appl LastByteAcked (J) (K) LastByteSent (I) LastByteWritten (B) NextByteExpected (C) LastByteRcvd LastByteRead (A) TCP LastByteSent (K) – LastByteAcked (J) <= AdvertisedWindow EffWin = AdvertisedWin - (LastByteSent-LastByteAcked) AdvertisedWindow = MaxRcvBuffer- ((NextByteExp-1)-LastByteRead) LastByteWritten – LastByteAcked <= MaxSendBuffer
Sequence Number Wrap Around Protect against SequenceNum wrap around Sliding window Seq # space >= 2 x WinSize For TCP: 232 >> 2 x 216 Seq # should not wraparound within a MSL (120 sec) period of time For OC-48 (2.5 Gbps), time until wraparound: 14 sec TCP extension to the sequence # space for protecting against seq # wrapping around Add 32-bit timestamp as optional header
Keep the Pipe Full AdvertisedWindow: 216=>64 KB TCP Extension: Big enough to allow the sender to keep the pipe full (assume that the receiver has enough buffer to handle the data) If RTT = 100 ms, Delay x Bandwidth = 122 KB for 10 Mbps link Delay x Bandwidth = 1.2 MB for 100 Mbps link (AdvertisedWindow is not large enough) TCP Extension: Scaling factor option for AdvertisedWindow, e.g., use 16-byte units of data
TCP Error Control Cumulative ACK: ACK the highest contiguous bytes received Same as studied before Extension: Selective ACK (SACK), ACK additional blocks of received data in TCP optional header Timeout Timer If timeout too soon unnecessarily retransmit → adds load to network If timeout too late Increases latency Limits the throughput. How?
TCP Timeout Issue: RTT in a wide area network varies substantially Solution: Adaptive Timeout Original Algorithm: EstimatedRTT = a x EstimatedRTT + (1-a) x SampleRTT Timeout = β x EstimatedRTT (β = 2) Problem Does not distinguish whether the ACK is for original transmission or retransmission (suggestions?) Constant β is not good. Why? Assumes constant variance
TCP Timeout Karn/Partridge Algorithm Whenever TCP retransmits a segment, it stops taking samples of the RTT Only measure SampleRTT for segments that have have been sent only once Each time TCP retransmits, set the next timeout to be twice the last timeout Relieves congestion Jacobson/Karels Algorithm: Adaptive variance (uses mean variance) Difference = SampleRTT - EstimatedRTT EstimatedRTT = EstimatedRTT + (d x Difference) → (same as in original) Deviation = Deviation + d(|Difference|- Deviation) Timeout = m x EstimatedRTT + f x Deviation (default: set m = 1 and f= 4 )
Triggering Transmission When to transmit a segment: small segments subject to large overhead Reach max segment size (MSS): the size of the largest segment TCP can send without causing the local IP to fragment MSS = local MTU – IP & TCP header The sending process explicitly ask the TCP to transmit, “push”
TCP Silly Window Syndrome Sender has MSS bytes of data to send, but window is closed ACK arrives with a small window Sender sends a small segment (high overhead) Receiver advertise a small window Sender sends a small receive segment Repeat the above To solve: Nagle’s Algorithm When the application have data to send If both available data and the window >= MSS Send a full segment Else If there is unACKed data in flight Buffer the new data until an ACK arrives Send all the new data now
TCP Deadlock TCP Deadlock To solve it: receiver advertises a window size of 0, the sender stops sending data the window size update from the receiver is lost To solve it: the sender starts the persist timer when AdvertisedWindow = 0 When the persist timer expires, the sender sends a small packet
TCP Services Connection-oriented Byte-stream service Reliable and In-order Flow Control Error Control Congestion Control (next session)
Even with flow control packets might not reach the destination Congestion Source 1 Even with flow control packets might not reach the destination Dest 1 Source 2 Dest 2 Source 3 When the network cannot support the sender’s rate Queues at the network elements overflow
Congestion Control vs. Flow Control Mechanism to prevent sender from overrunning the capacity of the network When network is the bottleneck Flow Control Mechanism to prevent sender from overrunning the capacity of the receiver When receiver is the bottleneck
Congestion Control: Design Approach Maintain another window at the sender called CongestionWindow (cwnd) CongestionWindow is the max number of packets allowed in the network Number of unACKed packets at the sender. Why? Key: How to calculate congestion window (cwnd) Various approaches possible TCP estimates it based on observed packet losses Assumes packet loss as indication of congestion Since we don’t know whether the network or the receiver is the bottleneck MaxWindow = MIN(CongestionWindow, AdvertisedWindow) EffectiveWin = MaxWindow – (LastByteSent – LastByteAcked)
TCP Congestion Control Consist of four mechanisms Slow Start → (2) Getting to the equilibrium Congestion Avoidance → (1) Maintaining the equilibrium Fast Retransmit → (3) Avoiding retransmission timeouts and slow starts Fast Recovery → (4) Avoiding unnecessary slow starts
Congestion Avoidance: (AIMD) If no congestion in the network (increase conservatively) Increase the congestion window additively every RTT If congestion in the network (decrease aggressively) Decrease the congestion window multiplicatively, immediately How is congestion detected? Estimated (more later) Every ACK reception cwnd = cwnd + MSS*(MSS/cwnd) cwnd in bytes Every RTT w = w + 1 w = cwnd in segments Every ACK reception w = w + 1/w w = cwnd in segments cwnd = cwnd/2 cwnd in bytes
Congestion Avoidance: (AIMD) CongestionWindow Size Startup time Time TCP’s saw tooth pattern Issues with additive increase takes too long to ramp up a connection from the beginning The entire advertised window may be reopened when a lost packet retransmitted and a single cumulative ACK is received by the sender
TCP “Slow Start”: To start quickly! Maintain another variable slow start threshold (ssthresh) Last known stable rate If (cwnd > ssthresh) State = congestion avoidance Else State = slow start In Slow start Increase the congestion window exponentially every RTT Key: How is ssthresh calculated? Every ACK reception w = w + 1 w = cwnd in segments Every ACK reception cwnd = cwnd + MSS cwnd in bytes
TCP: Congestion Detection and Retransmit Loss of packet indicates congestion Timer Timeouts (No ACK) Set according to Jacobson/Karels algorithm On timer timeout ssthresh = max(2*MSS, effwin/2); cwnd = MSS Notice this will cause TCP to go into slow start Issue: takes a long time to detect a packet loss Affects throughput Any other quicker way of detecting a packet loss?
Fast Retransmit Observation: A series of duplicate ACKs might mean a packet loss Solution Every time receiver receives a packet (out-of-order), sends a duplicate ACK Sender retransmit the missing packet after it receives some number of duplicate ACKs (e.g. 3 duplicate ACKs) Fast Retransmit does not replaces timeouts Issue: Reduces latency (early retransmit) but still incurs loss in throughput (slow start after packet loss ) PKT 1 PKT 2 ACK 1 PKT 3 ACK 2 PKT 4 ACK 2 PKT 5 PKT 6 ACK 2 ACK 2 PKT 3 Retran ACK 6
Fast Recovery Transmit a packet for every ACK received till the retransmitted packet is ACK ssthresh= (2*MSS, cwdn/2); cwnd = sshthred + 3 On every ACK will the ACK of retransmitted packet cwnd = cwnd + 1 On reception of ACK of retransmitted packet Start congestion avoidance instead of slow start cwnd = ssthresh
Putting it all together (TCP Reno)
Homework 5.13 5.16 5.28 5.34 5.39 Due 4/16
Bonus Question Can a TCP receiver fool a TCP sender to increase its congestion window super linearly? Ponder: We studied that TCP increases the window size linearly during congestion avoidance. Is the rate of increase of the congestion window constant? If not, what could be the reasons?