EE 122: Transport Protocols Kevin Lai October 16, 2002
Motivation IP provides a weak, but efficient service model (best-effort) -packets can be delayed, dropped, reordered, duplicated -packets have limited size (why?) IP packets are addressed to a host -how to decide which application gets which packets? How should hosts send into the network? -every sends as fast as they can drop many packets, network is under-utilized (congestion collapse)
Transport Protocol Provides more than the underlying network protocol -more reliability, in order delivery, at most once delivery -supports messages of arbitrary length -provide a way to decide which packets go to which applications (multiplexing/demultiplexing) -govern how hosts should send data to prevent congestion collapse (congestion control and avoidance) Link Layer Physical Layer Networking Layer Transport Layer TCP/UDP IP
UDP User Datagram Protocol minimalistic transport protocol same best-effort service model as IP messages can be larger than one packet, but still limited (64KB) -uses fragmentation provides multiplexing/demultiplexing to IP does not provide congestion control advantage over TCP: does not increase end-to- end delay over IP application example: video/audio streaming
TCP Transmission Control Protocol reliable, in-order, and at most once delivery messages can be of arbitrary length provides multiplexing/demultiplexing to IP provides congestion control and avoidance increases end-to-end delay over IP e.g., file transfer, chat
Headers IP header used for IP routing, fragmentation, error detection… UDP header used for multiplexing/demultiplexing, error detection TCP header used for multiplexing/demultiplexing, flow and congestion control IP TCP UDP dataTCP/UDP dataTCP/UDPIP Application Sender data IP TCP UDP Application Receiver dataTCP/UDP dataTCP/UDPIP data
IP Header Comments -HLen – header length only in 32-bit words (5 <= HLen <= 15) -TOS (Type of Service): Differentiated Service (6 bits) Explicit Congestion Notification (ECN) (2 bits) -Length – the length of the entire datagram/segment; header + data -Flags: Don’t Fragment (DF) and More Fragments (MF) -Protocol: identifies the transport protocol -Header checksum - uses 1’s complement VersionHLen TOSLength Identification Fragment offset Flags Source address Destination address TTLProtocolHeader checksum Options (variable) 20 bytes Payload
Fragmentation What happens if router has to forward an IP packet that is larger than allowed by a data link layer? Break the IP packet into smaller IP packets and provide a way to reassemble -set “more fragments” bit in all fragments but last -set the fragment offset of fragment to be offset (in 8- byte offsets) from beginning of original packet -set the packet len to be length of this fragment
Fragmentation Issues Sending host had better be changing the IP ID Loose one fragment, loose them all Reassembly is complex -requires per packet state Only reassemble at destination Fragmentation can be avoided using Path Maximum Transmission Unit Discovery (PMTU) -most TCP implementations use PMTU
UDP Header Source and destination ports use port address space UDP length is UDP packet length (including UDP header and payload, but not IP header) Optional UDP checksum is over UDP packet -why have UDP checksum in addition to IP checksum? -why not have just the UDP checksum? -why is the UDP checksum optional? Source port Destination port UDP lengthUDP checksum Payload (variable)
Port Addressing Need to decide which application gets which packets Solution: map each socket to a port Client must know server’s port separate 16-bit port address space for UDP and TCP -(src IP, src port, dst IP, dst port) uniquely identifies TCP connection Well known ports(0-1023): everyone agrees which services run on these ports -e.g., ssh:22, -on UNIX, must be root to gain access to these ports (why?) ephemeral ports(most ): given to clients -e.g. chatclient gets one of these
TCP Header Sequence number, acknowledgement, and advertised window – used by sliding-window based flow control Flags: -SYN, FIN – establishing/terminating a TCP connection -ACK – set when Acknowledgement field is valid -URG – urgent data; Urgent Pointer says where non-urgent data starts -PUSH – don’t wait to fill segment -RESET – abort connection Source port Destination port Options (variable) Sequence number Acknowledgement Advertised window ChecksumUrgent pointer FlagsHdrLen Payload (variable)
TCP Challenges how to provide reliable, in-order, and at most once delivery? (sliding window) need to synchronize sender and receiver (connection establishment) -e.g., exchange initial sequence numbers prevent sender from sending too fast for receiver (flow control) estimate RTT for flow control and timeouts how to initially decide on sending rate (slow start) estimate how much bandwidth is available in network (congestion avoidance) slow down sending rate when we were sending too fast (congestion control)
Connection Establishment: How it works Three-way handshake -Goal: agree on a set of parameters: the start sequence number for each side -Starting sequence numbers are random. Client (initiator) Server SYN, SeqNum = x SYN and ACK, SeqNum = y and Ack = x + 1 ACK, Ack = y + 1 Active Open Passive Open connect()listen() accept() allocate buffer space
Three-way Handshake: Rationale Three-way handshare adds 1 RTT delay Why not just start sending data immediately? -congestion control network could be congested SYN = 40 bytes, Data < 1500 bytes packets which are dropped at a link waste the bandwidth of all previous links smaller packets waste less bandwidth SYN acts as cheap probe of network conditions
More Rationale -protection from denial of service (1) attacker could use one host to fake many SRC IP address (spoofing) and send many SYNs to server server must devote resources (e.g., buffer space) for open connections server would run out of resources and become very slow or crash 3-way handshake requires client to reply before server allocates significant resources -protection from denial of service (2) client and server begin connection using well- known sequence number instead of random one attacker guesses sequence number, inserts bogus packets into stream
Even More Rationale -protection from delayed packets client connects to server twice in succession using the same port a packet from the first connection is delayed and arrives during the second connection if sequence numbers are close, old packet could be accepted
1/18/ Flow control: Window Size and Throughput Sliding-window based flow control: -Higher window higher throughput Throughput = wnd/RTT Remember: window size control throughput How to determine effective window size? How to detect packet loss? wnd = 3 segment 1 segment 2 segment 3 ACK 1 segment 4 ACK 2 segment 5segment 6 ACK 3 RTT (Round Trip Time)
Effective Window Size Receiver window (MaxRcvBuf – maximum buffer size at receiver) Sender window (MaxSendBuf – maximum buffer size at sender) LastByteAcked LastByteSent LastByteWritten Sending Application NextByteExpectedLastByteRcvd LastByteRead Receiving Application sequence number increases AdvertisedWindow = MaxRcvBuffer – (LastByteRcvd – LastByteRead) EffectiveWindow = AdvertisedWindow – (LastByteSent – LastByteAcked) MaxSendBuffer >= LastByteWritten - LastByteAcked MaxRcvBuffer MaxSendBuffer
Advertised Window = 0 Sender cannot send any data receiver will not send acks receiver cannot notify sender that advertised window has grown Solution: TCP Persist Timer -when sender gets advertised window == 0, it sets timer -if sender receives advertised window > 0, cancels timer -when timer expires, sender sends 1 byte payload to receiver receiver must accept data 1 byte past window -receiver sends ack for byte before 1 byte -sender gets new advertised window
Silly Window Syndrome (SWS) Maximum Segment Size (MSS) = w App sends of small segments and/or receiver advertises small window -causes small packets to be sent in network -small packets have high header overhead advWin = w size=1 app: read 1 advWin=w size=w-1 app: send 1 app: send w+1 size=1 advWin=w app: read w-1
SWS Solution Sender only sends if -no unacknowledged data, (Nagle’s algorithm) or -full packet to send Receiver only sends new advertised window if -newAdvWin – oldAdvWin > min(MSS, 0.5*maxRcvBuf)
Set timeout If haven’t received ack by timeout, retransmit packet after last acked packet How to set timeout? -Too long: connection has low throughput -Too short: retransmit packet that was just delayed packet was probably delayed because of congestion sending another packet too soon just makes congestion worse Solution: make timeout proportional to RTT
RTT Estimation Use exponential averaging: EstimatedRTT Time SampleRTT
Problem How to differentiate between the real ACK, and ACK of the retransmitted packet ACK Retransmission Original Transmission SampleRTT SenderReceiver ACK Retransmission Original Transmission SampleRTT SenderReceiver
Karn/Partridge Algorithm Measure SampleRTT only for original transmissions Exponential backoff for each retransmission, double EstimatedRTT
Jacobson/Karels Algorithm Problem: exponential average is not enough -one solution: use standard deviation (requires expensive square root computation) -use mean deviation instead
Summary IP -routing, fragmentation UDP -Multiplexing/demultiplexing using ports -error detection TCP -reliable, in order, at most once delivery -Connection establishment three way handshake -RTT exponential averaging and variance -Flow control based on sliding window protocol -Congestion control next lecture