The Transport Layer: TCP and UDP Chap 2
Basic Philosophy of TCP/IP Simple core, complex edge Edge can be hosts, edge routers, network boundaries, etc. Why? Scalable, flexibility for different complexities at edge
Internet Architecture Mesh of separate networks connected at exchange points Tier 1 providers carry full Internet routing tables no defaults Tier 2+ providers carry subset and point to upstream default
Overview of TCP/IP Protocols
IPv4, IPv6 Header Format IPv4 Header Format IPv6 Header Format
Extension Headers Extension Header Order IPv6 header Hop-by-Hop Options header Destination Options header Processing option for node indicated by IPv6 Destination Address & Routing header’s list Routing header Fragment header Authentication header Encapsulating Security Payload header Destination Options header Only by the final destination of the packet Upper-layer header
Text Representation of Address X:X:X:X:X:X:X:X (X: Hexadecimal) ex) FEDC:BA98:7654:3210:FEDC:BA98:7654:3210 In order to make writing addresses containing zero bits easier a special syntax is available to compress the zeros. ex) 1080:0:0:0:8:800:200C:417A -> 1080::8:800:200C:417A = a unicast addr. FF01:0:0:0:0:0:0:101 -> FF01::101 = a multicast addr. 0:0:0:0:0:0:0:1 -> ::1 = loopback addr. 0:0:0:0:0:0:0:0 -> :: = unspecified addr. X:X:X:X:X:X:d.d.d.d A mixed environment of IPv4 and IPv6. ex) 0:0:0:0:0:0: :0:0:0:0:FFFF:
Buffer Size and Limitation Max. size of IP datagram IPv4: bytes including header(20 bytes) IPv6: bytes(payload) + 40 bytes(header) MTU (Max. Transmission Unit) Network 이 전달해 줄 수 있는 최대 payload 크기 (Ethernet 에서 1500 bytes) path MTU: the smallest MTU in the path between two hosts Fragmentation is performed if datagram size > link MTU In IPv4: by host or router, IPv6: only by host DF bit in IPv4 header may be used for path MTU discovery Min. reassembly buffer size (guaranteed by any implementation) 576 bytes in IPv4, 1500 bytes in IPv6 MSS (Max. Segment Size): max TCP payload size to avoid IP fragmentation In Ethernet, MSS = MTU(1500) – IP header(20 or 40) – TCP header(20) = 1460 B (IPv4) or 1440 B (IPv6)
UDP UDP connection-less datagram service lack of reliability No flow control, no congestion control Support multicasting No overhead like TCP UDP user datagram format Source PortDestination Port LengthUDP Checksum Data
TCP Overview Connection-oriented Byte-stream sending process writes some number of bytes TCP breaks into segments and sends via IP receiving process reads some number of bytes Full duplex Flow control: keep sender from overrunning receiver Congestion control: keep sender from overrunning network
TCP Fundamental Objectives Deliver data in sequence to receiver Fill pipe from sender to receiver avoid congestion at receiver and in network ACK-driven Sending rate clocked to arrival of ACKs Impacted by Large (bandwidth * delay) networks need to fill pipe Packet loss recover and keep the pipe full
TCP Header Format
TCP Connection Setup and Teardown Establishment: Three-Way Handshake Termination
Sliding Window Each byte has a sequence number ACKs are cumulative Sending side LastByteAcked LastByteSent LastByteWritten Bytes between LastByteAcked and LastByteWritten must be buffered Receiving side NextByteRead < NextByteExpected LastByteRcvd + 1 Bytes between NextByteRead and LastByteRcvd must be buffered NextByteRead
Keeping the Pipe Full Wrap Around: 32-bit SequenceNum Bandwidth & Time Until Wrap Around Bytes in Transit: 16-bit AdvertisedWindow (< 64KB) Bandwidth & Delay x Bandwidth Product Bandwidth T1 (1.5Mbps) Ethernet (10Mbps) T3 (45Mbps) FDDI (100Mbps) STS-3 (155Mbps) STS-12 (622Mbps) STS-24 (1.2Gbps) Time Until Wrap Around 6.4 hours 57 minutes 13 minutes 6 minutes 4 minutes 55 seconds 28 seconds Delay x Bandwidth (RTT = 100ms) 18KB 122KB 549KB 1.2MB 1.8MB 7.4MB 14.8MB
RTO Estimation for Adaptive Retransmission Jacobson/Karels Algorithm New calculation for average RTT Diff = SampleRTT - EstimatedRTT EstimatedRTT = EstimatedRTT + ( x Diff) Deviation = Deviation + (|Diff|- Deviation) where is a fraction between 0 and 1 (1/8) Consider variance when setting timeout value RTO = x EstimatedRTT + x Deviation where = 1 and = 4
TCP Extensions Implemented as header options Store timestamp in outgoing segments Use 32-bit timestamp to extend sequence space (PAWS) Shift (scale) advertised window
TCP Congestion Control Congestion control prevents a sender from overrunning the capacity of the network (e.g. links and routers) TCP adapts sender's rate to network capacity and attempts to avoid potential congestion situations Basic congestion control mechanisms that TCP supports are: Slow start Congestion avoidance Fast retransmission Fast recovery
TCP Slow-Start Old TCP would "blast" a full advertised-window's worth of segments into the network at connection startup thus overrunning buffers in the routers, links and hosts TCP Slow-Start avoids this problem by sending a few packets at the beginning, waiting for the ACKs and then gradually increasing the number of packets sent into the network Slow-Start invoked at connection setup (initial window), connection restart after a long idle period (restart window) or at connection restart after a retransmit timeout (loss window)
TCP Congestion Control TCP probes for congestion by sending more packets into the network until a timeout occurs or duplicate ACK is received If congestion occurs, the TCP sender(s) must reduce the amount of data sent into the network Congestion avoidance operation: Define a new state variable at the sender, Slow-Start Threshold (SSTHRESH) When TCP detects congestion (time-out or duplicate ACK), set SSTHRESH=one-half of current window-size and set CWND=1 (if time-out occurred) TCP then Slow-Starts (exponential increase) up to SSTHRESH and then increases window size by at most one segment per round-trip time(MSS*MSS/CWND). This is a linear increase TCP Slow-Start and congestion avoidance are implemented together
TCP Slow-Start/Congestion Avoidance
TCP Fast Retransmit and Fast Recovery If the TCP receiver receives a segment out of order, it will resend the ACK (duplicate ACK) of the last correctly received segment Fast retransmission operation If the TCP sender receives three duplicate ACKs in row, then this is a strong indication that the segment was lost TCP sender will retransmit lost segment This avoids having to wait for a time-out to resend the lost segment Fast recovery operation The fact that the TCP receiver is generating duplicate ACKs means that other segments have been received. This suggests that data is continuing to flow between the TCP sender and receiver TCP sender is allowed to send one segment per duplicate ACK even if this exceeds the current window-size After fast retransmission, TCP performs congestion avoidance instead of slow-start This avoids throughput reduction associated with initial slow-start
TCP Performance Objectives Fill pipe with as many outstanding segments as possible before receiving an ACK Delay(or RTT) x Bandwidth(or Throuhtput) <= 64KB Minimize time in or avoid slow-start altogether Recover from packet loss(es) and maintain ACK clock without experiencing RTO
TCP State Transition Diagram
TIME_WAIT State MSL: maximum segment lifetime 30 sec in BSD-derived implementation 2 min in RFC 1122 Reason for the TIME-WAIT state(waiting for 2MSL) to implement TCP’s full-duplex connection termination reliably termination 중에 lost, duplicated packet 문제를 해결하기 위해 to allow old duplicate segments to expire in the network 같은 host 의 같은 port 로 연결되는 next TCP connection 에 그전 session 의 packet 을 제거하기 위해
Port Number
Association Association {protocol, local address, local port, foreign address, foreign port} Socket = {address, port} Notation: : Socket pair in TCP uniquely identify every TCP connection in the Internet (local IP address, local TCP port, foreign IP address, foreign TCP port) Wildcard local address (e.g. * :21) any choice of local addresses (INADDR_ANY) Local IPForeign IP Local portForeign port
TCP Port Numbers and Concurrent Servers (1) foreignlocal ?21 ?0 foreignlocal foreignlocal ?21 ?0 foreignlocal ?21 ?0 foreignlocal foreignlocal
TCP Port Numbers and Concurrent Servers (2) foreignlocal foreignlocal foreignlocal foreignlocal foreignlocal
TCP Output Successful return from write means that we can reuse application buffer TCP must keep a copy of our data in the socket send buffer until ACK is received.
UDP Output Successful return from write means that the datagram or fragments of the datagram have been added to the datalink output queue
Standard Internet Services See /etc/services Services running on TCP or UDP Distinguished by protocol and port number A Service is a symbolic representation of port number
Protocol Usage by Common Internet Applications