Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 13 TCP Implementation
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Objectives Understand the structure of typical TCP implementation Outline the implementation of extended standards for TCP over high-performance networks Understand the sources of end-system overhead in typical TCP implementations, and techniques to minimize them Quantify the effect of end-system overhead and buffering on TCP performance Understand the role of Remote Direct Memory Access (RDMA) extensions for high-performance IP networking
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Contents Overview of TCP implementation High-performance TCP End-system overhead Copy avoidance TCP offload
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Implementation Overview
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Overall Structure (RFC 793) Internal structure specified in RFC 793 Fig. 13.1
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Data Structure of TCP Endpoint Data structure of TCP endpoint Transmission control block: Stores the connection state and related variables Transmit queue: Buffers containing outstanding data Receiver queue: Buffers for received data (but not yet forwarded to higher layer)
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Buffering and Data Movement Buffer queues reside in the protocol-independent socket layer within the operating system kernel TCP sender upcalls to the transmit queue to obtain data TCP receiver notifies the receive queue of correct arrival of incoming data BSD-derived kernels implement buffers in mbufs Moves data by reference Reduces the need to copy Most implementations commit buffer space to the queue lazily Queues consume memory only when the bandwidth of the network does not match the rate at which TCP user produces/consumes data
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain User Memory Access Provides for movement of data to and from the memory of the TCP user Copy semantics SEND and RECEIVE are defined with copy semantics The user can modify a send buffer at the time the SEND is issued Direct access Allows TCP to access the user buffers directly Bypasses copying of data
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain TCP Data Exchange TCP endpoints cooperate by exchanging segments Each segment contains: Sequence number seg.seq, segment data length seg.len, status bits, ack seq number seg.ack, advertised receive window size seg.wnd Fig. 13.3
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Data Retransmissions TCP sender uses retransmission timer to derive retransmission of unacknowledged data Retransmits a segment if the timer fires Retransmission timeout (RTO) RTO<RTT: Aggressive; too many retransmissions RTO>RTT: Conservative; low utilisation due to connection idle In practice, adaptive retransmission timer with back-off is used (Specified in RFC 2988)
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Congestion Control A retransmission event indicates (to TCP sender) that the network is congested Congestion management is a function of the end-systems RFC 2581 requires TCP end-systems respond to congestion by reducing sending rate AIMD: Additive Increase Multiplicative Decrease TCP sender probes for available bandwidth on the network path Upon detection of congestion, TCP sender multiplicatively reduces cwnd Achieves fairness among TCP connections
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain High Performance TCP
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain TCP Implementation with High Bandwidth-Delay Product High bandwidth-delay product: High speed networks (e.g. optical networks) High-latency networks (e.g. satellite network) Collectively called Long Fat Networks (LFNs) LFNs require large window size (more than 16 bits as originally defined for TCP) Window scale option allows TCP sender to advertise large window size (e.g. 1 Gbyte) Specified at connection setup Limits window sizes in units of up to 16K
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Round Trip Time Estimation Accuracy of RTT estimation depends on frequent sample measurements of RTT Percentage of segments sampled decreases with larger windows May be insufficient for LFNs Timestamp option Enables the sender to compute RTT samples Provides safeguard against accepting out-of-sequence numbers
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Path MTU Discovery Most efficient by using the largest MSS without segmentation Enables TCP sender to automatically discover the largest acceptable MSS TCP implementation must correctly handle dynamic changes to MSS Never leaves more than 2*MSS bytes of data unacknowledged TCP sender may need to segment data for retransmission
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain End-System Overhead
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Reduce End-System Overhead TCP imposes processing overhead in operating system Adds directly to latency Consumes a significant share of CPU cycles and memory Reducing overhead can improve application throughput
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Relationship Between Bandwidth and CPU Utilization
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Achievable Throuput for Host- Limited Systems
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Sources of Overhead for TCP/IP Per-transfer overhead Per-packet overhead Per-byte overhead Fig. 13.5
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Per-Packet Overhead Increasing packet size can mitigate the impact of per-packet and per-segment overhead Fig Increasing segment size S increases achievable bandwidth As packet size grows, the effect of per-packet overhead becomes less significant Interrupts A significant source of per-packet overhead
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Relationship between Packet Size and Achievable Bandwidth
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Relationship between Packet Overhead and Bandwidth
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Checksum Overhead A source of per-byte overhead Ways for reducing checksum overhead: Complete multiple steps in a single traversal to reduce per-byte overhead Integrate chechsumming with the data copy Compute the checksum in hardware
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Copy Avoidance
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Copy Avoidance for High- Performance TCP Page remapping Uses virtual memory to reduce copying across the TCP/user interface Typically resides at the socket layer in the OS kernel Scatter/gather I/O Does not require copy semantics Entails a comprehensive restructuring of OS and I/O interfaces Remote Direct Memory Access (RDMA) Steers incoming data directly into user-specified buffers IETF standards under way
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain TCP Offload
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain TCP Offload Supports TCP/IP protocol functions directly on the network adapter (NIC) Processing TCP checksum offloading Significantly reduces per-packet overheads for TCP/IP protocol processing Helps to avoid expensive copy operations