TCP Offload Through Connection Handoff Hyong-youb Kim and Scott Rixner Rice University April 20, 2006
Rice University TCP Offload Through Connection Handoff 2 Full TCP Offloading Move all TCP/IP processing to the network interface Computation Saves processing resources on the host NIC can be customized for TCP/IP processing Memory Reduces host memory references Network interface can exploit small, fast, local memory Problems Network interface can become a performance bottleneck Limited computation on NIC Limited memory capacity on NIC Complicates global resource management in the stack
Rice University TCP Offload Through Connection Handoff 3 Solution: Connection Handoff Only handoff established connections to NIC Operating system controls division of work Only TCP send and receive on the NIC OS performs connection establishment, routing, … No changes to sockets API SPECweb99 performance 17% and 32% reduction in cycles per packet 15% and 27% improved throughput
Rice University TCP Offload Through Connection Handoff 4 Unmodified Network Stack ~3100 instructions per packet ~50% of all operations are memory references User Application Socket Ethernet Driver IP TCP Host OS Transmit Receive NIC Ethernet frames User requests Protocol/socket operations Packet generation Receive processing
Rice University TCP Offload Through Connection Handoff 5 Network Stack with Connection Handoff Socket Ethernet Driver User Application Bypass IP TCP Socket TCP IP Ethernet Lookup Host OS NIC Protocol/socket operations Transmit/Receive Packet send, same as unmodified stack Packet receive now goes through lookup Packet generation Receive processing Connection in OS Connection on NIC
Rice University TCP Offload Through Connection Handoff 6 Handoff Interface Extend driver/OS API Move connections Handoff (OS): move connection from OS to NIC Restore (OS, NIC): move connection from NIC to OS Relay socket operations between OS and NIC Send (OS): insert send data into NIC's socket Acknowledge (NIC): remove ack'ed data from OS's socket Receive (NIC): insert received data into OS's socket Received (OS): remove received data from NIC's socket Control (OS, NIC): change socket states, etc. Misc. Forward (OS), Post (OS), Resource (NIC)
Rice University TCP Offload Through Connection Handoff 7 Example Use NICHost OS Accept handoff Allocate connection Receive data receive Enqueue data Read data received Dequeue data Write data send Enqueue data Receive ACK acknowledge Drop sent data Receive FIN control Change socket state Close control Send FIN Destroy connection control Destroy connection Handoff Command Accept connection, receive request, send response, close connection
Rice University TCP Offload Through Connection Handoff 8 Real Prototype Modified FreeBSD 4.7 AMD Athlon XP CPU Alteon programmable Gigabit Ethernet NIC 1MB memory Limited to 256 connections Actual socket buffer data only in main memory 88MHz processor Limits maximum throughput
Rice University TCP Offload Through Connection Handoff 9 TCP Send 0 System Call TCP IPEthernet Driver Bypass Total Cycles per packet L2 misses per packet Cycles (No Handoff, 1 connection) Cycles (No Handoff, 256 connections) Cycles (Handoff, 256 connections) L2 misses (No Handoff, 1 connection) L2 misses (No Handoff, 256 connections) L2 misses (Handoff 256 connections)
Rice University TCP Offload Through Connection Handoff 10 Simulated Machine Prototype NIC is too slow Simics full-system simulator Boots unmodified operating systems Use same software as real prototype Simulated processor 1GHz functional x86 processor Timed memory to mimic Athlon XP Simulated NIC 450MHz functional processor Timed 1Gb/s Ethernet wire
Rice University TCP Offload Through Connection Handoff 11 SPECweb99, 1024 Connections System Call TCPIPEthernetDriverBypassTotal Cycles per packet No HandoffHandoff 1024 connections Static (No Handoff)Static (Handoff 1024 connections) 15% increase in HTTP throughput (Mb/s) 27% increase in HTTP throughput (Mb/s)
Rice University TCP Offload Through Connection Handoff 12 Summary Memory behavior limits TCP performance Connection state accesses cause cache pressure Offload can help, but full offload is problematic Connection handoff: offloading made practical OS in charge of division of work Host network stack largely unaffected Ongoing work: OS handoff policies