TCP Offload Through Connection Handoff Hyong-youb Kim and Scott Rixner Rice University April 20, 2006.

Slides:



Advertisements
Similar presentations
Layer 3 Switching. Routers vs Layer 3 Switches Both forward on the basis of IP addresses But Layer 3 switches are faster and cheaper However, Layer 3.
Advertisements

IS333, Ch. 26: TCP Victor Norman Calvin College 1.
1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.
UCB Switches Jean Walrand U.C. Berkeley
Page: 1 Director 1.0 TECHNION Department of Computer Science The Computer Communication Lab (236340) Summer 2002 Submitted by: David Schwartz Idan Zak.
1 Design and Implementation of A Content-aware Switch using A Network Processor Li Zhao, Yan Luo, Laxmi Bhuyan University of California, Riverside Ravi.
Xen and the Art of Virtualization A paper from the University of Cambridge, presented by Charlie Schluting For CS533 at Portland State University.
TCP Splicing for URL-aware Redirection
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
William Stallings Data and Computer Communications 7 th Edition (Selected slides used for lectures at Bina Nusantara University) Transport Layer.
Networks 1 CS502 Spring 2006 Network Input & Output CS-502 Operating Systems Spring 2006.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
Communication Protocols III Tenth Meeting. Connections in TCP A wants to send to B. What is the packet next move? A travels through hub and bridge to.
CS-3013 & CS-502, Summer 2006 Network Input & Output1 CS-3013 & CS-502, Summer 2006.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
An Efficient Programmable 10 Gigabit Ethernet Network Interface Card Paul Willmann, Hyong-youb Kim, Scott Rixner, and Vijay S. Pai.
Router Architectures An overview of router architectures.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
Router Architectures An overview of router architectures.
Understanding Networks Charles Zangla. Network Models Before I can explain how connections are made from across the country, I would like to provide you.
Christopher Bednarz Justin Jones Prof. Xiang ECE 4986 Fall Department of Electrical and Computer Engineering University.
Review: – computer networks – topology: pair-wise connection, point-to-point networks and broadcast networks – switching techniques packet switching and.
Network Server Performance and Scalability June 9, 2005 Scott Rixner Rice Computer Architecture Group
Exploring the Packet Delivery Process Chapter
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Achieving 10 Gb/s Using Xen Para-virtualized.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Examining TCP/IP.
High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data.
The NE010 iWARP Adapter Gary Montry Senior Scientist
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
Make Hosts Ready for Gigabit Networks. Hardware Requirement To allow a host to fully utilize Gbps bandwidth, its hardware system must be ready for Gbps.
Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath.
Routers and Routing Basics CCNA 2 Chapter 10.
Srihari Makineni & Ravi Iyer Communications Technology Lab
1 CS 4396 Computer Networks Lab TCP/IP Networking An Example.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
CSE 6590 Department of Computer Science & Engineering York University 111/9/ :26 AM.
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Networking Basics CCNA 1 Chapter 11.
CS 4396 Computer Networks Lab Router Architectures.
Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
1 OSI and TCP/IP Models. 2 TCP/IP Encapsulation (Packet) (Frame)
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
Mapping IP Addresses to Hardware Addresses Chapter 5.
Slide #1 CIT 380: Securing Computer Systems TCP/IP.
SIMULATION OF MULTIPROCESSOR SYSTEM AND NETWORK Manish Patel Nov 8 th 2004 Advisor: Dr. Chung-E-Wang Department of Computer Science California State University,
F. HemmerUltraNet® Experiences SHIFT Model CPU Server CPU Server CPU Server CPU Server CPU Server CPU Server Disk Server Disk Server Tape Server Tape Server.
Trickles :A stateless network stack for improved Scalability, Resilience, and Flexibility Alan Shieh,Andrew C.Myers,Emin Gun Sirer Dept. of Computer Science,Cornell.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
1 Introductory material. This module illustrates the interactions of the protocols of the TCP/IP protocol suite with the help of an example. The example.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
E Virtual Machines Lecture 5 Network Virtualization Scott Devine VMware, Inc.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Networking COMP
Addressing: Router Design
CS 286 Computer Organization and Architecture
TCP/IP Networking An Example
TCP/IP Networking An Example
Storage Networking Protocols
TCP/IP Protocol Suite: Review
Direct Memory Access Disk and Network transfers: awkward timing:
TCP/IP Protocol Suite: Review
Network Models CCNA Instructor Training Course October 12-17, 2009
ECE 671 – Lecture 8 Network Adapters.
Presentation transcript:

TCP Offload Through Connection Handoff Hyong-youb Kim and Scott Rixner Rice University April 20, 2006

Rice University TCP Offload Through Connection Handoff 2 Full TCP Offloading Move all TCP/IP processing to the network interface  Computation Saves processing resources on the host NIC can be customized for TCP/IP processing  Memory Reduces host memory references Network interface can exploit small, fast, local memory Problems  Network interface can become a performance bottleneck Limited computation on NIC Limited memory capacity on NIC  Complicates global resource management in the stack

Rice University TCP Offload Through Connection Handoff 3 Solution: Connection Handoff Only handoff established connections to NIC  Operating system controls division of work  Only TCP send and receive on the NIC OS performs connection establishment, routing, …  No changes to sockets API SPECweb99 performance 17% and 32% reduction in cycles per packet 15% and 27% improved throughput

Rice University TCP Offload Through Connection Handoff 4 Unmodified Network Stack ~3100 instructions per packet ~50% of all operations are memory references User Application Socket Ethernet Driver IP TCP Host OS Transmit Receive NIC Ethernet frames User requests Protocol/socket operations Packet generation Receive processing

Rice University TCP Offload Through Connection Handoff 5 Network Stack with Connection Handoff Socket Ethernet Driver User Application Bypass IP TCP Socket TCP IP Ethernet Lookup Host OS NIC Protocol/socket operations Transmit/Receive Packet send, same as unmodified stack Packet receive now goes through lookup Packet generation Receive processing Connection in OS Connection on NIC

Rice University TCP Offload Through Connection Handoff 6 Handoff Interface Extend driver/OS API Move connections  Handoff (OS): move connection from OS to NIC  Restore (OS, NIC): move connection from NIC to OS Relay socket operations between OS and NIC  Send (OS): insert send data into NIC's socket  Acknowledge (NIC): remove ack'ed data from OS's socket  Receive (NIC): insert received data into OS's socket  Received (OS): remove received data from NIC's socket  Control (OS, NIC): change socket states, etc. Misc.  Forward (OS), Post (OS), Resource (NIC)

Rice University TCP Offload Through Connection Handoff 7 Example Use NICHost OS Accept handoff Allocate connection Receive data receive Enqueue data Read data received Dequeue data Write data send Enqueue data Receive ACK acknowledge Drop sent data Receive FIN control Change socket state Close control Send FIN Destroy connection control Destroy connection Handoff Command Accept connection, receive request, send response, close connection

Rice University TCP Offload Through Connection Handoff 8 Real Prototype Modified FreeBSD 4.7 AMD Athlon XP CPU Alteon programmable Gigabit Ethernet NIC  1MB memory Limited to 256 connections Actual socket buffer data only in main memory  88MHz processor Limits maximum throughput

Rice University TCP Offload Through Connection Handoff 9 TCP Send 0 System Call TCP IPEthernet Driver Bypass Total Cycles per packet L2 misses per packet Cycles (No Handoff, 1 connection) Cycles (No Handoff, 256 connections) Cycles (Handoff, 256 connections) L2 misses (No Handoff, 1 connection) L2 misses (No Handoff, 256 connections) L2 misses (Handoff 256 connections)

Rice University TCP Offload Through Connection Handoff 10 Simulated Machine Prototype NIC is too slow Simics full-system simulator  Boots unmodified operating systems  Use same software as real prototype Simulated processor  1GHz functional x86 processor  Timed memory to mimic Athlon XP Simulated NIC  450MHz functional processor  Timed 1Gb/s Ethernet wire

Rice University TCP Offload Through Connection Handoff 11 SPECweb99, 1024 Connections System Call TCPIPEthernetDriverBypassTotal Cycles per packet No HandoffHandoff 1024 connections Static (No Handoff)Static (Handoff 1024 connections) 15% increase in HTTP throughput (Mb/s) 27% increase in HTTP throughput (Mb/s)

Rice University TCP Offload Through Connection Handoff 12 Summary Memory behavior limits TCP performance  Connection state accesses cause cache pressure Offload can help, but full offload is problematic Connection handoff: offloading made practical  OS in charge of division of work  Host network stack largely unaffected Ongoing work: OS handoff policies