SpliceNP: A TCP Splicer using a Network Processor Li Zhao +, Yan Luo*, Laxmi Bhuyan University of California Riverside Ravi Iyer Intel Corporation + Now.

Slides:



Advertisements
Similar presentations
A First Example: The Bump in the Wire A First Example: The Bump in the Wire 9/ INF5061: Multimedia data communication using network processors.
Advertisements

A First Example: The Bump in the Wire A First Example: The Bump in the Wire 8/ INF5062: Programming Asymmetric Multi-Core Processors.
Chapter 7 – Transport Layer Protocols
Fast Communication Firefly RPC Lightweight RPC  CS 614  Tuesday March 13, 2001  Jeff Hoy.
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
CS162 Section Lecture 9. KeyValue Server Project 3 KVClient (Library) Client Side Program KVClient (Library) Client Side Program KVClient (Library) Client.
1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.
Introduction to Content-aware Switch Presented by Li Zhao.
Content Switch Design Introduce Linux networking source code. IP Masquerade techniques. LVS(Linux Virtual Server). Design of the Content Switch.
Page: 1 Director 1.0 TECHNION Department of Computer Science The Computer Communication Lab (236340) Summer 2002 Submitted by: David Schwartz Idan Zak.
1 Design and Implementation of A Content-aware Switch using A Network Processor Li Zhao, Yan Luo, Laxmi Bhuyan University of California, Riverside Ravi.
Load Balancing in Web Clusters CS 213 LECTURE 15 From: IBM Technical Report.
IXP2400 Protocol Offloading Yan Luo Chris Baron
1 Improving Web Servers performance Objectives:  Scalable Web server System  Locally distributed architectures  Cluster-based Web systems  Distributed.
4/22/2003 Network Processor & Its Applications1 Network Processor and Applications Prof. Laxmi Bhuyan
TCP Splicing for URL-aware Redirection
Architectural Impact of SSL Processing Jingnan Yao.
Design and Implementation of Web Switch
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Intel IXP1200 Network Processor q Lab 12, Introduction to the Intel IXA q Jonathan Gunner, Sruti.
Design and Implementation of a Server Director Project for the LCCN Lab at the Technion.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
TCP. Learning objectives Reliable Transport in TCP TCP flow and Congestion Control.
ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer.
WXES2106 Network Technology Semester /2005 Chapter 8 Intermediate TCP CCNA2: Module 10.
Network Processors and Web Servers CS 213 LECTURE 17 From: IBM Technical Report.
I/O Acceleration in Server Architectures
Process-to-Process Delivery:
Christopher Bednarz Justin Jones Prof. Xiang ECE 4986 Fall Department of Electrical and Computer Engineering University.
Paper Review Building a Robust Software-based Router Using Network Processors.
Performance Tradeoffs for Static Allocation of Zero-Copy Buffers Pål Halvorsen, Espen Jorde, Karl-André Skevik, Vera Goebel, and Thomas Plagemann Institute.
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
Network Server Performance and Scalability June 9, 2005 Scott Rixner Rice Computer Architecture Group
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
TCP Throughput Collapse in Cluster-based Storage Systems
Lecture 3 Review of Internet Protocols Transport Layer.
High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data.
Chapter 5 Transport layer With special emphasis on Transmission Control Protocol (TCP)
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
ICOM 6115©Manuel Rodriguez-Martinez ICOM 6115 – Computer Networks and the WWW Manuel Rodriguez-Martinez, Ph.D. Lecture 26.
David Wetherall Professor of Computer Science & Engineering Introduction to Computer Networks Transport Layer Overview (§ )
1 The Internet and Networked Multimedia. 2 Layering  Internet protocols are designed to work in layers, with each layer building on the facilities provided.
Transport Layer: TCP and UDP. Overview of TCP/IP protocols Comparing TCP and UDP TCP connection: establishment, data transfer, and termination Allocation.
Sharing the Network It slices, it dices, it sequences ….. All of this and error checking too!
Data and Computer Communications Circuit Switching and Packet Switching.
Chapter 6-2 the TCP/IP Layers. The four layers of the TCP/IP model are listed in Table 6-2. The layers are The four layers of the TCP/IP model are listed.
Transport Layer3-1 Chapter 3: Transport Layer Our goals: r understand principles behind transport layer services: m multiplexing/demultipl exing m reliable.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
On the Performance of TCP Splicing for URL-aware Redirection Ariel Cohen, Sampath Rangarajan, and Hamilton Slye The 2 nd USENIX Symposium on Internet Technologies.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Interfaces and Services Each layer provides a service to the layer above it. A service is a set of primitive operations. Under UNIX, primitives are implemented.
Networking Basics CCNA 1 Chapter 11.
TCP Offload Through Connection Handoff Hyong-youb Kim and Scott Rixner Rice University April 20, 2006.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking.
Computer Network Architecture Lecture 6: OSI Model Layers Examples 1 20/12/2012.
Introduction to Content-aware Switch Presented by Li Zhao.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
McGraw-Hill Chapter 23 Process-to-Process Delivery: UDP, TCP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
SCTP Handoff for Cluster Servers
Network Architecture Introductory material
Enhance Features and Performance of Content Switches
Review of Important Networking Concepts
Process-to-Process Delivery:
Lec 11 – Multicore Architectures and Network Processors
Process-to-Process Delivery: UDP, TCP
Transport Layer 9/22/2019.
Presentation transcript:

SpliceNP: A TCP Splicer using a Network Processor Li Zhao +, Yan Luo*, Laxmi Bhuyan University of California Riverside Ravi Iyer Intel Corporation + Now with Intel Corporation * Now with University of Massachusetts Lowell

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Outline Background and Motivation Design and Implementation Performance Evaluation Conclusions

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, User kernel Background Switch Media Server Application Server HTML Server Internet GET /cgi-bin/form HTTP/1.1 Host: APP. DATATCPIP Front-end of a web cluster, one VIP Route packets based on layer 5 information  Examine application data in addition to IP& TCP Problem with HTTP Proxy Overhead to copy data between two connections is high  Data goes up through protocol stack  Data is copied from kernel space to user space  Data is copied back to kernel and goes down through protocol stack HTTP proxy

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, TCP Splicing A technique to splice two TCP connections inside the kernel Data relaying between the two connections can be speeded up Can be used to speed up layer-5 switching, web proxy and firewall running in the user space Example: content-aware switch Without SplicingWith Splicing

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Anatomy of TCP Splicing Without TCP Splicing With TCP Splicing SEQ # translation Checksum Recalculation Etc. Bookkeeping of connection states, selection of servers, state migration

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, State Transition in TCP Splicing (1) 3-way handshaking with client (3) 3-way handshaking with server (2) Application specific operations (4) Spliced ! (5) Connection Tear down

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Outline Background and Motivation Design and Implementation  Design Options  Implementation on IXP2400 Performance Evaluation Conclusions

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Processing Elements for TCP Splicing ASIC (Application Specific Integrated Circuit)  High processing capacity  Long time to develop  Lack the flexibility GP (General-purpose Processor)  Programmable  Overheads: interrupt, moving packets over PCI bus, ISA not optimized NP (Network Processor)  optimized ISA for packet processing, multiprocessing and multithreading  high performance  Programmable

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Background on NP Architecture  Control processor (CP): embedded processor, maintain control information, process exception packets  Data processors (DPs): tuned specifically for packet processing  Communicate through shared SRAM and DRAM NP operation  Packet arrives in receive buffer  Packet Processing by CP or DPs  Transfer the packet onto wire after processing CP DP

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Design Options Option (a): GP-based (Linux-based) switch  Overhead of moving data across PCI bus Option (b): CP creates and splices connections, DPs process packets sent after splicing  Connection setup & splicing is more complex than data forwarding  However, control packets need to be passed through DRAM queues Option (c): DPs handle connection setup, splicing & forwarding

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Outline Background and Motivation Design and Implementation  Design Options  Implementation on IXP2400 Performance Evaluation Conclusions

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, IXP 2400 Architecture and Development Challenge Architecture XScale core 8 Microengines(MEs) Each ME  run up to 8 threads  4K instruction store  Local memory Scratchpad memory, SRAM & DRAM controllers ME Scratch Hash CSR IX bus interface SRAM controller XScale SDRAM controller PCI Challenges Limited size of instruction memory C compiler not available for IXP2400

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Resource Allocation Client-side control block list  record states for connections between clients and SpliceNP, states after splicing Server-side control block list  record states for connections between server and SpliceNP Client side CB list Server side CB list server selection table Locks Packet buffer SRAM (8MB) DRAM (256MB) Packet queues Scratchpad (16KB) Client ME Microengines Server ME Rx ME Tx ME

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Comparison of Functionality (I) StepFunctionalityTCPLinux Splicer SpliceNP 1Dequeue packetYYY 2IP header verificationYYY 3IP option processingYYN 4TCP header verificationYYY 5Control block lookupYYY 6Create new socket and set state to LISTENYYNo socket, only control block 7Initialize TCP and IP header templateYYN 8Reset idle time and keep-alive timerYYN 9Process TCP optionYYOnly MSS option 10Send ACK packet, change state to SYN_RCVD YYY Processing a SYN packet A lite version of TCP due to the limited instruction size of microengines.

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Comparison of Functionality (II) StepFunctionalityTCPLinux Splicer SpliceNP 1-5Same as processing a SYN packet 2Reset idle time and keep-alive timerYYN 3Process TCP optionYYOnly MSS option 4Verify ACK numbers and flagsYYY 5Connection establishment timerYYN 6Initialize receive sequence numberYYY 7Set state to ESTABLISHEDYYY 8Send ACK packetYYY Processing a SYN/ACK packet

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Comparison of Functionality (III) StepFunctionalityTCPLinux SplicerSpliceNP 1-5Same as processing a SYN packet 6Reset idle time and keep-alive timerYYN 7Process TCP optionYYOnly MSS option 8aWake up receiving processYDirect forwarding Copy data to applicationYNN 8bDelete acknowledged data from send buffer YDirect forwarding Wake up waiting processYNN 9Flow control processingYYN Processing a data packet

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Outline Background and Motivation Design and Implementation Performance Evaluation Conclusions

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Experimental Setup Radisys ENP2611 containing an IXP2400  XScale & ME: 600MHz  8MB SRAM and 128MB DRAM  Three 1Gbps Ethernet ports: 1 for Client port and 2 for Server ports Server: Apache web server on an Intel 3.0GHz Xeon processor Client: Httperf on a 2.5GHz Intel P4 processor Linux-based switch  Loadable kernel module  2.5GHz P4, two 1Gbps Ethernet NICs

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Latency on a Linux-based TCP Splicer Latency is reduced by TCP splicing

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Latency vs Request File Size Latency reduced significantly  83.3% (0.6ms  1KB The larger the file size, the higher the reduction  1MB file Latency on the Splicer (ms)

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Comparison of Packet Processing Latency

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Analysis of Latency Reduction Linux-basedNP-based Interrupt: NIC raises an interrupt once a packet comes polling NIC-to-mem copy Xeon 3.0Ghz Dual processor w/ 1Gbps Intel Pro 1000 (88544GC) NIC, 3 us to copy a 64-byte packet by DMA No copy: Packets are processed inside without two copies Linux processing: OS overheads Processing a data packet in splicing state: 13.6 us IXP processing: Optimized ISA 6.5 us

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Throughput vs Request File Size Throughput is increased significantly  5.7x for small file 1KB, 2.2x for large 1MB Higher improvement for small files  Latency reduction for control packets > data packets  Control packets take a larger portion for small files

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Conclusions Implemented TCP splicing on a network processor Analyzed various tradeoffs in implementation and compared its performance with a Linux-based TCP splicer Measurement results show that NP-based switch can improve the performance significantly  Process latency reduced by up to 83%  Throughput improved by up to 5.7x

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, Thank you ! Questions ?

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, TCP Splicing client content switch server step1 step2 SYN(CSEQ) SYN(DSEQ) ACK(CSEQ+1) DATA(CSEQ+1) ACK(DSEQ+1) step3 step7 step8 step4 step5 step6 SYN(CSEQ) SYN(SSEQ) ACK(CSEQ+1) DATA(CSEQ+1) ACK(SSEQ+1) DATA(SSEQ+1) ACK(CSEQ+lenR+1) DATA(DSEQ+1) ACK(CSEQ+LenR+1) ACK(DSEQ+lenD+1) ACK(SSEQ+lenD+1) lenR: size of http request. lenD: size of return document.

SpliceNP: A TCP Splicer using a Networ Processor ANCS2005, Princeton, NJ, Oct 27-28, TCP Handoff client content switch server step1 step2 SYN(CSEQ) SYN(DSEQ) ACK(CSEQ+1) DATA(CSEQ+1) ACK(DSEQ+1) step3 step4 step5 step6 DATA(DSEQ+1) ACK(CSEQ+lenR+1) ACK(DSEQ+lenD+1) ACK(DSEQ+lenD+1) Migrate (Data, CSEQ, DSEQ) Migrate the created TCP connection from the switch to the back-end sever –Create a TCP connection at the back-end without going through the TCP three-way handshake –Retrieve the state of an established connection and destroy the connection without going through the normal message handshake required to close a TCP connection Once the connection is handed off to the back-end server, the switch must forward packets from the client to the appropriate back-end server