Network Stack Specialization for Performance

Slides:



Advertisements
Similar presentations
Chap 2 System Structures.
Advertisements

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.
Chapter 7 Protocol Software On A Conventional Processor.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Precept 3 COS 461. Concurrency is Useful Multi Processor/Core Multiple Inputs Don’t wait on slow devices.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.
Reliable Networking Systems The goals: Implement a reliable network application of a file sharing network. Implement a reliable network application of.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
Figure 1.1 Interaction between applications and the operating system.
Cs238 Lecture 3 Operating System Structures Dr. Alan R. Davis.
Chapter 4 OSI Transport Layer
What Is TCP/IP? The large collection of networking protocols and services called TCP/IP denotes far more than the combination of the two key protocols.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
File Systems (2). Readings r Silbershatz et al: 11.8.
TCP/IP Web Design & Layout January 23, TCP/IP For Dummies  The guts and the rules of the Internet and World Wide Web. A set of protocols, services,
Chapter 2 Architectural Models. Keywords Middleware Interface vs. implementation Client-server models OOP.
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Lecture 18 Lecture 18: Case Study of SoC Design ECE 412: Microcomputer Laboratory.
Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System.
Jaringan Komputer Dasar OSI Transport Layer Aurelio Rahmadian.
LWIP TCP/IP Stack 김백규.
1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.
LWIP TCP/IP Stack 김백규.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath.
Architectures of distributed systems Fundamental Models
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Geneva, Switzerland, 11 June 2012 Switching and routing in Future Network John Grant Nine Tiles
E X C E E D I N G E X P E C T A T I O N S OP SYS Linux System Administration Dr. Hoganson Kennesaw State University Operating Systems Functions of an operating.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Application Block Diagram III. SOFTWARE PLATFORM Figure above shows a network protocol stack for a computer that connects to an Ethernet network and.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Scheduling Lecture 6. What is Scheduling? An O/S often has many pending tasks. –Threads, async callbacks, device input. The order may matter. –Policy,
How to Minimize Transport Protocol Processing: Implementation and Evaluation of Network Level Framing Pål Halvorsen, Thomas Plagemann, and Vera Goebel.
A Low-bandwidth Network File System Athicha Muthitacharoen et al. Presented by Matt Miller September 12, 2002.
Distributed System Concepts and Architectures Services
ND The research group on Networks & Distributed systems.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 13 TCP Implementation.
SEDA An architecture for Well-Conditioned, scalable Internet Services Matt Welsh, David Culler, and Eric Brewer University of California, Berkeley Symposium.
TCP/IP. The idea behind TCP/IP is exactly the same we explained about the OSI reference model: when transmitting data, programs talk to the Application.
Using Uncacheable Memory to Improve Unity Linux Performance
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
9/29/04 GGF Random Thoughts on Application Performance and Network Characteristics Distributed Systems Department Lawrence Berkeley National Laboratory.
Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
A Fragmented Approach by Tim Micheletto. It is a way of having multiple cache servers handling data to perform a sort of load balancing It is also referred.
1.3 Operating system services An operating system provide services to programs and to the users of the program. It provides an environment for the execution.
LWIP TCP/IP Stack 김백규.
Presented by Kristen Carlson Accardi
mOS: An open middlebox platform with programmable network stacks
Final Review CS144 Review Session 9 June 4, 2008 Derrick Isaacson
High Performance Messaging on Workstations
Xen Network I/O Performance Analysis and Opportunities for Improvement
Direct Memory Access Disk and Network transfers: awkward timing:
Chapter 2: Operating-System Structures
Introduction to Operating Systems
Chapter 2: Operating-System Structures
NetPerL Seminar An Analysis of TCP Processing Overhead
ECE 671 – Lecture 8 Network Adapters.
Presentation transcript:

Network Stack Specialization for Performance Presented by Donghwi Kim (Some figures are brought from the paper)

Objective The authors tried to show upper bound of network application performance by specialization (Actually, not only a network stack but also an application’s implementation is specialized) A special kind of applications is chosen (Serves same content to multiple users) Sandstorm: A Web server serves static webpage Namestorm: A DNS server

Key of performance A complete zero-copy stack Aggressive amortization Pre-packetized data Batching to mitigate system-call overhead Synchronous, clocked from received packets Improves cache locality Minimize the latency of sending the first packet of response Intel’s DDIO

Network stack libnmio: Data-movement and event-notification primitives libeth: A lightweight Ethernet-layer libtcpip: An optimized TCP/IP layer libudpip: A UDP/IP layer

A complete zero-copy stack Receiving a packet Done by DMA Transmitting a packet Aggressive amortization Modify one of prepared a copy of packet and use DMA The modifications are performed in a single pass to use CPU’s L1 cache efficiently

A complete zero-copy stack pre-copy method maintain more than one copy of each packet potential to thrash CPU’s L3 cache memcpy method maintain one long-term copy and create ephemeral copies more work should be done

How the optimization works? Batching increases TCP RTT Amortizing reduces per-request processing

Intel’s DDIO Direct Data I/O When transmission When reception Pull data from the L3 cache without a detour through system memory When reception DMA can place data in processor’s L3 cache

Evaluation

Evaluation

Evaluation

DDIO Pre-copy case: DDIO pulls untouched incoming data into the cache, so the file data cannot be cached Memcopy case: CPU loads file data into the cache

Discussion mTCP vs. Sandstorm

Discussion mTCP TCP of Sandstorm Provides UNIX-like socket programming interface mTCP provides fairness TCP of Sandstorm Higher level stack does not wrap lower level stack Each stack is a stand-alone service For example, an application interacts directly with libnmio Amortization, no-queueing, inaccurate timer cannot guarantee correctness Limited applications