1 PC-base Software Routers: High Performance and Application Service Support Author: Raffaele Bolla, Roberto Bruschi Publisher: PRESTO’08 Presenter: Hsin-Mao.

Slides:



Advertisements
Similar presentations
Layer 3 Switching. Routers vs Layer 3 Switches Both forward on the basis of IP addresses But Layer 3 switches are faster and cheaper However, Layer 3.
Advertisements

CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Figure 1.1 Interaction between applications and the operating system.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
Introduction Operating Systems’ Concepts and Structure Lecture 1 ~ Spring, 2008 ~ Spring, 2008TUCN. Operating Systems. Lecture 1.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Router Architectures An overview of router architectures.
Router Architectures An overview of router architectures.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
Network Server Performance and Scalability June 9, 2005 Scott Rixner Rice Computer Architecture Group
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Parallel and Distributed Simulation Hardware Platforms Simulation Fundamentals.
Introduction CSE 410, Spring 2008 Computer Systems
Composition and Evolution of Operating Systems Introduction to Operating Systems: Module 2.
Kernel, processes and threads Windows and Linux. Windows Architecture Operating system design Modified microkernel Layered Components HAL Interacts with.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
CCNA 2 Week 1 Routers and WANs. Copyright © 2005 University of Bolton Welcome Back! CCNA 2 deals with routed networks You will learn how to configure.
COMPUTER ORGANIZATIONS CSNB123. COMPUTER ORGANIZATIONS CSNB123 Why do you need to study computer organization and architecture? Computer science and IT.
System Architecture Directions for Networked Sensors Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, Kris Pister Presented by Yang Zhao.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
High Performance Network Virtualization with SR-IOV By Yaozu Dong et al. Published in HPCA 2010.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
ECE 720T5 Fall 2012 Cyber-Physical Systems Rodolfo Pellizzoni.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
CS 4396 Computer Networks Lab Router Architectures.
Welcome to CPS 210 Graduate Level Operating Systems –readings, discussions, and programming projects Systems Quals course –midterm and final exams Gateway.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
1Thu D. NguyenCS 545: Distributed Systems CS 545: Distributed Systems Spring 2002 Communication Medium Thu D. Nguyen
Using Uncacheable Memory to Improve Unity Linux Performance
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
PhD VIVA Kevin lee 28 th July 2006 “OpenNP: A Generic Programming Model for Network Processors” Brief Introduction.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Interrupts and Interrupt Handling David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet Domenico Galli Università di Bologna and INFN, Sezione di Bologna XII SuperB Project.
16 th IEEE NPSS Real Time Conference 2009 IHEP, Beijing, China, 12 th May, 2009 High Rate Packets Transmission on 10 Gbit/s Ethernet LAN Using Commodity.
Virtualization.
M. Bellato INFN Padova and U. Marconi INFN Bologna
NFV Compute Acceleration APIs and Evaluation
Chapter 1: A Tour of Computer Systems
Introduction to parallel programming
Andy Wang COP 5611 Advanced Operating Systems
Linux Operating System Architecture
Introduction of microprocessor
CS 286 Computer Organization and Architecture
Slave cores Etherbone Accessible device Etherbone Accessible device E
Memory hierarchy.
What is an Operating System?
CMSC 611: Advanced Computer Architecture
CS 31006: Computer Networks – The Routers
Multiprocessor Introduction and Characteristics of Multiprocessor
Introduction to Computers
8051 Supplement.
Hardware Components & Software Concepts
Chapter 4 Multiprocessors
Interrupts and Interrupt Handling
Presentation transcript:

1 PC-base Software Routers: High Performance and Application Service Support Author: Raffaele Bolla, Roberto Bruschi Publisher: PRESTO’08 Presenter: Hsin-Mao Chen Date:2010/02/24

2 Outline Introduction Architectural Bottlenecks Multi-CPU/Core Enhancements Performance Evaluation

3 Introduction Linux Network boards Packet Reception or Transmission HW Interrupt (IRQ) Kernel Software IRQs (SoftIRQs) Packet Processing RAM TxRing and RxRing

4 Introduction A SoftIRQ executes two main tasks. 1.The de-allocation of already-transmitted packets placed in the TxRing. 2.All the real packet forwarding operations. The task handles the received packets in the RxRing.

5 Architectural Bottlenecks SR architecture based on a single CPU/core. 1.The SR computational capacity. 2.The bandwidth/latency of I/O busses. SR architecture based on multiprocessor. Typical performance issues may sap parallelization gain. 1.Data accessing serialization. 2.CPU/core cache coherence.

6 Architectural Bottlenecks Data accessing serialization The SoftIRQ accesses to each TxRing are serialized by a code locking procedure (LLTX lock). This lock guarantees that each TxRing can be read or modified by only one SoftIRQ at a time.

7 Architectural Bottlenecks CPU/core cache management Whenever a CPU/core loads a TxRing to its local cache, all of the other processors also cashing it must invalidate their cache copies.

8 Mulit-CPU/core Enhancements HW evolution Intel® Advanced Smart Cache: It consists of a mechanism that allows level 2 cache-sharing among all the cores in the same processor. Intel PRO 1000 adapters: It supports multiple Tx- and Rx Ring and multiple HW IRQs per network interface.

9 Mulit-CPU/core Enhancements SW architecture 1.To entirely bind all operations carried out in forwarding a packet to a single CPU. 2.To reduce LLTX lock contention as much as possible. 3.To equally distribute the computational load among all the processors/cores in the system.

10 Mulit-CPU/core Enhancements CPU/core binding to TxRing: Bind each CPU/core to a different TxRing on each output device. CPU/core binging to RxRing: Bind each RxRings to a different CPU/core. Xeon core: 1 Mpkt/s Gigabit Ethernet interface: Mpkt/s with 64B sized frames Fast Ethernet interface: pkt/s with 64B sized frames

11 Mulit-CPU/core Enhancements

12 Performance Evaluation Standard SR architecture Agilent N2X router

13 Performance Evaluation Standard SR architecture

14 Performance Evaluation Enhanced SR architecture

15 Performance Evaluation Enhanced SR architecture

16 Performance Evaluation Enhanced SR architecture

17 Performance Evaluation Multi-layer service support

18 Performance Evaluation Multi-layer service support