1 2003 ©UCR CS 260 Lecture 1: Introduction to Network Processors Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan/CS260.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
Layer 3 Switching. Routers vs Layer 3 Switches Both forward on the basis of IP addresses But Layer 3 switches are faster and cheaper However, Layer 3.
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
©UCR CS 162 Computer Architecture Lecture 8: Introduction to Network Processors (II) Instructor: L.N. Bhuyan
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
CCU EE&CTR1 Software Architecture Overview Nick Wang & Ting-Chao Hou National Chung Cheng University Control Plane-Platform Development Kit.
Router Architecture : Building high-performance routers Ian Pratt
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) SriramGopinath( )
4/22/2003 Network Processor & Its Applications1 Network Processor and Its Applications (1) Yan Luo Prof. Laxmi Bhuyan
1 Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
1 Design and Implementation of A Content-aware Switch using A Network Processor Li Zhao, Yan Luo, Laxmi Bhuyan University of California, Riverside Ravi.
IXP1200 Microengines Apparao Kodavanti Srinivasa Guntupalli.
Chapter Hardwired vs Microprogrammed Control Multithreading
Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian,
4/22/2003 Network Processor & Its Applications1 Network Processor and Applications Prof. Laxmi Bhuyan
Min-Sheng Lee Efficient use of memory bandwidth to improve network processor throughput Jahangir Hasan 、 Satish ChandraPurdue University T. N. VijaykumarIBM.
Chapter 17 Parallel Processing.
Performance Analysis of the IXP1200 Network Processor Rajesh Krishna Balan and Urs Hengartner.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Intel IXP1200 Network Processor q Lab 12, Introduction to the Intel IXA q Jonathan Gunner, Sruti.
©UCB CS 162 Computer Architecture Lecture 2: Introduction & Pipelining Instructor: L.N. Bhuyan
Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of.
ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer.
ECE 526 – Network Processing Systems Design
Parallel Computer Architectures
Chapter 9 Classification And Forwarding. Outline.
Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
Network Processors and Web Servers CS 213 LECTURE 17 From: IBM Technical Report.
Router Architectures An overview of router architectures.
Router Architectures An overview of router architectures.
Computer Networks Switching Professor Hui Zhang
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Gigabit Routing on a Software-exposed Tiled-Microprocessor
Paper Review Building a Robust Software-based Router Using Network Processors.
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Lecture 1 Introduction to Application Oriented Networking.
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) Sriram Gopinath( )
A 50-Gb/s IP Router 참고논문 : Craig Partridge et al. [ IEEE/ACM ToN, June 1998 ]
CSE 58x: Networking Practicum Instructor: Wu-chang Feng TA: Francis Chang.
Data and Computer Communications Circuit Switching and Packet Switching.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Local-Area-Network (LAN) Architecture Department of Computer Science Southern Illinois University Edwardsville Fall, 2013 Dr. Hiroshi Fujinoki
Network Processors Harsh Chilwal. 900MHz Voice 1G 900MHz 1800MHz Voice 2G MHz Smart Phone Full web service 3G MHz Voice Tiny Internet.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
IXP Lab 2012: Part 1 Network Processor Brief. NCKU CSIE CIAL Lab2 Outline Network Processor Intel IXP2400 Processing Element Register Memory Interface.
CS 4396 Computer Networks Lab Router Architectures.
Intel ® IXP2XXX Network Processor Architecture and Programming Prof. Laxmi Bhuyan Computer Science UC Riverside.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
 Program Abstractions  Concepts  ACE Structure.
Lecture Note on Switch Architectures. Function of Switch.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
VIRTUAL NETWORK PIPELINE PROCESSOR Design and Implementation Department of Communication System Engineering Presented by: Mark Yufit Rami Siadous.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
CS 213 Lecture 7: Multiprocessor 3: Synchronization, Prefetching
CS 31006: Computer Networks – The Routers
Network Core and QoS.
Packet Switch Architectures
Lec 11 – Multicore Architectures and Network Processors
Apparao Kodavanti Srinivasa Guntupalli
Instructor: L.N. Bhuyan CS 213 Computer Architecture Lecture 7: Introduction to Network Processors Instructor: L.N. Bhuyan.
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Network Core and QoS.
Presentation transcript:

©UCR CS 260 Lecture 1: Introduction to Network Processors Instructor: L.N. Bhuyan

©UCR Outline °Introduction to NP Systems °Relevant Applications °Design Issues and Challenges °Relevant Software and Benchmarks °A case study: Intel IXP network processors

©UCR What are Network Processors °Any device that executes programs to handle packets in a data network °Examples Processors on router line cards Processors in network access equipment

©UCR Why Network Processors °Current Situation Data rates are increasing Protocols are becoming more dynamic and sophisticated Protocols are being introduced more rapidly °Processing Elements GP(General-purpose Processor) -Programmable, Not optimized for networking applications ASIC(Application Specific Integrated Circuit) - high processing capacity, long time to develop, Lack the flexibility NP(Network Processor) - achieve high processing performance - programming flexibility - Cheaper than GP

©UCR Typical NP Architecture SDRAM (Packet buffer) SRAM (Routing table) multi-threaded processing elements Co-processor Input ports Output ports Network Processor Bus

©UCR TCP/IP Model °ISO OSI (Open Systems Interconnection) not fully implemented °Presentation and Session layers not present in TCP/IP Application Pre. Session Transport Network Data Link Physical Application TCP IP Host-to-Net OSITCP/IP

©UCR Processing Tasks Policy Applications Network Management Signaling Topology Management Queuing / Scheduling Data Transformation Classification Data Parsing Media Access Control Physical Layer Data Plane Control Plane Source: Network Processor Tutorial in Micro 34 - Mangione-Smith & Memik

©UCR Application Categorization °Control-Plane tasks Less time-critical Control and management of device operation -Table maintenance, port states, etc. °Data-Plane tasks Operations occurring real-time on “packet path” Core device operations -Receive, process and transmit packets

©UCR Data Plane Tasks °Media Access Control Low-level protocol implementation -Ethernet, SONET framing, ATM cell processing, etc. °Data Parsing Parsing cell or packet headers for address or protocol information °Classification Identify packet against a criteria (filtering / forwarding decision, QoS, accounting, etc.) °Data Transformation Transformation of packet data between protocols °Traffic Management Queuing, scheduling and policing packet data

©UCR Applications: IPv4 Routing °Routers determine next hop and forward packets Router A B C P P P

©UCR URL-based switching – My NSF Project °Increase efficiency °Tasks Traverse the packet data (request) for each arriving packet and classify it: -Contains ‘.jpg’ -> to image server -Contains ‘cgi-bin/’ -> to application server Switch Image Server Application Server HTML Server Internet GET /cgi-bin/form HTTP/1.1 Host: APP. DATATCPIP

©UCR Organizing Processor Resources °Design decisions: High-level organization ISA and micro architecture Memory and I/O integration °Today’s commercial NPs: Chip multiprocessors Most are multithreaded Exploit little ILP (Cisco does) No cache Micro-programmed

©UCR Architectural Comparisons °High-level organizations Aggressive superscalar (SS) Fine-grained multithreaded (FGMT) Chip multiprocessor (CMP) Simultaneous multithreaded (SMT)

©UCR Multithreading °Basic idea multiple register sets in the processor fast context switch switch thread on a cache access (How is this different than non-blocking cache?) tolerating local latency vs remote in CC- NUMA multiprocessors hybrids -switch on notice -simultaneous multithreading

©UCR Architectural Comparisons (cont.) Time (processor cycle) SuperscalarFine-GrainedCoarse-Grained Multiprocessing Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Idle slot Simultaneous Multithreading

©UCR Tasks and Services Three Benchmarks used in the experiment

©UCR Some Challenges °Intelligent Design Given a selection of programs, a target network link speed, the ‘best’ design for the processor -Least area -Least power -Most performance °Write efficient multithreaded programs NPs have -Heterogeneous computer resources -Non-uniform memory -Multiple interacting threads of execution -Real-time constraints Make use of resources -How to use special instructions and hardware assists –Compilers –Hand-coded Multithreaded programs -Manage access to shared state -Synchronization between threads

©UCR Benchmarks for Network Processors NetBench -10 applications - CommBench -8 networking and communications applications - EEMBC - MediaBench -Transcoders -Some communications applications

©UCR IXP1200 Block Diagram °StrongARM processing core °Microengines introduce new ISA °I/O PCI SDRAM SRAM IX : PCI-like packet bus °On chip FIFOs 16 entry 64B each

©UCR IXP1200 Microengine °4 hardware contexts Single issue processor Explicit optional context switch on SRAM access °Registers All are single ported Separate GPR 256*6 = 1536 registers total °32-bit ALU Can access GPR or XFER registers °Shared hash unit 1/2/3 values – 48b/64b For IP routing hashing °Standard 5 stage pipeline °4KB SRAM instruction store – not a cache! °Barrel shifter

©UCR IXP 2400 Block Diagram °XScale core replaces StrongARM °Microengines Faster More: 2 clusters of 4 microengines each °Local memory °Next neighbor routes added between microengines °Hardware to accelerate CRC operations and Random number generation °16 entry CAM ME0ME1 ME2ME3 ME4ME5 ME6ME7 Scratch /Hash /CSR MSF Unit DDR DRAM controller XScale Core QDR SRAM controller PCI

©UCR Different Types of Memory Type Width (byte) Size (bytes) Approx unloaded latency (cycles) Notes Local Indexed addressing post incr/decr On-chip Scratch 416K60Atomic ops SRAM4256M150Atomic ops DRAM82G300Direct path to/fro MSF

©UCR IXA Software Framework Control Plane Protocol Stack Control Plane PDK Core Components Core Component Library Resource Manager Library Microblock Library Protocol Library Hardware Abstraction Library Micro block Micro block Micro block Utility Library XScale Core Microengine Pipeline Microengine C Language C/C++ Language External Processors

©UCR Example Toaster System: Cisco °Almost all data plane operations execute on the programmable XMC °Pipeline stages are assigned tasks – e.g. classification, routing, firewall, MPLS Classic SW load balancing problem °External SDRAM shared by common pipe stages

©UCR IBM PowerNP °16 pico-procesors and 1 powerPC °Each pico-processor Support 2 hardware threads 3 stage pipeline : fetch/decode/execute °Dyadic Processing Unit Two pico-processors 2KB Shared memory Tree search engine °Focus is layers 2-4 °PowerPC 405 for control plane operations 16K I and D caches °Target is OC-48

©UCR Motorola C-Port C-5 Chip Architecture

©UCR References °[NPT] W. H. Mangione-Smith, G. Memik Network Processor Technologies °[NPRD] Patrick Crowley, Raj Yavatkar An Introduction to Network Processor Research & Design, HPCA-9 Tutorial