IXP Lab 2012: Part 1 Network Processor Brief. NCKU CSIE CIAL Lab2 Outline Network Processor Intel IXP2400 Processing Element Register Memory Interface.

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Supercharging PlanetLab : a high performance, Multi-Application, Overlay Network Platform Written by Jon Turner and 11 fellows. Presented by Benjamin Chervet.
Supercharging PlanetLab A High Performance,Multi-Alpplication,Overlay Network Platform Reviewed by YoungSoo Lee CSL.
A First Example: The Bump in the Wire A First Example: The Bump in the Wire 9/ INF5061: Multimedia data communication using network processors.
A First Example: The Bump in the Wire A First Example: The Bump in the Wire 8/ INF5062: Programming Asymmetric Multi-Core Processors.
Advanced microprocessor optimization Kampala August, 2007 Agner Fog
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
©UCR CS 162 Computer Architecture Lecture 8: Introduction to Network Processors (II) Instructor: L.N. Bhuyan
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Modified from notes by Saeid Nooshabadi COMP3221: Microprocessors and Embedded Systems Lecture 25: Cache - I Lecturer:
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
1 Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
1 Design and Implementation of A Content-aware Switch using A Network Processor Li Zhao, Yan Luo, Laxmi Bhuyan University of California, Riverside Ravi.
IXP1200 Microengines Apparao Kodavanti Srinivasa Guntupalli.
Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian,
4/22/2003 Network Processor & Its Applications1 Network Processor and Applications Prof. Laxmi Bhuyan
Performance Analysis of the IXP1200 Network Processor Rajesh Krishna Balan and Urs Hengartner.
Ubiquitous Component Remoting Support on Overlay Network Adaptation support with Ontology-based annotation Roaming support of wireless component communication.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Intel IXP1200 Network Processor q Lab 12, Introduction to the Intel IXA q Jonathan Gunner, Sruti.
©UCR CS 260 Lecture 1: Introduction to Network Processors Instructor: L.N. Bhuyan
Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of.
ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer.
ECE 526 – Network Processing Systems Design
Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Paper Review Building a Robust Software-based Router Using Network Processors.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Types of Computers Mainframe/Server Two Dual-Core Intel ® Xeon ® Processors 5140 Multi user access Large amount of RAM ( 48GB) and Backing Storage Desktop.
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
CSE 58x: Networking Practicum Instructor: Wu-chang Feng TA: Francis Chang.
COMP3221 lec04--prog-model.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lecture 4: Programmer’s Model of Microprocessors
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
1 Chapter 2: Computer-System Structures  Computer System Operation  I/O Structure  Storage Structure  Storage Hierarchy  Hardware Protection  General.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Implementing Click IP Router Kernel on VLIW Architectures Kanyu Mark Cao and Xiaodong Jin Many thanks to Scott Weber and Kees Vissers for the help on this.
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
Intro  Scratchpad rings and queues.  First – In – Firs – Out (FIFO) data structure.  Rings are fixed-sized, circular FIFO.  Queues not fixed-size.
SCALABLE PACKET CLASSIFICATION USING INTERPRETING A CROSS-PLATFORM MULTI-CORE SOLUTION Author: Haipeng Cheng, Zheng Chen, Bei Hua and Xinan Tang Publisher/Conf.:
Processor Architecture
Performance Analysis of Packet Classification Algorithms on Network Processors Deepa Srinivasan, IBM Corporation Wu-chang Feng, Portland State University.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
IXP Lab 2012: Part 3 Programming Tips. Outline Memory Independent Techniques – Instruction Selection – Task Partition Memory Dependent Techniques – Reducing.
 Program Abstractions  Concepts  ACE Structure.
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Mart Haitjema Block Design Review: ONL NP Router Multiplexer (MUX)
1 TM 1 Embedded Systems Lab./Honam University ARM Microprocessor Programming Model.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
An NP-Based Router for the Open Network Lab Hardware
CS703 - Advanced Operating Systems
Types of Computers Mainframe/Server
Lec 11 – Multicore Architectures and Network Processors
Apparao Kodavanti Srinivasa Guntupalli
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Computer Architecture
Instructor: L.N. Bhuyan CS 213 Computer Architecture Lecture 7: Introduction to Network Processors Instructor: L.N. Bhuyan.
Design of a Diversified Router: Project Management
Chapter 11 Processor Structure and function
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Presentation transcript:

IXP Lab 2012: Part 1 Network Processor Brief

NCKU CSIE CIAL Lab2 Outline Network Processor Intel IXP2400 Processing Element Register Memory Interface IXP Programming Language Programming Model Programming Syntax

NCKU CSIE CIAL Lab3 Router Development (1) Software Based General Purpose Processor Flexible Poor Performance … Hardware Based ASIC Best Performance Long Development Time

NCKU CSIE CIAL Lab4 Router Development (2) Network Processor (NPU) Based Balance of both How ? Parallel processors Multi-threaded cores Programmable processors with nonprogrammble copressors

NCKU CSIE CIAL Lab5 Network Processor Overview For high speed packet processing Comprise Multi-Cores for Parallel executing Multi-Threaded Core Reduced Instruction Set Multiple Memory Interfaces

NCKU CSIE CIAL Lab6 Hierarchical Layer Data-Plane Fast-Path Slow-Path Control-Plane Routing Protocol Management-Plane Monitor Applications User Interface

NCKU CSIE CIAL Lab7 Data-Plane Fast-Path General Packet Handling As fast as possible Slow-Path Exception Packet Handling Packet with options Local TCP/IP Stack

NCKU CSIE CIAL Lab8 Internet eXchange Processor First Generation IXP1200, IXP1240, IXP1250 Second Generation IXP2400, IXP2800, IXP2850 IXP2805, IXP2855 Others IXP4XX

NCKU CSIE CIAL Lab9 Network Flow Processor By Netronome From Intel IXP2XXX NFP-3240, NFP-3216

NCKU CSIE CIAL Lab10 Target Development Board Radisys ENP-2611

NCKU CSIE CIAL Lab11 Intel IXP2400 Block Diagram

NCKU CSIE CIAL Lab12 IXP2400 Overview Functional Block Processing Element Memory Interfaces Coprocessors Other Interfaces Hierarchical View

NCKU CSIE CIAL Lab13 Processing Element Programmability Hierarchical Processing Elements XScale Microengine (ME)

NCKU CSIE CIAL Lab14 XScale RISC based processor (ARMV5TE) Real-time OS Montavista Linux ME Management Control ME execution Resource Management

NCKU CSIE CIAL Lab15 MicroEngine (1) Eight MEs per IXP2400 (work in parallel) Eight Threads per ME Instruction set of ME are reduced for packet processing only Not as powerful as general processor No floating point related instructions No divide instruction

NCKU CSIE CIAL Lab16 MicroEngine (2) No OS Not interactive Managed by XScale Code Store (4K Instrcutions) Executing

NCKU CSIE CIAL Lab17 MicroEngine Threads Concurrent Executing No Preemptive Round Robin Executing Each thread own its private set of registers Zero-Overhead Context Switching

NCKU CSIE CIAL Lab18 Registers of ME 256 GPRs 256 SRAM Transfer Registers 128 Read 128 Write 256 DRAM Transfer Registers 128 Read 128 Write 128 Next Neighbor Registers

NCKU CSIE CIAL Lab19 Context Switch Content of registers needs not be swap- out and swap-in during context switching With the mechanism, another thread can swap in and doing some useful task to cover the long latency when the previous thread has swapped out for issues a memory request

NCKU CSIE CIAL Lab20 Memory Interface of IXP2400 Local Memory Smallest and Fastest Scratchpad Passing handle of the packet SRAM Hold data structure for packet processing DRAM Largest and Slowest Hold packet ’ s content

NCKU CSIE CIAL Lab21 Local Memory Per ME Private to Other MEs Private to XScale Size: 2560 Bytes (640 LWs) Usage Variable Spilling Caching Latency: 3 cycles

NCKU CSIE CIAL Lab22 Scratchpad On-Chip Memory Shared by all MEs Size: 16KB (Fixed) Usage: Scratchpad Scratch Ring (Hardware FIFO) Latency: ~60 cycles

NCKU CSIE CIAL Lab23 SRAM Off-Chip Memory Shared by all MEs (2-channels) Size: 64 MB (Per Channel at Maximum) Usage: Hardware FIFO Hold data structure Hold Meta-data of packets Latency: ~90 cycles

NCKU CSIE CIAL Lab24 DRAM Off-Chip Memory Shared by all MEs (1-channels) Size: 1 GB (at Maximum) Usage: Hold whole packet contents Alternative space for data structure Latency: ~120 cycles

NCKU CSIE CIAL Lab25 Coprocessor MSF (Media Switch Fabric) Receive Packet to DRAM Transmit Packet from DRAM SHaC Scratchpad Hash Unit CAP

NCKU CSIE CIAL Lab26 Packet META-DATA (1) Data for processing packets How to identify packet? Packet Handle Packet Temporal Information Non-related to packet content Meta-data Input Port, Output Port Info for Packet Address in DRAM

NCKU CSIE CIAL Lab27 Packet META-DATA (2) How to pass these info between ME? Hardware FIFO Scratch Ring SRAM Ring Next-Neighbor Ring Issues

NCKU CSIE CIAL Lab28 Hierarchical View (Setting #1) Only one IXP2400 based board Data-Plane Fast-Path: Microengine Slow-Path: XScale Control-Plane XScale Management-Plane XScale

NCKU CSIE CIAL Lab29 Hierarchical View (Setting #2) Multiple IXP2400 based boards Data-Plane Fast-Path: Microengine Slow-Path: XScale Control-Plane CPU Management-Plane CPU

NCKU CSIE CIAL Lab30 Programming IXP2400 XScale Programming with C Microengine Programming with MicroC or Microcode We will focus on this part !

NCKU CSIE CIAL Lab31 IDE Tool --IXA SDK Workbench

NCKU CSIE CIAL Lab32 ME Language MicroC Subset of ANSI C Only limited part of standard C libraries are implemented Intrinsic Library for supporting operations of IXP Microcode High level of assembly

NCKU CSIE CIAL Lab33 Programming Model (1) Receive – Processing – Transmit Intel has provided sample code for receive and transmit. We only focus on the part of processing. RXPROCESSINGTX

NCKU CSIE CIAL Lab34 Programming Model (2) Processing ME Pipeline Model Parallel Model Mixed Model RXPROCESSINGTX

NCKU CSIE CIAL Lab35 Pipeline Model RXTXPROC #1RPOC #2 Control the whole resource of ME Hard to balance between different stage

NCKU CSIE CIAL Lab36 Parallel Model RXTX PROC #1 RPOC #2 Balance is easy Higher Performance Resource is limited

NCKU CSIE CIAL Lab37 Mixed Model RX TX PROC #1 RPOC #2 PROC #3

NCKU CSIE CIAL Lab38 MicroC Example 1 (1) void main () { _declspec(shared sram) int old_array[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; _declspec(shared sram) int new_array[sizeof(old_array)/sizeof(int)]; global_label("start_reverse"); reverse_array(old_array, new_array, sizeof(old_array)/sizeof(int)); global_label("end_reverse"); }

NCKU CSIE CIAL Lab39 MicroC Example 1 (2) void reverse_array(volatile int* old, volatile int* new, int size) { int index = 0; for (index = 0; index < size; index++) { new[index] = old[size - index - 1]; }

NCKU CSIE CIAL Lab40 MicroC Example 2 sram_read(&sram_egt_dim1_2_node, (__declspec(sram) unsigned int *)(PACKET_CLASSIFICATION_SRAM_BA SE1 + current*8), 2, sig_done, &sram_read_sig_dim1_2); __wait_for_all(&sram_read_sig_dim1_2); temp = sram_egt_dim1_2_node.next_dim;

NCKU CSIE CIAL Lab41 1A. COPY IXA_SDK_3.51, ixp_book 到 D:\ 1B: 下載 和 並解開 2. reboot 3.[Ctrl+Enter] 進還原卡總管模式 4.Password: davidchang

NCKU CSIE CIAL Lab42 1. COPY IXA_SDK_3.51, ixp_book 到 D:\ ; 再 reboot 3.[Ctrl+Enter] 進還原卡總管模式 4.Password: davidchang 5. 解壓縮 ixasdk351cd1windows.zip, ixasdk351cd3.zip, ixasdk351framework.zip, 再 依序安裝 (cd1 裝完後需 reboot) 6. 把 ixp_book 目錄 COPY 到 C:\