Introduction Introduction 1/9 - 2006 INF5062: Programming asymmetric multi-core processors.

Slides:



Advertisements
Similar presentations
Nios Multi Processor Ethernet Embedded Platform Final Presentation
Advertisements

CCNA3: Switching Basics and Intermediate Routing v3.0 CISCO NETWORKING ACADEMY PROGRAM Switching Concepts Introduction to Ethernet/802.3 LANs Introduction.
4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Lecture 6: Multicore Systems
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
Introduction Introduction Håkon Kvale Stensland August 26 th, 2011 INF5063: Programming heterogeneous multi-core processors.
Introduction Introduction Håkon Kvale Stensland August 19 th, 2012 INF5063: Programming heterogeneous multi-core processors.
Introduction Introduction Håkon Kvale Stensland August 28 th, 2012 INF5063: Programming heterogeneous multi-core processors.
Cell Broadband Engine. INF5062, Carsten Griwodz & Pål Halvorsen University of Oslo Cell Broadband Engine Structure SPE PPE MIC EIB.
IXP: Bump in the Wire IXP: Bump in the Wire INF5063: Programming Asymmetric Multi-Core Processors 15 April 2015.
Introduction Introduction 28/ INF5063: Programming heterogeneous multi-core processors.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
IXP: The Bump in the Wire IXP: The Bump in the Wire INF5062: Programming Asymmetric Multi-Core Processors 22 April 2015.
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Introduction to Operating Systems CS-2301 B-term Introduction to Operating Systems CS-2301, System Programming for Non-majors (Slides include materials.
1 Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
I/O Channels I/O devices getting more sophisticated e.g. 3D graphics cards CPU instructs I/O controller to do transfer I/O controller does entire transfer.
Performance Analysis of the IXP1200 Network Processor Rajesh Krishna Balan and Urs Hengartner.
Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer.
ECE 526 – Network Processing Systems Design
GCSE Computing - The CPU
CPU Chips The logical pinout of a generic CPU. The arrows indicate input signals and output signals. The short diagonal lines indicate that multiple pins.
Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
Router Architectures An overview of router architectures.
Router Architectures An overview of router architectures.
Module I Overview of Computer Architecture and Organization.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Emotion Engine A look at the microprocessor at the center of the PlayStation2 gaming console Charles Aldrich.
Paper Review Building a Robust Software-based Router Using Network Processors.
Computer System Architectures Computer System Software
Cell Architecture. Introduction The Cell concept was originally thought up by Sony Computer Entertainment inc. of Japan, for the PlayStation 3 The architecture.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Introduction Introduction 2/ INF5061: Multimedia data communication using network processors.
LOGO BUS SYSTEM Members: Bui Thi Diep Nguyen Thi Ngoc Mai Vu Thi Thuy Class: 1c06.
Computer Graphics Graphics Hardware
Chapter 2 The CPU and the Main Board  2.1 Components of the CPU 2.1 Components of the CPU 2.1 Components of the CPU  2.2Performance and Instruction Sets.
CSE 58x: Networking Practicum Instructor: Wu-chang Feng TA: Francis Chang.
Kevin Eady Ben Plunkett Prateeksha Satyamoorthy.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Lab Assignment 15/ INF5060: Multimedia data communication using network processors.
CHAPTER 4 The Central Processing Unit. Chapter Overview Microprocessors Replacing and Upgrading a CPU.
Computer Organization & Assembly Language © by DR. M. Amer.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.
ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Introduction Introduction September 26, 2016 INF5063: Programming heterogeneous multi-core processors.
Computer Graphics Graphics Hardware
GCSE Computing - The CPU
Cell Architecture.
Architecture & Organization 1
Architecture & Organization 1
Microprocessor & Assembly Language
Network Core and QoS.
Overview of Computer Architecture and Organization
Constructing a system with multiple computers or processors
Chapter 1 Introduction.
Computer Graphics Graphics Hardware
GCSE Computing - The CPU
Network Core and QoS.
Presentation transcript:

Introduction Introduction 1/ INF5062: Programming asymmetric multi-core processors

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Overview Course topic and scope Examples of asymmetric processors (Very) short intro of the Intel IXP 2400 network processors

INF5062: The Course

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors People Morten Pedersen (TA) ifi Carsten Griwodz ifi Pål Halvorsen ifi

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors About INF5062: Topic & Scope Content: The course gives …  … an overview of asymmetric multi-core processors in general and network processor cards in particular (architectures and use)  … an introduction of how to program the Intel IXP 2400 network processors  … some ideas of how to use/program asymmetric multi- core processors (guest lectures and paper presentations)

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors About INF5062: Topic & Scope Lab-assignments: An important part of the course are lab-assignments where the students should make a program for the Intel IXP2400 network processor 1. protocol statistics – download and run wwpingbump and then extend it to give processor, interface and protocol statistics 2. packet bridge with ARP support – forward packet to correct interface (of 3 available) 3. transparent load balancer – balance load and forward packets to the right machine in a cluster of two with same IP address 4. HTTP protocol translator – add support in the transparent load balancer for HTTP streaming having an RTSP/RTP server

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors About INF5062: Exam Prerequisite – mandatory assignments:  lab assignment 1: protocol statistics  presentation of a relevant paper Graded assignments (counting 33% each):  lab assignment 3: transparent load balancer deliver code short demo/explanation of code  lab assignment 4: HTTP protocol translator deliver code and a short report present and demonstrate Final oral exam (counting 33%): early December 2006  excerpts IXP documentation  lecture slides  presented papers  content of lab assignments

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Available Resources Resources will be placed at  | griff}/INF5062  Login:inf5062  Password: ixp  Manuals, papers, code example, …

Background and Motivation 1: Graphics Processing Units

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Graphics Processing Units (GPUs) buss connector & memory hub GPU: a dedicated graphics rendering device 2D 3D New powerful GPUs, e.g.,: Nvidia GeForce 7950 GX2 dual 400 MHz core each with 512 MB memory memory BW: 77 GBps fill rate: 24 x 10 9 pixels/s similar from other manufacturers … First GPUs, 80s: for early 2D operations Amiga and Atari used a blitter – bit block transfer (to offload memory graphics transfers) and Amiga also had the copper graphics processor 90s: 3D hardware for game consoles like PS and N64

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors General Purpose Computing on GPU The  high arithmetic precision  extreme parallel nature  optimized, special-purpose instructions  available resources  … … of the GPU allows for general, non-graphics related operations to be performed on the GPU BUT: how should it be programmed and which tasks should go where?

Background and Motivation 2: Moore’s Law for single cores

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Motivation: Intel View Soon >billion transistors integrated

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Motivation: Intel View Soon >billion transistors integrated Clock frequency can still increase

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Motivation: Intel View Soon >billion transistors integrated Clock frequency can still increase Future applications will demand TIPS

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Motivation: Intel View Soon >billion transistors integrated Clock frequency can still increase Future applications will demand TIPS Power? Heat?

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Motivation “Future applications will demand TIPS” “Think platform beyond a single processor” “Exploit concurrency at multiple levels” “Power will be the limiter due to complexity and leakage” Distributed workload on multiple cores + simple processors consume less energy  asymmetric multi-core processors

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Co-Processors Commodore Amiga was one of the earlier machines that used multiple processors  blitter & copper (as we saw for the GPUs)  Motorola 680x0  IBM power  old Motorola kept for backwards compatibility and parts of the OS not ported The original IBM PC included a socket for an Intel 8087 floating point co-processor (FPU)  50-fold speed up of floating point operations Intel kept the co-processor up to i486, where the i487 actually was a full 486DX knocking the 486SX to sleep.

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Intel Multi-Core Processors Single-die multi-processors is the new era and the next logical step in driving Moore’s Law into another decade Highly parallel Moderately parallel Sequential

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Intel Multi-Core Processors The operating systems can handle only a limited number of threads, e.g., 64 in Windows (2005) Where does the doubling stop? How many applications need more than 64 threads? Performance?  software limits is the issue Application specific engines?  Intel Academic Forum 2006: YES  But: Programming model?????

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors STI (Sony, Toshiba, IBM) Cell Cell is a 9-core processor  combining a light-weight general- purpose processor with multiple co-processors into a coordinated whole  Power Processing Element (PPE) conventional Power processor not supposed to perform all operations itself, acting like a controller running conventional OSes 16 KB instruction/data level 1 cache 512 KB level 2 cache

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors STI (Sony, Toshiba, IBM) Cell  Synergistic Processing Elements (SPE) specialized co-processors for specific types of code, i.e., very high performance vector processors local stores can do general purpose operations the PPE can start, stop, interrupt and schedule processes running on an SPE  Element Interconnect Bus (EIB) internal communication bus connects on-chip system elements: o PPE & SPEs o the memory controller (MIC) o two off-chip I/O interfaces 25.6 GBps each way

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors STI (Sony, Toshiba, IBM) Cell  memory controller Rambus XDRAM interface to Rambus XDR memory dual channels at 12.8 GBps  25.6 GBps  I/O controller Rambus FlexIO interface which can be clocked independently dual configurable channels maximum ~ 76.8 GBps

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors STI (Sony, Toshiba, IBM) Cell  Cell has in essence traded running everything at moderate speed for the ability to run certain types of code at high speed  used for example in Sony PlayStation 3: o 3.2 GHz clock o 7 SPEs for general operations o 1 SPE for security for the OS Toshiba home cinema: o decoding of 48 HDTV MPEG streams  dozens of thumbnail videos simultaneously on screen IBM blade centers: o 3.2 GHz clock o Linux ≥

Background and Motivation 3: Network traffic increase

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Uses conventional, shared hardware (e.g., a PC) Software  runs the entire system  allocates memory  controls I/O devices  performs all protocol processing First generation network systems: Software-Based Network System

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Review of General Data Path on Conventional Computer Hardware Architectures communication system application user space kernel space sending: communication system application receiving: communication system application forwarding: transport (TCP/UDP) network (IP) link Checksumming Copying Fragmentation Interrupts

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Question: Which is growing faster?  network bandwidth  processing power Note: if network bandwidth is growing faster  CPU may be the bottleneck  need special-purpose hardware  conventional hardware will become irrelevant Note: if processing power is growing faster  no problems with processing  network/busses will be bottlenecks

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Growth Of Technologies Mbps year Engineering rule: 1GHz general purpose CPU = 1Gbps network data rate Thus, software running on a general-purpose processor is insufficient to handle high-speed networks because the aggregate packet rate exceeds the capabilities of the CPU

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Network Processors: The Idea in a Nutshell Many designs through many generations (varying amount of HW & SW) Include support for protocol processing and I/O on one chip  General-purpose processor(s) for control tasks  Special-purpose processor(s) for packet processing and table lookup  Include functional units for tasks such as checksum computation, hashing, …  Call the result a network processor We are here – they exist  BUT: programming model??

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Network Processors: Main Idea Traditional system: - slow - resource demanding - shared with other operations Network processors: - a computer within the computer - special, programmable hardware - offloads host resources

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Explosion of Commercial Products 1990  2000: network processors transformed from interesting curiosity to mainstream product rreduction in both overall costs and time to market 22002: over 30 vendors with a vide range of architectures ee.g., Multi-Chip Pipeline (Agere) Augmented RISC Processor (Alchemy) Embedded Processor Plus Coprocessors (Applied Micro Circuit Corporation) Pipeline of Homogeneous Processors (Cisco) Pipeline of Heterogeneous Processors (EZchip) Configurable Instruction Set Processors (Cognigine) Extensive And Diverse Processors (IBM) Flexible RISC Plus Coprocessors (Motorola) Internet Exchange Processor (Intel) …

Agere P PP PayloadPlus: A Short Overview

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors Agere PayloadPlus (APP)  consists of both programmable hardware and software  consists of both data and control planes (i.e., slow and fast plane) APP defines HW architectures, SW mechanisms, interconnection mechanisms and interfaces, BUT does not specify how to implement them Several versions of APP exist differing in the number and types of functional units, degree of parallelism and internal bandwidth (2. generation: 5 models)

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors APP Conceptual Pipeline Classifier  extract packets from ingress  classify packet  send statistics to state engine  reassemble blocks  pass packet to forwarder together with classification decision State engine  initiate, configure and control classifier and traffic manager  receives control from classifier  update statistics (e.g., packet count)  check packets against profiles (and inform classifier) Forwarder  get packet from classifier  perform traffic shaping and management  fragment packet (if necessary)  modify headers (if necessary)

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors APP550 Chip

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors APP550 Chip Memory interfaces: - two types of physical memory - fast cycle RAM (FCRAM) for fast memory accesses - double data rate SRAM (DDR-SRAM) for high throughput - the different memory types are usually used like this:

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors APP550 Chip Media interfaces: - several to form fast data paths - two external connections: - cell-oriented (ATM) - packet-oriented (Ethernet)

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors APP550 Chip PCI bus interfaces: - allows communication with host CPU - mainly to control the whole operation Scheduling interface interfaces: - an external scheduling interface - external logic can use information about queues

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors APP550 Chip Coprocessor interfaces: - APP550 should be able to process a packet - BUT, to accommodate special cases, e.g., adding additional headers a co-processor interface is provided

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors APP550 Chip

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors APP550 Chip Pattern Processing Engine - patterns specified by programmer - programmable using a special high-level language - only pattern matching instructions - parallelism by hardware using multiple copies and several sets of variables - access to different memories State Engine - gather information (statistics) for scheduling - verify flow within bounds - provide an interface to the host - configure and control other functional units Traffic Manager - schedule packets and shape traffic flow - programmable via scripts - sends packets to output interface - according to implemented policy: - discard packets - choose queue Packet (protocol data unit) assembler - collect all blocks of a frame - not programmable Stream Editor (SED) - two parallel engines - modify outgoing packets (e.g., checksum, TTL, …) - configurable, but not programmable Reorder Buffer Manager - transfers data between classifier and traffic manager - ensure packet order due to parallelism and variable processing time in the pattern processing

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors APP550 Full Duplex Clock rate for APP550 is 233 MHz One chip cannot manage packet at wire speed in both directions – often two in parallel (one each direction)  overhead: all features needed in both direction?  classification only one direction  checks outgoing packets and enqueues using special queue

Intel I II IXP1200 / 2400: A Short Overview

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors IXA: Internet Exchange Architecture IXA is a broad term to describe the Intel network architecture (HW & SW, control- & data plane) IXP: Internet Exchange Processor  processor that implements IXA  IXP1200 is the first IXP chip (4 versions)  IXP2xxx has now replaced the first version IXP1200 basic features  1 embedded 232 MHz StrongARM  6 packet 232 MHz µengines  onboard memory  4 x 100 Mbps Ethernet ports  multiple, independent busses  low-speed serial interface  interfaces for external memory and I/O busses  … IXP2400 basic features  1 embedded 600 MHz XScale  8 packet 600 MHz µengines  3 x 1 Gbps Ethernet ports  …

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors IXP1200 Architecture Serial line: - connects to the RISC - intended for control and management - rate 38 Kbps PCI bus: - allow IXP to connect to I/O devices - enable use of host CPU - rate 2.2 Gbps IX (Intel eXchange) bus: - enable higher rates compared to PCI - form fast path (IXP and high-speed interfaces) - interface to other IXP cards Gbps SRAM bus: - shared bus (several external units) - usually control rather than data - rate 3.71 Gbps SDRAM bus: - provide access to external SDRAM memory used to store packets - can also pass addresses, control/store operations, etc. - rate 7.42 Gbps

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors IXP1200 Architecture RISC processor: - StrongARM running Linux - control, higher layer protocols and exceptions MHz Microengines: - low-level devices with limited set of instructions - transfers between memory devices - packet processing MHz Access units: - coordinate access to external units Scratchpad: - on-chip memory - used for IPC and synchronization

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors IXP1200 Processor Hierarchy General-Purpose Processor: - used for control and management - running general applications RISC processor: - chip configuration interface (serial line) - control, higher layer protocols and exceptions I/O processors (microengines): - transfers between memory devices - packet processing Coprocessors: - real-time clock and timers - IX bus controller - hashing unit -... Physical interface processors: - implement layer 1 & 2 processing

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors IXP1200 Memory Hierarchy

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors IXP1200 Memory Hierarchy Different memory types… …are organized into different addressable data units (words or longwords) …have different access times …connected to different busses Therefore, to achieve optimal performance, programmers must understand the organization and allocate items from the appropriate type

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors IXP1200  IXP2400 SRAM FLASH MEMORY MAPPED I/O DRAM SRAM access SDRAM access SCRATCH memory PCI access IX access Embedded RISK CPU (StrongARM) PCI bus IX bus DRAM bus SRAM bus microengine 2 microengine 1 microengine 5 microengine 4 microengine 3 microengine 6 multiple independent internal buses IXP1200

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors IXP2400 IXP2400 Architecture microengine 8 SRAM coprocessor FLASH DRAM SRAM access SDRAM access SCRATCH memory PCI access MSF access Embedded RISK CPU (XScale) PCI bus receive bus DRAM bus SRAM bus microengine 2 microengine 1 microengine 5 microengine 4 microengine 3 multiple independent internal buses slowport access … transmit bus RISC processor: - StrongArm  XScale MHz  600 MHz Microengines - 6  MHz  600 MHz Media Switch Fabric - forms fast path for transfers - interconnect for several IXP2xxx Receive/transmit buses - shared bus  separate busses Slowport - shared inteface to external units - used for FlashRom during bootstrap Coprocessors - hash unit - 4 timers - general purpose I/O pins - external JTAG connections (in-circuit tests) - several bulk cyphers (IXP2850 only) - checksum (IXP2850 only) - …

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors IXP2400 Architecture Memory  generally more of everything  generally larger gap between CPUs and memory access in terms of cycles  local memory on each microengine saving temporary results private per packet processor small (2560 bytes) low latency (one cycle) accessed through special registers

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors IXP2400 Basic Packet Processing microengine 8 SRAM coprocessor FLASH DRAM SRAM access SDRAM access SCRATCH memory PCI access MSF access Embedded RISK CPU (XScale) PCI bus receive bus DRAM bus SRAM bus microengine 2 microengine 1 microengine 5 microengine 4 microengine 3 multiple independent internal buses slowport access … transmit bus

2006 Carsten Griwodz & Pål HalvorsenINF5062 – programming asymmetric multi-core processors The End: Summary Asymmetric multi-core processors are already  Challenge: programming  should know the capabilities of the system  identify which parts of a program that should run where  different methods to program the different components We will use Intel IXP2400 as an example which offers…  …embedded processor plus parallel packet processors  …connections to external memories and buses Next time: how to start programming these monsters