ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

WATERLOO ELECTRICAL AND COMPUTER ENGINEERING 20s: Computer Hardware 1 WATERLOO ELECTRICAL AND COMPUTER ENGINEERING 20s Computer Hardware Department of.
Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
1 Architectural Complexity: Opening the Black Box Methods for Exposing Internal Functionality of Complex Single and Multiple Processor Systems EECC-756.
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
Chess Review May 10, 2004 Berkeley, CA A Comparison of Network Processor Programming Environments Niraj Shah William Plishker Kurt Keutzer.
Configurable System-on-Chip: Xilinx EDK
Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian,
Mapping Task Graphs to Processors in Large Multiprocessor Systems Mapping Task Graphs to Processors in Large Multiprocessor Systems Kurt Keutzer and the.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
ECE 526 – Network Processing Systems Design IXP XScale and Microengines Chapter 18 & 19: D. E. Comer.
ECE 526 – Network Processing Systems Design
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Intel ® Research mote Ralph Kling Intel Corporation Research Santa Clara, CA.
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
Router Architectures An overview of router architectures.
RISC CSS 548 Joshua Lo.
Router Architectures An overview of router architectures.
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
CAD for Physical Design of VLSI Circuits
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Automated Design of Custom Architecture Tulika Mitra
To be smart or not to be? Siva Subramanian Polaris R&D Lab, RTP Tal Lavian OPENET Lab, Santa Clara.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Department of Computer and IT Engineering University of Kurdistan Computer Networks II Router Architecture By: Dr. Alireza Abdollahpouri.
J. Christiansen, CERN - EP/MIC
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
1 Abstract & Main Goal המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory The focus of this project was the creation of an analyzing device.
Hardware-software Interface Xiaofeng Fan
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
CS 4396 Computer Networks Lab Router Architectures.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Performance Analysis of Packet Classification Algorithms on Network Processors Deepa Srinivasan, IBM Corporation Wu-chang Feng, Portland State University.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
1 Packet Network Simulator-on-Chip Henry Wong Danyao Wang University of Toronto Connections 2009 ECE Graduate Symposium.
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group January 9,
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Network Layer4-1 Chapter 4 Network Layer All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
System-on-Chip Design
William Stallings Computer Organization and Architecture 7th Edition
CS775: Computer Architecture
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Research: Past, Present and Future
Presentation transcript:

ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer

Ning WengECE 5262 Goal Understanding the inefficiency of 1 st, 2 nd and 3 rd generation network processing systems ─ Scalability plus flexibility Recognizing the necessity of new solution: 4 th generation (network processor technology) Learning ─ courage to appreciate the challenges ─ skill to characterize the “real” problem ─ art to propose an engineering solution Be aware of current network processor is a conceptual and general term

Ning WengECE 5263 Recall 1 ST 1 st generation network processing system Feasibility study ─ Design a software router data rate 10Gbps Assuming small packets (64B) Assuming each packet need 10,000 instruction to process ─ Can Intel do the job? CPU:24Ghz MIPs:125,000 (Million Instruction Per Second) 1 billion transistors …. ─ Conclusion: not feasible What is the real problem here?

Ning WengECE 5264 Real Problem is Technology push: uneven ─ Link bandwidth scaling much faster than CPU and memory technology ─ Transistor scaling and VLSI technology help but not enough Application pull: harder ─ More complex applications are required ─ Processing complexity is defined as the number of instructions and number of memory access to process one packet

5 What is the ideal platform? Structured ASIC FPGA Network Processor Reconfigura ble Co- processors

Ning WengECE nd and 3 rd Generations 2 nd generation: offloading and decentralized 3 rd generation: further offloading and using specialized devices (ASIC + embedded processors) Problems: losing the flexibility and very cost, why?

Ning WengECE 5267 Why not ASIC? High cost to develop ─ Network processing moderate quantity market Long time to market ─ Network processing quickly changing services Difficult to simulate ─ Complex protocol Expensive and time-consuming to change Little reuse across products Limited reuse across versions No consensus on framework or supporting chips Requires expertise

Ning WengECE 5268 Network Processors Question: where does NP gain higher performance from, compared with conventional processor?

Ning WengECE 5269 Instruction Set: minimality Not general as RISC and CISC processor ─ E.g. no floating point instructions ─ Optimized for packet processing functions only Not specific to a protocol or part a protocol Seek a minimal set of instruction set of instructions sufficient to handle arbitrary protocol, ─ plus specific instructions for protocol processing Example : atomic operation ─ Hard problem and will cover later

Ning WengECE Architecture: multiprocessor Parallelism ─ The nature of workload network processing: high parallel Flow-level Queue-level Packet-level Protocol-level Pipelining ─ Pipeline will help system performance at cost of longer delay ─ Is this acceptable? System-on-chip ─ Processing: RISC core ─ Memory: register, cache, instruction store, scratch pad, SRAM and SDRAM ─ I/O: network /switch fabric interfaces Question: how hard to build and use this NPs?

Ning WengECE Typical Processing

12 Case Study: IPv4 Packet Forwarding aa ba e bbcdd FF E FFF 01F01F 01F Root Memory access 1 Memory access 2 Memory access 5 Memory access 6 a b c d e Prefix (hex : binary) : 0* 002 : * 002F : * FFE : 000* FFF : * From (0) To (0) From (1) To (1) Lookup IPRoute 2-port router (2 Gbps) IP Lookup: longest prefix match (trie lookup algorithm) Xilinx Virtex-II Pro FPGA (2VP30)

13 Multiprocessor for Header Processing Packet Reception Packet Transmission Lookup-1 Transmit Verify Lookup-1 Transmit Verify Lookup-1 Transmit Verify FS L BRAM RS232Timer LEDs Lookup-2 Lookup-1 Transmit Verify Lookup-2 BRAM OP B FIFO queues

Ning WengECE Typical using NPs

Ning WengECE System Implementation Space

Ning WengECE Memory Architecture Memory access bottleneck Memory is area consuming ─ Limited memory-on-chip ─ Limited bandwidth to off-chip memory: pin and package cost ─ Off-chip memory access is slow: 100 cycles Possible solutions ─ Profiling application memory access pattern ─ Propose heterogeneous memory architecture ─ Memory aware mapping ─ Transactional memory (project topic)

Ning WengECE Application Mapping Current approach: fixed topology, assembly coding & hand-tuning Mapping

18 Basic Steps for Mapping Application description High-level optimizations Task graph (platform specific) Architecture configuration HW / SW partitioning Task allocation Data layout Communication assignment Compilation / Synthesis Profile PEFPGA PEFPGA PEFPGA PEFPGA MEM From (0) To (0) From (1) To (1) Lookup IPRoute

Ning WengECE Summary Network Processor ─ Special purpose, programmable hardware device ─ Optimized for network processing ─ Building blocks of network processing systems ─ Fundamental ideas Flexibility through programmability Scalability with parallelism and pipelining Here, NP is a concept ─ We will learn example of network processor soon

Ning WengECE For Next Class & Announcement Read Comer: chapter 13 and 14 Lab 1 total grade reduce to 82 HW 1 due Wed. Project topic will be announced after Wed.