Distributed Processors Allow Revolutionary Hardware & Software Partitioning Version 1.1 –March 2002 – APD / J-L Brelet & P Hardy - All right reserved -

Slides:



Advertisements
Similar presentations
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Advertisements

TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Configurable System-on-Chip: Xilinx EDK
Performance Analysis of Processor Characterization Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor:
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
1 Chapter 14 Embedded Processing Cores. 2 Overview RISC: Reduced Instruction Set Computer RISC-based processor: PowerPC, ARM and MIPS The embedded processor.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Technion Digital Lab Project Performance evaluation of Virtex-II-Pro embedded solution of Xilinx Students: Tsimerman Igor Firdman Leonid Firdman.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.
Programmable Logic- How do they do that? 1/16/2015 Warren Miller Class 5: Software Tools and More 1.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
- 1 - A Powerful Dual-mode IP core for a/b Wireless LANs.
Xilinx at Work in Hot New Technologies ® Spartan-II 64- and 32-bit PCI Solutions Below ASSP Prices January
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
Highest Performance Programmable DSP Solution September 17, 2015.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
SYSTEM-ON-CHIP (SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY.
Paper Review: XiSystem - A Reconfigurable Processor and System
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
© 2007 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Hardware Design INF3430 MicroBlaze 7.1.
ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.
Embedded Runtime Reconfigurable Nodes for wireless sensor networks applications Chris Morales Kaz Onishi 1.
The variety Of Processors And Computational Engines CS – 355 Chapter- 4 `
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
1 Abstract & Main Goal המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory The focus of this project was the creation of an analyzing device.
Part A Presentation Implementation of DSP Algorithm on SoC Student : Einat Tevel Supervisor : Isaschar Walter Accompanying engineer : Emilia Burlak The.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
L/O/G/O Input Output Chapter 4 CS.216 Computer Architecture and Organization.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer.
FPL Sept. 2, 2003 Software Decelerators Eric Keller, Gordon Brebner and Phil James-Roxby Xilinx Research Labs.
Computer Architecture 2 nd year (computer and Information Sc.)
This material exempt per Department of Commerce license exception TSU Xilinx On-Chip Debug.
IT-SOC 2002 © 스마트 모빌 컴퓨 팅 Lab 1 RECONFIGURABLE PLATFORM DESIGN FOR WIRELESS PROTOCOL PROCESSORS.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Survey of Reconfigurable Logic Technologies
FPGA Technology Overview Carl Lebsack * Some slides are from the “Programmable Logic” lecture slides by Dr. Morris Chang.
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
Embedded Systems. What is Embedded Systems?  Embedded reflects the facts that they are an integral.
Programmable Logic Devices
ECE354 Embedded Systems Introduction C Andras Moritz.
FPGAs in AWS and First Use Cases, Kees Vissers
Chapter III Desktop Imaging Systems & Issues
Anne Pratoomtong ECE734, Spring2002
Challenges Implementing Complex Systems with FPGA Components
Characteristics of Reconfigurable Hardware
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Good Morning/Afternoon/Evening
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
Presentation transcript:

Distributed Processors Allow Revolutionary Hardware & Software Partitioning Version 1.1 –March 2002 – APD / J-L Brelet & P Hardy - All right reserved - © XILINX th Workshop on Electronis for LHC Experiments 9 – 13 September 2002, Colmar (France) Authors: Jean-Reynald Mace & Jean-Louis Brelet / Xilinx

Colmar Workshop XILINX, Sept. 02 p 2 Agenda System Partitioning – Traditional techniques – Innovative approaches Example 1: DES Encryption Algorithm – HW solution compared to SW solution Example 2: Wireless LAN – HW / SW trade-off Enabling Technology: Virtex-II Pro

Colmar Workshop XILINX, Sept. 02 p 3 System Partitioning Definition: – “The mapping of a system level architecture into specific HW and SW components based upon application requirements” Today Implementation in: – Fixed HW components: FPGA, ASIC, ASSP,… – SW components: Code running on CPU, DSP processors, microcontrollers,… Hardware Components Embedded Software Application Control Management

Colmar Workshop XILINX, Sept. 02 p 4 Example System Functions Hardware : – Physical Layer – Memory Interfaces – Protocol Bridges – Finite State Machine – Signal Processing – Encryption Software : – Protocol Stack – User Interface – Diagnostics – Control – Signal Processing – Encryption

Colmar Workshop XILINX, Sept. 02 p 5 Optimal Solutions Enabled by On-Demand Architectural Synthesis Hardware: – Physical Layer – Memory Interfaces – Protocol Bridges – FSM – Signal Processing – Encryption Software: – Protocol Stack – User Interface – Diagnostics – Control – Signal Processing – Encryption Flexible Mapping

Colmar Workshop XILINX, Sept. 02 p 6 Traditional System Design Fixed HW / SW partitioning Early and final architecture mapping Critical commitment made at concept level SW mgr SW Dev.SW dev Fixed Interface HW mgr HW eng PCB eng Hardware Components Embedded Software

Colmar Workshop XILINX, Sept. 02 p 7 New System Partitioning Flexible HW / SW partitioning – Enables tradeoffs throughout the process Architecture redefinition possible – Tune for optimal performance and cost HW Team SW Team HW Team Hardware Components Embedded Software Flexible Interface

Colmar Workshop XILINX, Sept. 02 p 8 Innovative Partitioning New System Approach: – Enables non-traditional system architecture SW modules can be implemented in HW HW modules can be moved to SW – Requires a scalable and flexible platform that enables optimal HW / SW integration. Co-Design Methodology – Design attributes optimized during development (Performances, resource usage,…) – SW developers and HW engineers create solutions at module level for optimal systems

Colmar Workshop XILINX, Sept. 02 p 9 Agenda System Partitioning – Traditional techniques – Innovative approaches Example 1: DES Encryption Algorithm – HW solution compared to SW solution Example 2: Wireless LAN – HW / SW trade-off Enabling Technology: Virtex-II Pro

Colmar Workshop XILINX, Sept. 02 p 10 DES Overview DES Algorithm: – Message is split into fixed length blocks – Encode each block with fixed « key » – Block length = 64 bits (advanced 128-b), Key length = 56 bits 3DES Is An Enhanced Version of Encryption / Decryption – If Key 1 = Key 2 = Key 3, than 3DES is fully compatible with DES EncryptDecrypt Encrypt Data Key 1Key 2Key 3

Colmar Workshop XILINX, Sept. 02 p 11 System Integrator’s Dilemma DES Is Simple Algorithm System Engineer Has To Evaluate: – SW coding compare to HW implementation – Need for a specific processor and performances – Need for a dedicated solution – Cost effective solution of ASSP – Level of customization required – Fixed or flexible implementation

Colmar Workshop XILINX, Sept. 02 p 12 Architectural Options Popular DES Algorithm Is Available As SW code: – Public domain C or C++ code – Example of encryption data rate for 128-b DES : TMS320C62xx at 200 MHz delivers ~100 Mbps(*) MIPS 64-b RISC at 250 MHz delivers ~400 Mbps(*) Pentium III at 1 GHz delivers ~ 460 Mbps(*) HW Implementation Available At: – – Over 1.5 Gbps data rate in Virtex-II at 130 MHz (*) 3DES 56-b Algorithm Achieves 10.7 Gbps Throughtput – Xilinx record-breaking announcement in April 2002 * Source: Helion Technology Limited, Xilinx Design Consultant (Xilinx Xcell journal Issue 43 Summer 2002)

Colmar Workshop XILINX, Sept. 02 p 13 Mixed HW / SW Solution Encryption / Decryption Data Path: – DES encryption module is called twice – Decryption requires more compute power Decrypt Encrypt DES Decryption Algorithm Processor DES Encryption Algorithm Processor HW Data Flow

Colmar Workshop XILINX, Sept. 02 p 14 Full HW Implementation Full HW Implementation: – Shared Encryptor Encrypt Other Tasks Processor HW Decrypt Data Flow Full HW Pipelined Solution – Easy to add Parallelism – Easy to couple to distributed processors Encrypt Or No Processor? Processor HW Decrypt Encrypt Data Flow

Colmar Workshop XILINX, Sept. 02 p 15 Choices of HW / SW Partition Various Solutions To Fit Each Performances / Cost Requirement: – SW vs HW vs mixed HW / SW New Approach: – On-Demand Architecture Synthesis to modify HW / SW trade-off dynamically Distributed Processors Offer Another Level Of Flexibility Through Parallel Implementations

Colmar Workshop XILINX, Sept. 02 p 16 Agenda System Partitioning – Traditional techniques – Innovative approaches Example 1: DES Encryption Algorithm – HW solution compared to SW solution Example 2: Wireless LAN – HW / SW trade-off Enabling Technology: Virtex-II Pro

Colmar Workshop XILINX, Sept. 02 p 17 Networking Application: Wireless LAN Intra Forwarding Technique: Video transmission MPEG2 FTP File transfert: FTP QoS

Colmar Workshop XILINX, Sept. 02 p 18 Physical Layer Wireless LAN: Access point Architecture Presentation LayerNetwork LayerApplication LayerTransport LayerSession LayerData Link Layer Bus HOST I/F Medium Access Control Channel Access Control

Colmar Workshop XILINX, Sept. 02 p 19 Wireless LAN: QoS Wireless LAN example: – Intra forwarding technique – Complex algorithms of network access with few levels of prioritization in order to guarantee the QoS Select Most Urgent Frame – Choice is based on few parameters: – priority (Po to Pn) – Lifetime (Normalized Residual Lifetime, … CAPUPRLDISNRLDB PoPn 256 Ptrs 64 Bits Ptr of the Selected Frame Ptr of the Received Frame Pointer :

Colmar Workshop XILINX, Sept. 02 p 20 QoS: Full Hardware Design in FPGA: – FSM like design with adder/subtractor (~1000 LUT / 50MHz) – One table of pointers implemented in FPGA Block Ram 2 BRAM used for 4 priorities – Pipelining used – Easy to manage the Lifetime (update every 10 us) Complex Function in HW: – Electing two frames from one table of pointer by scrolling and comparison techniques Table of ptr of frames to be transmitted Elected ptr of Frame to transmit F11 F1 F3 F0 F10 Permutation

Colmar Workshop XILINX, Sept. 02 p 21 QoS: Full Software Design in Firmware: – Simple ~250 lines of C Code – Microprocessor used: PPC 405 – One table of pointers per priority in external memory (SDRAM) – Sort algorithm very well known and easy to implement Complex Function in SW: – System Real Time Requirement – Frame lifetime controlled by a set of timers In the same time new frame is coming, existing frame should move from upper priority table ….. F41 F52 F7 F22 F11 F31 F10 F21 F1 F3 F0 F11 Highest Priority Table Elected ptr of Frame to transmit

Colmar Workshop XILINX, Sept. 02 p 22 QoS: Mixed HW / SW Hardware Module: – Liftetime and move ptr between tables – Design : FSM like design with adder/subtractor (~200 lut-50MHz) 4 tables of pointers per priority with the FPGA Block Ram Updated Lifetime by scrolling Semaphore Software/Hardware interface: – Semaphore based communication Software Module: – Insertion and sort of the tables – Design : Easy to write (~200 lines of C Code) Sort algorithm Semaphore lib F41 F52 F7 F22 ….. F41 F52 F7 F22

Colmar Workshop XILINX, Sept. 02 p 23 Design Solutions Comparison Full HW Solution – Full control of events timing and easy parallelism design – Complex HDL coding of the FSM State Machines architecture requires advanced expertise Important validation time in design cycle Full SW Solution – Easy coding in C (sort algorithm) and flexibility – Difficult to handle real-time constraints Performances limitation by Von Neumann architecture (Proc.) Mixed HW / SW Solution: The Best Of The both Worlds – Offer advantages of HW and SW solution with the right partitioning

Colmar Workshop XILINX, Sept. 02 p 24 Agenda System Partitioning – Traditional techniques – Innovative approaches Example 1: DES Encryption Algorithm – HW solution compared to SW solution Example 2: Wireless LAN – HW / SW trade-off Enabling Technology: Virtex-II Pro

Colmar Workshop XILINX, Sept. 02 p 25 Platform FPGA Architecture A Solution that provides: – IP Immersion The ability to integrate a wide variety of Hard & Soft IP – A single Platform for multiple applications – Total customization – Full Hardware and Firmware upgradability Hard-IP Soft-IP System Connectivity HW functions

Colmar Workshop XILINX, Sept. 02 p 26 MGT Fabric PowerPC 405 Core 300+ MHz / 450+ DMIPS Performance Up to 4 per device Gbps Multi-Gigabit Transceivers (MGTs) Supports 10 Gbps standards Up to 24 per device IP-Immersion™ Fabric ActiveInterconnect™ 18Kb Dual-Port RAM Xtreme™ Multipliers 16 Global Clock Domains Virtex-II Pro Platform FPGA

Colmar Workshop XILINX, Sept. 02 p 27 High-Bandwidth Communications Code (SW) and data are stored in BRAM, without any external resources On-Chip Memory (OCM) offers an unique data bandwidth between FPGA fabric (HW) and embedded PowerPC core (SW) High-Bandwidth Communications between distributed processors OCM™ Technology BlockRAMs I-Cache 16KB MMU Fetch & Decode Timers and Debug Logic Execution Unit 32x32b GPR ALU, MAC D-Cache 16KB Acceleration Logic 6.4Gb/sec

Colmar Workshop XILINX, Sept. 02 p 28 Flexibility of Programmable Systems Nearly all Systems are composed of: – Logic + Memory + Processor Virtex-II Pro enables optimum “system partitioning” between Hardware and Software Performing SW tasks in HW is Inefficient Performing HW tasks in SW is Slow Provides the best of both worlds

Colmar Workshop XILINX, Sept. 02 p 29 Conclusion Distributed Processors Allow Flexible HW / SW Partitioning: – Optimal mapping at the module level – Offer to design with best solution of both worlds Virtex-II Pro The First Programmable System To Enable True Architectural Synthesis: – Unique bandwidth between embedded processors and HW – Unique on-chip solution provides an application-specific mix of logic, memory, integrated processors, and high bandwidth I/O