Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)

Slides:

Advertisements

Similar presentations

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

Advertisements

ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.

Xtensa C and C++ Compiler Ding-Kai Chen

ACCELERATING MATRIX LANGUAGES WITH THE CELL BROADBAND ENGINE Raymes Khoury The University of Sydney.

Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.

Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.

Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.

Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.

Chess Review May 10, 2004 Berkeley, CA A Comparison of Network Processor Programming Environments Niraj Shah William Plishker Kurt Keutzer.

Mapping Task Graphs to Processors in Large Multiprocessor Systems Mapping Task Graphs to Processors in Large Multiprocessor Systems Kurt Keutzer and the.

Chapter 13 Embedded Systems

Performance Analysis of Processor Characterization Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor:

Center for Embedded Computer Systems University of California, Irvine and San Diego Hardware and Interface Synthesis of.

1 Chapter 14 Embedded Processing Cores. 2 Overview RISC: Reduced Instruction Set Computer RISC-based processor: PowerPC, ARM and MIPS The embedded processor.

November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli

UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.

HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.

Hardware-Software Codesign Elvira Kitsis Hermawan Ho Alex Papadimoulis.

5 th Biennial Ptolemy Miniconference Berkeley, CA, May 9, 2003 MESCAL Application Modeling and Mapping: Warpath Andrew Mihal and the MESCAL team UC Berkeley.

Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.

1 Chapter 7 Design Implementation. 2 Overview 3 Main Steps of an FPGA Design ’ s Implementation Design architecture Defining the structure, interface.

Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.

Embedded Systems Design at Mentor. Platform Express Drag and Drop Design in Minutes IP Described In XML Databook s Simple System Diagrams represent complex.

Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.

1  Staunstrup and Wolf Ed. “Hardware Software codesign: principles and practice”, Kluwer Publication, 1997  Gajski, Vahid, Narayan and Gong, “Specification,

1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.

EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)

- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

CHALLENGING SCHEDULING PROBLEM IN THE FIELD OF SYSTEM DESIGN Alessio Guerri Michele Lombardi * Michela Milano DEIS, University of Bologna.

1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,

CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {

Lecture 13 Introduction to Embedded Systems Graduate Computer Architecture Fall 2005 Shih-Hao Hung Dept. of Computer Science and Information Engineering.

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:

Automated Design of Custom Architecture Tulika Mitra

Configurable, reconfigurable, and run-time reconfigurable computing.

1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.

Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.

ESL and High-level Design: Who Cares? Anmol Mathur CTO and co-founder, Calypto Design Systems.

Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,

High Performance Embedded Computing © 2007 Elsevier Lecture 18: Hardware/Software Codesign Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

Hardware-software Interface Xiaofeng Fan

- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.

MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.

An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.

6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.

ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer.

Overview of Operating Systems Introduction to Operating Systems: Module 0.

Platform Abstraction Group 3. Question How to deal with different types hardware and software platforms? What detail to expose to the programmer? What.

Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.

An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.

Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil.

ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.

4/27/2000 A Framework for Evaluating Programming Models for Embedded CMP Systems Niraj Shah Mel Tsai CS252 Final Project.

1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.

1 of 14 Lab 2: Design-Space Exploration with MPARM.

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

System-on-Chip Design

Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.

Ph.D. in Computer Science

Many-core Software Development Platforms

Introduction to cosynthesis Rabi Mahapatra CSCE617

for Network Processors

Introduction to Embedded Systems

HIGH LEVEL SYNTHESIS.

Presentation transcript:

Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba) –Research (RAW, IWarp, etc.) May 11, 2005 Design Flow from Domain Specific Languages to Embedded Multiprocessors William Plishker Kaushik Ravindran Kurt Keutzer The processor is the basic building block Software flexibility is key For application specific programmable systems to succeed, it is necessary to deliver high- performance implementations quickly Programming Challenges –Multiple processing elements –Heterogeneous memories –Special purpose hardware Domain Specific Languages –DSLs are tailored to an application domain with: –component libraries –communication and computation semantics –visualization tools –test suites Implementation Gap Natural representation of Application FromDevice(0) Discard ToDevice(0) FromDevice(1) FromDevice(2) FromDevice(3) Discard ToDevice(1) ToDevice(2) ToDevice(3) Discard … FromDevice(15) Lookup IPRoute ToDevice(15) …… IPVerify DecIPTTL Discard IPVerify DecIPTTL Discard IPVerify DecIPTTL … Discard DecIPTTL Discard DecIPTTL Low Level Programming Environment Proposed Design Approach –Application specification in domain specific language (DSL) –Abstract model of architecture and transform application to execution model –Automated mapping from execution model to target architecture Application description High-level optimizations Execution Model Architecture configuration HW / SW partitioning Task allocation Communication assignment Compilation / Synthesis Profile PEFPGA PEFPGA PEFPGA PEFPGA MEM From (0)To (0) From (1)To (1) Lookup IPRoute Key Models Computation Model –Abstract model to represent concurrency –Natural to the application domain Architectural Model –Capture those features of the architecture which most impact performance –Define components which must be annotated in the application to facilitate good mappings Execution Model –Description of computation on a target hardware –Task graph with platform specific computation and memory annotations Generating an Application Execution Model –Unravel application tasks to expose concurrency –Partition application components into tasks –Annotate memory and communication requirements Extract parallelism from application without explicit designer intervention Platform Dependent S1S1 S2S2 R1R1 L11L11 L21L21 T1T1 R2R2 L12L12 L22L22 T2T2 Receiv e Lookup Stage 1 Lookup Stage 2 Transmit Branch 1 Branch 2 Packet header Memory read Core Reg File Core Reg File Core ALU Core ALU Extension Reg Files Extension Reg Files Extension ALU Extension ALU Timers, Interrupts Instruction Fetch Data Load/Store $ $ $ $ System Bus Instruction RAM/ROM Instruction RAM/ROM Data RAM/ROM Data RAM/ROM XLMI (peripherals) XLMI (peripherals) Network Processor Application execution model Periodicity Communication requirements, shared resources Execution Model Computation requirements (per implementation option) Mapping Platform Independent FPGA logic Execution Model Computation requirements, Architecture constraints Execution Model Fabric and data requirements Application description in DSL Queue requirements Schedulable element rates Sequential programs Mapping Programs + MHS Mapping RTL Task  PE, Data  Memory, Comm  Interconnect/Memory Arbitration scheme selection Element tuning Configure Architecture PEs, Memory, Interconnect HW/SW Partitioning Element Implementation Selection Floor planning Translation to IXP-CTranslation to CTranslation to RTL Assign Element Implementation Options High-level Optimizations Form Task Graph boundaries MEM MB Soft Multiprocessor Mapping Procedure –Transform application description in DSL to execution model –Explore design space of the assignment of computation and communication to architectural resources –Produce set of sequential code to be handed off to traditional compilation techniques Design Space Exploration Analytical Models for the Architecture –Profile information for task execution times –Assume performance and communication requirements can be evaluated statically Constraint Formulation and Optimization Methods –Partition tasks between processing elements –Assign application state to memory –Assign communication to hardware links –Find optimal configuration to maximize some performance metric Current Work Mapping network applications to multiple platforms –Application in Click DSL –Target multiprocessors: IXP 2xxx network processor, Xilinx Virtex 2VP50 soft multiprocessor –Integer-linear programming approaches for task allocation ME ME Cluster ScratchpadSRAMSDRAM Hash Unit Media Switch Fabric XScale Intel IXP2800 Tensilica MPSoC ME ME Cluster ScratchpadSRAMSDRAM Hash Unit Media Switch Fabric XScale Example Flow