Mapping of Applications to Multi-Processor Systems Peter Marwedel TU Dortmund, Informatik 12 Germany 2012 年 07 月 09 日 Graphics: © Alexandra Nolte, Gesine.

Slides:



Advertisements
Similar presentations
Fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.
Advertisements

fakultät für informatik informatik 12 technische universität dortmund Additional compiler optimizations Peter Marwedel TU Dortmund Informatik 12 Germany.
Fakultät für informatik informatik 12 technische universität dortmund Standard Optimization Techniques Peter Marwedel Informatik 12 TU Dortmund Germany.
Technische universität dortmund fakultät für informatik informatik 12 Specifications and Modeling Peter Marwedel TU Dortmund, Informatik
Embedded Systems & Parallel Programming P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2007 Universität Dortmund A view on embedded systems.
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.
Fakultät für informatik informatik 12 technische universität dortmund Classical scheduling algorithms for periodic systems Peter Marwedel TU Dortmund,
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
1 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory Design Space Exploration of Embedded Systems © Lothar Thiele ETH Zurich.
Static Bus Schedule aware Scratchpad Allocation in Multiprocessors Sudipta Chattopadhyay Abhik Roychoudhury National University of Singapore.
Mapping of Applications to Platforms Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These slides.
Mapping of Applications to Platforms Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These slides.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
1 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory Performance Analysis of Embedded Systems Lothar Thiele ETH Zurich.
Platform-based Design 5KK70 TU/e 2009 Henk Corporaal Bart Mesman.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
1 EE249 Discussion A Method for Architecture Exploration for Heterogeneous Signal Processing Systems Sam Williams EE249 Discussion Section October 15,
Embedded Systems in Silicon TD5102 Henk Corporaal Technical University Eindhoven DTI / NUS Singapore.
Fakultät für informatik informatik 12 technische universität dortmund Classical scheduling algorithms for periodic systems Peter Marwedel TU Dortmund,
ARTIST2 Network of Excellence on Embedded Systems Design cluster meeting –Bologna, May 22 nd, 2006 System Modelling Infrastructure Activity leader : Jan.
1 Chapter 14 Embedded Processing Cores. 2 Overview RISC: Reduced Instruction Set Computer RISC-based processor: PowerPC, ARM and MIPS The embedded processor.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
UML for Embedded Systems Development— Extensions; Hardware-Software CoDesign.
Torino (Italy) – June 25th, 2013 Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems Christian Pilato Fabrizio Ferrandi,
1 Embedded Computer System Laboratory RTOS Modeling in Electronic System Level Design.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Actual design flows and tools.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
1  Staunstrup and Wolf Ed. “Hardware Software codesign: principles and practice”, Kluwer Publication, 1997  Gajski, Vahid, Narayan and Gong, “Specification,
Efficient Hardware dependant Software (HdS) Generation using SW Development Platforms Frédéric ROUSSEAU CASTNESS‘07 Computer Architectures and Software.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
1 HW-SW Framework for Multimedia Applications on MPSoC: Practice and Experience Adviser : Chun-Tang Chao Adviser : Chun-Tang Chao Student : Yi-Ming Kuo.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
SBSE Course 4. Overview: Design Translate requirements into a representation of software Focuses on –Data structures –Architecture –Interfaces –Algorithmic.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Computer Science 12 Embedded Systems Group © H. Falk | Dortmund, 08-Jul-08 Overview about Computer Science 12 at Dortmund University of Technology Heiko.
1 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory Internal Design Representations for Embedded System Design Lothar.
ECE-777 System Level Design and Automation Introduction 1 Cristinel Ababei Electrical and Computer Department, North Dakota State University Spring 2012.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
Lecture 13 Introduction to Embedded Systems Graduate Computer Architecture Fall 2005 Shih-Hao Hung Dept. of Computer Science and Information Engineering.
Mapping of Applications to Multi-Processor Systems Peter Marwedel TU Dortmund, Informatik 12 Germany 2014 年 01 月 17 日 These slides use Microsoft clip arts.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany 2013 年 12 月 02 日 These slides use Microsoft clip arts. Microsoft copyright.
Chapter 5B: Hardware/Software Codesign / Partitioning EECE **** Embedded System Design.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
Classical scheduling algorithms for periodic systems Peter Marwedel TU Dortmund, Informatik 12 Germany 2012 年 12 月 19 日 These slides use Microsoft clip.
Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Combinatorial Optimization for Embedded System Design
Evaluating Register File Size
Mapping of Applications to Multi-Processor Systems
Standard Optimization Techniques
Digital Processing Platform
Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs
Presentation transcript:

Mapping of Applications to Multi-Processor Systems Peter Marwedel TU Dortmund, Informatik 12 Germany 2012 年 07 月 09 日 Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These slides use Microsoft clip arts. Microsoft copyright restrictions apply.

- 2 -  p. marwedel, informatik 12, 2012 Structure of this course 2: Specification 3: ES-hardware 4: system software (RTOS, middleware, …) 8: Test 5: Evaluation & validation (energy, cost, performance, …) 7: Optimization 6: Application mapping Application Knowledge Design repository Design Numbers denote sequence of chapters

- 3 -  p. marwedel, informatik 12, 2012 The need to support heterogeneous architectures Energy efficiency a key constraint, e.g. for mobile systems © Hugo De Man/Philips, 2007 “inherent power efficiency (IPE)“ Unconventional architectures close to IPE Efficiency required for smart assistants  How to map to these architectures ? © Renesas, MPSoC‘07

- 4 -  p. marwedel, informatik 12, 2012 Practical problem in automotive design Which processor should run the software?

- 5 -  p. marwedel, informatik 12, 2012 A Simple Classification Architecture fixed/ Auto-parallelizing Fixed ArchitectureArchitecture to be designed Starting from given task graph Map to CELL, Hopes, Qiang XU (HK) Simunic (UCSD) COOL codesign tool; EXPO/SPEA2 SystemCodesigner Auto-parallelizing Mnemee (Dortmund) Franke (Edinburgh) Daedalus MAPS

- 6 -  p. marwedel, informatik 12, 2012 Example: System Synthesis © L. Thiele, ETHZ

- 7 -  p. marwedel, informatik 12, 2012 Basic Model – Problem Graph © L. Thiele, ETHZ

- 8 -  p. marwedel, informatik 12, 2012 Basic Model: Specification Graph © L. Thiele, ETHZ

- 9 -  p. marwedel, informatik 12, 2012 Design Space Scheduling/Arbitration proportional share WFQ staticdynamic fixed priority EDF TDMA FCFS Communication Templates Architecture # 1 Architecture # 2 Computation Templates DSP EE Cipher SDRAM RISC FPGA LookUp DSP TDMA Priority EDF WFQ RISC DSP LookUp Cipher EE EE EE EE EE EE static Which architecture is better suited for our application? © L. Thiele, ETHZ

 p. marwedel, informatik 12, 2012 Evolutionary Algorithms for Design Space Exploration (DSE) © L. Thiele, ETHZ

 p. marwedel, informatik 12, 2012 Challenges © L. Thiele, ETHZ

 p. marwedel, informatik 12, 2012 Exploration Cycle EXPO – Tool architecture (1) MOSES EXPOSPEA 2 selection of “good” architectures system architecture performance values task graph, scenario graph, flows & resources © L. Thiele, ETHZ

 p. marwedel, informatik 12, 2012 EXPO – Tool architecture (2) Tool available online: ee.ethz.ch/ex po/expo.html Tool available online: ee.ethz.ch/ex po/expo.html © L. Thiele, ETHZ

 p. marwedel, informatik 12, 2012 EXPO – Tool (3) © L. Thiele, ETHZ

 p. marwedel, informatik 12, 2012 Application Model Example of a simple stream processing task structure: © L. Thiele, ETHZ

 p. marwedel, informatik 12, 2012 Exploration – Case Study (1) © L. Thiele, ETHZ

 p. marwedel, informatik 12, 2012 Exploration – Case Study (2) © L. Thiele, ETHZ

 p. marwedel, informatik 12, 2012 Exploration – Case Study (3) © L. Thiele, ETHZ

 p. marwedel, informatik 12, 2012 Performance for encryption/decryption Performance for RT voice processing © L. Thiele, ETHZ More Results

 p. marwedel, informatik 12, 2012 Design Space Exploration with SystemCoDesigner (Teich et al., Erlangen)  System Synthesis comprises: Resource allocation Actor binding Channel mapping Transaction modeling  Idea: Formulate synthesis problem as 0-1 ILP Use Pseudo-Boolean (PB) solver to find feasible solution Use multi-objective Evolutionary algorithm (MOEA) to optimize Decision Strategy of the PB solver © J. Teich, U. Erlangen-Nürnberg

 p. marwedel, informatik 12, 2012 A 3rd approach based on evolutionary algorithms: SYMTA/S: [R. Ernst et al.: A framework for modular analysis and exploration of heteterogenous embedded systems, Real-time Systems, 2006, p. 124]

 p. marwedel, informatik 12, 2012 A Simple Classification Architecture fixed/ Auto-parallelizing Fixed ArchitectureArchitecture to be designed Starting from given task graph Map to CELL, Hopes Qiang XU (HK) Simunic (UCSD) COOL codesign tool; EXPO/SPEA2 SystemCodesigner Auto-parallelizing Mnemee (Dortmund) Franke (Edinburgh) Daedalus MAPS

 p. marwedel, informatik 12, 2012 A fixed architecture approach: Map  CELL Martino Ruggiero, Luca Benini: Mapping task graphs to the CELL BE processor, 1st Workshop on Mapping of Applications to MPSoCs, Rheinfels Castle, 2008

 p. marwedel, informatik 12, 2012 Partitioning into Allocation and Scheduling R © Ruggiero, Benini, 2008

 p. marwedel, informatik 12, 2012

 p. marwedel, informatik 12, 2012 A Simple Classification Architecture fixed/ Auto-parallelizing Fixed ArchitectureArchitecture to be designed Starting from given task graph Map to CELL, Hopes, Qiang XU (HK) Simunic (UCSD) COOL codesign tool; EXPO/SPEA2 SystemCodesigner Auto-parallelizing Mnemee (Dortmund) Franke (Edinburgh) Daedalus MAPS

 p. marwedel, informatik 12, 2012 Daedalus Design-flow System-level synthesis Library of IP cores Platform specification Parallel application specification Automatic Parallelization High-level Models Mapping specification System-level design space exploration Explore, modify, select instances Multi-processor System on Chip (Synthesizable VHDL and C/C++ code for processors) RTL-level Models Common XML Interface Library of IP cores KPNgen Sesame ESPAM Xilinx Platform Studio (XPS) RTL-Level Specification System-Level Specification Synthesizable VHDL C/C++ code for processors MP-SoC Kahn Process Network Sequential application © E. Deprettere, U. Leiden Ed Deprettere et al.: Toward Composable Multimedia MP-SoC Design,1st Workshop on Mapping of Applications to MPSoCs, Rheinfels Castle, 2008

 p. marwedel, informatik 12, 2012 Example architecture instances for a single-tile JPEG encoder: JPEG/JPEG2000 case study 2 MicroBlaze processors (50KB)1 MicroBlaze, 1HW DCT (36KB) 6 MicroBlaze processors (120KB) 4 MicroBlaze, 2HW DCT (68KB) Vin 8KB 4x2KB 4x16KB 32KB 16KB32KB 2KB 32KB 2KB 8KB 32KB 4KB VLE, Vout DCT, Q Vin,DCTQ,VLE,VoutVin,Q,VLE,Vout Vin Q Q VLE, Vout DCT © E. Deprettere, U. Leiden

 p. marwedel, informatik 12, 2012 Sesame DSE results: Single JPEG encoder DSE © E. Deprettere, U. Leiden

 p. marwedel, informatik 12, 2012 A Simple Classification Architecture fixed/ Auto-parallelizing Fixed ArchitectureArchitecture to be designed Starting from given task graph Map to CELL, Hopes, Qiang XU (HK) Simunic (UCSD) COOL codesign tool; EXPO/SPEA2 SystemCodesigner Auto-parallelizing Mnemee (Dortmund) Franke (Edinburgh) Daedalus MAPS

 p. marwedel, informatik 12, 2012 Auto-Parallelizing Compilers Discipline “High Performance Computing”:  Research on vectorizing compilers for more than 25 years.  Traditionally: Fortran compilers.  Such vectorizing compilers usually inappropriate for Multi- DSPs, since assumptions on memory model unrealistic: Communication between processors via shared memory Memory has only one single common address space  De Facto no auto-parallelizing compiler for Multi-DSPs!  Work of Franke, O’Boyle (Edinburgh) © Falk

 p. marwedel, informatik 12, 2012 MACC_System Introduction of Memory Architecture-Aware Optimization The MACC PMS (Processor/ Memory/Switch) Model CPU1 SPM MM2 CPU2 SPM L1$ BRI CPU3 SPM L1$ L2$ MM3 BUS1 BUS2 MM1  Explicit memory architecture  API provides access to memory information C code

 p. marwedel, informatik 12, 2012 MaCC Modeling Example via GUI

 p. marwedel, informatik 12, 2012 Page 34 Toolflow Detailed View 1.Optimization of dynamic data structures 2.Extraction of potential parallelism 3.Implementation of parallelism; placement of static data 4.Placement of dynamic data (1) Dynamic Data Type Optimizations (2) Map source code to task graphs (3) Parallelization Implem. Memory Hierarchy (MH) MPSoC Parallelization Assistant (MPA) MACC Eco-System MNEMEE Toolflow (Sequential C Source Code) START (4) Dynamic Memory Management Optimizations

 p. marwedel, informatik 12, 2012 Page 35 Toolflow Detailed View MNEMEE Toolflow (7) Scratchpad Memory Optimizations per PE END (Optimized Source Code) Platform DB (6) RTLIB Mapping (5) Scenario Based Mapping (5) Memory Aware Mapping 5.Perform mapping to processing elements Scenario based Memory aware 6.Transform the code to implement the mapping 7.Perform scratchpad memory optimizations for each processing element

 p. marwedel, informatik 12, 2012 MAPS-TCT Framework Rainer Leupers, Weihua Sheng: MAPS: An Integrated Framework for MPSoC Application Parallelization, 1st Workshop on Mapping of Applications to MPSoCs, Rheinfels Castle, 2008 © Leupers, Sheng, 2008

 p. marwedel, informatik 12, 2012 Summary  Clear trend toward multi-processor systems for embedded systems, there exists a large design space  Using architecture crucially depends on mapping tools  Mapping applications onto heterogeneous MP systems needs allocation (if hardware is not fixed), binding of tasks to resources, scheduling  Two criteria for classification Fixed / flexible architecture Auto parallelizing / non-parallelizing  Introduction to proposed Mnemee tool chain Evolutionary algorithms currently the best choice