Combinatorial Optimization for Embedded System Design

Slides:



Advertisements
Similar presentations
Embedded System, A Brief Introduction
Advertisements

November 23, 2005 Egor Bondarev, Michel Chaudron, Peter de With Scenario-based PA Method for Dynamic Component-Based Systems Egor Bondarev, Michel Chaudron,
System design-related Optimization problems Michela Milano Joint work DEIS Università di Bologna Dip. Ingegneria Università di Ferrara STI Università di.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and.
CAD and Design Tools for On- Chip Networks Luca Benini, Mark Hummel, Olav Lysne, Li-Shiuan Peh, Li Shang, Mithuna Thottethodi,
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
CHALLENGING SCHEDULING PROBLEM IN THE FIELD OF SYSTEM DESIGN Alessio Guerri Michele Lombardi * Michela Milano DEIS, University of Bologna.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
SoC TAM Design to Minimize Test Application Time Advisor Dr. Vishwani D. Agrawal Committee Members Dr. Victor P. Nelson, Dr. Adit D. Singh Apr 9, 2015.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 2: Embedded Computing High Performance Embedded Computing Wayne Wolf.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
Real-Time Systems Mark Stanovich. Introduction System with timing constraints (e.g., deadlines) What makes a real-time system different? – Meeting timing.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.
Presentation by Tom Hummel OverSoC: A Framework for the Exploration of RTOS for RSoC Platforms.
Energy-Aware Resource Adaptation in Tessellation OS 3. Space-time Partitioning and Two-level Scheduling David Chou, Gage Eads Par Lab, CS Division, UC.
Modeling Virtualized Environments in Simalytic ® Models by Computing Missing Service Demand Parameters CMG2009 Paper 9103, December 11, 2009 Dr. Tim R.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Platform Abstraction Group 3. Question How to deal with different types hardware and software platforms? What detail to expose to the programmer? What.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Martino Ruggiero, Michele Lombardi, Michela Milano and Luca Benini
Best detection scheme achieves 100% hit detection with
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
CS203 – Advanced Computer Architecture
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Lucas De Marchi sponsors: co-authors: Liria Matsumoto Sato
INTRODUCTION TO WIRELESS SENSOR NETWORKS
CPU Central Processing Unit
System-on-Chip Design
CS203 – Advanced Computer Architecture
OPERATING SYSTEMS CS 3502 Fall 2017
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
CS161 – Design and Architecture of Computer Systems
Andrea Acquaviva, Luca Benini, Bruno Riccò
Operating Systems : Overview
ECE354 Embedded Systems Introduction C Andras Moritz.
Morgan Kaufmann Publishers
Architecture & Organization 1
Computing Resource Allocation and Scheduling in A Data Center
Babak Sorkhpour, Prof. Roman Obermaisser, Ayman Murshed
CPU Central Processing Unit
Improved schedulability on the ρVEX polymorphic VLIW processor
Architecture & Organization 1
CPU Central Processing Unit
Computer Architecture
Operating Systems : Overview
Circuit Design Techniques for Low Power DSPs
A High Performance SoC: PkunityTM
Introduction to Embedded Systems
Chapter 1 Introduction.
Operating Systems : Overview
HIGH LEVEL SYNTHESIS.
Operating Systems : Overview
Die Stacking (3D) Microarchitecture -- from Intel Corporation
Planning and Scheduling in Manufacturing and Services
Subject Name: Operating System Concepts Subject Number:
Presented By: Darlene Banta
Presentation transcript:

Combinatorial Optimization for Embedded System Design Michela Milano Michele Lombardi, Alessio Bonfietti, Luca Benini, Andrea Bartolini, Davide Bertozzi, Alessio Guerri, Martino Ruggiero, Giuseppe Tagliavini

Embedded Systems A rough definition “Any computing system which is not a computer” Large variety of devices High performance (as they often real time applications) High energy efficiency (e.g. in case of battery supplied power) Issue: efficiency Classical design approaches → dedicated G.P. systems → dedicated hardware Issue: high design cost + poor flexibility

MultiProcessor Systems on Chip MPSoCs address all such problems: Flexibilty through software (or mixed HW/SW) applications Performance through parallelism Low power consumption by using low(er) frequency cores and power saving techniques HOWEVER: Thermal issues Requires proper use of the exposed resources Squeezing out the full power of a modern MPSoC can be defintiely HARD

This thing has to perform some automatic optimization Ideally Given: Input application (code) Target Platform description Yield: optimized application What’s the black box? A compiler? A CAD tool (less blackish)? A run time support? ? FOR SURE: This thing has to perform some automatic optimization OPTIMIZED APPLICATION

On-line vs off-line approaches How should the black-box look like? → most likely: two distinct components ? ? ? ON-LINE OFF-LINE OS level scheduler On-line application-to-core dispatcher (Current multi-core CPUs...) Out of order execution ... Off-line code optimization (e.g. VLIW compilers) Memory allocation (even hand made) Off-line resource allocation (e.g. mixed HW/SW design)

Requirements for automatic optimization 1. A formal description of the application must be available Formal = can be undertood by a computer can be manipulated by mathematics Usually: task based models Task = atomic computation unit (e.g. an instruction, a process, a code block...) Tasks may have dependencies Task have measurable “features” (e.g. execution time) which must be computed in some way Tasks use hardware resources T0 T1 T2 T3 T4 T5 T6

Requirements for automatic optimization 2. A formal description of the platform must be available Usually: resource based models Resource = an “energy” provider over time Each resoruce has a finite capacity Platform = collection of resources PROC cores Additionally, for off-line approaches: 3. A formal description of the performance metrics must be available completion time (makespan) throughput energy consuption number of bus transactions ... PROC MEM memory devices MEM

Compiled code + directives + run time support = OPTIMIZED APPLICATION An example on MPSoCs Mapping & Scheduling Problem: Application description Platform description Through: Off-line optimization algorithm Provide directives for the run-time support: Resource-to-task allocation Task scheduling decisions ? Compiled code + directives + run time support = OPTIMIZED APPLICATION OPT. APP.

Communication-aware approach Communication aware: the approach minimizes inter-core communication between tasks Decisions: task allocation and scheduling Approach: Logic-based Benders decomposition Validation on a cycle accurate simulator and target platform Martino Ruggiero, Alessio Guerri, Davide Bertozzi, Michela Milano, Luca Benini: A Fast and Accurate Technique for Mapping Parallel Applications on Stream-Oriented MPSoC Platforms with Communication Awareness. International Journal of Parallel Programming 36(1): 3-36 (2008) Luca Benini, Michele Lombardi, Michela Milano, Martino Ruggiero: Optimal resource allocation and scheduling for the CELL BE platform. Annals OR 184(1): 51-77 (2011)

MP-OPT Cell Programming Interface Solver Runtime System A High-Performance Data-Flow Programming Environment for the Cell BE Processor Programming Interface Solver Runtime System http://mpopt.ing.unibo.it

Communication-aware approach Robustness and Variability Performance Cell SuperScalar MPOpt All experiments were executed on a PlayStation 3 (3.2 GHz Cell) running Yellow Dog Linux 6.0

Dynamic voltage and frequency scaling Energy aware: the approach minimizes energy dissipation Decisions: task allocation, frequency and scheduling Approach: Logic-based Benders decomposition Validation a cycle-accurate simulator Martino Ruggiero, Davide Bertozzi, Luca Benini, Michela Milano, A. Andrei: Reducing the Abstraction and Optimality Gaps in the Allocation and Scheduling for Variable Voltage/Frequency MPSoC Platforms. IEEE Trans. on CAD of Integrated Circuits and Systems 28(3): 378-391 (2009)

Energy-aware approach

Robust optimization for conditional task graphs Robust optimization: minimizes expected execution time guaranteeing resource feasibility in all scenarios Conditional task graphs Approach: Constraint Programming solver: transformation of a probabilistic problem in a deterministic counterpart Michele Lombardi, Michela Milano, Martino Ruggiero, Luca Benini: Stochastic allocation and scheduling for conditional task graphs in multi-processor systems-on-chip. J. Scheduling 13(4): 315-345 (2010) Michele Lombardi, Michela Milano: Allocation and scheduling of Conditional Task Graphs. Artif. Intell. 174(7-8): 500-529 (2010) Against scenario-based scheduling Same performance of solvers considering 50% scenarios Much higher solution quality: 49% improvements Against scenario based scheduling: same performance of 50% scenarios – much higher solution quality 49% on average

Robust Optimization under duration uncertainty Robust optimization: minimizes expected execution time guaranteeing deadline feasibility in all scenarios Task graphs with WCET and BCET known Approach: Partial Order scheduler with min-flow algorithm for identifying critical sets Michele Lombardi, Michela Milano, Luca Benini: Robust Scheduling of Task Graphs under Execution Time Uncertainty. IEEE Trans. Computers 62(1): 98-111 (2013) Fixed priority scheduler based on tabu search

Synchronous data flow graphs Syncronous data-flow graphs: Maximizes throughput Approach: Constraint Programming solver: throughput constraint Alessio Bonfietti, Michele Lombardi, Michela Milano, Luca Benini: Maximum-throughput mapping of SDFGs on multi-core SoC platforms. J. Parallel Distrib. Comput. 73(10): 1337-1350 (2013) Against SDF3 and SMS SDF3 fastest SDF3 has 12.1% average optimality gap SMS has 4,8% average optimality gap Against simulation and Swing modulo Scheduling implemented in gcc The ‘‘SDF3’’ tool is the fastest approach, however its solutions present an average gap of 12.1% (with a peak of 47.7% in the MPEG-2 benchmark) w.r.t. the 4.8% of the SMS.

Cyclic scheduling Cyclic applications: possibly more than one with different period. Approach: CP solver with a modular arithmetic based approach Alessio Bonfietti, Michele Lombardi, Luca Benini, Michela Milano: CROSS cyclic resource-constrained scheduling solver. Artif. Intell. 206: 25-52 (2014) Presentation tomorrow by Alessio Bonfietti on a project with ABB

Empirical Model Learning Machine learning and data analytics for characterizing the app and the platform Insert the learned model into the optimization model Michele Lombardi, Michela Milano, Andrea Bartolini: Empirical decision model learning Artif. Intell.: online (2016) http://www.sciencedirect.com/science/article/pii/S0004370216000126

Empirical Model Learning Thermal behaviour is complex Depends on: Room temperature Core workload Neighbor workload Heat and sink position

Empirical Model Learning Building and training a Neural Network to model the thermal behavior of a (simulated) quad core chip temp_(0) pwr_3 pwr_2 pwr_1 pwr_0 sigmoid linear temp_3(Δ) Power input Encoding the network in CP, using a “Neuron Constraint” to encode each neuron and decisions for the input Building a CP model to perform thermal aware workload dispatching with temperature constraints

Empirical Model Learning WITHOUT EML WITH EML

Open issues Accuracy vs. efficiency How to consider accuracy/confidence levels IN the optimization process Definition of the training set More inference methods for ML models