© imec 2001 ARRM’01, Oct.17 Managing dynamic concurrent tasks in real-time multi-media systems Francky Catthoor, IMEC, Belgium.

Slides:



Advertisements
Similar presentations
Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Advertisements

*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
University of Michigan Electrical Engineering and Computer Science 1 Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution.
Embedded Computer Architecture 5KK73 TU/e Henk Corporaal
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
System design-related Optimization problems Michela Milano Joint work DEIS Università di Bologna Dip. Ingegneria Università di Ferrara STI Università di.
Resource Management – a Solution for Providing QoS over IP Tudor Dumitraş, Frances Jen-Fung Ning and Humayun Latif.
High-level System Modeling and Power Management Techniques Jinfeng Liu Dept. of ECE, UC Irvine Sep
1 of 30 June 14, 2000 Scheduling and Communication Synthesis for Distributed Real-Time Systems Paul Pop Department of Computer and Information Science.
1 Oct 2, 2003 Design Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems Traian Pop, Petru Eles, Zebo Peng Embedded Systems Laboratory.
1 of 14 1 Analysis and Synthesis of Communication-Intensive Heterogeneous Real-Time Systems Paul Pop Computer and Information Science Dept. Linköpings.
Embedded Systems in Silicon TD5102 Henk Corporaal Technical University Eindhoven DTI / NUS Singapore.
Courseware Basics of Real-Time Scheduling Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
1 of 14 1 / 18 An Approach to Incremental Design of Distributed Embedded Systems Paul Pop, Petru Eles, Traian Pop, Zebo Peng Department of Computer and.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Actual design flows and tools.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
SBSE Course 4. Overview: Design Translate requirements into a representation of software Focuses on –Data structures –Architecture –Interfaces –Algorithmic.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
EECE **** Embedded System Design
ECE-777 System Level Design and Automation Introduction 1 Cristinel Ababei Electrical and Computer Department, North Dakota State University Spring 2012.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
Integrating Fine-Grained Application Adaptation with Global Adaptation for Saving Energy Vibhore Vardhan, Daniel G. Sachs, Wanghong Yuan, Albert F. Harris,
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
1 EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
System Design with CoWare N2C - Overview. 2 Agenda q Overview –CoWare background and focus –Understanding current design flows –CoWare technology overview.
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
1 of 14 1/15 Synthesis-driven Derivation of Process Graphs from Functional Blocks for Time-Triggered Embedded Systems Master thesis Student: Ghennadii.
Configurable, reconfigurable, and run-time reconfigurable computing.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Veronica Eyo Sharvari Joshi. System on chip Overview Transition from Ad hoc System On Chip design to Platform based design Partitioning the communication.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
1 Lic Presentation Memory Aware Task Assignment and Scheduling for Multiprocessor Embedded Systems Radoslaw Szymanek / Embedded System Design
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Full and Para Virtualization
Jürgen Lambrecht– DESICS © imec 2002 Multi-objective refinement of Mapping Tables in Telecom Network Applications Jürgen Lambrecht, Chantal Ykman- Couvreur,
Static Process Scheduling
Making Good Points : Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank.
Martin Kruliš by Martin Kruliš (v1.1)1.
System-on-Chip Design Hao Zheng Comp Sci & Eng U of South Florida 1.
© imec 2003 Designing an Operating System for a Heterogeneous Reconfigurable SoC Vincent Nollet, P. Coene, D. Verkest, S. Vernalde, R. Lauwereins IMEC,
SR: 599 report Channel Estimation for W-CDMA on DSPs Sridhar Rajagopal ECE Dept., Rice University Elec 599.
Jamie Unger-Fink John David Eriksen.  Allocation and Scheduling Problem  Better MPSoC optimization tool needed  IP and CP alone not good enough  Communication.
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Roman Barták (Charles University in Prague, Czech Republic) ACAT 2010.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
Pradeep Konduri Static Process Scheduling:  Proceedance process model  Communication system model  Application  Dicussion.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Embedded Real-Time Systems
Real-Time Operating Systems RTOS For Embedded systems.
System-on-Chip Design
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Algorithm Design
Parallel Programming in C with MPI and OpenMP
Digital Processing Platform
Digital Processing Platform
Virtualization Techniques
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

© imec 2001 ARRM’01, Oct.17 Managing dynamic concurrent tasks in real-time multi-media systems Francky Catthoor, IMEC, Belgium

© imec 2001 ARRM’01, Oct.17 Goal of our research A methodology to map dynamic and concurrent real- time applications on an embedded multi-processor platform

© imec 2001 ARRM’01, Oct.17 MPEG4 JPEG Why are Applications becoming more dynamic and concurrent? The workload decreases but the tasks are dynamically created and their size is data dependent T1 T1’ T1 T2 T3 T4

© imec 2001 ARRM’01, Oct.17 MPEG-4 : multimedia spec = too huge to handle as 1 task => break up in many interacting tasks With this code my boss has to give me a raise!!! TASK1 h Cost(P) Exec.Time Divide-and-conquer approach of today leads to local solutions

© imec 2001 ARRM’01, Oct.17 This didn’t look so good after all??? TASK1 TASK2 TASK3 Processor 1 Global picture: ad hoc h h h t1 t2 t3 t1+t2+t3<T

© imec 2001 ARRM’01, Oct.17 TASK1 TASK2 TASK3 Processor 1 Global trade-offs with cost-performance curves h h h Luckily we have the Pareto approach! x x x t1n t2n t3n t1n+t2n+t3n<T

© imec 2001 ARRM’01, Oct.17 Outline Motivation: new challenges in system-level design Overview of TCM methodology Cost-efficient design-time scheduling

© imec 2001 ARRM’01, Oct.17 Maximize Quality Minimize Power Limited Resources Terminal QoS requires new approach CPU Bus access Mem. size Terminal ? Solution: Pareto curves 3D object Video object Tasks See session Multi-media 1

© imec 2001 ARRM’01, Oct.17 Very dynamic behaviour so worst-case realisation too costly

© imec 2001 ARRM’01, Oct.17 System design issues in IT-Application domain IN 622 Mb/s OUT 622 Mb/s 53 cycles Stringent real-time constraints Embedded system => cost Complex data sets Large and irregular dynamically allocated data Huge memory accesses Routing Record Packet Record FIFO 200 accesses data in ISR time data out routing reply out Processes Dynamic and concurrent processes Global/local control Non-deterministic events Network layer protocols (ATM, IP) Dynamic multi-media algorithms (MPEG4/7/21) Wireless/wired terminals (Internet, WLAN)

© imec 2001 ARRM’01, Oct.17 Executive Decoders Presenter Application Data Channel ALManager Flex Demux Service Data Channel Data Channel Data Channel Decoding Buffer Composition Memory BIFS Critical path OD msec Rendering Pre/post Decoding Pre/post writing Real-time constraints, tasks and data in the IM1-MPEG4 protocol

© imec 2001 ARRM’01, Oct.17 Why aren’t multi-processor platforms used now in embedded domains? Live from FZ-TV efficient mapping requires a very high design effort when done manually => need for a cost-sensitive real-time system compiler

© imec 2001 ARRM’01, Oct.17 Outline Motivation: new challenges in system-level design Overview of TCM methodology Cost-efficient design-time scheduling

© imec 2001 ARRM’01, Oct.17 Requirements of system level design approach AlgorithmsData Structures+ ARM IP 1 IP 2 RAMROM Architecture Processor architecture RAM ROM MMU custom logic DSP ROM micro processor TCM approach: ICPP’00, Kluwer book’99

© imec 2001 ARRM’01, Oct.17 TCM steps aim at removing the bottlenecks for better performance Optimized system specification Task-level system architecture Inter-task DTSE Task concurrency mngnt Task1 Task2 Task3 Inter-task interface refinement Task to processor assignment 1 Task/thread scheduling Task conc. Extraction/trafo Array-processor allocation Virtual Proc1 Virtual Proc2 2 Proc ICPP’00, Kluwer book’99

© imec 2001 ARRM’01, Oct.17 Static task scheduling cost time task1 cost time task2 cost time task3 platform Real-time constraints Run-time scheduler processor1 processor2 processor3 time platform Real-time constraints C/C++ specification of dynamic concurrent system Extraction of the gray-box model Task-level DTSE Concurrency improving transformations Memory architecture Real-time constraints

© imec 2001 ARRM’01, Oct.17 Outline Motivation: new challenges in system-level design Overview of TCM methodology Cost-efficient design-time scheduling

© imec 2001 ARRM’01, Oct.17 Reduce global system energy by task scheduling + assignment (e.g. 2-processor approach ) ARM Processor 1 V dd =1V V dd =3.3V ARM Processor 2 Task n Task 2 Task 1 Codes’01, J.of Sys.Arch, summer 2001

© imec 2001 ARRM’01, Oct.17 Trade-off between time budget (period/latency) and cost (e.g.energy) leads to Pareto curves Time Cost CB1CB2CB3CB4CB5CB6 Processor alloc/assign and scheduling alternatives for code version 1 x x x x Non-optimal points

© imec 2001 ARRM’01, Oct.17 TCM transformations on application model shift Pareto curve to origin Bottleneck of case 1 Scheduling alternatives for case 1 Transformations shift the Pareto Curve –Decrease cost for same Time-Budget –Increase performance for same Cost-Budget New curve for case 2 TCM trafo Energy Time Budget

© imec 2001 ARRM’01, Oct.17 Comparison for original and transformed IM1 graphs on 2 processors with different Vdd original Transformed

© imec 2001 ARRM’01, Oct.17 Comparison between different processor platforms Combination 2 Combination 3 (Global Pareto curve) Global Pareto curve

© imec 2001 ARRM’01, Oct.17 Comparison of genetic algorithm (GA) and heuristic task scheduling results

© imec 2001 ARRM’01, Oct.17 Why run-time scheduler?(I) Design-time Scheduler + Predictability- Schedulability for - Flexibility dynamic events + Run time complexity+ Optimization Run-time Scheduler - Predictability+ Schedulability for + Flexibility dynamic events - Run time complexity- Optimization

© imec 2001 ARRM’01, Oct.17 Overall solution: combination of design- and run-time schedulers Cases’00, Design&Test- Sep.’ Design-time Scheduling AB Design-time scheduling: at compile time, exploring all the optimization possibility Run-time Scheduling 1AB32 Run-time scheduling: at run time, providing flexibility and dynamic control at low cost as part of synthesized RTOS

© imec 2001 ARRM’01, Oct.17 Main messages Embedded multi-media applications are becoming very dynamic and concurrent in nature => RTOS essential Task Concurrency Management approach provides the flexibility and optimization possibility while limiting the run time computation complexity A multiprocessor platform with different working voltages potentially provides an energy saving solution Application-specific run-time scheduling technique combined with design-time scheduling to provide cost- performance Pareto-curve essential for effective solution