Energy-Optimal Software Partitioning in Heterogeneous

Slides:



Advertisements
Similar presentations
Fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.
Advertisements

Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Base Station Deployment and Resource Allocation in Sustainable Wireless Networks 1 Zhongming Zheng, 1 Shibo He, 2 Lin X. Cai, and 1 Xuemin (Sherman) Shen.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.
Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel.
Marcus T. Schmitz and Bashir M. Al-Hashimi
Fakultät für informatik informatik 12 technische universität dortmund Classical scheduling algorithms for periodic systems Peter Marwedel TU Dortmund,
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
Acceleration of Cooley-Tukey algorithm using Maxeler machine
Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia Systems Authors: Wanghong Yuan, Klara Narhstedt Appears in SOSP 2003 Presented by:
§1 Greedy Algorithms ALGORITHM DESIGN TECHNIQUES
Pinwheel Scheduling for Power-Aware Real-Time Systems Gaurav Chitroda Komal Kasat Nalini Kumar.
THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
Time Slicing in Mobile TV Broadcast Networks with Arbitrary Channel Bit Rates Cheng-Hsin Hsu Joint work with Mohamed Hefeeda April 23, 2009 Simon Fraser.
-Based Workload Estimation for Mobile 3D Graphics
Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Zhou Peng, Zuo Decheng, Zhou Haiying Harbin Institute of Technology 1.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Hardware/Software Codesign.
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
25 seconds left…...
University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
From Model-based to Model-driven Design of User Interfaces.
Diversity in Smartphone Usage MobiSys ‘10 June 17, 2010 UCLA, Microsoft, USC Hossein Falaki, Ratul Mahajan, Srikanth Kandula Dimitrios Lymberopoulos, Ramesh.
S YSTEM -W IDE E NERGY M ANAGEMENT FOR R EAL -T IME T ASKS : L OWER B OUND AND A PPROXIMATION Xiliang Zhong and Cheng-Zhong Xu ICCAD 2006, ACM Trans. on.
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Scheduling for Energy Performance and Reliability Yavuz Yetim Princeton University.
Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.
University of Massachusetts, Amherst Triage: Balancing Energy and Quality of Service in a Microserver Nilanjan Banerjee, Jacob Sorber, Mark Corner, Sami.
Aleksandra Tešanović Low Power/Energy Scheduling for Real-Time Systems Aleksandra Tešanović Real-Time Systems Laboratory Department of Computer and Information.
1 Prediction-based Strategies for Energy Saving in Object Tracking Sensor Networks Tzu-Hsuan Shan 2006/11/06 J. Winter, Y. Xu, and W.-C. Lee, “Prediction.
TRADING OFF PREDICTION ACCURACY AND POWER CONSUMPTION FOR CONTEXT- AWARE WEARABLE COMPUTING Presented By: Jeff Khoshgozaran.
Investigating the Effect of Voltage- Switching on Low-Energy Task Scheduling in Hard Real-Time Systems Paper review Presented by Chung-Fu Kao.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
Chess Review May 11, 2005 Berkeley, CA Composable Code Generation for Distributed Giotto Tom Henzinger Christoph Kirsch Slobodan Matic.
System-Wide Energy Minimization for Real-Time Tasks: Lower Bound and Approximation Xiliang Zhong and Cheng-Zhong Xu Dept. of Electrical & Computer Engg.
Memory Access Scheduling and Binding Considering Energy Minimization in Multi- Bank Memory Systems Chun-Gi Lyuh, Taewhan Kim DAC 2004, June 7-11, 2004.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Task Alloc. In Dist. Embed. Systems Murat Semerci A.Yasin Çitkaya CMPE 511 COMPUTER ARCHITECTURE.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Energy Aware Task Mapping Algorithm For Heterogeneous MPSoC Based Architectures Amr M. A. Hussien¹, Ahmed M. Eltawil¹, Rahul Amin 2 and Jim Martin 2 ¹Wireless.
SOFTWARE / HARDWARE PARTITIONING TECHNIQUES SHaPES: A New Approach.
1 Customer-Aware Task Allocation and Scheduling for Multi-Mode MPSoCs Lin Huang, Rong Ye and Qiang Xu CHhk REliable computing laboratory (CURE) The Chinese.
Company name KUAS HPDS A Realistic Variable Voltage Scheduling Model for Real-Time Applications ICCAD Proceedings of the 2002 IEEE/ACM international conference.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.
6. Application mapping 6.1 Problem definition
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Power and Control in Networked Sensors E. Jason Riedy and Robert Szewczyk Presenter: Fayun Luo.
Energy-Aware Scheduling for Aperiodic Tasks on Multi-core Processors Dawei Li and Jie Wu Department of Computer and Information Sciences Temple University,
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
Workload Clustering for Increasing Energy Savings on Embedded MPSoCs S. H. K. Narayanan, O. Ozturk, M. Kandemir, M. Karakoy.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Input and Output Optimization in Linux for Appropriate Resource Allocation and Management James Avery King.
Overview Motivation (Kevin) Thermal issues (Kevin)
Jacob R. Lorch Microsoft Research
SECTIONS 1-7 By Astha Chawla
Flavius Gruian < >
Hui Chen, Shinan Wang and Weisong Shi Wayne State University
Department of Electrical Engineering Joint work with Jiong Luo
Anand Bhat*, Soheil Samii†, Raj Rajkumar* *Carnegie Mellon University
Presentation transcript:

Energy-Optimal Software Partitioning in Heterogeneous Multiprocessor Embedded Systems Michel Goraczko, Jie Liu (Microsoft Research, Redmond) Dimitrios Lymberopoulos (Yale University) Slobodan Matic (UC Berkeley) Bodhi Priyantha Feng Zhao (Microsoft Research, Redmond) Presentation at DAC 2008, Anaheim, CA June 10th, 2008

Energy Usage in Embedded Applications Patient monitoring Smart environments Mobile devices Low duty cycle monitoring for long battery life High throughput for realtime critical events processing.

Energy Performance Diversity A single processor with DVFS may not be flexible enough. Energy efficiency in embedded processors Non-trivial wake-up latency and energy costs

Heterogeneous Multi-Processor Platforms UCLA LEAP Platform MSR mPlatform

Outline Introduction Design Flow Power State Machine ILP Formulation and Optimization A Sound Source Localization Case Study

Software Partitioning Problem Given a time sensitive application, allocate software components to different processors to minimize energy consumption without violating timing constraints. Tasks Processor modes Timing Analysis Task timing Partitioning Application structure/ requirements Power model Task-Processor-Mode assignments

Power State Machines STBY Power: ~0 mW IDLE Power: 0.25mW 60MHz Power: 141 mW 30MHz Power: 72 mW 7.5MHz Power: 20 mW negligible 1.53 mJ 24.5 ms 0.1 mJ 1.4 ms 1.47 mJ 23.8 ms

Software Model Directed acyclic graph of tasks Single-rate periodic execution Known release time Known end-to-end deadline Worst case execution time: Pre-assignments

ILP: Variables and Objective Core binary variables task-to-processor assignment; task-to-mode assignment; task transition assignment; Core integer variables task start time instances; Derived variables: In order to convert the problem into ILP formulations, need to further introduce auxiliary variables. Objective: minimize total energy per iteration

ILP: Constraints A task can only be allocated to one processor and one mode; A processor can only execute one task at any time; Waking up from sleep modes takes time; Processor total utilization should be less than 1; Tasks have dependencies with in an iteration; Tasks have dependencies across iteration boundaries; No task can start before its release time; All tasks should finish by the deadline;

Case Study Sound Source Localization FFT SC FFT SC HT FFT SC FFT SC VOTE HT FFT SC FFT SC S – Audio Sampling FFT – Fast Fourier Transform SC – Noise Estimation & Signal classification HT – Hypothesis Testing VOTE – Sound detection voting

Hardware Model Power Mode ARM7 @ 2.5V 60MHz full speed MSP430 @ 3V 141 10.8 1/2 speed 72 2.7 1/8 speed 20 1.4 Idle 0.25 ~0 Standby ARM7 @2.5V MSP430 @3V Wake up Energy (mJ) Time (ms) To full speed 1.5 24.5 ~0 0.006 To 1/8 speed 0.1 1.4

Task Profiling Proc Mode FFT (ms) SC(ms) HT (ms) ARM7 60MHz 7.8 4.4 111 30MHz 15.6 9.0 222 7.5MHz 39.6 23.3 567 MSP430 6MHz 99.2 37.2 3MHz 196 76 1.5MHz 394 152 0.75MHz 792 300

Partitioning Results (1) ARM7 60MHz HT 50 100 150 Deadline: 128ms Need 4 MSP430 ARM7 @ 60MHz Total energy/iteration: 21.7mJ Average power: 166.7mW MSP-4 6MHz FFT SC 50 100 150 MSP-3 6MHz FFT SC 50 100 150 MSP-2 6MHz FFT SC 50 100 150 MSP-1 6MHz FFT SC 50 100 150

Scheduling Results (2) Deadline: 256ms Need 2 MSP430 ARM7 @ 30MHz Total energy/iteration: 22.1mJ Average power: 86.4mW HT 50 100 150 200 256 MSP4 6MHz FFT SC FFT SC 50 100 150 200 256 MSP3 6MHz FFT SC FFT SC 50 100 150 200 256 MSP2 6MHz 50 100 150 200 256 MSP1 6MHz 50 100 150 200 256

Scheduling Results (3) Deadline: 1000ms Need 2 MSP430 ARM7 @ 7.5MHz 4xFFT HT Deadline: 1000ms Need 2 MSP430 ARM7 @ 7.5MHz Total energy/iteration: 16.2mJ Average power: 16.2mW 200 400 600 800 1000 MSP4 6MHz SC SC 200 400 600 800 1000 MSP3 6MHz SC SC 200 400 600 800 1000 MSP2 6MHz 200 400 600 800 1000 MSP1 6MHz 200 400 600 800 1000

Conclusion Processor diversities can help energy saving. Wakeup time and energy must be considered in software partitioning. Optimal software partitioning is NP–hard, but can be formulated as an ILP problem.

Limitations & Future Work Execution time variations Aperiodic tasks Lightweight heuristics for online scheduling