Power Aware Real-time Systems Rami Melhem A joint project with Daniel Mosse, Bruce Childers, Mootaz Elnozahy.

Slides:



Advertisements
Similar presentations
EE5900 Advanced Embedded System For Smart Infrastructure
Advertisements

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
ECE 667 Synthesis and Verification of Digital Circuits
1 EE5900 Advanced Embedded System For Smart Infrastructure RMS and EDF Scheduling.
1 “Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation In Multi-processor Real-Time Systems” Dakai Zhu, Rami Melhem, and Bruce Childers.
CPE555A: Real-Time Embedded Systems
Real- time Dynamic Voltage Scaling for Low- Power Embedded Operating Systems Written by P. Pillai and K.G. Shin Presented by Gaurav Saxena CSE 666 – Real.
CSE 522 Real-Time Scheduling (4)
Introduction and Background  Power: A Critical Dimension for Embedded Systems  Dynamic power dominates; static /leakage power increases faster  Common.
Task Allocation and Scheduling n Problem: How to assign tasks to processors and to schedule them in such a way that deadlines are met n Our initial focus:
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Dynamic Planning Based Scheduling.
1 Swiss Federal Institute of Technology Computer Engineering and Networks Laboratory Embedded Systems Exercise 2: Scheduling Real-Time Aperiodic Tasks.
Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.
Aleksandra Tešanović Low Power/Energy Scheduling for Real-Time Systems Aleksandra Tešanović Real-Time Systems Laboratory Department of Computer and Information.
Preemptive Behavior Analysis and Improvement of Priority Scheduling Algorithms Xiaoying Wang Northeastern University China.
Embedded Systems Exercise 3: Scheduling Real-Time Periodic and Mixed Task Sets 18. May 2005 Alexander Maxiaguine.
By Group: Ghassan Abdo Rayyashi Anas to’meh Supervised by Dr. Lo’ai Tawalbeh.
Misconceptions About Real-time Computing : A Serious Problem for Next-generation Systems J. A. Stankovic, Misconceptions about Real-Time Computing: A Serious.
Technische Universität Dortmund Classical scheduling algorithms for periodic systems Peter Marwedel TU Dortmund, Informatik 12 Germany 2007/12/14.
Minimizing Response Time Implication in DVS Scheduling for Low Power Embedded Systems Sharvari Joshi Veronica Eyo.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
10 years of research on Power Management (now called green computing) Rami Melhem Daniel Mosse Bruce Childers.
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
1 EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling.
Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.
Real-Time Scheduling CS4730 Fall 2010 Dr. José M. Garrido Department of Computer Science and Information Systems Kennesaw State University.
Scheduling policies for real- time embedded systems.
Multiprocessor Real-time Scheduling Jing Ma 马靖. Classification Partitioned Scheduling In the partitioned approach, the tasks are statically partitioned.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Real-Time Systems Mark Stanovich. Introduction System with timing constraints (e.g., deadlines) What makes a real-time system different? – Meeting timing.
Real-Time Scheduling CS4730 Fall 2010 Dr. José M. Garrido Department of Computer Science and Information Systems Kennesaw State University.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
6. Application mapping 6.1 Problem definition
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Special Class on Real-Time Systems
CSE 522 Real-Time Scheduling (2)
1 Real-Time Scheduling. 2Today Operating System task scheduling –Traditional (non-real-time) scheduling –Real-time scheduling.
CSCI1600: Embedded and Real Time Software Lecture 24: Real Time Scheduling II Steven Reiss, Fall 2015.
Power Aware Real-time Systems A joint project with profs Daniel Mosse Bruce Childers Mootaz Elnozahy (IBM Austin) And students Nevine Abougazaleh Cosmin.
CSCI1600: Embedded and Real Time Software Lecture 23: Real Time Scheduling I Steven Reiss, Fall 2015.
Classical scheduling algorithms for periodic systems Peter Marwedel TU Dortmund, Informatik 12 Germany 2012 年 12 月 19 日 These slides use Microsoft clip.
CprE 458/558: Real-Time Systems (G. Manimaran)1 Energy Aware Real Time Systems - Scheduling algorithms Acknowledgement: G. Sudha Anil Kumar Real Time Computing.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Energy-aware QoS packet scheduling.
Power Aware Real-time Systems A joint project with profs Rami Melhem Bruce Childers Mootaz Elnozahy (IBM Austin) Trying to rope in Ahmed Amer Jose Brustoloni.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Fault-Tolerant Rate- Monotonic Scheduling Sunondo Ghosh, Rami Melhem, Daniel Mosse and Joydeep Sen Sarma.
Distributed Process Scheduling- Real Time Scheduling Csc8320(Fall 2013)
Real-Time Operating Systems RTOS For Embedded systems.
Embedded System Scheduling
Multiprocessor Real-Time Scheduling
Wayne Wolf Dept. of EE Princeton University
EEE 6494 Embedded Systems Design
Chapter 8 – Processor Scheduling
Process Scheduling B.Ramamurthy 9/16/2018.
Lecture 24: Process Scheduling Examples and for Real-time Systems
Process Scheduling B.Ramamurthy 11/18/2018.
Dynamic Voltage Scaling
CSCI1600: Embedded and Real Time Software
Networked Real-Time Systems: Routing and Scheduling
Process Scheduling B.Ramamurthy 2/23/2019.
Process Scheduling B.Ramamurthy 2/23/2019.
Process Scheduling B.Ramamurthy 2/23/2019.
CSCI1600: Embedded and Real Time Software
Process Scheduling B.Ramamurthy 4/11/2019.
Process Scheduling B.Ramamurthy 4/7/2019.
Process Scheduling B.Ramamurthy 4/19/2019.
Process Scheduling B.Ramamurthy 4/24/2019.
Research Topics Embedded, Real-time, Sensor Systems Frank Mueller moss
Presentation transcript:

Power Aware Real-time Systems Rami Melhem A joint project with Daniel Mosse, Bruce Childers, Mootaz Elnozahy

Outline Introduction to real-time systems Voltage and frequency scaling (no slide) Speed adjustment in frame-based systems Dynamic speed adjustment in multiprocessor environment Speed adjustment in general periodic systems Static speed adjustment for tasks with different power consumptions Maximizing computational reward for given energy and deadline Tradeoff between energy consumption and reliability The Pecan board

Hard RT systems Periodic Aperiodic (frame-based) preemptive non preemptive non preemptive uni-processor parallel processors Soft RT systems Real-time systems

n tasks with maximum computation times C i and periods T i, for i=1,…,n. If C 2 = 3.1, then C 2 will miss its deadline although the utilization is +, which is less than 1. C=2, T=4 C=3, T= Liu and Layland RMS utilization bound is n (2 1/n - 1) Periodic, Rate Monotonic scheduling Static priority scheduling (high priority to task with shorter period).

n tasks with maximum computation times C i and periods T i, for i=1,…,n. Dynamic priority scheduling (high priority to the task with earlier deadline) All tasks will meet their deadlines if utilization is not more than 1. C=2, T=4 C=3, T=7 Periodic, EDF scheduling

time S max S min Select the speed based on worst-case execution time,WCET, and deadline CPU speed time deadline S max S min WCET speed adjustment in frame-based systems Assumption: all tasks have the same deadline. Static speed adjustment

Dynamic Speed adjustment techniques for linear code time WCET ACET Speed adjustment based on remaining WCET Note: a task very rarely consumes its estimated worst case execution time.

Dynamic Speed adjustment techniques for linear code time Remaining WCET Remaining time Speed adjustment based on remaining WCET

Dynamic Speed adjustment techniques for linear code time Remaining WCET Remaining time Speed adjustment based on remaining WCET

Dynamic Speed adjustment techniques for linear code time Speed adjustment based on remaining WCET

Dynamic Speed adjustment techniques for linear code time Speed adjustment based on remaining WCET Greedy speed adjustment: Give reclaimed slack to next task rather than all future tasks time

Dynamic Speed adjustment techniques for linear code time WCE time WCET ACET time AV Remaining av. ex. time Speed adjustment based on remaining average execution time WCE S max Remaining time

An alternate point of view time WCE time WCET ACET time AV WCE Reclaimed slack stolen slack

Dynamic Speed adjustment techniques for non-linear code Remaining WCET is based on the longest path Remaining average case execution time is based on the branching probabilities (from trace information). At a p3p3 p2p2 p1p1 minaveragemax

2. Periodic, non-frame-based systems  Each task has a WCET, C i and a period T i  Earliest Deadline First (EDF) scheduling  Static speed adjustment: If utilization U = < 1, then we can reduce the speed by a factor of U, and still guarantee that deadlines are met. S max Note: Average utilization, U av can be much less than U

Greedy dynamic speed adjustment  Giving reclaimed slack to the next ready task is not always a correct scheme  Theorem: A reclaimed slack has to be associated with a deadline and can be given safely to a task with an earlier or equal deadline. correct incorrect

aggressive dynamic speed adjustment  Theorem: if tasks 1, …, k are ready and will complete before the next task arrival, then we can swap the time allocation of the k tasks. That is we can add stolen slack to the reclaimed slack  Experimental rule: Do not be very aggressive and reduce the speed of a task below a certain speed (the optimal speed determined by U av )

Speed adjustment in Multi-processors the case of independent tasks on two processors Canonical execution ==> all tasks consume WCET P1 P2 time Deadline 12 Global queue P1 P Static speed management No speed management

Dynamic speed adjustment Non canonical execution ==> tasks consume ACET P1 P2 time 12 Greedy slack Reclamation (GSR) 6,56,69,33,3 P1 P2 Slack sharing (SS) If we select the initial speed based on WCET, can we do dynamic speed adjustment and still meet the deadline? deadline miss

Dynamic speed adjustment P1 P2 time 12 Greedy slack Reclamation (GSR) P1 P2 Slack sharing (SS) Non canonical execution ==> tasks consume ACET 6,56,69,33,3 If we select the initial speed based on WCET, can we do dynamic speed adjustment and still meet the deadline? deadline miss

Dynamic speed adjustment P1 P2 time 12 Greedy slack Reclamation (GSR) P1 P2 Slack sharing (SS) Non canonical execution ==> tasks consume ACET 6,56,69,33,3 If we select the initial speed based on WCET, can we do dynamic speed adjustment and still meet the deadline? deadline miss meets deadline

43 2. dependent tasks P1 P2 time Ready_Q Use list scheduling 4 4 Canonical execution Dynamic speed adjustment (2 processors)

2. dependent tasks P1 P2 time Ready_Q Use list scheduling 4 4 Canonical execution 3 3 Dynamic speed adjustment (2 processors)

2. dependent tasks P1 P2 time Ready_Q 43 Use list scheduling Canonical execution Assuming that we adjust the speed statically such that canonical execution meets the deadline. Can we reclaim unused slack dynamically and still meet the deadline? Dynamic speed adjustment (2 processors)

2. dependent tasks P1 P2 time ,1 2 3 Ready_Q P1 P2 2 3 Ready_Q Use list scheduling Canonical execution Non-canonical Execution with Slack sharing time 12 Dynamic speed adjustment (2 processors)

2. dependent tasks P1 P2 time ,1 2 3 Ready_Q P1 P2 2 Ready_Q Use list scheduling Canonical execution Non-canonical Execution with Slack sharing 6 time 12 Dynamic speed adjustment (2 processors)

2. dependent tasks P1 P2 time ,1 2 3 Ready_Q P1 P2 2 Ready_Q Use list scheduling Canonical execution Non-canonical Execution with Slack sharing 6 time Dynamic speed adjustment (2 processors)

2. dependent tasks P1 P2 time ,1 2 3 Ready_Q P1 P2 2 Ready_Q Use list scheduling Canonical execution Non-canonical Execution with Slack sharing time Dynamic speed adjustment (2 processors)

2. dependent tasks P1 P2 time ,1 2 3 Ready_Q P1 P2 2 Ready_Q Use list scheduling Canonical execution Non-canonical Execution with Slack sharing time Dynamic speed adjustment (2 processors)

63 2. dependent tasks P1 P2 time ,1 2 3 Ready_Q P1 P2 2 Ready_Q Use list scheduling Canonical execution Non-canonical Execution with Slack sharing 6 time Dynamic speed adjustment (2 processors) 6 6 deadline miss

Dynamic speed adjustment (2 processors) Solution: Use a wait_Q to enforce canonical order in Ready_Q P1 P2 time ,1 Ready_Q Wait_Q A task is put in Wait_Q when its last predecessor starts execution Tasks are ordered in Wait_Q by their expected start time under WCET Only the head of the Wait_Q can move to the Ready_Q

Dynamic speed adjustment (2 processors) P1 P2 time ,1 Ready_Q Wait_Q A task is put in Wait_Q when its last predecessor starts execution Tasks are ordered in Wait_Q by their expected start time under WCET Only the head of the Wait_Q can move to the Ready_Q Solution: Use a wait_Q to enforce canonical order in Ready_Q

Dynamic speed adjustment (2 processors) P1 P2 time ,1 Ready_Q Wait_Q A task is put in Wait_Q when its last predecessor starts execution Tasks are ordered in Wait_Q by their expected start time under WCET Only the head of the Wait_Q can move to the Ready_Q Solution: Use a wait_Q to enforce canonical order in Ready_Q 6 6 6

64 3 Dynamic speed adjustment (2 processors) P1 P2 time ,1 Ready_Q Wait_Q A task is put in Wait_Q when its last predecessor starts execution Tasks are ordered in Wait_Q by their expected start time under WCET Only the head of the Wait_Q can move to the Ready_Q Solution: Use a wait_Q to enforce canonical order in Ready_Q

Dynamic speed adjustment (2 processors) P1 P2 time ,1 Ready_Q Wait_Q A task is put in Wait_Q when its last predecessor starts execution Tasks are ordered in Wait_Q by their expected start time under WCET Only the head of the Wait_Q can move to the Ready_Q Solution: Use a wait_Q to enforce canonical order in Ready_Q meets deadline

Theoretical results For independent tasks, if canonical execution finishes at time T, then non-canonical execution with slack sharing finishes at or before time T. For dependent tasks, if canonical execution finishes at time T, then, non-canonical execution with slack sharing and a wait queue finishes at or before time T. Can optimize energy based on WCET (static speed adjustment) At run time, can use reclaimed slack to further reduce energy (dynamic speed adjustment), while still guaranteeing deadlines. Implication:

Simulation results We simulated task graphs from real-applications (matrix operations and solution of linear equations) We assumed that Power consumption is proportional to S 3 Typical results are as follows: Static speed/voltage adjustment Dynamic speed/voltage adjustment 100% 20% 40% 60% 80% Average dynamic slack (2 processors) % 40% 60% 80% Normalized energy consumption Number of processors (50% dynamic slack) %

4. Static optimization when different tasks consume different power Start time deadline Assuming that the power consumption functions are identical for all tasks. Task 1 Task 2 Task 3 Then to minimize the total energy, all tasks have to execute at the same speed. If, however, the power functions, P i (S), are different for tasks i = 1, …, n, Then using the same speed for all tasks does not minimize energy consumption. Let C i = number of cycles needed to complete task i

Example: Start time deadline Task 1 Three tasks with C 1 = C 2 = C 3, If P i (S) = a i S 2, for task i, then, energy consumed by task i is E i = a i / t i. D time D/3 E Task 2 Task 3 t1t1 t2t2 If a 1 = a 2 = a 3, then t 1 = t 2 = t 3 minimizes total energy t3t3 Minimizing energy consumption

Example: Start time deadline Three tasks with C 1 = C 2 = C 3, If P i (S) = a i S 2, for task i, then, energy consumed by task i is E i = a i / t i. time E D/3 D If a 1, a 2 and a 3, are different then t 1 = t 2 = t 3 does not minimize total energy t1t1 t2t2 t3t3 Minimizing energy consumption

Start time deadline The problem is to find S i, i=1, …, n, such that to Note that We solved this optimization problem, consequently developing a solution for arbitrary convex power functions. Algorithm complexity: O(n 2 log n) D

Maximizing the system’s reward C1 C2 C3 General problem assumptions: tasks have different power/speed functions tasks have different rewards as functions of number of executed cycles time Speed (S) C1 Reward C2C3 S1 Power S2S3

Theorem: If power functions are all quadratic or cubic in S, Maximizing the system’s reward S1 Power S2S3 Theorem: If power functions are all of the form  i  S, then reward is maximum when power is the same for all tasks Energy Deadline C1 C2 C3 time Speed Given the speeds, we know how to maximize the total reward while meeting the deadline. p

Maximizing the system’s reward If power functions of tasks are not all of the form  i  S time Speed 1) Ignore energy and maximize reward, R, within the deadline 2) If exceed available energy;  t –remove  t from a task such that decrease in R is minimal –use  t to decrease the speed of a task, such as to maximize the decrease in energy consumption p

Maximizing the system’s reward time 1) Ignore energy and maximize reward, R, within the deadline 2) If exceed available energy; - remove  t from a task such that decrease in R is minimal - use  t to decrease the speed of a task, such as to maximize the decrease in energy consumption 3) Repeat step 2 until the energy constraints are satisfied  t If power functions of tasks are not all of the form  i  S p

Basic hypothesis:  Dependable systems must include redundant capacity in either time or space (or both)  Redundancy can also be exploited to reduce power consumption Tradeoff between energy & dependability Time redundancy (checkpointing and rollbacks) Space redundancy

Exploring time redundancy deadline The slack can be used to 1) add checkpoints 2) reserve recovery time 3) reduce processing speed For a given number of checkpoints, we can find the speed that minimizes energy consumption, while guaranteeing recovery and timeliness. S max

More checkpoints = more overhead + less recovery slack D C r Optimal number of checkpoints For a given slack (C/D) and checkpoint overhead (r/C), we can find the number of checkpoints that minimizes energy consumption, and guarantee recovery and timeliness. # of checkpoints Energy

Non-uniform check-pointing Observation: May continue executing at S max after recovery. Advantage: recovery in an early section can use slack created by execution of later sections at S max Disadvantage: increases energy consumption when a fault occurs (a rare event) Requires non-uniform checkpoints.

Non-uniform check-pointing Can find # of checkpoints, their distribution, and the CPU speed such that energy is minimized, recovery is guaranteed and deadlines are met

The Pecan board An experimental platform for power management research Profiling power usage of applications. Development of active software power control. An initial prototype for thin/dense servers based on Embedded PowerPC processors. A software development platform for PPC405 processors. PCI-Based Network Ethernet Network(s) Serial Ethernet PPC405 RAM Flash Serial Ethernet PPC405 RAM Flash Serial Ethernet PPC405 RAM Flash Serial Etherne t PPC405 PCI Bridge RAM Flash Serial Ethernet PCI Bridge PPC405 RAM Flash FPGA PCMCIA RTC

Block Diagram Single-board computer as a PCI adapter; non-transparent bridge. Up to 128MB SDRAM. Serial, 10/100 Ethernet, Boot Flash, RTC, FPGA, PCMCIA. Components grouped on voltage islands for power monitoring & control. Serial Ethernet PCI Bridge PPC405 RAM Flash FPGA PCMCIA RTC Linux for PPC 405GP available in-house & commercially (MontaVista). PCI provides easy expandability. Leverage open-source drivers for networking over PCI -- Fast access to network-based file systems. Software: Linux