Hardware/Software Codesign of Embedded Systems Power/Voltage Management Voicu Groza School of Information Technology and Engineering

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

System Integration and Performance
DSPs Vs General Purpose Microprocessors
Instruction Set Design
Computer Abstractions and Technology
Power Reduction Techniques For Microprocessor Systems
Tuan Tran. What is CISC? CISC stands for Complex Instruction Set Computer. CISC are chips that are easy to program and which make efficient use of memory.
Mehdi Amirijoo1 Dynamic power management n Introduction n Implementation, levels of operation n Modeling n Power and performance issues regarding.
L27:Lower Power Algorithm for Multimedia Systems 성균관대학교 조 준 동
Aleksandra Tešanović Low Power/Energy Scheduling for Real-Time Systems Aleksandra Tešanović Real-Time Systems Laboratory Department of Computer and Information.
Low Power Design for Wireless Sensor Networks Aki Happonen.
Chapter 1 and 2 Computer System and Operating System Overview
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
System-Wide Energy Minimization for Real-Time Tasks: Lower Bound and Approximation Xiliang Zhong and Cheng-Zhong Xu Dept. of Electrical & Computer Engg.
11/11/05ELEC CISC (Complex Instruction Set Computer) Veeraraghavan Ramamurthy ELEC 6200 Computer Architecture and Design Fall 2005.
Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, ISCAS.
Computer Organization and Assembly language
Processor Frequency Setting for Energy Minimization of Streaming Multimedia Application by A. Acquaviva, L. Benini, and B. Riccò, in Proc. 9th Internation.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Processor Organization and Architecture
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Embedded Systems Power/Energy Aware Embedded Systems Dynamic Voltage Scheduling.
Micro-operations Are the functional, or atomic, operations of a processor. A single micro-operation generally involves a transfer between registers, transfer.
Minimizing Response Time Implication in DVS Scheduling for Low Power Embedded Systems Sharvari Joshi Veronica Eyo.
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Low-Power Wireless Sensor Networks
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
1 Overview 1.Motivation (Kevin) 1.5 hrs 2.Thermal issues (Kevin) 3.Power modeling (David) Thermal management (David) hrs 5.Optimal DTM (Lev).5 hrs.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
1 EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
Computer Organization & Assembly Language © by DR. M. Amer.
Operating System Requirements for Embedded Systems Rabi Mahapatra.
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.
Power and Control in Networked Sensors E. Jason Riedy and Robert Szewczyk Presenter: Fayun Luo.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)
DSP Architectures Additional Slides Professor S. Srinivasan Electrical Engineering Department I.I.T.-Madras, Chennai –
Full and Para Virtualization
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.
Workload Clustering for Increasing Energy Savings on Embedded MPSoCs S. H. K. Narayanan, O. Ozturk, M. Kandemir, M. Karakoy.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Energy-aware QoS packet scheduling.
بسم الله الرحمن الرحيم MEMORY AND I/O.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
CS203 – Advanced Computer Architecture
1 Aphirak Jansang Thiranun Dumrongson
Overview Motivation (Kevin) Thermal issues (Kevin)
Real-time Software Design
Basic Computer Organization and Design
Evaluating Register File Size
SECTIONS 1-7 By Astha Chawla
Central Processing Unit
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
Dynamic Voltage Scaling
Embedded System Hardware - Processing -
Embedded System Hardware
COMP755 Advanced Operating Systems
Presentation transcript:

Hardware/Software Codesign of Embedded Systems Power/Voltage Management Voicu Groza School of Information Technology and Engineering

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Embedded Systems Power/Energy Aware Embedded Systems Dynamic Voltage Scheduling Dynamic Power Management Power/Energy Aware Embedded Systems Dynamic Voltage Scheduling Dynamic Power Management Surpassed hot (kitchen) plate …? Why not use it?

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Processing units Need for efficiency (power + energy): „Power is considered as the most important constraint in embedded systems“ [in: L. Eggermont (ed): Embedded Systems Roadmap 2002, STW] Current smart phones can hardly be operated for more than an hour, if data is being transmitted. [from a report of the Financial Times, Germany, on an analysis by Credit Suisse First Boston; ] Why worry about energy and power?

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS The energy/flexibility conflict - Intrinsic Power Efficiency - Technology [H. de Man, Keynote, DATE‘02; T. Claasen, ISSCC99] Operations/Watt [MOPS/mW] Processors Reconfigurable Computing hardwired muxed ASIC µ Necessary to optimize HW/SW; otherwise the prize for software flexibility cannot be paid! Ambient Intelligence 0.07µ DSP-ASIPs µPs µ0.5µ1.0µ poor design techniques

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Power and energy are related to each other t P E In many cases, faster execution also means less energy, but the opposite may be true if power has to be increased to allow faster execution. E'

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Low Power vs. Low Energy Consumption Minimizing the power consumption is important for the design of the power supply the design of voltage regulators the dimensioning of interconnect short term cooling Minimizing the energy consumption is important due to restricted availability of energy (mobile systems) limited battery capacities (only slowly improving) very high costs of energy (solar panels, in space) cooling high costs limited space dependability long lifetimes, low temperatures Minimizing the power consumption is important for the design of the power supply the design of voltage regulators the dimensioning of interconnect short term cooling Minimizing the energy consumption is important due to restricted availability of energy (mobile systems) limited battery capacities (only slowly improving) very high costs of energy (solar panels, in space) cooling high costs limited space dependability long lifetimes, low temperatures

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Application Specific Circuits (ASICS) or Full Custom Circuits Custom-designed circuits necessary if ultimate speed or energy efficiency is the goal and large numbers can be sold. Approach suffers from long design times, lack of flexibility (changing standards) and high costs (e.g. Mill. $ mask costs).

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Mask cost for specialized HW becomes very expensive [ tech_articles/MII_COO_NIST_2001.PDF9]  Trend towards implementation in Software

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Power Consumption of a Gate

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Fundamentals of dynamic voltage scaling (DVS) Power consumption of CMOS circuits (ignoring leakage): Delay for CMOS circuits:  Decreasing V dd reduces P quadratically, while the run-time of algorithms is only linearly increased (ignoring the effects of the memory system).

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Potential for Energy Optimization Saving Energy under given Time Constraints: – Reduce the supply voltage V dd – Reduce switching activity α – Reduce the load capacitance C L – Reduce the number of cycles #Cycles Saving Energy under given Time Constraints: – Reduce the supply voltage V dd – Reduce switching activity α – Reduce the load capacitance C L – Reduce the number of cycles #Cycles

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Processors At the chip level, embedded chips include micro-controllers and microprocessors. Micro-controllers are the true workhorses of the embedded family. They are the original ’embedded chips’ and include those first employed as controllers in elevators and thermostats [Ryan, 1995].

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Voltage Scaling and Power Management Dynamic Voltage Scaling V dd Energy / Cycle [nJ]

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Prescott: 90 W/cm², 90 nm [c‘t 4/2004] Nuclear reactor Power density continues to get worse

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Need to consider CPU & System Power Mobile PC Thermal Design (TDP) System Power Note: Based on Actual Measurements 600/500 MHz uP 37% LCD 10" 19% HDD 9% Memory+Graphics 12% Power Supply 10% Other 13% Mobile PC Average System Power 600/500 MHz uP 13% LCD 10" 30% HDD 19% Memory+Graphics 15% Power Supply 10% Other 13% CPU Dominates Thermal Design Power Multiple Platform Components Comprise Average Power [Courtesy: N. Dutt; Source: V. Tiwari]

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS New ideas can actually reduce energy consumption As published by Transmeta [ Pentium Crusoe Running the same multimedia application.

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Dynamic power management (DPM) RUN: operational IDLE: a sw routine may stop the CPU when not in use, while monitoring interrupts SLEEP: Shutdown of on- chip activity RUN SLEEPIDLE 400mW 160µW 50mW 90µs 10µs 160ms Example: STRONGARM SA1100 Power fault signal

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Variable-voltage/frequency example: INTEL Xscale From Intel’s Web Site OS should schedule distribution of the energy budget.

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Key requirement #2: Code-size efficiency CISC machines: RISC machines designed for run-time-, not for code-size-efficiency Compression techniques: key idea

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Code-size efficiency Compression techniques (continued): 2nd instruction set, e.g. ARM Thumb instruction set: Reduction to % of original code size 130% of ARM performance with 8/16 bit memory 85% of ARM performance with 32-bit memory Rd 0 Rd 0000 Constant 16-bit Thumb instr. ADD Rd #constant Rd Constant zero extended major opcode minor opcode source= destination [ARM, R. Gupta] Same approach for LSI TinyRisc, … Requires support by compiler, assembler etc. Dynamically decoded at run-time

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Dictionary approach, two level control store (indirect addressing of instructions) “Dictionary-based coding schemes cover a wide range of various coders and compressors. Their common feature is that the methods use some kind of a dictionary that contains parts of the input sequence which frequently appear. The encoded sequence in turn contains references to the dictionary elements rather than containing these over and over.” [Á. Beszédes et al.: Survey of Code size Reduction Methods, Survey of Code-Size Reduction Methods, ACM Computing Surveys, Vol. 35, Sept. 2003, pp ] “Dictionary-based coding schemes cover a wide range of various coders and compressors. Their common feature is that the methods use some kind of a dictionary that contains parts of the input sequence which frequently appear. The encoded sequence in turn contains references to the dictionary elements rather than containing these over and over.” [Á. Beszédes et al.: Survey of Code size Reduction Methods, Survey of Code-Size Reduction Methods, ACM Computing Surveys, Vol. 35, Sept. 2003, pp ]

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Key idea (for d bit instructions) Uncompressed storage of a d-bit-wide instructions requires axd bits. In compressed code, each instruction pattern is stored only once. Hopefully, axb+cxd < axd. Called nanoprogramming in the Motorola instruction address CPU d bit b « d bit table of used instructions (“dictionary”) For each instruction address, S contains table address of instruction. S a b c ≦ 2 b small

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Application: y[j] =  i=0 x[j-i]*a[i]  i: 0  i  n-1: y i [j] = y i-1 [j] + x[j-i]*a[i] - Key requirement #3: Run-time efficiency - Domain-oriented architectures - Architecture: Example: Data path ADSP210x n-1 Application maps nicely onto architecture MR MF MX MY * +,- AR AF AXAY +,-,.. D P y i-1 [j] x[j-i] x[j-i]*a[i] a[i] Address generation unit (AGU) Address- registers A0, A1, A2.. i+1, j-i+1 a x MR:=0; A1:=1; A2:=n-2; MX:=x[n-1]; MY:=a[0]; for ( j:=1 to n) {MR:=MR+MX*MY; MY:=a[A1]; MX:=x[A2]; A1++; A2--}

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Modulo addressing Modulo addressing: Am++  Am:=(Am+1) mod n (implements ring or circular buffer in memory).. x[t1-1] x[t1] x[t1-n+1] x[t1-n+2].. Memory, t=t1 Memory, t2=t1+1 sliding window x t1 t n most recent values.. x[t1-1] x[t1] x[t1+1] x[t1-n+2]..

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS  Returns largest/smallest number in case of over/underflows  Example: a0111 b standard wrap around arithmetic (1)0000 saturating arithmetic1111 (a+b)/2: correct1000 wrap around arithmetic0000 saturating arithmetic + shifted0111  Appropriate for DSP/multimedia applications: No timeliness of results if interrupts are generated for overflows Precise values less important Wrap around arithmetic would be worse.  Returns largest/smallest number in case of over/underflows  Example: a0111 b standard wrap around arithmetic (1)0000 saturating arithmetic1111 (a+b)/2: correct1000 wrap around arithmetic0000 saturating arithmetic + shifted0111  Appropriate for DSP/multimedia applications: No timeliness of results if interrupts are generated for overflows Precise values less important Wrap around arithmetic would be worse. Saturating arithmetic „almost correct“

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Fixed-point arithmetic Shifting required after multiplications and divisions in order to maintain binary point.

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Properties of fixed-point arithmetic Automatic scaling a key advantage for multiplications. Example: x= 0.5 x x = = For iwl=1 and fwl=3 decimal digits, the less significant digits are automatically chopped off: x = Like a floating point system with numbers  [0..1), with no stored exponent (bits used to increase precision). Appropriate for DSP/multimedia applications (well-known value ranges).

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Slow Module 1.3V 50MHz Standard Modules 1.8V 100MHz Busy Module 3.3V 200MHz Spatial vs. Dynamic Supply Voltage Management Normal Mode 1.3 V 50MHz Busy Mode 3.3 V 200MHz Not all components require same performance. Not all components require same performance. Required performance may change over time Analogy of biological blood systems: Different supply to different regions High pressure: High pulse count and High activity Low pressure: Low pulse count and Low activity Analogy of biological blood systems: Different supply to different regions High pressure: High pulse count and High activity Low pressure: Low pulse count and Low activity

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Example: Processor with 3 voltages Case a): Complete task ASAP Task that needs to execute 10 9 cycles within 25 seconds. E a = 10 9 x 40 x = 40 [J]

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Case b): Two voltages E b = x 40 x x 10 x = 32.5 [J] E b = x 40 x x 10 x = 32.5 [J]

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Case c): Optimal voltage E c = 10 9 x 25 x = 25 [J]

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Observations  A minimum energy consumption is achieved for the ideal supply voltage of 4 Volts. In the following: variable voltage processor = processor that allows any supply voltage up to a certain maximum. It is expensive to support truly variable voltages, and therefore, actual processors support only a few fixed voltages.  A minimum energy consumption is achieved for the ideal supply voltage of 4 Volts. In the following: variable voltage processor = processor that allows any supply voltage up to a certain maximum. It is expensive to support truly variable voltages, and therefore, actual processors support only a few fixed voltages. Ishihara, Yasuura: “Voltage scheduling problem for dynamically variable voltage processors”, Proc. of the 1998 International Symposium on Low Power Electronics and Design (ISLPED’98)

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Generalization Lemma [Ishihara, Yasuura]: If a variable voltage processor completes a task before the deadline, then the energy consumption can be reduced. If a processor uses a single supply voltage V and completes a task T just at its deadline, then V is the unique supply voltage which minimizes the energy consumption of T. If a processor can only use a number of discrete voltage levels, then a voltage schedule with at most two voltages minimizes the energy consumption under any time constraint. If a processor can only use a number of discrete voltage levels, then the two voltages which minimize the energy consumption are the two immediate neighbors of the ideal voltage V ideal possible for a variable voltage processor. Lemma [Ishihara, Yasuura]: If a variable voltage processor completes a task before the deadline, then the energy consumption can be reduced. If a processor uses a single supply voltage V and completes a task T just at its deadline, then V is the unique supply voltage which minimizes the energy consumption of T. If a processor can only use a number of discrete voltage levels, then a voltage schedule with at most two voltages minimizes the energy consumption under any time constraint. If a processor can only use a number of discrete voltage levels, then the two voltages which minimize the energy consumption are the two immediate neighbors of the ideal voltage V ideal possible for a variable voltage processor.

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS The case of multiple tasks: Assigning optimum voltages to a set of tasks N : the number of tasks EC j : the number of execution cycles of task j L : the number of voltages of the target processor V i : the i th voltage, with 1  i  L F i : the clock frequency for supply voltage V i T : the global deadline at which all tasks must have been completed SC j : the average switching capacitance during the execution of task j (SC i comprises the actual capacitance CL and the switching activity  ) X i, j : the number of clock cycles task j is executed at voltage V i N : the number of tasks EC j : the number of execution cycles of task j L : the number of voltages of the target processor V i : the i th voltage, with 1  i  L F i : the clock frequency for supply voltage V i T : the global deadline at which all tasks must have been completed SC j : the average switching capacitance during the execution of task j (SC i comprises the actual capacitance CL and the switching activity  ) X i, j : the number of clock cycles task j is executed at voltage V i

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Designing an IP model Simplifying assumptions of the IP-model include the following: There is one target processor that can be operated at a limited number of discrete voltages. The time for voltage and frequency switches is negligible. The worst case number of cycles for each task are known. Simplifying assumptions of the IP-model include the following: There is one target processor that can be operated at a limited number of discrete voltages. The time for voltage and frequency switches is negligible. The worst case number of cycles for each task are known. Minimize Subject to and

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Experimental Results

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Voltage Scheduling Techniques Static Voltage Scheduling Extension: Deadline for each task Formulation as IP problem (SS) Decisions taken at compile time Dynamic Voltage Scheduling Decisions taken at run time 2 Variants: arrival times of tasks is known (SD) arrival times of tasks is unknown (DD) Static Voltage Scheduling Extension: Deadline for each task Formulation as IP problem (SS) Decisions taken at compile time Dynamic Voltage Scheduling Decisions taken at run time 2 Variants: arrival times of tasks is known (SD) arrival times of tasks is unknown (DD)

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Dynamic Voltage Control by Operating Systems Voltage Control and Task Scheduling by Operating System to minimize energy consumption Okuma, Ishihara, and Yasuura: “Real-Time Task Scheduling for a Variable Voltage Processor”, Proc. of the 1999 International Symposium on System Synthesis (ISSS'99) Voltage Control and Task Scheduling by Operating System to minimize energy consumption Okuma, Ishihara, and Yasuura: “Real-Time Task Scheduling for a Variable Voltage Processor”, Proc. of the 1999 International Symposium on System Synthesis (ISSS'99) Target: single processor system Only OS can issue voltage control instructions Voltage can be changed anytime only one supply voltage is used at any time overhead for switching is negligible static determination of worst case execution cycles Target: single processor system Only OS can issue voltage control instructions Voltage can be changed anytime only one supply voltage is used at any time overhead for switching is negligible static determination of worst case execution cycles

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Task2 Task3 Task1 2.5V 5.0V 4.0V deadline arrival time What is the optimum supply voltage assignment for each task in order to obtain minimum energy consumption? Problem for Operating Systems

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS The proposed Policy task Time slot: T task Consider a time slot the task can use without violating real-time constraints of other tasks executed in the future Once time slot is determined: The task is executed at a frequency of WCEC / T Hz The scheduler assigns start and end times of time slot Once time slot is determined: The task is executed at a frequency of WCEC / T Hz The scheduler assigns start and end times of time slot

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Two Algorithms Two possible situations: The arrival time of tasks is known: SD Algorithm Static ordering and Dynamic voltage assignment The arrival time of tasks is unknown DD Algorithm Dynamic ordering and Dynamic voltage assignment Two possible situations: The arrival time of tasks is known: SD Algorithm Static ordering and Dynamic voltage assignment The arrival time of tasks is unknown DD Algorithm Dynamic ordering and Dynamic voltage assignment SD DD CPU Time Allocation Start Time Assignment End Time Prediction SD DD CPU Time Allocation Start Time Assignment End Time Prediction off-line on-line off-line on-line

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Task1 Task2 Arrival time of all tasks is known Deadline of all tasks is known WCEC of all tasks is known  CPU time can be allocated statically CPU time is assigned to each task: assuming maximum supply voltage assuming WCEC Arrival time of all tasks is known Deadline of all tasks is known WCEC of all tasks is known  CPU time can be allocated statically CPU time is assigned to each task: assuming maximum supply voltage assuming WCEC SD Algorithm (CPU Time Allocation)

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Task1 Task2 Free time Current time SD Algorithm (Start Time Assignment) Task2 Task1 Task2 Vmax Current time In SD, it is possible to assign lower supply voltage to Task2 using the free time In SS, the scheduler can’t use the free time because it has statically assigned voltage In SD, it is possible to assign lower supply voltage to Task2 using the free time In SS, the scheduler can’t use the free time because it has statically assigned voltage

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS When the task’s arrival time is unknown, its end time can’t be predicted statically using the SD algorithm  No predetermined CPU time, start or end times When the task’s arrival time is unknown, its end time can’t be predicted statically using the SD algorithm  No predetermined CPU time, start or end times DD Algorithm Task1 Current time Task2 Start Time Assignment: New task arrives – it either: a)Preempts currently executing task b)Starts right after currently executing task  Starting time is determined Start Time Assignment: New task arrives – it either: a)Preempts currently executing task b)Starts right after currently executing task  Starting time is determined

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS DD Algorithm (cont.) End Time Prediction: Based on the currently executing task’s end time prediction, add the new task’s WCEC time at maximum voltage End Time Prediction: Based on the currently executing task’s end time prediction, add the new task’s WCEC time at maximum voltage Task1 Current time Task2 Completion time assigned at CPU time allocation

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS DD Algorithm (cont.)  If the currently executing task finishes earlier, then new task can start sooner and run slower at lower voltage Task1 Current time Task2 Task1 Task2

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Task End Time Start Time Task End Time Start Time Comparison: SD vs. DD  SD Algorithm:  DD Algorithm:

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Normal: Processor runs at maximum supply voltage SS: Static Scheduling SD: Scheduling done by SD Algorithm DD: Scheduling done by DD Algorithm Normal: Processor runs at maximum supply voltage SS: Static Scheduling SD: Scheduling done by SD Algorithm DD: Scheduling done by DD Algorithm Experimental Results: Energy

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Dynamic power management (DPM) Dynamic Power management tries to assign optimal power saving states Requires Hardware Support Example: StrongARM SA1100 Dynamic Power management tries to assign optimal power saving states Requires Hardware Support Example: StrongARM SA mW 160uW50mW 90us 10us 160ms RUN: operational IDLE: a sw routine may stop the CPU when not in use, while monitoring interrupts SLEEP: Shutdown of on-chip activity RUN: operational IDLE: a sw routine may stop the CPU when not in use, while monitoring interrupts SLEEP: Shutdown of on-chip activity SLEEPIDLE RUN

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS The opportunity: Reduce power according to workload busy idle busy shut downwake up device states Desired: Shutdown only during long idle times  Tradeoff between savings and overhead Desired: Shutdown only during long idle times  Tradeoff between savings and overhead T sd T wu working sleeping T bs T sd : shutdown delayT wu : wakeup delay T bs : time before shutdown T bw : time before wakeup power states

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS The challenge Questions: When to go to a power-saving state? Is an idle period long enough for shutdown?  Predicting the future Questions: When to go to a power-saving state? Is an idle period long enough for shutdown?  Predicting the future

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Adaptive Stochastic Models IBBBI………...IBBIIBB time Sliding Window (SW): [Chung DATE 99] Interpolating pre-computed optimization tables to determine power states Using sliding windows to adapt to non-stationarity Interpolating pre-computed optimization tables to determine power states Using sliding windows to adapt to non-stationarity

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Comparison of different approaches P : average power N sd : number of shutdowns N wd : wrong shutdowns (actually waste energy)

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS What about multitasking? user device program operating system power manager requesters  Coordinate multiple workload sources

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Requesters We use processes to represent requesters requester = process Concurrent processes –Created, executed, and terminated –Have different device utilization –Generate requests only when running (occupy CPU) Power manager is notified when processes change state Concurrent processes –Created, executed, and terminated –Have different device utilization –Generate requests only when running (occupy CPU) Power manager is notified when processes change state

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Task Scheduling Rearrange task execution to cluster similar utilization and idle periods idle T T: time quantum time t1t2t3t1t2t t1t2t3t1t2t idle 1

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Power-aware OS implementations Windows APM and ACPI Device-centric, shutdown based Power-aware Linux Good research platform (several partial implementations, es. U. Delft, Compaq, etc.) Quite high-overhead for low-end embedded systems Power-aware ECOS Good research platform (HP-Unibo implementation) Lower overhead than Linux, modular Micro OSes Windows APM and ACPI Device-centric, shutdown based Power-aware Linux Good research platform (several partial implementations, es. U. Delft, Compaq, etc.) Quite high-overhead for low-end embedded systems Power-aware ECOS Good research platform (HP-Unibo implementation) Lower overhead than Linux, modular Micro OSes

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Application Aware DPM – Example: Communication Power NICs powered by portables reduce battery life 2.5 hours 8 hours In general: Higher bit rates lead to higher power consumption 90% of power for listening to a radio channel  Proper use of PHY layer services by MAC is critical! In general: Higher bit rates lead to higher power consumption 90% of power for listening to a radio channel  Proper use of PHY layer services by MAC is critical!

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Doze mode Off mode Energy saving Server Buffering Playback Buffer fullPlayingLWM reached time Power Client time Refill Request Beacons Access Point Low water mark Off mode power savings

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Higher error probability Exploits NIC off-state Min. value to allow data acquisition Higher error probability Exploits NIC off-state Min. value to allow data acquisition lower error probability Incurs NIC off-state overhead Max. value: Buffer_length–1 block lower error probability Incurs NIC off-state overhead Max. value: Buffer_length–1 block LWM / Buffer characteristics Where to put the LWM? How long should the buffer be? Depends on memory availability The longer the buffer, the higher the NIC off-state benefits Depends on memory availability The longer the buffer, the higher the NIC off-state benefits Buffering Strategies should be Power Aware!

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Comparison Low length buffers incur off mode power overhead Good power saving for high length buffers Low length buffers incur off mode power overhead Good power saving for high length buffers

University of Ottawa, SITE, 2008 VOICU GROZA - HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS Exploiting application knowledge Approximate processing [Chandrakasan98-01] Tradeoff quality for energy (es. lossy compression) Design algorithms for graceful degradation Enforce power-efficiency in programming Avoid repetitive polling [Intel98] Use event-based activation (interrupts) Localize computation whenever possible Helps shutdown of peripherals Helps shutdown of memories Approximate processing [Chandrakasan98-01] Tradeoff quality for energy (es. lossy compression) Design algorithms for graceful degradation Enforce power-efficiency in programming Avoid repetitive polling [Intel98] Use event-based activation (interrupts) Localize computation whenever possible Helps shutdown of peripherals Helps shutdown of memories