Optimality and Improvement of Dynamic Voltage Scaling Algorithms for Multimedia Applications Authors: Zhen Cao, Brian Foo, Lei He and Mihaela van der Schaar.

Optimality and Improvement of Dynamic Voltage Scaling Algorithms for Multimedia Applications Authors: Zhen Cao, Brian Foo, Lei He and Mihaela van der Schaar Presented by : Amarnath Kasibhatla & ViswaKiran Popuri Electrical Engineering Dept., UCLA

Outline Background Problem formulation Optimal Offline Solution Effective Online Algorithm Simulations and Results Conclusions

Overview of the Background Review Why DVS is required and how it helps? Brief Review of DVS algorithms

Changes in Communication & Computation From Wall-plugged Towards Portable systems  Battery did not scale as much! Has LIMITED ENERGY  And that limited energy is expected to last long. From General-purpose Towards Intelligent systems  Computationally more expensive (freq. of operation is ever increasing)! Requires MORE ENERGY From Continuous-Mode Towards Burst-Mode  Multimedia (PEAK requirement >> AVERAGE throughput)  For e.g. Video transmission over Mobile phones Need an ENERGY-efficient processor to support high peak operating freq while conserving energy in other modes.

CMOS & DV(F)S Energy = C sw *V dd 2 *f clk *T exec (Quadratic wrt V dd )‏ Delay α V dd /(V dd –V th ) α (Freq scales Linearly wrt V dd )‏ Dynamic Voltage Scaling: Reduce V dd to an extent that the Delay requirements are just met. Example: f clk = (n/ ΔT)‏ Case1: T exec = ΔT/2  E 1 = (1/2)*CV dd 2 n Case2: T exec = ΔT, (V dd /2), f clk /2  E 2 = (1/8)*CV dd 2 n = (1/4)*E 1 75 % Reduction in Energy! ΔTΔT Taken from *[1]

Need and Challenge of Power Management Multimedia applications are energy sensitive  Computationally demanding  Stringent deadlines for multiple tasks  Processed on energy-limited devices DVS (dynamic voltage scaling) algorithms to reduce energy while meeting deadlines DVS algorithms need to deal with uncertainty in  Job complexity (time-varying workloads)‏  Communication delay (e.g. wireless channel)‏

DVS Hardware/Algorithm requirements Hardware: Programmable DC-DC switching voltage regulator, Programmable clock gen & Processor with multiple Operating Points. Algorithm: Accurate Prediction of workload requirement for a give computation task.  Error in Prediction causes Deadline miss or Low energy- efficiency Taken from *[1]

Overview of the background review Why DVS is required and how it helps? Types of DVS algorithms

DVS algorithms Single-task deadline based Multiple-task deadline based Feedback control based Stochastic Model based

Single-task deadline based Uses Worst-Case Exec Time (WCET) or Average Case Exec Time (ACET)‏ Frame-based DVS [1]  Each frame is handled individually for accurate prediction of decoding time. Cross-layer adaptation [2]  Adapts Hardware layer (Vdd Scaling) for small changes in processing overload (fine granularity). (+) Low computational Complexity (-) Tasks with imminent deadlines take huge energy to finish task in time. Taken from *[2]

Multiple-task deadline based Real Time-DVS (RT-DVS) [3]  Normal-DVS Based on average throughput (For e.g ACET)‏ Simple feedback mechanism: Detect the idle time and adjust the freq. This might cause deadline miss  DVS must be tightly coupled with the real-time scheduler of the OS. Rate-Monotonic Scheduler (RM): Static, Quickest Task first Earliest-Deadline-First Scheduler (EDF): Dynamic, Task with earliest deadline first

Multiple-task deadline based Look-Ahead RT-DVS: Defer as much work as possible. Set oper freq to meet the minimum work that must be done to meet the deadlines. Taken from *[3]

Feedback Control Based Earlier approaches of hard real-time scheduling rely on a priori knowledge of WCET. The actual execution time varies a lot from the estimated WCET. If actual exec time is lesser, proc consumes more Energy ( & hence computed earlier) than required.  For e.g. 60% degradation in RT- DVS for fluctuating workload. Use feedback control techniques in real-time scheduling for hard real- time systems such that the DVS scheme should adjust to the ever- changing workload as fast as possible. ~60% degradation Taken from *[4]

Feedback Control Based CA : Execution time of the first portion of tasks. Maximal Schedule Profile: Has the offline generated exec times. Taken from *[4] Feedback-DVS [4] Based on the difference between CA and the actual exec time (error) & the WCETs (Maximal Schedule Profile) the Vol/Freq selector chooses a V,F. Using the V,F input, the scheduler schedules the next ready task (from the Task Queue) using EDF policy. The estimated exec time for the next job is fed-back for later decision making.

Complexity of Multimedia Applications Huge work-load variations between different classes of jobs Work load distribution within each class of decoding jobs is observed to estimate mean and variance through off-line training.

Model for Stochastic Complexity of Jobs Class-based stochastic model [5]  A job class is a particular GOP frame type  Near Gaussian distributed complexity  Parameters derived offline and transmitted online with low cost B. Foo and M. van der Schaar, "A Queuing Theoretic Approach to Processor Power Adaptation for Video Decoding Systems," IEEE Trans. Signal Process., vol. 56, no. 1, pp. 378-392, Jan. 2008

References [1] “Frame-Based Dynamic Voltage and Frequency Scaling for a MPEG Decoder” Kihwan Choi et al, ICCAD 2002 [2] “GRACE: Cross-layer adaptation for multimedia quality and battery energy” W. Yuan et al, IEEE trans. On Mobile Computing, 2006 [3] “Real time dynamic voltage scaling for low power embedded operating systems” P.Pillai et al, Proc of ACM symposium on Operating Systems, 2001 [4] “Feedback EDF scheduling exploiting dynamic voltage scaling” Y. Zhu et al, Proc. Of International Conf on Comp. Arch, 2004 [5] "A Queuing Theoretic Approach to Processor Power Adaptation for Video Decoding Systems," B. Foo and M. van der Schaar, IEEE Trans. Signal Process., vol. 56, no. 1, pp. 378-392, Jan. 2008

Contributions Efficient LP-based (instead of ILP-based) optimal offline DVS algorithm.  It is not clear how far are the existing online algorithms are from the optimal solution.  Existing ILP-based offline solutions are not scalable.  Current offline algorithm enables optimality study of online DVS algorithms. Effective online approach by sequential robust linear programming, namely SLP/r.  Consumes 0.3% more energy versus 4% more energy for the best existing work, when both compared with optimal solution. Applicable to other delay-sensitive applications with time- varying workloads.  e.g. real-time stream queries for financial or medical data and manufacturing process control.

Power Model The power model used for Dynamic power: Sub-threshold leakage power Lg is the number of devices in the circuit, Ij is the reverse-bias junction current, Vbs is the body bias voltage. K3, K4 and K5 are constant fitting parameters. Sleep mode of operation is also considered (which makes the power model non-convex): Power = 0 ; Frequency = 0

Formulation of DVS Problem Given  A sequence of decoding jobs (stochastic complexity, stochastic arrival time, deterministic deadline).  A set of voltages including power gating, each with associated clock frequency and power. Find  The time and voltage level for each voltage switch. Minimize  Energy. Subject to  Start a job after it arrives.  Finish a job before its deadline.

Formulation of DVS Problem - Contd. Let C = {C1, C2... CM}, T = {T1,T2... TM} and D = {D1, D2... DM} be the complexity, arrival time and display deadlines of M incoming jobs. Let F = {F0, F1,... FK} and P = {P0, P1... PK} be the available frequency and power switch levels. The scheduling solution S = {Ts, Vs, N}, where N is the number of voltage switches; Ts = {t0, t1,... tN, tN+1} and Vs = {v0, v1.. vN} are the time and voltage levels for each switch, then the DVS problem is:

DVS Problem in Time – Complexity Space

Property of DVS Solution Compared to multimedia jobs (around 10 9 clock cycles), voltage switching overhead (around 10 clock cycles) is negligible. We call time interval with constant U(t) and L(t) as the adaptation interval. Primary Theorem: for an adaptation interval, an arbitrary ordering of any accumulative computation curve consumes the same energy.  What really matters is the percentage of time for each voltage level. U(t)‏ L(t Seq:2,0,1,3,4 U(t)‏ L(t Seq:0,1,2,3,4

LP Formulation for DVS Length of adaptation interval i Power j allocation of voltage level j in adaptation interval i Frequency s. t.

rLP Formulation L(t) and U(t) depend on stochastic complexity of jobs A sequence of rLP for a sequence of time windows, each window is bigger than an adaptation interval Difference from offline formulation L(t) and U(t) become stochastic s. t.

…… L(t) for robust linear programming Illustration of SLP/r Media Time D 1 D 2 D 3 T 1 'T 2 'T 3 ' U(t) for robust linear programming mean S D Prediction

Illustration of SLP/r …… Media Time D 1 D 2 D 3 T 1 'T 2 'T 3 ' L(t) for robust linear programming U(t) for robust linear programming Prediction rLP

L(t) for robust linear programming Illustration of SLP/r Prediction rLP Commitment …… U(t)‏ real complexity and arrive time of jobs Media Time D 1 D 2 D 3 T 1 'T 2 'T 3 '

L(t) for robust linear programming Illustration of SLP/r …… U(t)‏ real complexity and arrive time of jobs Media Time D 1 D 2 D 3 T 1 'T 2 'T 3 ' Prediction rLP Commitment

Illustration of SLP/r Process a new time window Media Time L(t) for robust linear programming D 1 D 2 D 3 Prediction rLP Commitment U(t)‏ real complexity and arrive time of jobs

Experimental Setup V dd between 0.6V and 1.0V with step sizes of 0.1V, plus power gating  Our algorithms are applicable to any power model. Video sequence consisting of 10 different scenes. Compare SLP/r with:  Queuing-Based Stochastic Algorithm [Foo, 2008]  Deterministic laEDF [Pillai, 2001] Monte Carlo simulation of stochastic complexity and arrival time to verify all results

Recap of SLP/r Confidence level  Linear online prediction function for workload of each job class: mean + k * standard deviation  Deciding trade-off between miss rate and energy Granularity of SLP/r  Number of jobs to commit before shifting the window  Deciding tradeoff between runtime and quality of solution

Comparison of Energy/Miss Rate Granularity = 1 job Optimality study: laEDF: 15% more energy. Queuing-based: 4% more. Online algorithm SLP/r 1% more energy

Granularity VS Quality Granularity = 4 jobs: 0.03% miss rate with 0.3% more energy than optimal Changing granularity from 1 to 4jobs We reduce runtime, energy and miss rate simultaneously

Energy VS Granularity/Confidence Level Minimum energy setting: Granularity = 4 jobs Confidence level =1.5

Miss Rate VS Granularity/Confidence Level Granularity = 4 jobs Confidence level =1.5, Miss rate close to zero

Outline Background Optimal Offline Solution Effective Online Algorithm Simulations and Results Conclusions

An efficient optimal offline DVS algorithm based a tractable LP formulation.  Enables optimality study for DVS algorithms. An effective online approach SLP/r by sequential robust linear programming.  Consumes 0.3% more energy versus 4% for the best existing work, when both compared with optimal solution.

Thanks! Q & A

Back-up slides

Existing Online Algorithms Uses worst or average case execution time [Pillai ACM Symposium on OS’01] [Choi, ICCAD’02] [Zhu LCTES’07] [Nahrstedt et al., ITMC’06]  Ensures hard deadlines not missed  Soft deadlines (and slack reclamation) to reduce energy consumption Online workload prediction  Feedback-based [Zhu LCTES’07]  Adaptive linear prediction [Akyol 2007]  Buffer-constrained DVS [Maxiaguine et al, ICHSC’05]  Stochastic Queuing-based DVS [Foo 2008] It’s not clear how far online algorithms are away from the optimal solution

Existing Offline Algorithms Offline algorithms can be used for optimality study  Optimal solution assuming that complexity and arrival time are known based on trace  Lower bound of energy for online algorithms Existing ILP-based offline algorithms are not scalable  [Akyol, 2007] [Zhang, ICCAD’07]

Optimality and Improvement of Dynamic Voltage Scaling Algorithms for Multimedia Applications Authors: Zhen Cao, Brian Foo, Lei He and Mihaela van der Schaar.

Similar presentations

Presentation on theme: "Optimality and Improvement of Dynamic Voltage Scaling Algorithms for Multimedia Applications Authors: Zhen Cao, Brian Foo, Lei He and Mihaela van der Schaar."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Optimality and Improvement of Dynamic Voltage Scaling Algorithms for Multimedia Applications Authors: Zhen Cao, Brian Foo, Lei He and Mihaela van der Schaar.

Similar presentations

Presentation on theme: "Optimality and Improvement of Dynamic Voltage Scaling Algorithms for Multimedia Applications Authors: Zhen Cao, Brian Foo, Lei He and Mihaela van der Schaar."— Presentation transcript:

Similar presentations

About project

Feedback