Yifan Zhu, Frank Mueller North Carolina State University Center for Efficient, Secure and Reliable Computing DVSleak: Combining Leakage Reduction and Voltage Scaling in Feedback EDF Scheduling
2 Background Dyn. Voltage scaling (DVS): lowers dyn. power — Power ~ Sleep: lowers leakage (static) power Dynamic power was dominating Leakage becoming dominant
3 Real-Time Systems Hard real-time systems — periodic, preemptive, independent tasks [Liu, Layland] –w/ known worst-case execution time (WCET) — jobs: periodically released instances of a task — WCET: measured at the max. freq., w/o DVS — most practical system: U << 1 Earliest-deadline-first (EDF) scheduling —, C i =WCET, P i =period —, = (0< 1) DVS scaling factor
4 Motivation Embedded systems with limited power supply DVS for real-time system — trade-off: energy saving vs. timing requirements — lower CPU voltage/frequency longer to complete Task workloads change dynamically — WCET overestimates actual execution time — wide variation of execution times –Longest vs. shortest times Sleeping: 1-2 orders of magnitude less power — DVS below threshold more energy than sleeping — Long idle more energy than sleeping — But wakeup overhead (cold misses in cache)
5 Motivation Prior DVS algorithms: lack adaptability to dynamic workloads Real-world examples: — graphics: 78% of WCET [Wegener/Mueller] — defense: 87%; automotive: 74% — benchmarks: 30-89%; image recognition: 85% [Wolf] Look-ahead DVS [Pillai/ Shin]
6 Contribution A feedback-based framework for dynamic workloads [LCTES’02, RTAS’04, LCTES’05] New: A hybrid sleep+DVS scheme, 2 observations: 1.Limit to DVS use sleep below certain threshold 2.Trade-off idle vs. sleep depends on length of inactivity 3.Feedback helps in these decisions Simulation experiments Comparison with prior work
7 Related Work Dynamic Voltage Scaling — General purpose DVS: Weiser, Govil, Pering, Grunwald — Real-time DVS: Lee, Pillai, Aydin — Optimality of DVS: Ishihara, Qu, Lorch, Xie, Saewong Feedback Real-time Scheduling — Stankovic, Lu, Varma, Poellabauer, Minerick Leakage-aware DVS scheduling — Lee, Quan, Jejurikar ’04/’05, Zhang — We compare with Jejurikar’05 (closest related, best scheme)
8 Feedback-DVS Framework V/f selector: error (V,f) = func(error) Fig. Feedback-DVS Framework Maximum EDF schedule determine slack in EDF schedule assumes: c = WCET
9 Voltage-Frequency Selector : Task splitting with WCET: + — at freq. ( 0 100%); at max. freq. — More aggressive: – < uniform frequency w/o splitting — Objective: –T finishes within the 1 st portion lower energy consumption = /( +slack) Still guaranteed to meet deadline proof in prior paper f t f max Tb 100% f t Ta /
10 Extension to Leakage-aware DVS Power ~ Static power exceeds dynamic power when the voltage is reduced below a threshold value, the critical speed — Voltage below threshold not energy efficient anymore — Sleeping may be better But need to consider wakeup overhead — Mostly due to cache refill — Calculated statically based on time to refill reused lines Dynamic power does NOT dominate anymore!
11 Speed Reduction vs. Task Delaying tt T T Speed reduction Delaying the start time Why delay the start time of a task? — To maximize the CPU sleeping time
12 Delay Dispatching a Task 1. If idle1+idle2 > t th before DVS but < t th afterwards no DVS 2. idle1+idle2 < t th no delay 3. If idle1 < C B no delay 4. Otherwise delay Still guaranteed to meet deadline proof in paper idle1 idle2 T1T2T3 t (i) Consider Schedule WCET idle1 sleep T1T2T3 t (ii) No Delay WCET idle2 sleep T1T2T3 t (iii) Delay CBCB WCET threshold for sleep
13 Scaling below the Critical Speed Pure DVS: should never scale frequency below critical speed DVS combined with sleeping: — sleep if threshold t th > idle slot — If idle slot is too short (< t th ), scale below critical speed –No other work to do (in contrast to non real-time) –Lower frequency/voltage power savings
14 Experimental Framework Scheduling simulator — Accurately reflects energy trends [Zhu’05] PPC405LP Use the same power model as [Jejurikar’04] — Critical speed, wakeup cost Assume four discrete frequency levels: — 25%, 50%, 75%, 100% of f max Compare energy in hyperperiod (const. amount of work) for — Pure Feedback-DVS — DVS+sleep: Feedback-DVS w/ sleep policy (no delay policy) — DSR-DP: dyn. procrastination+slack reclamation [Jejurikar’05] — DVSleak: feedback-DVS w/ sleep & delay now/later policies — Lower-bound schedule: best frequency + sleep for max. idle
15 3 Tasks, Const. Execution, 25% WCET Significant savings w/ sleep, more for low utilizations DVSleak: Delay most impact for medium to high utilizations — Close to lower bound
16 3 Tasks, Const. Execution, 75% WCET All schemes: resilient to actual/WCET ratio DVSleak never worse than other schemes, savings: — 50% over pure, 20% over DVS+sleep, 8.5% over DSR-DP
17 3 Tasks, Var. Execution (pat1), 75% WCET DVSleak: more resilient to fluctuating exec. times (unchanged) feedback helps! All others: 5-10% more energy consumption than for const. exec.
18 10 Tasks, Const. Execution, 25% WCET More tasks 5-10% higher energy cost (switching) DVSleak still best of all (~ same margin)
19 Length of Task Periods U=60%, E normalized to hyperperiod task set 2, c=50% WCET Harmonic (1) vs. non-harmonic (2): — 10-27% more energy for non-harmonic cannot fold jobs released at same time more uncertainty Longer (2) vs. shorter (3) periods for non-harmonic: — 2-28% more energy for shorter periods more job releases, less sleep time — DVSleak ~ 15% lower energy than DSR-DP Feedback more important for shorter periods
20 Conclusion DVSleak: Novel Feedback DVS + leakage (sleep), benefits for — fluctuating execution times — shorter task periods can scale below critical speed — medium utilizations (most common) sleep policy by itself enough for high/low utilizations (always sleep/never sleep) DVSleak energy over other schemes: — avg. 50% over DVS-only — avg. 20% more over DVS+sleep — Avg. 8.5% more over [ Jejurikar’05] — Sleep now/later important when actual exec. << WCET Prior: Evaluation on a real embedded platform — V 2 f model works for OS scheduling
21 Future Work Implementation on IBM PPC 405LP test board Has been used for DVS experiments — Oscilloscope, data acquisition card for voltage / current Assessing sleep modes 1. Clock suspend same power, all still up 2. Suspend 1/10 power, SDRAM up 3. Hibernate N/A (SDRAM NVRAM) 4. Standby N/A (APM over I2C) Need faster resume (reactivating devices slow low-power modes)