Download presentation
Presentation is loading. Please wait.
Published byCordelia Wilcox Modified over 9 years ago
1
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and Engineering University of California, San Diego ISLPED 2007
2
Why Dynamic Voltage Frequency Scaling? Power consumption is a critical issue in system design today Mobile systems face battery life issues High performance systems face heating issues Dynamic Voltage Frequency Scaling (DVFS): Dynamically scale the supply voltage level of CPU to provide “just enough” circuit speed to process the workload An effective system level technique to reduce power consumption Dynamic Power Management (DPM) is another popular system level technique. However focus of this work is on DVFS
3
Previous Work Based on task level knowledge: [Yao95],[Ishihara98],[Quan02] Based on compiler/app. support: [Azevedo02],[Hsu02],[Chung02] Based on micro-architecture level support: [Marculescu00],[Weissel02],[Choi04], [Choi05]
4
Workload Characterization and Voltage-Frequency Selection No hard task deadlines in general purpose system. Goal: Maximize energy savings while minimizing performance delay. Key idea: CPU-intensive tasks don’t benefit from scaling Memory intensive tasks energy efficient at low v-f settings
5
Workload Characterization and Voltage-Frequency Selection (contd.) Three tasks burn_loop (CPU-intensive), mem (memory intensive) and combo (mix) run with static scaling. burn_loop energy efficient at all settings mem energy efficient at lowest v-f setting
6
Measure CPU-intensiveness (µ) CPI Stack CPI avg =CPI base +CPI cache +CPI tlb +CPI branch +CPI stall Use Performance Monitoring Unit (PMU) of PXA27x to estimate CPI stack components. µ = CPI base /CPI avg High µ indicates high CPU-intensiveness and vice versa
7
Dynamic Task Characterization Dynamically estimate µ for every scheduler quantum and feed it to the online learning algorithm. The algorithm models the CPU- intensiveness of the task and accordingly selects the best suited v-f setting. Theoretical guarantee on converging to the best v-f setting available.
8
Online Learning for Horse Racing Experts Selects the best performing expert for investing his money Expert manages money for the race Evaluates performance of all experts for that race
9
Online Learning for DVFS DVFS Experts (Working Set) Selects the best performing expert Selected expert applied to CPU for next scheduler quantum Evaluates performance of all experts ….. v-f setting 1 DVFS Controller CPU v-f setting 2v-f setting n
10
Controller Algorithm Parameters: Initial weight vector for experts such that Do for t = 1,2,3….. 1.Calculate µ. 2.Update weight vector of task: w i t+1 = w i t. (1-(1-ß). l i t 3.Choose expert with highest probability factor in : 4. Apply the v-f setting corresponding to the selected expert to the CPU. 5. Reset and restart the PMU Sched. tick occurs
11
Evaluation of experts (loss calculation) 0.1 0.3 0.5 0.7 0.9 0 0.2 0.60.8 0.4 Expert1 µmean µ Expert3 µmean Expert4 µmean Expert5 µmean Expert2 µmean 1.0 Intuition: Best suited frequency scales linearly with µ. Map task characteristics to the best suited frequency using µ-mapper. Eg: Expert1-5={100,200,300,400,500}MHz Evaluate experts against the best suited frequency.
12
What about Multi-tasking systems? Possible for task with differing characteristics to execute together. Weight vector (w t ) characterizes an executing task. Need to personalize this information at task level for accurate characterization. Solution: store weight vector as a task level structure
13
Performance bound on Controller If l t i is the loss incurred by expert i for the scheduler quantum t: = r t.l t Goal to minimize net loss: L G –min i L i where, r t.l t and Net loss bounded by Average net loss per period decreases at the rate of Performance of the scheme converges to that of best performing expert with successive sched ticks Let N: experts in working set, T: total number of sched ticks
14
Implementation Testbed Intel PXA27x Development Platform Linux 2.6.9 Implemented as Loadable Kernel Module DVFS LKM Task Creation Scheduler Tick Linux Process Manager Intel PXA27x /proc file system Linux Kernel User PMU vf setting
15
Experiments Setup 1.25 samples/sec DAQ Energy savings calculated using actual current measurements Working set: 4 v-f setting experts Workloads: qsort djpeg blowfish dgzip Freq (MHz) Voltage (V) 2081.2 3121.3 4161.4 5201.5
16
Results: Single Task Environment Bench. Low perf delay -------> Higher energy savings %delay%energy%delay%energy%delay%energy qsort 61716322541 djpeg 7 2115372645 dgzip 153021422749 bf 61116272540 Bench. 208MHz/1.2V %delay%energy qsort 5648 djpeg 3454 dgzip 3354 bf 4051
17
Result: Frequency of Selection For qsort Higher energy savings Lower Perf Delay
18
Results: Multi Task Environment Bench. Low perf delay -------> Higher energy savings %delay%energy%delay%energy%delay%energy qsort+djpeg 61715332541 djpeg+dgzip 132419392748 qsort+djpeg 72018352642 dgzip+bf 131822322744
19
Advantages of the scheme Online learning algorithm: Provides theoretical guarantee on performance converging to that of the best performing expert. Multi-Tasking systems: Works seamlessly across context switches. User preference: Adapts energy savings/performance delay tradeoff with changes in user preference.
20
Overhead Process Creation: used lat_proc from lmbench. 0% overhead Context Switch: used lat_ctx from lmbench 3% overhead with 20 processes (max supported by lat_ctx) [choi05] cause 100% overhead in context switch times Extremely lightweight implementation.
21
Conclusion Designed and implemented a DVFS technique for general purpose multi- tasking systems. Based on online learning that provides theoretical guarantee on the convergence of overall performance to that of the best performing expert. Provides user control over desired energy/performance tradeoff and is extremely lightweight.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.