Download presentation
Presentation is loading. Please wait.
Published byLaureen McBride Modified over 9 years ago
1
Green Governors: A Framework for Continuously Adaptive DVFS Vasileios Spiliopoulos, Stefanos Kaxiras Uppsala University, Sweden
2
2 Introduction Optimize power efficiency Reduce power without harming performance Goal: minimize power efficiency metrics —Energy delay product (EDP), energy delay square product (ED 2 P) etc. Exploit memory slack Applications with many LLC misses memory becomes bottleneck Performance insensitive to processor frequency —Scaling frequency down high energy benefit at low performance cost Develop analytical models to predict impact of frequency scaling No empirical parameters No training period Suitable for run-time use
3
3 Modeling DVFS Theoretical (work in simulator) Extend previous Interval-based models (Karkhanis and Smith, ISCA 2004, Eyerman et. al, ACM TOCS, 2010) Two models for runtime DVFS management Miss-based & Stall-based models differ in accuracy and ease of implementation Estimate energy benefits – performance loss G. Keramidas, V. Spiliopoulos, and S. Kaxiras. Interval-Based Models for Run-Time DVFS Orchestration in SuperScalar Processors. Proc. of Int. Conference on Computing Frontiers, 2010 Implementation in real hardware Apply model for power-performance adaptation in real processors —Case study: Intel Core i7 —Approximate models based on available performance monitoring hardware Estimate power characteristics of real hardware V. Spiliopoulos, S. Kaxiras, G. Keramidas "Green governors: A framework for Continuously Adaptive DVFS" International Green Computing Conference (IGCC'11).
4
44 Interval-based Performance Model Break the execution time of a program to intervals Steady-state intervals: the IPC is limited by the machine width and program’s ILP Miss-intervals: introduce stall cycles due to branch mispredictions, on-chip instruction/data misses, LLC misses (off-chip misses) Instr. rate (IPC) cycles Steady-State IPC Branch MissPred. Inst. Miss (on-chip) Data Miss (on-chip) LLC Miss (off-chip)
5
55 Interval-based DVFS Model (step 1) Miss Intervals and Frequency scaling (time measured in cycles) Branch-MissPredictions Miss Intervals —same penalty (in cycles) in all frequencies On-chip data/instruction Miss-Intervals —same penalty (in cycles) in all frequencies LLC (off-chip) Miss intervals —for DVFS only account for this interval Instr. rate (IPC) cycles Steady-State IPC Branch MissPred. Instr Miss (on-chip) Data Miss (on-chip) LLC Miss (off-chip)
6
66 Interval-based DVFS Model (step 2) LLC Miss Interval and Frequency scaling Model core frequency scaling as change in memory latency in cycles Example: memory access time = 100ns f = 1GHz T = 1ns mem_lat = 100 cycles f = 500MHz T = 2ns mem_lat = 50 cycles
7
77 RoB fill Interval-based DVFS Model (step 2) LLC Miss Interval and Frequency scaling Model core frequency scaling as change in memory latency in cycles Instr. rate (IPC) cycles Steady-State IPC LLC Miss (off-chip) LLC Miss IQ Drain Full-stall Ramp-up Mem. latency
8
88 Frequency scaling == Change in memory latency Frequency: memory latency, full stall area —Other areas (ROB–fill, IQ-drain and ramp-up) remain intact RoB fill Instr. rate (IPC) cycles Steady-State IPC LLC Miss IQ Drain Full-stall Ramp-up Mem. latency Ramp-up Mem. latency
9
99 DVFS target: Eliminate the slack Memory latency up to ROB fill time No more available slack due to off chip misses Further reduction performance penalty RoB fill Instr. rate (IPC) cycles Steady-State IPC LLC Miss IQ Drain Full-stall Ramp-up Mem. latency RoB fill Instr. rate (IPC) cycles Steady-State IPC LLC Miss Mem. latency
10
10 Elastic and Non-Elastic Areas Target: Eliminate “slack” by reducing Memory Latency but: ROB fill area: DOES NOT shrink inelastic area Full-stall, IQ drain and Ramp-up: DO shrink elastic areas RoB fill Instr. rate (IPC) cycles Steady-State IPC LLC Miss IQ Drain Full-stall Ramp-up Mem. latency
11
11 Two Simple Interval-Based Models Stall-based Model Fed by in-core information Assumes all stalls scale with frequency —Disregards ROB fill area Can be used in real hardware Miss-based Model Fed by information from the memory system Accounts for both elastic-inelastic areas Required information not available in current hardware
12
12 Stall-based Model Assume (all) stalls scale with f Not true due to RoB Fill Exec cycles at f/k: c init – stalls + (stalls/k) 12 RoB fill Instr. rate (IPC) cycles Steady-State IPC LLC Miss Mem. latency stalls
13
13 Miss-based Model Assumes whole miss interval scales with f Exec cycles at f/k: c init – misses*mem_lat + (misses*mem_lat/k) 13 RoB fill Instr. rate (IPC) cycles Steady-State IPC LLC Miss Mem. latency
14
14 Miss-based Model, more … But important implication for overlapping misses! Stalls of misses under a miss do not scale because of the inelastic Rob fill 14 d Instr. rate (IPC) cycles Steady-State IPC Miss1 Miss2 Miss based model predicts execution cycles based on the number of clusters of misses Mem. latency d d
15
15 Real Hardware Approximations Cannot apply miss-based model No cluster of misses counter available Cannot apply stall-based model as it is No stalls due to LLC misses counter available Approximate stall-based model Approximate LLC stalls with the minimum between all pipeline stalls and worst case stalls due to LLC misses (LLC misses * mem_lat) Good accuracy Predict execution time going from f min to f max and vice versa Less than 5% avg error
16
Measuring power 16
17
Power prediction Previous researchers correlated total power (P = a C f V 2 + P static ) with performance counter events We correlate effective capacitance (P = a C f V 2 + P static ) with performance counter events Run a set of benchmarks Compute effective C of benchmark i as Estimate C i as Minimize 17
18
Power prediction Only need to train the model for a single frequency: Prediction in other frequencies: Events monitored Uops executed L2 misses L2 accesses Resource stalls FP operations Branch mispredictions 18
19
19 Implementing Linux Frequency Governors Linux kernel module that selects frequency Window-based approach Run application for a time window Estimate performance (using stall-based model) and power in any frequency Scale frequency based on policy of interest Implement different policies Optimize EDP/ED 2 P with/without performance constraints Single & multi-process management Experimental framework Intel Core i7 SPEC2006 benchmark suite
20
Intel i7 single process (OptEDP) 20
21
Intel i7 single process (OptEDPlimit) 21
22
Intel i7 multi-process (OptEDP) 22
23
23 Conclusions DVFS modeling in simulators Implement the model in real processors Apply, explain and validate our model for SPEC2006 Contribution: optimize power efficiency using linux frequency governors Other uses of the models PowerSleuth: combine models with phase detection to characterize the power behavior of applications Future work Multi-threading applications
24
24 Thank You! Any questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.