Green Governors: A Framework for Continuously Adaptive DVFS Vasileios Spiliopoulos, Stefanos Kaxiras Uppsala University, Sweden.

Green Governors: A Framework for Continuously Adaptive DVFS Vasileios Spiliopoulos, Stefanos Kaxiras Uppsala University, Sweden

2 Introduction Optimize power efficiency Reduce power without harming performance Goal: minimize power efficiency metrics —Energy delay product (EDP), energy delay square product (ED 2 P) etc. Exploit memory slack Applications with many LLC misses  memory becomes bottleneck Performance insensitive to processor frequency —Scaling frequency down  high energy benefit at low performance cost Develop analytical models to predict impact of frequency scaling No empirical parameters No training period Suitable for run-time use

3 Modeling DVFS Theoretical (work in simulator) Extend previous Interval-based models (Karkhanis and Smith, ISCA 2004, Eyerman et. al, ACM TOCS, 2010)  Two models for runtime DVFS management Miss-based & Stall-based models  differ in accuracy and ease of implementation Estimate energy benefits – performance loss G. Keramidas, V. Spiliopoulos, and S. Kaxiras. Interval-Based Models for Run-Time DVFS Orchestration in SuperScalar Processors. Proc. of Int. Conference on Computing Frontiers, 2010 Implementation in real hardware Apply model for power-performance adaptation in real processors —Case study: Intel Core i7 —Approximate models based on available performance monitoring hardware Estimate power characteristics of real hardware V. Spiliopoulos, S. Kaxiras, G. Keramidas "Green governors: A framework for Continuously Adaptive DVFS" International Green Computing Conference (IGCC'11).

44 Interval-based Performance Model Break the execution time of a program to intervals Steady-state intervals: the IPC is limited by the machine width and program’s ILP Miss-intervals: introduce stall cycles due to branch mispredictions, on-chip instruction/data misses, LLC misses (off-chip misses) Instr. rate (IPC) cycles Steady-State IPC Branch MissPred. Inst. Miss (on-chip) Data Miss (on-chip) LLC Miss (off-chip)

55 Interval-based DVFS Model (step 1) Miss Intervals and Frequency scaling (time measured in cycles) Branch-MissPredictions Miss Intervals  —same penalty (in cycles) in all frequencies On-chip data/instruction Miss-Intervals  —same penalty (in cycles) in all frequencies LLC (off-chip) Miss intervals  —for DVFS only account for this interval Instr. rate (IPC) cycles Steady-State IPC Branch MissPred. Instr Miss (on-chip) Data Miss (on-chip) LLC Miss (off-chip)

66 Interval-based DVFS Model (step 2) LLC Miss Interval and Frequency scaling Model core frequency scaling as change in memory latency in cycles Example: memory access time = 100ns f = 1GHz  T = 1ns  mem_lat = 100 cycles f = 500MHz  T = 2ns  mem_lat = 50 cycles

77 RoB fill Interval-based DVFS Model (step 2) LLC Miss Interval and Frequency scaling Model core frequency scaling as change in memory latency in cycles Instr. rate (IPC) cycles Steady-State IPC LLC Miss (off-chip) LLC Miss IQ Drain Full-stall Ramp-up Mem. latency

88 Frequency scaling == Change in memory latency  Frequency:  memory latency,  full stall area —Other areas (ROB–fill, IQ-drain and ramp-up) remain intact RoB fill Instr. rate (IPC) cycles Steady-State IPC LLC Miss IQ Drain Full-stall Ramp-up Mem. latency Ramp-up Mem. latency

99 DVFS target: Eliminate the slack  Memory latency up to ROB fill time No more available slack due to off chip misses Further reduction  performance penalty RoB fill Instr. rate (IPC) cycles Steady-State IPC LLC Miss IQ Drain Full-stall Ramp-up Mem. latency RoB fill Instr. rate (IPC) cycles Steady-State IPC LLC Miss Mem. latency

10 Elastic and Non-Elastic Areas Target: Eliminate “slack” by reducing Memory Latency but: ROB fill area: DOES NOT shrink  inelastic area Full-stall, IQ drain and Ramp-up: DO shrink  elastic areas RoB fill Instr. rate (IPC) cycles Steady-State IPC LLC Miss IQ Drain Full-stall Ramp-up Mem. latency

11 Two Simple Interval-Based Models Stall-based Model Fed by in-core information Assumes all stalls scale with frequency —Disregards ROB fill area Can be used in real hardware Miss-based Model Fed by information from the memory system Accounts for both elastic-inelastic areas Required information not available in current hardware

12 Stall-based Model Assume (all) stalls scale with f Not true due to RoB Fill Exec cycles at f/k: c init – stalls + (stalls/k) 12 RoB fill Instr. rate (IPC) cycles Steady-State IPC LLC Miss Mem. latency stalls

13 Miss-based Model Assumes whole miss interval scales with f Exec cycles at f/k: c init – misses*mem_lat + (misses*mem_lat/k) 13 RoB fill Instr. rate (IPC) cycles Steady-State IPC LLC Miss Mem. latency

14 Miss-based Model, more … But important implication for overlapping misses! Stalls of misses under a miss do not scale because of the inelastic Rob fill 14 d Instr. rate (IPC) cycles Steady-State IPC Miss1 Miss2 Miss based model predicts execution cycles based on the number of clusters of misses Mem. latency d d

15 Real Hardware Approximations Cannot apply miss-based model No cluster of misses counter available Cannot apply stall-based model as it is No stalls due to LLC misses counter available Approximate stall-based model Approximate LLC stalls with the minimum between all pipeline stalls and worst case stalls due to LLC misses (LLC misses * mem_lat) Good accuracy Predict execution time going from f min to f max and vice versa Less than 5% avg error

Measuring power 16

Power prediction Previous researchers correlated total power (P = a C f V 2 + P static ) with performance counter events We correlate effective capacitance (P = a C f V 2 + P static ) with performance counter events Run a set of benchmarks Compute effective C of benchmark i as Estimate C i as Minimize 17

Power prediction Only need to train the model for a single frequency: Prediction in other frequencies: Events monitored Uops executed L2 misses L2 accesses Resource stalls FP operations Branch mispredictions 18

19 Implementing Linux Frequency Governors Linux kernel module that selects frequency Window-based approach Run application for a time window Estimate performance (using stall-based model) and power in any frequency Scale frequency based on policy of interest Implement different policies Optimize EDP/ED 2 P with/without performance constraints Single & multi-process management Experimental framework Intel Core i7 SPEC2006 benchmark suite

Intel i7 single process (OptEDP) 20

Intel i7 single process (OptEDPlimit) 21

Intel i7 multi-process (OptEDP) 22

23 Conclusions DVFS modeling in simulators Implement the model in real processors Apply, explain and validate our model for SPEC2006 Contribution: optimize power efficiency using linux frequency governors Other uses of the models PowerSleuth: combine models with phase detection to characterize the power behavior of applications Future work Multi-threading applications

24 Thank You! Any questions?

Green Governors: A Framework for Continuously Adaptive DVFS Vasileios Spiliopoulos, Stefanos Kaxiras Uppsala University, Sweden.

Similar presentations

Presentation on theme: "Green Governors: A Framework for Continuously Adaptive DVFS Vasileios Spiliopoulos, Stefanos Kaxiras Uppsala University, Sweden."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Green Governors: A Framework for Continuously Adaptive DVFS Vasileios Spiliopoulos, Stefanos Kaxiras Uppsala University, Sweden.

Similar presentations

Presentation on theme: "Green Governors: A Framework for Continuously Adaptive DVFS Vasileios Spiliopoulos, Stefanos Kaxiras Uppsala University, Sweden."— Presentation transcript:

Similar presentations

About project

Feedback