IBM Research © 2010 IBM Corporation Guarded Power Gating in a Multi-core Setting Niti Madan, Alper Buyuktosunoglu, Pradip Bose, IBM T.J.Watson June 2010.

IBM Research Outline  Motivation  Queuing Model based Methodology  Results  Conclusions and Future Work 2

IBM Research Power Management through Power Gating –Use header or footer transistor to power-gate the idle circuit –Apply “sleep” to header or footer => turn off voltage –Can be applied at unit-level (intra-core or small-knob) –Can be applied at core-level (per- core or big-knob) Vdd Sleep Virtual Vdd Logic Block........ 3

IBM Research Predictive Power Gating Anita Lungu, Pradip Bose, Alper Buyuktosunoglu, Daniel Sorin,”Dynamic power gating with quality guarantees”. ISLPED ‘09 Power-gating Algorithms are predictive by nature Frequent mis-predictions can burn more power than save Break-even point dependent upon block-size and tech parameters Guard mechanism proposed for unit-level power gating algorithms by Lungu et al. (ISLPED’09) Concern for per-core power gating algorithms as breakeven point is much higher for cores Energy Overhead 0 Break-even point Energy Decide to power gateWake-up Cumulative Energy Savings …10100 0000000000… Decide to Power Gate Correct prediction => save power …10100 001…………. Decide to Power Gate Ex. break-even point = 10 cycles 4

IBM Research Power Gating Scenarios  Exploiting the two dimensions of utilization to power-gate idle units or cores – System Utilization (OS perspective) triggers the big-knob – Resource Utilization (Core’s perspective) triggers the small-knob Do we PG cores or execution units or both?  How can we maximize power-savings opportunities provided by both the small and big knobs ? Core 1 Core 2 Core 3 Core 4 time (a) Baseline 4-core system (b) Folded 2-core system 5

IBM Research Goals of this study  Explore the trade-offs between unit-level/small-knob power gating algorithms and per-core/big-knob power gating algorithms for a range of latencies/parameters  Leverage analytical models for early-stage evaluation  A case for guard mechanism for per-core power- gating Sriram Vajapeyam, Pradip Bose 6

IBM Research Queuing Theory Based Analytical Model  Representation of Multi-processor workloads as a Queuing system – Cores are servers – Processing tasks are customer requests – Tasks are processed in FCFS order – Queuing system tracks average customer waiting time, service time and server utilization  Evaluate our power-management policies using C++ based Queuing model simulator: “QUTE” ? Arrivals Customers Queue Server(s) Departures 7

IBM Research Overview of QUTE Framework  Simulation of Queuing Models (G/G/N/k/inf/FCFS) – Faster than cycle-accurate simulations – Easy to explore design-space early on  Statistical Workload Generation Parameters: –Task Arrival Times: Exponential Distribution –Task Lengths: Normal/Exponential/Uniform Distributions  Evaluation Metrics: – Performance: Average response time – Power: Average number of cores switched on – Other Stats: Server utilization, variance in service demand etc. 8

IBM Research QUTE Framework.... Task arrival (arrival rate distribution using random number generator) …….. C1 C2 C3C4 (all cores queue back the task at the end of a time slice) (service time or task Length statistical distribution) FIFO Task Queue 9

IBM Research Big Knob Modeling Implemented a simple Idleness-triggered heuristic:  Set Idleness Threshold (say to 0.5 msec)  Every 0.5 msec (i.e. the idleness threshold), –Scan all cores –Identify cores idle for > idleness threshold –Switch off all such cores (except, make sure there is always at least one core ON, either free or active)  When a task arrives at the head of the task queue: –If there is no free core, If there is a switched-off core, switch it ON 10

IBM Research Small Knob Modeling  Cannot directly simulate workload phases  Each core can have N power states –2 states for this version : nominal power state and low power state (75% power)  Generate statistical distribution (Gaussian) of each power state duration  Each task always starts in the nominal power state – Switch between power states in a given time-slice  Parameters: Nominal (Hi) and Low (Lo) power state means, Transition overhead 11

IBM Research Simulation Parameters 12 System-level Parameters Number of cores Mean Task Length Mean Task Inter- Arrival Rate Time Slice Simulation Length 32 5 ms 300 µs 1 ms 10000 Tasks Big Knob Parameters Core Switch-on Lat (OnLat) Idleness Threshold (C T ) 500 µs Small Knob Parameters Hi state mean Lo state mean Transition overhead Power Factor 300 µs 100 µs 1 µs 0.75 ρ = λ / N*µ

IBM Research Outline  Problem Background  Methodology: Queuing Model  Results  Conclusions and Future Work 13

IBM Research Big Knob Results ExperimentResponse time (µs) Average Power (Num Cores) Base5002.2232 OnLat = 0.5msC T = 0.5ms C T = 0.3ms C T = 0.1ms C T = 10µs 5038.46 5070.12 5158.51 5244.43 24.99 23.33 21.83 21.68 OnLat = 10µsC T = 0.5ms C T = 10µs 5002.93 5007.07 24.82 20.77 C T controls the degree of power-savings (up to 34%) OnLat controls the performance loss (up to 5%) 14

IBM Research Idle-Time Durations Histogram Number of durations Idle-time Duration (us) 15 CTCT

IBM Research Small Knob Results System_Power = Num_cores x (%time_in_Hi_state + F x %time_in_Lo_state) x P where F = 0.75 for this analysis Workload Behavior Hi Mean Lo Mean Hi % Lo % Response Time (µs) Avg Power (Effective Num- Cores) Short phases High ILP Low ILP Very High ILP Very Low ILP 100 200 300 100 500 100 200 100 300 100 500 52 57 79 30 89 21 48 43 21 70 11 79 5050.51 5027.36 5026.46 5028.23 5013.67 5019.95 28.16 28.48 30.08 26.24 31.04 25.6 Transition Overhead (us) Performance Loss % 16 Power-savings dependent upon workload behavior Short phases increases number of transitions and overhead Transition overhead tolerable for our assumptions

IBM Research Hybrid Model Results (Big + Small Knob) High ILP Workload Low ILP Workload Inter-arrival Rate (µs) Server Utilization (measured) 50 100 300 500 1000 2000 1.0 0.52 0.31 0.16 0.08 17 High ILP workloads – Big knob is most helpful Low ILP workloads – Small knob helpful for even lower utilization

IBM Research A Case for Guard Mechanism for Multi- core Power Gating ExperimentResponse Time (us) Core Switching ON/OFF Frequency Fixed Arrival Rate5043.8891482 Toggling Arrival Rate5111226372  Depending upon workload characteristics, Per-core power gating heuristics are prone to mis-predictions and dissipating more power  Aggressive power-gating heuristics are also increase the performance overhead of mis-prediction (e.g. Lower C T ) 18

IBM Research Observations  In a fully loaded system, the small knob is helpful  In a lightly loaded system, the big knob is most useful  In the intermediate loaded system, the big knob is useful to have but the usefulness of the small knob depends upon the workload characteristics – Lower ILP or low resource utilization workloads are benefited by the small knob  Small knob is a useful feature to have regardless of system load if we can implement power state with lower power factor –Current power factor is conservative (0.75) 19

IBM Research Future Work  Improve methodology by supporting real server utilization traces  Evaluate a system with multiple P-states and DVFS  Architect guard mechanisms for the per-core power gating algorithms  Design implementation of a hybrid PG system 20

IBM Research Thanks and Questions! 21

IBM Research Backup Slides 22

IBM Research Power Factor Sensitivity Analysis for High ILP Workload 23

IBM Research Power Factor Sensitivity Analysis for Low ILP Workload 24

IBM Research Two Level Power Gating Algorithms (Lungu et al. ISLPED'09)  Observations:  Correctness requirement of power saving schemes (efficiency-wise): save power  Single level idle prediction algorithms can behave incorrectly and waste power  Proposed Idea:  Add second level monitor to control enabling of power gating scheme  Improve efficiency of power wasting cases without degrading power saving of common case  Per-core power-gating algorithms also rely on such predictive schemes and will require guard mechanisms – Cost of misprediction is higher in per-core power-gating Efficiency Counters Enable Estimate Power Savings > 0 Yes Enable = 1 Enable = 0 Cnt2++Cnt1++ Level 2: Monitor & Control Level 1: Actuate No OnOff_UOff_C Off_U: Power gated, uncompensated Off_C: Power gated, compensated 25

IBM Research © 2010 IBM Corporation Guarded Power Gating in a Multi-core Setting Niti Madan, Alper Buyuktosunoglu, Pradip Bose, IBM T.J.Watson June 2010.

Similar presentations

Presentation on theme: "IBM Research © 2010 IBM Corporation Guarded Power Gating in a Multi-core Setting Niti Madan, Alper Buyuktosunoglu, Pradip Bose, IBM T.J.Watson June 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IBM Research © 2010 IBM Corporation Guarded Power Gating in a Multi-core Setting Niti Madan, Alper Buyuktosunoglu, Pradip Bose, IBM T.J.Watson June 2010.

Similar presentations

Presentation on theme: "IBM Research © 2010 IBM Corporation Guarded Power Gating in a Multi-core Setting Niti Madan, Alper Buyuktosunoglu, Pradip Bose, IBM T.J.Watson June 2010."— Presentation transcript:

Similar presentations

About project

Feedback