Microarchitectural Techniques for Power Gating of Execution Units

Slides:



Advertisements
Similar presentations
Various Power Gating techniques to reduce power dissipation in various macros of microprocessors By Sai Raghunath T.
Advertisements

Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Power Reduction Techniques For Microprocessor Systems
5/9/2015 A 32-bit ALU with Sleep Mode for Leakage Power Reduction Manish Kulkarni Department of Electrical and Computer Engineering Auburn University,
Adaptive Techniques for Leakage Power Management in L2 Cache Peripheral Circuits Houman Homayoun Alex Veidenbaum and Jean-Luc Gaudiot Dept. of Computer.
Introduction to CMOS VLSI Design Lecture 18: Design for Low Power David Harris Harvey Mudd College Spring 2004.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec.
Super-Drowsy Caches Single-V DD and Single-V T Super-Drowsy Techniques for Low- Leakage High-Performance Instruction Caches Nam Sung Kim, Krisztián Flautner,
Decomposition of Instruction Decoder for Low Power Design TingTing Hwang Department of Computer Science Tsing Hua University.
Scheduling Reusable Instructions for Power Reduction J.S. Hu, N. Vijaykrishnan, S. Kim, M. Kandemir, and M.J. Irwin Proceedings of the Design, Automation.
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
Hot-and-Cold: Using Criticality in the Design of Energy-Efficient Caches Rajeev Balasubramonian, University of Utah Viji Srinivasan, IBM T.J. Watson Sandhya.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Lecture 5 – Power Prof. Luke Theogarajan
CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998.
Architectural Power Management for High Leakage Technologies Department of Electrical and Computer Engineering Auburn University, Auburn, AL /15/2011.
Lecture 7: Power.
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
Low Power Techniques in Processor Design
Low-Power Wireless Sensor Networks
IBM Research © 2010 IBM Corporation Guarded Power Gating in a Multi-core Setting Niti Madan, Alper Buyuktosunoglu, Pradip Bose, IBM T.J.Watson June 2010.
Dept. of Computer Science, UC Irvine
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management J. S. Hu, A. Nadgir, N. Vijaykrishnan, M. J. Irwin, M. Kandemir.
Washington State University
A Centralized Cache Miss Driven Technique to Improve Processor Power Dissipation Houman Homayoun, Avesta Makhzan, Jean-Luc Gaudiot, Alex Veidenbaum University.
Energy-Effective Issue Logic Hasan Hüseyin Yılmaz.
Microprocessor Microarchitecture Limits of Instruction-Level Parallelism Lynn Choi Dept. Of Computer and Electronics Engineering.
Dec 3, 2008Sheth: MS Thesis1 A Hardware-Software Processor Architecture Using Pipeline Stalls For Leakage Power Management Khushboo Sheth Master’s Thesis.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
CS203 – Advanced Computer Architecture
LOW POWER DESIGN METHODS
PipeliningPipelining Computer Architecture (Fall 2006)
Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker : Chun-Chung Chen Single-ISA.
Power-Optimal Pipelining in Deep Submicron Technology
Smruti R. Sarangi IIT Delhi
CS203 – Advanced Computer Architecture
Nios II Processor: Memory Organization and Access
Temperature and Power Management
Memory Segmentation to Exploit Sleep Mode Operation
Dynamic Branch Prediction
Multiscalar Processors
Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
SECTIONS 1-7 By Astha Chawla
Hot Chips, Slow Wires, Leaky Transistors
Architecture & Organization 1
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
On-demand solution to minimize I-cache leakage energy
Superscalar Processors & VLIW Processors
Architecture & Organization 1
Computer Architecture Lecture 4 17th May, 2006
Power-Aware Microprocessors
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Overheads for Computers as Components 2nd ed.
Control unit extension for data hazards
* From AMD 1996 Publication #18522 Revision E
Lecture 7: Power.
Lecture 7: Power.
Kejia Li, Yang Fu University of Virginia
Program Phase Directed Dynamic Cache Way Reconfiguration
Presentation transcript:

Microarchitectural Techniques for Power Gating of Execution Units Authors: Zhigang Hu, Alper Buyuktosunoglu, Viji Srinivasan, Victor Zyuban, Hans Jacobson, Pradip Bose IBM T.J. Watson Research Center Page: 32-37, In International Symposium for Low Power Electronic Devices, 2004. Presenter: Sai Raghunath T

Sources of Power dissipation Sub-threshold leakage Gate leakage current Circuit level approach for leakage power reduction Body bias control Dual threshold Domino circuits Input vector control Power gating

Architectural level leakage power reduction in caches and buffers Tristating the drivers of bitlines of SRAM Determination of Sleep mode activation policies for the integer functional units using Dual-Vt Domino logic circuits Role of compiler to detect long idle periods for different functional units and enable power gating.

Work done in the paper: Exploiting work load phases and characteristics to dynamically power gate OFF/ON selected units within a pipeline using Time based technique and Branch prediction technique Specifications of out of-order issue Super scalar processor - Turandot

Fundamentals of Power gating: Power gating is achieved by using suitably sized header or footer for a circuit. ‘Sleep’ signal is applied when the logic detects sufficiently long idle period and the macro is turned OFF.

Sequence T1-T0= T(idle detect)‏ T2-T1= T(idle delay)‏ T3-T2= T(breakeven)‏ T4-T2= T(full discharge)‏ T5= detection of next busy interval T6-T5= T(busy delay)‏ T7-T6= T(wakeup) Sequence 1. T0 -> T1= Leakage energy 2. T1 -> T2= Overhead energy+ Leakage energy (Overhead energy is the energy required to generate ‘Sleep’ signal)‏ Savings in leakage energy increase with decrease in supply voltage 3. T5 -> T6= Overhead energy 4. T6 -> T7= Leakage energy

T(breakeven) is the point when the aggregate leakage energy savings E(avg saved) equals the energy overhead of switching ON and OFF the header/footer device. Typically, the value of N (breakeven) is 10 DIBL= Drain Induced Barrier Lowering factor (typically 0.1)‏ WH = total area of header device total area of clock gated macro α- switching factor m = 0.1

Power gating of execution units Quantifying the Power gating potential for out-of- order Superscalar processor model using different applications from SPEC2K suite. Assumptions: T(idle delay)= T(busy delay)=0 →perfect predictor T(idle) > T(overhead) ( =T(wakeup)+T(breakeven))‏

The following equations estimate the fraction of cycles the units can be power gated: Ex: Sequence of activity bits of some unit 1111 00000 111111 0000 1111 000000 1111 T(overhead) =3 Opp cycles = (5-3)+ (4-3) +(6-3) =6 Power gating potential = 6/33 =18.18 %~ 19%

values of T(overhead)‏ Power gating potential averaged across SPEC2K FP applications for various values of T(overhead)‏

Power gating potential averaged across SPEC2K integer applications for various values of T(overhead)‏

Time-Based Power Gating: Assumptions: T(breakeven)= T(breakeven)+ T(idle delay)‏ T(wakeup)= T(wakeup) +T(busy delay)‏ One issue queue per execution unit Logic used: Observe the state of an execution unit and turn it OFF when a long streak of idle cycles is seen

FSM: State Machine of an execution unit when power gating is engaged

% of cycles in sleep mode for FPU with different T(idle detect) and T(breakeven). T(wakeup)= 3 cycles

Avg IPC of SPECFP2K suite with different T(idle detect) and T(wake up) values. T(break even)=9 cycles. IPC is normalized to the base case where Power gating is disabled. Long idle periods coupled with smaller values of T(break even) and T(wakeup) help achieve large leakage reductions and mitigate overall performance loss savings T(idle detect)= 6-12 cycles for optimum balance between performance and power

% of cycles in sleep mode for FXU with different T(idle detect) and T(breakeven). T(wakeup)= 3 cycles

Avg IPC of SPECINT2K suite with different T(idle detect) and T(wake up) values. T(break even)=9 cycles. IPC is normalized to the base case where Power gating is disabled.

Branch prediction guided Power gating: Observations from the previous graphs show that FXU typically had short idle periods. So, it is difficult to efficiently implement Power gating in integer execution units. Branch mispredictions are highly disruptive events in speculative out-of-order processors – Good chance of implementing Power gating techniques. In the event of branch misprediction, the pipeline is flushed and correct instruction is fetched During this process, execution unit is idle.

New branch prediction guided power gating technique: As soon as the branch misprediciton is detected, all idle FXUs are transferred to ‘Uncompensated’ state →reduction in T(idle detect) → higher % of cycles in ‘sleep’ mode → smaller performance loss and better leakage reduction

T(breakeven)=9 cycles; T(wakeup)= 3 cycles % of performance loss in sleep mode versus performance degradation techniques T(breakeven)=9 cycles; T(wakeup)= 3 cycles

Conclusions and critique: Time based technique is efficient for FP execution units which have relatively high idle time. Branch prediction technique is efficient for Integer execution units. No mention about the advantage/disadvantage of power gating over other circuit level approaches for leakage power reduction. How efficient is Power gating if the above mentioned assumptions are relaxed?? What is the power consumption from the macro generating ‘Sleep’ signal? What is the ratio of its power consumption to power savings?

How is this paper relevant to the class?? State-of-art microprocessors are facing the problem of high power leakage due to scaling of technology. Leakage power is high from the execution units which are the most important blocks in the microprocessor. This paper gives a good insight in understanding techniques to reduce leakage power. Also, various power gating techniques to reduce the power dissipation in CMP and SMT architectures can be explored.

Project: Considering a small integer ALU and comparing various circuit level approaches with Power gating and suggesting the better technique(s)- the idea that will be suggested can be a optimum mix of using 2 or more circuit level approaches.

THANK YOU Q &A?