On-demand solution to minimize I-cache leakage energy

Slides:



Advertisements
Similar presentations
Thank you for your introduction.
Advertisements

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
1 SS and Pipelining: The Sequel Data Forwarding Caches Branch Prediction Michele Co, September 24, 2001.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Managing Static (Leakage) Power S. Kaxiras, M Martonosi, “Computer Architecture Techniques for Power Effecience”, Chapter 5.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Adaptive Techniques for Leakage Power Management in L2 Cache Peripheral Circuits Houman Homayoun Alex Veidenbaum and Jean-Luc Gaudiot Dept. of Computer.
EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
CS 7810 Lecture 7 Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching E. Rotenberg, S. Bennett, J.E. Smith Proceedings of MICRO-29.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
Super-Drowsy Caches Single-V DD and Single-V T Super-Drowsy Techniques for Low- Leakage High-Performance Instruction Caches Nam Sung Kim, Krisztián Flautner,
Power Savings in Embedded Processors through Decode Filter Cache Weiyu Tang, Rajesh Gupta, Alex Nicolau.
1 Lecture 19: Core Design Today: issue queue, ILP, clock speed, ILP innovations.
Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department.
CS 7810 Lecture 14 Reducing Power with Dynamic Critical Path Information J.S. Seng, E.S. Tune, D.M. Tullsen Proceedings of MICRO-34 December 2001.
On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005.
EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
Hot-and-Cold: Using Criticality in the Design of Energy-Efficient Caches Rajeev Balasubramonian, University of Utah Viji Srinivasan, IBM T.J. Watson Sandhya.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
1 Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
CS 7810 Lecture 21 Threaded Multiple Path Execution S. Wallace, B. Calder, D. Tullsen Proceedings of ISCA-25 June 1998.
1 Lecture 20: Core Design Today: Innovations for ILP, TLP, power ISCA workshops Sign up for class presentations.
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
Roza Ghamari Bogazici University.  Current trends in transistor size, voltage, and clock frequency, future microprocessors will become increasingly susceptible.
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
Power and Frequency Analysis for Data and Control Independence in Embedded Processors Farzad Samie Amirali Baniasadi Sharif University of Technology University.
Power Management in High Performance Processors through Dynamic Resource Adaptation and Multiple Sleep Mode Assignments Houman Homayoun National Science.
Energy-Effective Issue Logic Hasan Hüseyin Yılmaz.
Multiple Sleep Mode Leakage Control for Cache Peripheral Circuits in Embedded Processors Houman Homayoun, Avesta Makhzan, Alex Veidenbaum Dept. of Computer.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
Fetch Directed Prefetching - a Study
Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
1 Improved Policies for Drowsy Caches in Embedded Processors Junpei Zushi Gang Zeng Hiroyuki Tomiyama Hiroaki Takada (Nagoya University) Koji Inoue (Kyushu.
Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.
Computer Organization CS224
CS203 – Advanced Computer Architecture
Lecture: Branch Prediction
Lecture: Branch Prediction
The University of Adelaide, School of Computer Science
PowerPC 604 Superscalar Microprocessor
5.2 Eleven Advanced Optimizations of Cache Performance
Pipeline Implementation (4.6)
Microarchitectural Techniques for Power Gating of Execution Units
The processor: Pipelining and Branching
Computer Architecture Lecture 4 17th May, 2006
Module 3: Branch Prediction
Address-Value Delta (AVD) Prediction
Overheads for Computers as Components 2nd ed.
Lecture 10: Branch Prediction and Instruction Delivery
* From AMD 1996 Publication #18522 Revision E
Dynamic Hardware Prediction
Patrick Akl and Andreas Moshovos AENAO Research Group
rePLay: A Hardware Framework for Dynamic Optimization
Consider the following code segment for a loop: int a = 3, b = 4;
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Procedure Return Predictors
Presentation transcript:

On-demand solution to minimize I-cache leakage energy Group members: Chenyu Lu and Tzyy-Juin Kao

Motivation High power dissipation causes thermal problems, such as higher packaging, power delivery and cooling costs In 70nm technology, leakage may constitute as much as 50% of total energy dissipation Use the super-drowsy leakage saving technique Lower the supply voltage to a level (0.25V) near the threshold voltage (0.2V) Data can still maintain but can not be accessed Require one cycle penalty to wake up from the saving mode to the active mode Use the on-demand wakeup policy on the I-cache Only the cache lines currently in use need to be awake Accurately predict the next cache line by using the branch predictor On most branch mispredictions, the extra wakeup stage is overlapped with the misprediction recovery

Overview Super-drowsy cache line Wakeup prediction policy A Schmitt trigger inverter controls the voltage of the cache line at the leakage saving mode Replace multiple supply voltage sources Wakeup prediction policy enables on-demand wakeup The branch predictor already identifies which line need to be woken up No additional wakeup-prediction structure is needed

Methodology Leakage energy = drowsy_energy + active_energy + turn_on_energy Monitoring every cycle in sim-outorder: active_lines & turn_on Add a wake_bit to every block: 0: means it’s in drowsy mode this cycle 1: means it’s in active mode this cycle 2: means it’s in active mode this cycle and the next cycle 3: means it in drowsy mode this cycle and will be in active mode next cycle Update the wake_bit and count the active_lines every cycle using Update_wakeup() Change the wake_bit every instruction fetch using fetch_line() Improved strategy Interval * Active_Power < Interval * Drowsy_Power + Turn_On_Energy Speculate with a list of recently-accessed cache lines

Results

Change block size

Change interval

Future Work One cycle extra latency when target address misprediction (0.08% performance drop according to the paper) Apply On demand policy on data cache No prediction Extra latency can be hidden by locality and out of order execution