CS 7810 Lecture 14 Reducing Power with Dynamic Critical Path Information J.S. Seng, E.S. Tune, D.M. Tullsen Proceedings of MICRO-34 December 2001.

Slides:



Advertisements
Similar presentations
Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.
Advertisements

CS6290 Pentiums. Case Study1 : Pentium-Pro Basis for Centrinos, Core, Core 2 (We’ll also look at P4 after this.)
CS 7810 Lecture 22 Processor Case Studies, The Microarchitecture of the Pentium 4 Processor G. Hinton et al. Intel Technology Journal Q1, 2001.
CS 7810 Lecture 4 Overview of Steering Algorithms, based on Dynamic Code Partitioning for Clustered Architectures R. Canal, J-M. Parcerisa, A. Gonzalez.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
CS 7810 Lecture 16 Simultaneous Multithreading: Maximizing On-Chip Parallelism D.M. Tullsen, S.J. Eggers, H.M. Levy Proceedings of ISCA-22 June 1995.
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
CS 7810 Lecture 7 Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching E. Rotenberg, S. Bennett, J.E. Smith Proceedings of MICRO-29.
Low power Design Strategies Daniele Folegnani. Talk outline Why Low Power is Important Power Consumption in CMOS Circuits New Trends for Future Microprocessors.
CS 7810 Lecture 20 Initial Observations of the Simultaneous Multithreading Pentium 4 Processor N. Tuck and D.M. Tullsen Proceedings of PACT-12 September.
UPC Power and Complexity Aware Microarchitectures Jaume Abella 1 Ramon Canal 1
1 Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1)
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec.
1 Lecture 11: ILP Innovations and SMT Today: out-of-order example, ILP innovations, SMT (Sections 3.5 and supplementary notes)
Super-Drowsy Caches Single-V DD and Single-V T Super-Drowsy Techniques for Low- Leakage High-Performance Instruction Caches Nam Sung Kim, Krisztián Flautner,
1 Lecture 19: Core Design Today: issue queue, ILP, clock speed, ILP innovations.
CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N.
CS 7810 Lecture 5 The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays M.S. Hrishikesh, N.P. Jouppi, K.I. Farkas, D. Burger, S.W. Keckler,
1 Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections )
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton University M. Franklin – University of Maryland Presented by:
Hot-and-Cold: Using Criticality in the Design of Energy-Efficient Caches Rajeev Balasubramonian, University of Utah Viji Srinivasan, IBM T.J. Watson Sandhya.
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
Cluster Prefetch: Tolerating On-Chip Wire Delays in Clustered Microarchitectures Rajeev Balasubramonian School of Computing, University of Utah July 1.
CS 7810 Lecture 3 Clock Rate vs. IPC: The End of the Road for Conventional Microarchitectures V. Agarwal, M.S. Hrishikesh, S.W. Keckler, D. Burger UT-Austin.
CS Lecture 24 Exceeding the Dataflow Limit via Value Prediction M.H. Lipasti, J.P. Shen Proceedings of MICRO-29 December 1996.
Exploiting Load Latency Tolerance for Relaxing Cache Design Constraints Ramu Pyreddy, Gary Tyson Advanced Computer Architecture Laboratory University of.
Dynamic Management of Microarchitecture Resources in Future Processors Rajeev Balasubramonian Dept. of Computer Science, University of Rochester.
September 28 th 2004University of Utah1 A preliminary look Karthik Ramani Power and Temperature-Aware Microarchitecture.
CS 7810 Lecture 11 Delaying Physical Register Allocation Through Virtual-Physical Registers T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
Ryota Shioya, Masahiro Goshimay and Hideki Ando Micro 47 Presented by Kihyuk Sung.
CS 7810 Lecture 21 Threaded Multiple Path Execution S. Wallace, B. Calder, D. Tullsen Proceedings of ISCA-25 June 1998.
1 Lecture 20: Core Design Today: Innovations for ILP, TLP, power ISCA workshops Sign up for class presentations.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
CS 7810 Lecture 15 A Case for Thermal-Aware Floorplanning at the Microarchitectural Level K. Sankaranarayanan, S. Velusamy, M. Stan, K. Skadron Journal.
Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.
CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO
Revisiting Load Value Speculation:
The Elusive Metric for Low-Power Architecture Research Center for Experimental Research in Computer Systems Georgia Institute of Technology Atlanta, GA.
Erkan Çetiner. Outline Introduction Related Works Modeling Methodology Baseline Results DTM Techniques Conclusions.
Low Power Techniques in Processor Design
1 Lecture 21: Core Design, Parallel Algorithms Today: ARM Cortex A-15, power, sort and matrix algorithms.
CS Lecture 4 Clock Rate vs. IPC: The End of the Road for Conventional Microarchitectures V. Agarwal, M.S. Hrishikesh, S.W. Keckler, D. Burger UT-Austin.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
Complexity-Effective Superscalar Processors S. Palacharla, N. P. Jouppi, and J. E. Smith Presented by: Jason Zebchuk.
CS Lecture 14 Delaying Physical Register Allocation Through Virtual-Physical Registers T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals.
Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.
1 Lecture: SMT, Cache Hierarchies Topics: SMT processors, cache access basics and innovations (Sections B.1-B.3, 2.1)
OOO Pipelines - III Smruti R. Sarangi Computer Science and Engineering, IIT Delhi.
1 Lecture 20: Core Design Today: Innovations for ILP, TLP, power Sign up for class presentations.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
1 Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
CS203 – Advanced Computer Architecture
15-740/ Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011.
Temperature and Power Management
Lecture 18: Core Design, Parallel Algos
Lecture: Out-of-order Processors
Lecture: SMT, Cache Hierarchies
On-demand solution to minimize I-cache leakage energy
Computer Architecture Lecture 4 17th May, 2006
Lecture 17: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.
Lecture 19: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.
Lecture 22: Multithreading
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Presentation transcript:

CS 7810 Lecture 14 Reducing Power with Dynamic Critical Path Information J.S. Seng, E.S. Tune, D.M. Tullsen Proceedings of MICRO-34 December 2001

Instruction Criticality

Blue instructions are critical Yellow instructions are non-critical -- they can be slowed without increasing execution time The critical path can change Critical instructions are usually executed in order

Criticality Metrics QOLD – instructions that are the oldest in the issueq are considered critical  can be extended to oldest-N  does not need a predictor

Criticality Metrics QOLD – instructions that are the oldest in the issueq are considered critical  can be extended to oldest-N  does not need a predictor  young instrs are possibly on mispredicted paths  young instruction latencies can be tolerated  older instrs are possibly holding up the window  older instructions have more dependents in the pipeline than younger instrs This paper uses a predictor because early criticality info is required

Other Metrics QOLDDEP: Producing instructions for oldest in q ALOLD: Oldest instr in ROB FREED-N: Instr completion frees up at least N dependent instrs Wake-Up: Instr completion triggers a chain of wake-up operations Instruction types: cache misses, branch mpreds, and instructions that feed them

Slow FUs Slow units have twice the latency of fast units

Low Power Techniques Slower circuit styles that consume less power Smaller transistors consume less power, but take longer to charge their load Higher threshold voltage reduces leakage and increases delay Assumption: Leakage accounts for 10% of the base; low power FUs consume 20% less dynamic power and 50% less leakage

IPC to FU-Power Ratio Approximates hot-spot evaluation Number of thermal emergencies might be a better metric Total-IPC/Total-Power is more suitable for ___ ?

Serial Nature of Critical Instrs

Issue Queues and Criticality

Proposed Microarchitecture Criticality Predictor Dispatch In-order issueq o-o-o issueq FUs Imbalance between the two queues can worsen performance Favorable implications for temperature o-o-o issueq can have long latency Criticality predictor is likely to not be a hot spot

Overall Results

Power-Aware Architectures Speculation control Criticality Leakage control by de-activating structures Dynamic voltage and frequency scaling Metrics: temperature, peak power, battery life, packaging costs, power delivery, etc.

Title Bullet