CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec.

Slides:



Advertisements
Similar presentations
Power Reduction Techniques For Microprocessor Systems
Advertisements

Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 14: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
1 A Variation-tolerant Sub- threshold Design Approach Nikhil Jayakumar Sunil P. Khatri. Texas A&M University, College Station, TX.
CS 7810 Lecture 14 Reducing Power with Dynamic Critical Path Information J.S. Seng, E.S. Tune, D.M. Tullsen Proceedings of MICRO-34 December 2001.
CS 7810 Lecture 3 Clock Rate vs. IPC: The End of the Road for Conventional Microarchitectures V. Agarwal, M.S. Hrishikesh, S.W. Keckler, D. Burger UT-Austin.
Low-power computer architecture
1 Lecture 15: Recap Today’s topics:  Recap for mid-term Reminders:  no class Thursday  office hours on Monday (10am-4pm)  mid-term Tuesday (arrive.
1 Razor: A Low Power Processor Design Presented By: - Murali Dharan.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Lecture 7: Power.
Power-Aware Computing 101 CS 771 – Optimizing Compilers Fall 2005 – Lecture 22.
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  logistics  why computer organization is important  modern trends.
1 Lecture 20: Core Design Today: Innovations for ILP, TLP, power ISCA workshops Sign up for class presentations.
Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.
EECS 470 Power and Architecture Many slides taken from Prof. David Brooks, Harvard University and modified by Mark Brehob. A couple of slides are also.
Power, Energy and Delay Static CMOS is an attractive design style because of its good noise margins, ideal voltage transfer characteristics, full logic.
1 Lecture 2: Metrics to Evaluate Systems Topics: Power and technology trends wrap-up, benchmark suites, performance equation, summarizing performance with.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
Low Power Techniques in Processor Design
McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures Runjie Zhang Dec.3 S. Li et al. in MICRO’09.
8 – Simultaneous Multithreading. 2 Review from Last Time Limits to ILP (power efficiency, compilers, dependencies …) seem to limit to 3 to 6 issue for.
1 Lecture 21: Core Design, Parallel Algorithms Today: ARM Cortex A-15, power, sort and matrix algorithms.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Low-Power Wireless Sensor Networks
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
CS Lecture 4 Clock Rate vs. IPC: The End of the Road for Conventional Microarchitectures V. Agarwal, M.S. Hrishikesh, S.W. Keckler, D. Burger UT-Austin.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
MS108 Computer System I Lecture 2 Metrics Prof. Xiaoyao Liang 2014/2/28 1.
[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.
Microprocessor Microarchitecture Limits of Instruction-Level Parallelism Lynn Choi Dept. Of Computer and Electronics Engineering.
경종민 Low-Power Design for Embedded Processor.
© Digital Integrated Circuits 2nd Inverter EE5900 Advanced Algorithms for Robust VLSI CAD The Inverter Dr. Shiyan Hu Office: EERC 731 Adapted.
Basics of Energy & Power Dissipation
Patricia Gonzalez Divya Akella VLSI Class Project.
FPGA-Based System Design: Chapter 2 Copyright  2004 Prentice Hall PTR Topics n Logic gate delay. n Logic gate power consumption. n Driving large loads.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
1 Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with.
11/15/05ELEC / Lecture 191 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
1 Lecture 20: Core Design Today: Innovations for ILP, TLP, power Sign up for class presentations.
1 Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
CS203 – Advanced Computer Architecture
LOW POWER DESIGN METHODS
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
Power-Optimal Pipelining in Deep Submicron Technology
Microprocessor Microarchitecture Introduction
CS203 – Advanced Computer Architecture
Lecture 2: Performance Today’s topics:
Lecture 18: Core Design, Parallel Algos
The Inverter EE4271 VLSI Design Professor Shiyan Hu Office: EERC 518
Lecture: Branch Prediction
SECTIONS 1-7 By Astha Chawla
Hot Chips, Slow Wires, Leaky Transistors
Lecture: Branch Prediction
Lynn Choi Dept. Of Computer and Electronics Engineering
Vishwani D. Agrawal James J. Danaher Professor
Microarchitectural Techniques for Power Gating of Execution Units
Lecture 2: Performance Today’s topics: Technology wrap-up
The Inverter EE4271 VLSI Design Dr. Shiyan Hu Office: EERC 731
8 – Simultaneous Multithreading
Lecture 22: Multithreading
Presentation transcript:

CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec 2000

Power/Energy Basics Energy = Power x time Dynamic Power =  C V 2 f  switching activity factor C capacitances being charged V voltage swing f processor frequency Current trends: f and C are rising, V is dropping, overall dynamic power is increasing Leakage energy is also increasing

Processor Breakdowns Alpha Caches 16% O-o-o Issue Logic 19% Mem management unit 9% FP unit 11% Integer unit 11% Clock power 34% Pentium Pro

Metrics Performance  f  1/D (D is delay or execution time) Delay of a circuit  1/(V – Vt) ; lower frequency tolerates longer delays, hence, can reduce voltage Power = a C V 2 f ; since f is roughly proportional to voltage, P  V 3  f 3 Since V and f are variable, remove it from the expression: PD 3 = constant (regardless of V and f) This is the best metric to compare processors; any other metric (say, perf/watt) can be “fudged” by changing voltage or frequency

Metric Example Proc-A Proc-B V = 1.25; f = 1GHz Perf 1000 MIPS 800 MIPS Power 100W 80W V = 1.0; f = 0.8GHz Perf 800 MIPS 640 MIPS Power 51.2W 41W V= 1.5; f = 1.2GHz Perf 1200 MIPS 960 MIPS Power 172.8W 138.2W MIPS/W = 10 MIPS/W = 15.6 MIPS/W = 6.9 Power/f 3 =

Metrics PD 3 gives ratio of power if two processors were tuned* to yield the same performance (PD 3 ) 1/3 gives ratio of performance if two processors were tuned* to yield the same power *Tuning is done through voltage and frequency scaling and it is assumed that a linear relationship exists between V and f – note that in modern processors, this is not true and PD x is the right metric, where x > 3 (x can be 1 or 2 in markets where performance is not very critical)

Commercial Examples

Global Power Saving Strategies Dynamic frequency scaling – trivially reduces power, worsens performance, no effect on energy If off-chip components (memory) dominate, there will be an energy reduction with DFS Leakage power is unaffected by DFS, so if leakage dominates, overall energy increases Montecito: 20MHz changes in frequency can happen in a single cycle

Global Power Saving Strategies Dynamic voltage scaling – since we are changing frequency, can also combine it with voltage scaling as each circuit has longer slack – has a more than quadratic effect on dynamic power, a linear effect on leakage power, and a more than linear effect on energy Intel Xscale: roughly 50  s to scale from V DVS opportunities are reducing: lower voltage margins, error rates may increase

Localized Power Saving Strategies When a processor structure is not used in a cycle, gate off its clock for that cycle – gating can happen in a single cycle; increase in complexity Leakage energy can be reduced by gating off supply voltage V during periods of inactivity – takes more time to effect Body biasing can also reduce leakage power

Localized Power Saving Strategies Dynamically adjust frequency/voltage and size for each domain, based on thruput rates

Leakage Power Leakage is a linear function of supply voltage, a linear function of the number of transistors, and an exponential function of threshold voltage From Butts and Sohi, MICRO’00

Power-Performance Trade-Offs

Caches, bpreds are doubled at each point below, while the x-axis represents the sizes of issue queues, registers, ROB, etc. Argues against going to wider/larger superscalars

Other Observations Clustered architectures have better power scalability (since the complexity of each cluster remains unchanged) CMP and SMT can employ complexity-effective designs – power consumption is low (little wasted work) and multi-threaded performance continues to be high From ISPASS’06

Title Bullet