Low-power computer architecture

Slides:



Advertisements
Similar presentations
International Symposium on Low Power Electronics and Design Qing Xie, Mohammad Javad Dousti, and Massoud Pedram University of Southern California ISLPED.
Advertisements

Computer Structure Power Management Lihu Rappoport and Adi Yoaz Thanks to Efi Rotem for many of the foils.
Computer Abstractions and Technology
Power Reduction Techniques For Microprocessor Systems
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec.
1 Memory Management Challenges in the Power-Aware Computing Era Dr. Avi Mendelson, Intel - Mobile Processors Architecture group
Chapter 1. Introduction This course is all about how computers work But what do we mean by a computer? –Different types: desktop, servers, embedded devices.
8/18/05ELEC / Lecture 11 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
Power-Aware Computing 101 CS 771 – Optimizing Compilers Fall 2005 – Lecture 22.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
ECE 510 Brendan Crowley Paper Review October 31, 2006.
Power-aware Computing n Dramatic increases in computer power consumption: » Some processors now draw more than 100 watts » Memory power consumption is.
EE141 © Digital Integrated Circuits 2nd Introduction 1 The First Computer.
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
17 Sep 2002Embedded Seminar2 Outline The Big Picture Who’s got the Power? What’s in the bag of tricks?
Low Power Techniques in Processor Design
EE141 © Digital Integrated Circuits 2nd Introduction 1 EE4271 VLSI Design Dr. Shiyan Hu Office: EERC 518 Adapted and modified from Digital.
Computer Performance Computer Engineering Department.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Last Time Performance Analysis It’s all relative
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Power-Aware Compilation CS 671 April 22, CS 671 – Spring Why Worry about Power Dissipation? Environment Thermal issues: affect cooling, packaging,
Thermal-aware Issues in Computers IMPACT Lab. Part A Overview of Thermal-related Technologies.
Bypass Aware Instruction Scheduling for Register File Power Reduction Sanghyun Park, Aviral Shrivastava Nikil Dutt, Alex Nicolau Yunheung Paek Eugene Earlie.
The End of Conventional Microprocessors Edwin Olson 9/21/2000.
Present – Past -- Future
Morgan Kaufmann Publishers
Thermal-aware Phase-based Tuning of Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing This work was supported.
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
EE141 © Digital Integrated Circuits 2nd Introduction 1 Principle of CMOS VLSI Design Introduction Adapted from Digital Integrated, Copyright 2003 Prentice.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 3 Computer Evolution.
EE141 © Digital Integrated Circuits 2nd Introduction 1 EE5900 Advanced Algorithms for Robust VLSI CAD Dr. Shiyan Hu Office: EERC 731 Adapted.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
CS203 – Advanced Computer Architecture
EE141 © Digital Integrated Circuits 2nd Introduction 1 EE4271 VLSI Design Dr. Shiyan Hu Office: EERC 731 Adapted and modified from Digital.
Crusoe Processor Seminar Guide: By: - Prof. H. S. Kulkarni Ashish.
CS203 – Advanced Computer Architecture
Lecture 2: Performance Today’s topics:
Microarchitecture.
Lynn Choi School of Electrical Engineering
Morgan Kaufmann Publishers
Architecture & Organization 1
Circuits and Interconnects In Aggressively Scaled CMOS
Lecture 2: Performance Today’s topics: Technology wrap-up
Architecture & Organization 1
Transistors on lead microprocessors double every 2 years Moore’s Law in Microprocessors Transistors on lead microprocessors double every 2 years.
CMSC 611: Advanced Computer Architecture
A High Performance SoC: PkunityTM
Computer Evolution and Performance
The University of Adelaide, School of Computer Science
Welcome to Computer Architecture
The University of Adelaide, School of Computer Science
Course Code 114 Introduction to Computer Science
Presentation transcript:

Low-power computer architecture Dr. Avi Mendelson

Disclaimer No Intel proprietary information is disclosed. Every future estimate or projection is only a speculation Responsibility for all opinions and conclusions falls on the author only.  It does not means you cannot trust them…  © Dr. Avi Mendelson

Out of the box thinking is needed Before we start The focus of my classes during the summer school is on understanding the power problem, current solutions and research directions. Personal observation: focusing on low-power resembles Alice through the looking-glass: We are looking at the same old problems, but from the other side of the looking glass, and the landscape appears much different... Out of the box thinking is needed © Dr. Avi Mendelson

Schedule of the course First day: Introduction for Low-power . Second day: Circuits modeling and simulation. Third day: Circuit level solutions. Fourth day: Architectures for Low Power. Fifth day: Thermal and system issues. © Dr. Avi Mendelson

Agenda The power crisis General solutions and directions Power consumption Power density and thermal limitations General solutions and directions © Dr. Avi Mendelson

Moore’s law Memory Microprocessor “Doubling the number of transistors on a manufactured die every year” - Gordon Moore, Intel Corporation Source: Intel Transistors Per Die ’70 ’73 ’76 ’79 ’82 ’85 ’88 ’91 ’94 '97 2000 108 107 106 105 104 103 102 4M Memory Microprocessor 109 64K 1M 1K 256K 4K 16K 16M 64M 4004 8080 8086 80286 i386™ i486™ Pentium® 256M Pentium® Pro Pentium®III Pentium®4 Pentium® II © Dr. Avi Mendelson

In the Last 25 Years Life was Easy(*) Doubling of transistor density every 30 months Increasing die sizes, allowed by Increasing Wafer Size Process technology moving from “black art” to “manufacturing science”  Doubling of transistors every 18 months 486 shrink was 0.7 (area) Pentium shrink 0.5 (area) Implications: (in the same technology) 1. New mArch ~ 2-3X die area of the last mArch 2. Provides 1.5-1.7X integer performance of the last mArch (*) source Fred Pollack, Micro-32 © Dr. Avi Mendelson

Suddenly, the power monster appears in all different market segments © Dr. Avi Mendelson

Processor Power Evolution Max Power (Watts) i386 i486 Pentium® w/MMX tech. 1 10 100 1.5m 1m 0.8m 0.6m 0.35m 0.25m 0.18m 0.13m Pentium® Pro Pentium® II Pentium® 4 ? Pentium® III Traditionally: new generation always increase power Compactions: higher performance at lower power Used to be “One size fits all”: start with high power and shrink to Mobile © Dr. Avi Mendelson

The power crisis – power consumption Sourse: cool-chips, Micro 32 © Dr. Avi Mendelson

Power challenges per segment Handhelds Mobile Desktops Servers Form Factor Battery size Battery cost Thermal cost Delivery cost Form factor Power related system cost drivers Performance Battery life Noise Perf/Kg. Perf/$$ Perf/inch^3 Price drivers Max battery life Max perf/power to meet application’s need Max performance @ thermal constraint Optimization point © Dr. Avi Mendelson

Power & Energy Power Dynamic power: consumed by transistors during switching. P = aCV2f - Work done per time unit (Watts) (a: activity, C: capacitance, V: voltage, f: frequency) Static Power (Leakage): consumed by all “inactive transistors”, it depends on temperature and voltage. Power aware architectures -> aim to reduce peak power Energy Power consume during some period of time. Energy aware architectures -> aims to reduce average power consumption © Dr. Avi Mendelson

Power Evolution (Theoretical) 250 Leakage Power Active Power 200 150 Watts 100 50 0.25m 0.18m 0.13m 0.1m For a 15mm/side die (225mm2) Assume 2X frequency increase each generation Future process numbers are estimated © Dr. Avi Mendelson

Why high power matters Power Limitations Higher power  higher current Cannot exceed platform power delivery constraints Higher power  higher temperature Cannot exceed the thermal constraints (e.g., Tj < 100oC) Increases leakage. The heat must be controlled in order to avoid electric migration and other “chemical” reactions of the silicon Energy Affects battery life. Consumer devices – the processor may consume most of the energy Mobile computers (Laptops) - the system (display, disk, cooling, energy supplier, etc) consumes most of the energy Affects the cost of Electricity © Dr. Avi Mendelson

Power Density Sun's Surface Rocket Nozzle Watts/cm Nuclear Reactor 1000 Rocket Nozzle Nuclear Reactor 100 Watts/cm 2 Pentium® 4 Hot plate Pentium® III Pentium® II 10 Pentium® Pro Pentium® i386 i486 1 1.5m 1m 0.7m 0.5m 0.35m 0.25m 0.18m 0.13m 0.1m 0.07m * “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” – Fred Pollack, Intel Corp. Micro32 conference key note - 1999. © Dr. Avi Mendelson

© Dr. Avi Mendelson

Why power and power density increase over time ? © Dr. Avi Mendelson

How do we keep up with the Moore’s Law? Every 18 month in average we introduce a new process The new process shrinks the dimension of the transistors by 0.7 (ideal shrink) As a result, on the same die area, we can have more transistors, each of them running at higher frequency One may mistakenly think that this is the reason for the increase in power and power density. © Dr. Avi Mendelson

Scaling theory--1 of 2 Lateral and vertical dimensions reduce by 30% Capacitance--area and fringing—reduce by 30% Die area reduces 50% © Dr. Avi Mendelson

Scaling theory--2 of 2 Capacitance per transistor reduces 30% Capacitance per unit area increases 43% Delay reduces 30%, power reduces 50% © Dr. Avi Mendelson

Ideal Scenarios... Ideal “Shrink” Ideal New march Same march 1X #Xistors 0.5X size 1.5X frequency 0.5X power 1X IPC (instr./cycle) 1.5X performance 1X power density Ideal New march Same die size 2X #Xistors 1X size 1.5X frequency 1X power 2X IPC 3X performance 1X power density © Dr. Avi Mendelson

Process Technologies – Reality But in reality: New process is not ideal anymore New designs squeeze frequency to 2X per process New designs use more transistors (2X-3X to get 1.5X-1.7X perf) So, every new process and architecture generation: Power goes up about 2X Power density goes up 30%~80% This is bad, and… Will get worse in future process generations: Voltage (Vdd) will scale down less Leakage is going to the roof © Dr. Avi Mendelson

Die increases in order to maintain performance boost Silicon Process Technology 1.5µ 1.0µ 0.8µ 0.6µ 0.35µ 0.25µ 0.18µ 0.13µ Intel386™ DX Processor Intel486™ DX Processor Pentium® Processor Pentium® Pro Processor Pentium® II Processor P-III sizes – 0.25u 140 mm^2, 9.5 M trans 0KB L2; 01.8 106 mm^2, 28M Trans, 128KB L2; 01.3 79 mm^2, 40M Trans, 512KB L2 P4P sizes – W 1.75V 2GHZ 75 Watts 217mm^2 42M trans, 256KB L2 N 1.5V 2GHZ 52W, 2.2GHZ 55W 146mm^2 55M tran 512KB L2 Pentium® III Processor Pentium® 4 Processor © Dr. Avi Mendelson

Put it all together: Power and Power density are real threat to the Moore’s law Complex algorithms lead to denser power: Dense random logic Timing pressure leads to faster/bigger/power-hungrier gates Designers put together units that communicate with each other. It creates “regions” with high activity factors -> hot spots. Power is not distributed evenly over the chip. A failure can happen if a single point reach the max power point. Many of the modern processors are power limited © Dr. Avi Mendelson

Some implications We can’t build microprocessors with ever increasing power density and die sizes The constraint is power – not manufacturability The design of any future micro-processor should take power into consideration. We need to distinguish between different aspects of power: Power delivery Max power (TJ) Power density - hot spots Energy – static + dynamic Power and Energy aware design should take care of each of these aspects One-size does not fit all anymore © Dr. Avi Mendelson

General solutions and directions Assume that one size does not fit all. For different segments there may be different solutions (although many of them share the same principle of operation). © Dr. Avi Mendelson

Embedded systems vs. Laptops Most of the power is consumed by the CPU Usually not thermally limited. What we really care about is battery life and meeting the timing limitations. In real time systems we can take advantage of known “deadlines” Laptops (Mobile systems) We are thermally limited. We can not use deadlines (most of the time). We need to optimize for max battery life and max performance in a given power envelope. © Dr. Avi Mendelson

How to extend Battery life: Voltage Scaling Within a given voltage range, higher voltage allows higher freq. Used for trading power and frequency. Either Statically, at manufacturing time Dynamically, at run time (e.g., Intel’s SpeedStep® Technology) Actual range depends on specific design and process technology Examples*: Intel® XScale™ processors runs from 0.75V (150MHz/50mW) to 1.65V (800MHz/900mW) Intel mobile Pentium® III processor sells from 1.1V (600MHz) to 1.7V (1GHz) XScale proc. freq & power vs voltage * Source: Intel Corp. (http://developer.intel.com) © Dr. Avi Mendelson

Voltage Scaling (cont.) Huge effect on Dynamic Power: 20% freq reduction  20% voltage reduction 35% energy reduction. (aCV2 = aC*0.82 = aC*0.64)  50% power reduction. (aCV2f = aC*0.83 = aC*0.51) Even more impressive if we recall: 20% freq hit  only 10%-15% performance hit* Voltage scaling can be used to trade performance for power Reduce the power consumption when performance needs can be released e.g., if deadlines known and if we have enough “dead time”, we can extend the execution time on the expense of lowering the voltage. BUT it has technology limitations * Depends mainly on core to bus frequency ratio and caches size. © Dr. Avi Mendelson

How to extend battery life: energy Efficiency Energy per task Proportional to # of processed instructions per task Proportional to the average work consumed per instruction “Energy per (retired) instruction” = b*W, where b: Ratio of Total to Retired number of processed instructions W: Average energy spent in processing an instruction Both figures deteriorate with every new microarchitecture Since speculation increases and complexity grows In that respect: high performance modern microarchitectures are less energy-efficient © Dr. Avi Mendelson

Improving Hot Spots Clustering Build your system as clustered architecture (e.g., Alpha) Design your system so that when all clusters are active the system exceeds the Max-Power allowed Most of the time, not all the clusters are active “Smart scheduling” will spread the thermal hot-spots among different clusters. In VLIW based architectures, compilers can help © Dr. Avi Mendelson

Alpha hot spots Source - CoolChips-99 Area 30% Freq. 50% Power 67% © Dr. Avi Mendelson

Power Complexity Metrics Power  C V2 f Metrics: suppose we introduce new feature that consumes extra x power and gain y performance: Power/Perf ( Energy), assuming same technology (same C) and same voltage For battery life, energy bills. For a given power envelope – without voltage scaling. Power/Perf2 ( Energy*Delay) Balance performance and power needs. Power/Perf3 ( Energy*Delay2) For a given power envelope – with voltage scaling. assuming that we can (1) trade frequency and voltage scaling, and (2) we can lower the voltage as much as we wish © Dr. Avi Mendelson

E*D product (lower is better) E = energy / instruction = Power * sec / instruction = Watt / MIPS D = sec / instruction = 1 / MIPS E *D ~ Watt / MIPS2 © Dr. Avi Mendelson

Leakage control Leakage depends on: technology, area voltage and temperature. High temperature  high leakage  high power  higher temperature Leakage will be very significant in future micro-architectures. Large caches contributes to the performance but may increase the power due to leakage. Larger caches: better performance higher leakage -> slower clock -> lower performance. Leakage make the major difference between clock gating and deep sleep modes (where power is disconnected) © Dr. Avi Mendelson

Design for power: Out Of Order Execution OOO architecture was found to be very efficient in masking the effect L1 cache misses. Aggressive OOO, and wider machines require more registers and memory ports It consumes a lot of power Can we slow down the access to the cache and let the OOO solve the performance problem? Can we simplify the OOO mechanisms, assuming that the memory subsystem limits the performance? How aggressive we should be as speculation (branch prediction, value prediction, etc) © Dr. Avi Mendelson

Pentium Pro Power Breakdown Actual computation: less than 25%! What can be done: Trace cache Many low-level improvements © Dr. Avi Mendelson

SMT Single CPU µArch augmented to look as 2 or more CPUs to the software Adds ~10% logic to CPU (Alpha experience) Average power increases <10%. Can increase performance of two threads by 20-50% in respect of running the same applications sequentially. Looks like a good tradeoffs between power and performance. © Dr. Avi Mendelson

MT - Implications on power The area and the power consumption of register files and memory elements within the processor increases significantly due to aggressive out-of-order and aggressive SMT (Alpha, CoolChip, 99’) Increase the power at the hotspot, not fit to thermally limited segments (where performance is needed). May better tolerate cache misses, so power aware caches can be used Hot-spots may force us to use more aggressive clustering © Dr. Avi Mendelson

Question? © Dr. Avi Mendelson