® 1 Exponential Challenges, Exponential Rewards The Future of Moores Law Shekhar Borkar Intel Fellow Circuit Research, Intel Labs Fall, 2004.

Slides:



Advertisements
Similar presentations
Exponential Challenges, Exponential Rewards— The Future of Moore’s Law
Advertisements

Work in Progress --- Not for Publication p. 1--PIDS Summary, Dec.04 PIDS Summary Peter M. Zeitzoff US Chair ITWG Meeting Tokyo, Japan November 30 - December.
PIDS: Poster Session 2002 ITRS Changes and 2003 ITRS Key Issues ITRS Open Meeting Dec. 5, 2002 Tokyo.
WTO Information technology symposium. "The Challenges ahead for the ITA with Converging Technologies and IC Manufacturing Techniques" Ray Foy – Intel Corp.
Embedded Systems Design: A Unified Hardware/Software Introduction 1 Chapter 10: IC Technology.
Transforming Relationships or How to Flatten a Function! …subtle twists on regression.
® 1 Exponential Challenges, Exponential Rewards— The Future of Moore’s Law Based on lecture of Shekhar Borkar Intel Fellow Circuit Research, Intel Labs.
Introduction to CMOS VLSI Design Lecture 21: Scaling and Economics
41 st DAC Tuesday Keynote. Giga-scale Integration for Tera-Ops Performance Opportunities and New Frontiers Pat Gelsinger Senior Vice President & CTO Intel.
VLSI Trends. A Brief History  1958: First integrated circuit  Flip-flop using two transistors  From Texas Instruments  2011  Intel 10 Core Xeon Westmere-EX.
Design and Implementation of VLSI Systems (EN0160)
EE 466: VLSI Design Instructor: Amlan Ganguly TA: Souradip Sarkar Meeting: MWF, 12.10pm, Sloan-38.
S. Reda EN160 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 18: Scaling Theory Prof. Sherief Reda Division of Engineering, Brown University.
Lesson 1 Introduction – “Is there a limit ?” Transistors – “CMOS building blocks” Physics, Cross section and Layout Views.
Outline Introduction – “Is there a limit?”
EE314 Basic EE II Silicon Technology [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
For the exclusive use of adopters of the book Introduction to Microelectronic Fabrication, Second Edition by Richard C. Jaeger. ISBN © 2002.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Introduction to CMOS VLSI Design Lecture 21: Scaling and Economics Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
EE141 © Digital Integrated Circuits 2nd Introduction 1 The First Computer.
VLSI Tarik Booker. VLSI? VLSI – Very Large Scale Integration Refers to the many fields of electrical and computer engineering that deal with the analysis.
Jevons’ paradox: “Resource use efficiency increases tend to increase the rate of consumption of that resource” Moore’s law:
Advanced Computing and Information Systems laboratory Device Variability Impact on Logic Gate Failure Rates Erin Taylor and José Fortes Department of Electrical.
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
1. 2 Electronics Beyond Nano-scale CMOS Shekhar Borkar Intel Corp. July 27, 2006.
MOS Capacitors MOS capacitors are the basic building blocks of CMOS transistors MOS capacitors distill the basic physics of MOS transistors MOS capacitors.
Advanced Process Integration
CMOS Scaling Two motivations to scale down
® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005.
CMOS Technology Scaling. Moore’s law “Cramming more Components onto Integrated Circuits” Gordon E. Moore, Electronics 1965, p “VLSI: some fundamental.
1 BULK Si (100) VALENCE BAND STRUCTURE UNDER STRAIN Sagar Suthram Computational Nanoelectronics Class Project
Introduction to CMOS VLSI Design Lecture 1: Circuits & Layout.
Present – Past -- Future
© Digital Integrated Circuits 2nd Inverter EE5900 Advanced Algorithms for Robust VLSI CAD The Inverter Dr. Shiyan Hu Office: EERC 731 Adapted.
W E L C O M E. T R I G A T E T R A N S I S T O R.
Semiconductor Industry Milestones
EE5900 Advanced VLSI Computer-Aided Design
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Transistor Counts 1,000, ,000 10,000 1, i386 i486 Pentium ® Pentium ® Pro K 1 Billion Transistors.
EE141 © Digital Integrated Circuits 2nd Introduction 1 Principle of CMOS VLSI Design Introduction Adapted from Digital Integrated, Copyright 2003 Prentice.
0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.
The Fate of Silicon Technology: Silicon Transistors Maria Bucukovska Scott Crawford Everett Comfort.
A 280mV-to-1.1V 256b Reconfigurable SIMD Vector Permutation Engine with 2-D Shuffle in 22nm CMOS [ISSCC ’12] Literature Review Fang-Li Yuan Advisor: Prof.
CS203 – Advanced Computer Architecture
Intel’s 3D Transistor BENJAMIN BAKER. Where we are headed  What is a transistor?  What it is and what does it do?  Moore’s Law  Who is Moore and what.
Computer Stuff By Raj and Brad. Virtual Memory This takes place if your computer is running low on RAM. It stimulates more RAM when your computer reaches.
Integration Lower sums Upper sums
CS203 – Advanced Computer Architecture
VLSI Tarik Booker.
Introduction to VLSI ASIC Design and Technology
EE 4611 INTRODUCTION 21 January 2015 Semiconductor Industry Milestones
VLSI Design MOSFET Scaling and CMOS Latch Up
Downsizing Semiconductor Device (MOSFET)
IC TECHNOLOGY.
Challenges in Nanoelectronics: Process Variability
Chapter 10: IC Technology
Transistors on lead microprocessors double every 2 years Moore’s Law in Microprocessors Transistors on lead microprocessors double every 2 years.
Downsizing Semiconductor Device (MOSFET)
Lecture 19 OUTLINE The MOS Capacitor (cont’d) The MOSFET:
PRESENTATION ON TRI GATE TRANSISTOR PREPARED BY: SANDEEP ( )
ELE 523E COMPUTATIONAL NANOELECTRONICS
Chapter 10: IC Technology
Energy Efficient Power Distribution on Many-Core SoC
Reading (Rabaey et al.): Sections 3.5, 5.6
Welcome to Computer Architecture
Technology scaling Currently, technology scaling has a threefold objective: Reduce the gate delay by 30% (43% increase in frequency) Double the transistor.
Chapter 10: IC Technology
Beyond Si MOSFETs Part IV.
Intel CPU for Desktop PC: Past, Present, Future
Dr. Hari Kishore Kakarla ECE
Presentation transcript:

® 1 Exponential Challenges, Exponential Rewards The Future of Moores Law Shekhar Borkar Intel Fellow Circuit Research, Intel Labs Fall, 2004

2 ISSCC 2003 Gordon Moore said… No exponential is forever… But We can delay Forever

3 Outline The exponential challenges The exponential challenges Circuit and Arch solutions Circuit and Arch solutions Major paradigm shifts in design Major paradigm shifts in design Integration & SOC Integration & SOC The exponential reward The exponential reward Summary Summary

4 Goal: 1TIPS by 2010 Pentium® Pro Architecture Pentium® 4 Architecture Pentium® Architecture How do you get there?

5 Technology Scaling GATE SOURCE BODY DRAIN Xj Tox D GATE SOURCE DRAIN Leff BODY Dimensions scale down by 30% Doubles transistor density Oxide thickness scales down Faster transistor, higher performance Vdd & Vt scaling Lower active power Technology has scaled well, will it in the future?

6 Technology Outlook High Volume Manufacturing Technology Node (nm) Integration Capacity (BT) Delay = CV/I scaling 0.7~0.7>0.7 Delay scaling will slow down Energy/Logic Op scaling >0.35>0.5>0.5 Energy scaling will slow down Bulk Planar CMOS High Probability Low Probability Alternate, 3G etc Low Probability High Probability Variability Medium High Very High ILD (K) ~3<3 Reduce slowly towards Reduce slowly towards RC Delay Metal Layers to 1 layer per generation

7 Is Transistor a Good Switch? On I = I = 0 Off I = 0 I 0 I = 1ma/u I 0 Sub-threshold Leakage

8 Exponential Challenge #1

9 Sub-threshold Leakage Sub-threshold leakage increases exponentially Assume: 0.25 m, I off = 1na/ 5X increase each generation at 30ºC

10 SD Leakage Power SD leakage power becomes prohibitive

11 Leakage Power Leakage power limits Vt scaling A. Grove, IEDM 2002

12 Exponential Challenge #2

13 Gate Oxide is Near Limit High-K dielectric is crucial 90nm MOS Transistor50nm Silicon substrate 1.2 nm SiO 2 Gate If Tox scaling slows down, then Vdd scaling will have to slow down

14 Exponential Challenge #3

15 Energy per Logic Operation Energy per logic operation scaling will slow down

16 The Power Crisis Business as usual is not an option

17 Exponential Challenge #4

18 Sources of Variations Heat Flux (W/cm 2 ) Results in Vcc variation Temperature Variation (°C) Hot spots Random Dopant Fluctuations 193nm 248nm 365nm LithographyWavelength 65nm 90nm 130nm Generation Gap 45nm 32nm 13nm EUV 180nm Source: Mark Bohr, Intel Sub-wavelength Lithography

19 Frequency & SD Leakage 0.18 micron ~1000 samples 20X 30% Low Freq Low Isb High Freq Medium Isb High Freq High Isb

20 ~30mV Vt Distribution 0.18 micron ~1000 samples Low Freq Low Isb High Freq Medium Isb High Freq High Isb

21 Exponential Challenge #5

22 Platform Requirements PC towerMini tower tower Slim lineSmall pc System Volume ( cubic inch) Shrinking volume Quieter Yet, High Performance Power (W) Thermal Budget ( o C/W) Heat-Sink Volume (in 3 ) Projected Heat Dissipation Volume Projected Air Flow Rate Pentium ® III Thermal Budget Air Flow Rate (CFM) Pentium ® 4 Thermal budget decreasing Higher heat sink volume Higher air flow rate

23 Exponential Challenge #6

24 Exponential Costs G. Moore ISSCC 03 Litho Cost FAB Cost $ per Transistor $ per MIPS

25 Product Cost Pressure Shrinking ASP, and shrinking $ budget for power

26 Must Fit in Power Envelope Technology, Circuits, and Architecture to constrain the power nm65nm45nm32nm22nm16nm Power (W), Power Density (W/cm 2 ) SiO2 Lkg SD Lkg Active 10 mm Die 1 MB 2 MB 4 MB 8 MB

27 Some Implications Tox scaling will slow downmay stop? Tox scaling will slow downmay stop? Vdd scaling will slow downmay stop? Vdd scaling will slow downmay stop? Vt scaling will slow downmay stop? Vt scaling will slow downmay stop? Approaching constant Vdd scaling Approaching constant Vdd scaling Energy/logic op will not scale Energy/logic op will not scale

28 The Gigascale Dilemma 1B T integration capacity will be available 1B T integration capacity will be available But could be unusable due to power But could be unusable due to power Logic T growth will slow down Logic T growth will slow down Transistor performance will be limited Transistor performance will be limitedSolutions Low power design techniques Low power design techniques Improve design efficiencyMulti everywhere Improve design efficiencyMulti everywhere Valued performance by even higher integration (of potentially slower transistors) Valued performance by even higher integration (of potentially slower transistors)

29 Poweractive and leakage VariationsMicroarchitecture

30 Active Power Reduction SlowFastSlow Low Supply Voltage High Supply Voltage Multiple Supply Voltages Logic Block Freq = 1 Vdd = 1 Throughput = 1 Power = 1 Area = 1 Pwr Den = 1 Vdd Logic Block Freq = 0.5 Vdd = 0.5 Throughput = 1 Power = 0.25 Area = 2 Pwr Den = Vdd/2 Logic Block Replicated Designs

31 Leakage Control Body Bias Vdd Vbp Vbn -Ve +Ve 2-10XReduction Sleep Transistor Logic Block XReduction Stack Effect Equal Loading 5-10XReduction

32 Circuit Design Tradeoffs Low-Vt usage low high Higher probability of target frequency with: 1.Larger transistor sizes 2.Higher Low-Vt usage But with power penalty Transistor size small large power target frequency probability

33 Impact of Critical Paths # of critical paths Mean clock frequency Clock frequency Number of dies 0% 20% 40% 60% # critical paths With increasing # of critical paths With increasing # of critical paths –Both and become smaller –Lower mean frequency

34 Impact of Logic Depth 0% 20% 40% -16%-8%0%8%16% Delay 20% 40% NMOS PMOS Device I ON Number of samples (%) Variation (%) NMOS Ion PMOS Ion / Delay / 5.6%3.0%4.2% Logic depth: Logic depth Ratio of delay- to Ion

Logic depth small large frequency target frequency probability # uArch critical paths lessmore Architecture Tradeoffs Architecture Tradeoffs Higher target frequency with: 1.Shallow logic depth 2.Larger number of critical paths But with lower probability

36 Variation-tolerant Design # uArch critical paths lessmore Balance power & frequency with variation tolerance Logic depth small large frequency target frequency probability Transistor size small large power target frequency probability Low-Vt usage low high

37 Leakage Power FrequencyDeterministicProbabilistic 10X variation ~50% total power Probabilistic Design Delay Path Delay Probability Deterministic design techniques inadequate in the future Due to variations in: V dd, V t, and Temp Delay Target # of Paths Deterministic Delay Target # of Paths Probabilistic

38 Shift in Design Paradigm Multi-variable design optimization for: Multi-variable design optimization for: – Yield and bin splits – Parameter variations – Active and leakage power – Performance Tomorrow: Global Optimization Multi-variateToday: Local Optimization Single Variable

39 Resistor Network 4.5 mm 5.3 mm Multiple subsites PD & Counter Resistor Network CUT Bias Amplifier Delay Die frequency: Min(F 1..F 21 ) Die power: Sum(P 1..P 21 ) Technology 150nm CMOS Number of subsites per die 21 Body bias range 0.5V FBB to 0.5V RBB Bias resolution 32 mV 1.6 X 0.24 mm, 21 sites per die 150nm CMOS Adaptive Body Bias--Experiment

40 Adaptive Body Bias--Results 0% 20% 60% 100% Accepted die noBB 100% yield ABB Higher Frequency 97% highest bin within die ABB For given Freq and Power density 100% yield with ABB 100% yield with ABB 97% highest freq bin with ABB for within die variability 97% highest freq bin with ABB for within die variability

41 Design & Arch Efficiency Employ efficient design & Architectures

42 Memory Latency MemoryCPUCache Small ~few Clocks Large ns Assume: 50ns Memory latency Cache miss hurts performance Worse at higher frequency Cache miss hurts performance Worse at higher frequency

43 Increase on-die Memory Large on die memory provides: 1.Increased Data Bandwidth & Reduced Latency 2.Hence, higher performance for much lower power

44 Multi-threading ST Wait for Mem MT1 Wait for Mem MT2 Wait MT3 Single Thread Multi-Threading Full HW Utilization Multi-threading improves performance without impacting thermals & power delivery Thermals & Power Delivery designed for full HW utilization

45 Chip Multi-Processing C1C2 C3C4 Cache Multi-core, each core Multi-threaded Shared cache and front side bus Each core has different Vdd & Freq Core hopping to spread hot spots Lower junction temperature

46 Special Purpose Hardware 2.23 mm X 3.54 mm, 260K transistors Opportunities: Network processing engines MPEG Encode/Decode engines Speech engines Special purpose HWBest Mips/Watt TCP Offload Engine

47 Valued Performance: SOC (System on a Chip) Special-purpose hardware more MIPS/mm² Special-purpose hardware more MIPS/mm² SIMD integer and FP instructions in several ISAs SIMD integer and FP instructions in several ISAs Die Area PowerPerformance General Purpose 2X2X~1.4X Multimedia Kernels <10%<10% X Si Monolithic Polylithic CPU Special HW Memory CMOS RF Wireline RF Opto- Electronics Dense Memory Heterogeneous Si, SiGe, GaAs

48 The Exponential Reward Multi-everywhere: MT, CMP Speculative, OOO Era of Instruction LevelParallelism Super Scalar Era of Pipelined Architecture Multi Threaded Era of Thread & ProcessorLevelParallelism Special Purpose HW Multi-Threaded, Multi-Core

49 SummaryDelaying Forever Gigascale transistor integration capacity will be availablePower and Energy are the barriers Gigascale transistor integration capacity will be availablePower and Energy are the barriers Variations will be even more prominentshift from Deterministic to Probabilistic design Variations will be even more prominentshift from Deterministic to Probabilistic design Improve design efficiency Improve design efficiency Multieverywhere, & SOC valued performance Multieverywhere, & SOC valued performance Exploit integration capacity to deliver performance in power/cost envelope Exploit integration capacity to deliver performance in power/cost envelope