® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005.

Slides:

Advertisements

Similar presentations

1 Networks for Multi-core Chip A Controversial View Shekhar Borkar Intel Corp.

Advertisements

® 1 Exponential Challenges, Exponential Rewards The Future of Moores Law Shekhar Borkar Intel Fellow Circuit Research, Intel Labs Fall, 2004.

Embedded Systems Design: A Unified Hardware/Software Introduction 1 Chapter 10: IC Technology.

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

® 1 Exponential Challenges, Exponential Rewards— The Future of Moore’s Law Based on lecture of Shekhar Borkar Intel Fellow Circuit Research, Intel Labs.

ECE 4100/6100 (1) Multicore Computing - Evolution.

41 st DAC Tuesday Keynote. Giga-scale Integration for Tera-Ops Performance Opportunities and New Frontiers Pat Gelsinger Senior Vice President & CTO Intel.

Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.

VLSI Trends. A Brief History  1958: First integrated circuit  Flip-flop using two transistors  From Texas Instruments  2011  Intel 10 Core Xeon Westmere-EX.

CS 7810 Lecture 12 Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors D. Brooks et al. IEEE Micro, Nov/Dec.

Low-power computer architecture

S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.

Temperature-Aware Design Presented by Mehul Shah 4/29/04.

Energy and Power Lecture notes S. Yalamanchili and S. Mukhopadhyay.

Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.

Computer performance.

6.893: Advanced VLSI Computer Architecture, September 28, 2000, Lecture 4, Slide 1. © Krste Asanovic Krste Asanovic

EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.

1. 2 Electronics Beyond Nano-scale CMOS Shekhar Borkar Intel Corp. July 27, 2006.

1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.

EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.

Multi Core Processor Submitted by: Lizolen Pradhan

Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu

Advanced Process Integration

Lynn Choi School of Electrical Engineering Microprocessor Microarchitecture The Past, Present, and Future of CPU Architecture.

Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.

Winter 2004 Class Representation For Advanced VLSI Course Instructor : Dr S.M.Fakhraie Presented by : Naser Sedaghati Major Reference : Design and Implementation.

1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University

1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah

1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,

[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.

Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.

VTU – IISc Workshop Compiler, Architecture and HPC Research in Heterogeneous Multi-Core Era R. Govindarajan CSA & SERC, IISc

Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.

Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.

Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications.

Present – Past -- Future

Shashwat Shriparv InfinitySoft.

W E L C O M E. T R I G A T E T R A N S I S T O R.

© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.

EE141 © Digital Integrated Circuits 2nd Introduction 1 Principle of CMOS VLSI Design Introduction Adapted from Digital Integrated, Copyright 2003 Prentice.

0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

Attacking the Power-Wall by Using Near-threshold Cores Liang Wang

DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 3 Computer Evolution.

Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.

The Fate of Silicon Technology: Silicon Transistors Maria Bucukovska Scott Crawford Everett Comfort.

CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Dark Silicon and End of Multicore.

CS203 – Advanced Computer Architecture

Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.

Modern Processors.  Desktop processors  Notebook processors  Server and workstation processors  Embedded and communications processors  Internet.

CS203 – Advanced Computer Architecture

Lynn Choi School of Electrical Engineering

Temperature and Power Management

Lynn Choi School of Electrical Engineering

Hot Chips, Slow Wires, Leaky Transistors

TECHNOLOGY TRENDS.

Architecture & Organization 1

VLSI Design MOSFET Scaling and CMOS Latch Up

Hyperthreading Technology

Lecture 2: Performance Today’s topics: Technology wrap-up

Architecture & Organization 1

Chapter 10: IC Technology

Parallel Processing Sharing the load.

Adaptive Single-Chip Multiprocessing

Chapter 10: IC Technology

Computer Evolution and Performance

Welcome to Computer Architecture

Technology scaling Currently, technology scaling has a threefold objective: Reduce the gate delay by 30% (43% increase in frequency) Double the transistor.

Chapter 10: IC Technology

Presentation transcript:

® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005

2 Outline Technology scaling challenges Technology scaling challenges Circuit and design solutions Circuit and design solutions Microarchitecture advances Microarchitecture advances Multi-everywhere Multi-everywhere Summary Summary

3 Goal: 10 TIPS by 2015 Pentium® Pro Architecture Pentium® 4 Architecture Pentium® Architecture How do you get there?

4 Technology Scaling GATE SOURCE BODY DRAIN Xj Tox D GATE SOURCE DRAIN Leff BODY Dimensions scale down by 30% Doubles transistor density Oxide thickness scales down Faster transistor, higher performance Vdd & Vt scaling Lower active power Scaling will continue, but with challenges!

5 Technology Outlook High Volume Manufacturing Technology Node (nm) Integration Capacity (BT) Delay = CV/I scaling 0.7~0.7>0.7 Delay scaling will slow down Energy/Logic Op scaling >0.35>0.5>0.5 Energy scaling will slow down Bulk Planar CMOS High Probability Low Probability Alternate, 3G etc Low Probability High Probability Variability Medium High Very High ILD (K) ~3<3 Reduce slowly towards Reduce slowly towards RC Delay Metal Layers to 1 layer per generation

6 The Leakage(s)… 90nm MOS Transistor50nm Si 1.2 nm SiO 2 Gate

7 Must Fit in Power Envelope nm65nm45nm32nm22nm16nm Power (W), Power Density (W/cm 2 ) SiO2 Lkg SD Lkg Active 10 mm Die Technology, Circuits, and Architecture to constrain the power

8 Solutions Move away from Frequency alone to deliver performance Move away from Frequency alone to deliver performance More on-die memory More on-die memory Multi-everywhere Multi-everywhere –Multi-threading –Chip level multi-processing Throughput oriented designs Throughput oriented designs Valued performance by higher level of integration Valued performance by higher level of integration –Monolithic & Polylithic

9 Leakage Solutions Tri-gate Transistor Silicon substrate 1.2 nm SiO 2 Gate Planar Transistor Silicon substrate Gate electrode 3.0nm High-k For a few generations, then what?

10 Active Power Reduction SlowFastSlow Low Supply Voltage High Supply Voltage Multiple Supply Voltages Logic Block Freq = 1 Vdd = 1 Throughput = 1 Power = 1 Area = 1 Pwr Den = 1 Vdd Logic Block Freq = 0.5 Vdd = 0.5 Throughput = 1 Power = 0.25 Area = 2 Pwr Den = Vdd/2 Logic Block Throughput Oriented Designs

11 Leakage Control Body Bias Vdd Vbp Vbn -Ve +Ve 2-10XReduction Sleep Transistor Logic Block XReduction Stack Effect Equal Loading 5-10XReduction

12 Optimum Frequency Maximum performance with Optimum pipeline depth Optimum pipeline depth Optimum frequency Optimum frequency Process Technology Relative Frequency Sub-threshold Leakage increases exponentially Pipeline Depth Relative Pipeline Depth Power Efficiency Optimum Pipeline & Performance Relative Frequency (Pipelining) Performance Diminishing Return

13 Memory Latency MemoryCPUCache Small ~few Clocks Large ns Assume: 50ns Memory latency Cache miss hurts performance Worse at higher frequency Cache miss hurts performance Worse at higher frequency

14 Increase on-die Memory Large on die memory provides: 1.Increased Data Bandwidth & Reduced Latency 2.Hence, higher performance for much lower power

15 Multi-threading ST Wait for Mem MT1 Wait for Mem MT2 Wait MT3 Single Thread Multi-Threading Full HW Utilization Multi-threading improves performance without impacting thermals & power delivery Thermals & Power Delivery designed for full HW utilization

16 Single Core Power/Performance Moore’s Law  more transistors for advanced architectures Delivers higher peak performance But… Lower power efficiency

17 Chip Multi-Processing C1C2 C3C4 Cache Multi-core, each core Multi-threaded Shared cache and front side bus Each core has different Vdd & Freq Core hopping to spread hot spots Lower junction temperature

18 Dual Core VoltageFrequencyPowerPerformance1%1%3%0.66% Rule of thumb Core Cache Core Cache Core Voltage = 1 Freq = 1 Area = 1 Power = 1 Perf = 1 Voltage = -15% Freq = -15% Area = 2 Power = 1 Perf = ~1.8 In the same process technology…

19 Multi-Core C1C2 C3C4 Cache Large Core Cache Small Core Power Performance Power = 1/4 Performance = 1/2 Multi-Core: Power efficient Better power and thermal management

20 Special Purpose Hardware 2.23 mm X 3.54 mm, 260K transistors Opportunities: Network processing engines MPEG Encode/Decode engines, Speech engines TCP/IP Offload Engine Special purpose HW provides best Mips/Watt

21 Performance Scaling Amdahl’s Law: Parallel Speedup = 1/(Serial% + (1-Serial%)/N) Serial% = 6.7% N = 16, N 1/2 = 8 16 Cores, Perf = 8 Serial% = 20% N = 6, N 1/2 = 3 6 Cores, Perf = 3 Parallel software key to Multi-core success

22 From Multi to Many… 13mm, 100W, 48MB Cache, 4B Transistors, in 22nm 12 Cores24 Cores 144 Cores

23 GP General Purpose Cores Future Multi-core Platform SP Special Purpose HW CC CC CC CC CC CC CC CC Interconnect fabric Heterogeneous Multi-Core Platform

24 The New Era of Computing Multi-everywhere: MT, CMP Speculative, OOO Era of Instruction LevelParallelism Super Scalar Era of Pipelined Architecture Multi Threaded Era of Thread & ProcessorLevelParallelism Special Purpose HW Multi-Threaded, Multi-Core

25 Summary Business as usual is not an option Business as usual is not an option –Performance at any cost is history Must make a Right Hand Turn (RHT) Must make a Right Hand Turn (RHT) –Move away from frequency alone Future  Architectures and designs Future  Architectures and designs –More memory (larger caches) –Multi-threading –Multi-processing –Special purpose hardware –Valued performance with higher integration