0 1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007.

Slides:



Advertisements
Similar presentations
Assembly and Packaging TWG
Advertisements

1 Networks for Multi-core Chip A Controversial View Shekhar Borkar Intel Corp.
® 1 Exponential Challenges, Exponential Rewards The Future of Moores Law Shekhar Borkar Intel Fellow Circuit Research, Intel Labs Fall, 2004.
An International Technology Roadmap for Semiconductors
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Introduction to the TRAMS project objectives and results in Y1 Antonio Rubio, Ramon Canal UPC, Project coordinator CASTNESS’11 WORKSHOP ON TERACOMP FET.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
System Simulation Of 1000-cores Heterogeneous SoCs Shivani Raghav Embedded System Laboratory (ESL) Ecole Polytechnique Federale de Lausanne (EPFL)
Computing Systems Roadmap and its Impact on Software Development Michael Ernst, BNL HSF Workshop at SLAC January, 2015.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
41 st DAC Tuesday Keynote. Giga-scale Integration for Tera-Ops Performance Opportunities and New Frontiers Pat Gelsinger Senior Vice President & CTO Intel.
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
1 Runnemede: Disruptive Technologies for UHPC John Gustafson Intel Labs HPC User Forum – Houston 2011.
Unreliable Silicon: Myth or Reality? Shubu Mukherjee Principal Engineer Director, SPEARS Group (SPEARS = Simulation & Pathfinding of Efficient And Reliable.
1 Modeling and Optimization of VLSI Interconnect Lecture 1: Introduction Avinoam Kolodny Konstantin Moiseev.
EE141 © Digital Integrated Circuits 2nd Introduction 1 The First Computer.
SoC Design Methodology for Exascale Computing
1. 2 Electronics Beyond Nano-scale CMOS Shekhar Borkar Intel Corp. July 27, 2006.
Computer System Architectures Computer System Software
EE414 VLSI Design Design Metrics in Design Metrics in VLSI Design [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Multi Core Processor Submitted by: Lizolen Pradhan
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Advanced Process Integration
Lynn Choi School of Electrical Engineering Microprocessor Microarchitecture The Past, Present, and Future of CPU Architecture.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
1 Moore’s Law in Microprocessors Pentium® proc P Year Transistors.
J. Christiansen, CERN - EP/MIC
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
Present – Past -- Future
Shashwat Shriparv InfinitySoft.
Addressing Future HPC Demand with Multi-core Processors Stephen S. Pawlowski Intel Senior Fellow GM, Architecture and Planning CTO, Digital Enterprise.
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
EE586 VLSI Design Partha Pande School of EECS Washington State University
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Revision - 01 Intel Confidential Page 1 Intel HPC Update Norfolk, VA April 2008.
Transistor Counts 1,000, ,000 10,000 1, i386 i486 Pentium ® Pentium ® Pro K 1 Billion Transistors.
EE141 © Digital Integrated Circuits 2nd Introduction 1 Principle of CMOS VLSI Design Introduction Adapted from Digital Integrated, Copyright 2003 Prentice.
Trends in IC technology and design J. Christiansen CERN - EP/MIC
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 3 Computer Evolution.
EE141 © Digital Integrated Circuits 2nd Introduction 1 EE5900 Advanced Algorithms for Robust VLSI CAD Dr. Shiyan Hu Office: EERC 731 Adapted.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
© 2004 IBM Corporation Power Everywhere POWER5 Processor Update Mark Papermaster VP, Technology Development IBM Systems and Technology Group.
Tackling I/O Issues 1 David Race 16 March 2010.
CISC 879 : Advanced Parallel Programming Vaibhav Naidu Dept. of Computer & Information Sciences University of Delaware Dark Silicon and End of Multicore.
CS203 – Advanced Computer Architecture
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
EE141 © Digital Integrated Circuits 2nd Introduction 1 EE4271 VLSI Design Dr. Shiyan Hu Office: EERC 731 Adapted and modified from Digital.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
CS203 – Advanced Computer Architecture
Lynn Choi School of Electrical Engineering
TECHNOLOGY TRENDS.
Fault-Tolerant NoC-based Manycore system: Reconfiguration & Scheduling
Architecture & Organization 1
Energy Efficient Computing in Nanoscale CMOS
Lecture 2: Performance Today’s topics: Technology wrap-up
Architecture & Organization 1
Parallel Processing Sharing the load.
Transistors on lead microprocessors double every 2 years Moore’s Law in Microprocessors Transistors on lead microprocessors double every 2 years.
ITRS Design.
Computer Evolution and Performance
Processor Design Challenges
Welcome to Computer Architecture
Presentation transcript:

0

1 Thousand Core Chips A Technology Perspective Shekhar Borkar Intel Corp. June 7, 2007

2 Outline Technology outlook Evolution of Multi—thousands of cores? How do you feed thousands of cores Future challenges: variations and reliability ResiliencySummary

3 Technology Outlook High Volume Manufacturing Technology Node (nm) Integration Capacity (BT) Delay = CV/I scaling 0.7~0.7>0.7 Delay scaling will slow down Energy/Logic Op scaling >0.35>0.5>0.5 Energy scaling will slow down Bulk Planar CMOS High Probability Low Probability Alternate, 3G etc Low Probability High Probability Variability Medium High Very High ILD (K) ~3<3 Reduce slowly towards Reduce slowly towards RC Delay Metal Layers to 1 layer per generation

4 Terascale Integration Capacity Total Transistors, 300mm 2 die ~1.5B Logic Transistors ~100MB Cache 100+B Transistor integration capacity

5 Scaling Projections Freq scaling will slow down V dd scaling will slow down Power will be too high 300mm 2 Die

6 Why Multi-core? –Performance Ever increasing single cores yield diminishing performance in a power envelope Multi-cores provide potential for near-linear performance speedup

7 Why Dual-core? –Power VoltageFrequencyPowerPerformance1%1%3%0.66% Rule of thumb Core Cache Core Cache Core Voltage = 1 Freq = 1 Area = 1 Power = 1 Perf = 1 Voltage = -15% Freq = -15% Area = 2 Power = 1 Perf = ~1.8 In the same process technology…

8 C1C2 C3C4 Cache Large Core Cache Small Core Power Performance Power = 1/4 Performance = 1/2 Multi-Core: Power efficient Better power and thermal management From Dual to Multi—

9 GP General Purpose Cores Future Multi-core Platform SP Special Purpose HW CC CC CC CC CC CC CC CC Interconnect fabric Heterogeneous Multi-Core Platform—SOC

10 Fine Grain Power Management ff f f ff V dd Cores with critical tasks Freq = f, at Vdd TPT = 1, Power = 1 f/2 0.7xV dd Non-critical cores Freq = f/2, at 0.7xVdd TPT = 0.5, Power = Cores shut down TPT = 0, Power = 0

11 Performance Scaling Amdahl’s Law: Parallel Speedup = 1/(Serial% + (1-Serial%)/N) Serial% = 6.7% N = 16, N 1/2 = 8 16 Cores, Perf = 8 Serial% = 20% N = 6, N 1/2 = 3 6 Cores, Perf = 3 Parallel software key to Multi-core success

12 From Multi to Many… 13mm, 100W, 48MB Cache, 4B Transistors, in 22nm 12 Cores48 Cores 144 Cores

13 From Many to Too Many… 13mm, 100W, 96MB Cache, 8B Transistors, in 16nm 24 Cores96 Cores 288 Cores

14 On Die Network Power 300mm 2 Die A careful balance of: 1.Throughput performance 2.Single thread performance (core size) 3.Core and network power

15 Observations Scaling Multi— demands more parallelism every generation Thread level, task level, application level Many (or too many) cores does not always mean… The highest performance The highest MIPS/Watt The lowest power If on-die network power is significant, then power is even worse Now software, too, must follow Moore’s Law

16 Memory BW Gap Busses have become wider to deliver necessary memory BW (10 to 30 GB/sec) Yet, memory BW is not enough Many Core System will demand 100 GB/sec memory BW How do you feed the beast?

17 IO Pins and Power State of the art: 100 GB/sec ~ 1 Tb/sec = 1,000 Gb/sec  25mw/Gb/sec = 25 Watts Bus-width = 1,000/5 = 200, about 400 pins (differential) Too many signal pins, too much power

18 Solution Chip > 5mm Bus High speed busses Busses are transmission lines L-R-C effects Need signal termination Signal processing consumes power Solutions: Reduce distance to << 5mm R-C bus Reduce signaling speed (~1Gb/sec) Increase pins to deliver BW 1-2 mw/Gbps Chip <2mm 100 GB/sec ~ 1 Tb/sec = 1,000 Gb/sec  2mw/Gb/sec = 2 Watts Bus-width = 1,000/1 = 1,000 pins

19 Package Anatomy of a Silicon Chip Si Chip Heat-sink HeatPower Signals

20 Package System in a Package Si Chip Limited pins: 10mm / 50 micron = 200 pins Limited pins Signal distance is large ~10 mm – higher power Complex package

21 Package DRAM on Top CPU Temp = 85°C Junction Temp = 100+°C High temp, hot spots Not good for DRAM DRAM Heat-sink

22 Package DRAM at the Bottom DRAMCPU Heat-sink Power and IO signals go through DRAM to CPU Thin DRAM die Through DRAM vias The most promising solution to feed the beast

23 Reliability Soft Error FIT/Chip (Logic & Mem) Time dependent device degradation Burn-in may phase out…? Extreme device variations Wider

24 Implications to Reliability Extreme variations (Static & Dynamic) will result in unreliable components Impossible to design reliable system as we know today Transient errors (Soft Errors) Gradual errors (Variations) Time dependent (Degradation) Reliable systems with unreliable components —Resilient  Architectures

25 Implications to Test One-time-factory testing will be out Burn-in to catch chip infant-mortality will not be practical Test HW will be part of the design Dynamically self-test, detect errors, reconfigure, & adapt

26 In a Nut-shell… 100 Billion Transistors 100 BT integration capacity Billions unusable (variations) Some will fail over time Yet, deliver high performance in the power & cost envelope Intermittent failures

27 Resiliency with Many-Core Dynamic on-chip testing Performance profiling Cores in reserve (spares) Binning strategy Dynamic, fine grain, performance and power management Coarse-grain redundancy checking Dynamic error detection & reconfiguration Decommission aging cores, swap with spares Dynamically… 1.Self test & detect 2.Isolate errors 3.Confine 4.Reconfigure, and 5.Adapt CC CC CC CC CC CC CC CC

28 Summary Moore’s Law with Terascale integration capacity will allow integration of thousands of cores Power continues to be the challenge On-die network power could be significant Optimize for power with the size of the core and the number of cores 3D Memory technology needed to feed the beast Many-cores will deliver the highest performance in the power envelope with resiliency