1 Some Limits of Power Delivery in the Multicore Era Runjie Zhang, Brett H. Meyer, Wei Huang, Kevin Skadron and Mircea R. Stan University of Virginia,

Slides:



Advertisements
Similar presentations
International Symposium on Low Power Electronics and Design Qing Xie, Mohammad Javad Dousti, and Massoud Pedram University of Southern California ISLPED.
Advertisements

Exploring 3D Power Distribution Network Physics
AN ANALYTICAL MODEL TO STUDY OPTIMAL AREA BREAKDOWN BETWEEN CORES AND CACHES IN A CHIP MULTIPROCESSOR Taecheol Oh, Hyunjin Lee, Kiyeon Lee and Sangyeun.
Performance, Energy and Thermal Considerations of SMT and CMP architectures Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron Dept. of Computer Science,
Nikos Hardavellas, Northwestern University
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
SuperRange: Wide Operational Range Power Delivery Design for both STV and NTV Computing Xin He, Guihai Yan, Yinhe Han, Xiaowei Li Institute of Computing.
3D Systems with On-Chip DRAM for Enabling
Introduction to CMOS VLSI Design Lecture 20: Package, Power, and I/O
VLSI Trends. A Brief History  1958: First integrated circuit  Flip-flop using two transistors  From Texas Instruments  2011  Intel 10 Core Xeon Westmere-EX.
Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scalability 36th International Symposium on Computer Architecture Brian Rogers †‡, Anil Krishna.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
CS 7810 Lecture 15 A Case for Thermal-Aware Floorplanning at the Microarchitectural Level K. Sankaranarayanan, S. Velusamy, M. Stan, K. Skadron Journal.
1 The Problem of Power Consumption in Servers L. Minas and B. Ellison Intel-Lab In Dr. Dobb’s Journal, May 2009 Prepared and presented by Yan Cai Fall.
IC packaging and Input - output signals
Slide 1 U.Va. Department of Computer Science LAVA Architecture-Level Power Modeling N. Kim, T. Austin, T. Mudge, and D. Grunwald. “Challenges for Architectural.
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
1 VLSI and Computer Architecture Trends ECE 25 Fall 2012.
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Modern VLSI Design 4e: Chapter 7 Copyright  2008 Wayne Wolf Topics Global interconnect. Power/ground routing. Clock routing. Floorplanning tips. Off-chip.
McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures Runjie Zhang Dec.3 S. Li et al. in MICRO’09.
CAD for Physical Design of VLSI Circuits
Last Time Performance Analysis It’s all relative
Lecture 1: What is a Computer? Lecture for CPSC 2105 Computer Organization by Edward Bosworth, Ph.D.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD What can Manifold Enable? Manifold.
CSE 691: Energy-Efficient Computing Lecture 7 SMARTS: custom-made systems Anshul Gandhi 1307, CS building
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
1 Recap (from Previous Lecture). 2 Computer Architecture Computer Architecture involves 3 inter- related components – Instruction set architecture (ISA):
® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
The George Washington University School of Engineering and Applied Science Department of Electrical and Computer Engineering ECE122 – Lab 7 MOSFET Parameters.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
FPGAs for Temperature-Aware Microarchitecture Research Siva Velusamy, Wei Huang, John Lach, Mircea Stan and Kevin Skadron University of Virginia.
1 An Improved Block-Based Thermal Model in HotSpot 4.0 with Granularity Considerations Wei Huang 1, Karthik Sankaranarayanan 1, Robert Ribando 3, Mircea.
Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Spezielle.
Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications.
Distributed Computation: Circuit Simulation CK Cheng UC San Diego
Thermal-aware DC IR-drop co-analysis using non- conformal domain decomposition methods by Yang Shao, Zhen Peng, and Jin-Fa Lee Proceedings A Volume 468(2142):
Performance Benefits on HPCx from Power5 chips and SMT HPCx User Group Meeting 28 June 2006 Alan Gray EPCC, University of Edinburgh.
A Cross Layer Design Exploration of Charge- Recycled Power-Delivery in Many-Layer 3D-IC Runjie Zhang Kaushik Mazumdar Brett H. Meyer Ke Wang Kevin Skadron.
Improving Energy Efficiency of Configurable Caches via Temperature-Aware Configuration Selection Hamid Noori †, Maziar Goudarzi ‡, Koji Inoue ‡, and Kazuaki.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Scaling I Mohammad Sharifkhani. Reading Text book II (pp. 123)
CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.
33 rd IEEE International Conference on Computer Design ICCD rd IEEE International Conference on Computer Design ICCD 2015 Improving Memristor Memory.
Power and Performance Analysis of PDA Architectures Advanced VLSI Computer Architecture Fall 2000 Robert Lee Ripal Nathuji.
Compiler-Directed Power Density Reduction in NoC-Based Multi-Core Designs Sri Hari Krishna Narayanan, Mahmut Kandemir, Ozcan Ozturk Embedded Mobile Computing.
Dept. of Electronics Engineering & Institute of Electronics National Chiao Tung University Hsinchu, Taiwan ISPD’16 Generating Routing-Driven Power Distribution.
CS203 – Advanced Computer Architecture
Microprocessor Design Process
William Stallings Computer Organization and Architecture 6th Edition
IC packaging and Input - output signals
Lynn Choi School of Electrical Engineering
Seth Pugsley, Jeffrey Jestes,
Lynn Choi School of Electrical Engineering
3Boston University ECE Dept.;
ECE 154A Introduction to Computer Architecture
Architecture & Organization 1
Computer Architecture and Organization
BIC 10503: COMPUTER ARCHITECTURE
Overview of VLSI 魏凱城 彰化師範大學資工系.
3D silicon package structure
Energy Efficient Power Distribution on Many-Core SoC
Die Stacking (3D) Microarchitecture -- from Intel Corporation
Welcome to Computer Architecture
Presentation transcript:

1 Some Limits of Power Delivery in the Multicore Era Runjie Zhang, Brett H. Meyer, Wei Huang, Kevin Skadron and Mircea R. Stan University of Virginia, McGill University, IBM Austin Research Lab.

ITRS Projection on Transistor Density (2011 Edition) 2 Source: ITRS 2011

Power Density and Current Density 3 A/mm 2 Current = Power / Supply Voltage

The Chip - Package 4 Source: pcgameshardware.com

5 The Chip – Inside the Package Source: ITRS 2009Edition,

The Chip – C4 Bumps Source: flipchips.com, Source: Wikipedia

What are the problems? 7 Year Total Pad Count Source: ITRS 2011Edition, Source: Shao et al. IEEE Computer Society Annual Symposium on VLSI, 2005 Source: Ye et al. Applied Physics letters, 2003 Electromigration

Architecture Level PDN Model 8

Input: –PDN physical parameters. E.g. metal width –Processor floorplan and powermap. –Pad configuration Output: –Voltage pad –Pad current 9 Architecture Level PDN Model

Validation 10 IBM Power Grid analysis benchmarks –Steady-State –SPICE format –Provides details about metal layer and Pad locations

11 Power Map VDD Pad Distribution IBM_PG6

Validation Results 12

Pad Current Comparison 13

Multicore Scaling 14 Baseline: 3.7GHz, Duo Core, Intel Penryn 4-way OoO Processor Private L2 cache, 3MB per core Mesh-Based NoC 45nm32nm22nm16nm # of Cores24816 Area(mm2) Supply Voltage Peak Total Power (W) Peak Total Current(A)

Flooplan 15

Power Delivery Noise Scaling Trend 16

Pad Optimization 17

18 Sorted Pad Current After Optimization Sorted Pad Current Before Optimization

I/O vs. Power Supply 19 Const core-to-MC ratio 80 pads per MC 5% IR drop target

Thermal vs. Power Delivery 20

Conclusions Power delivery is becoming a limiting factor in near future; IR drop poses a bigger challenge than ElectroMigration; Memory bandwidth will be affected’ With liquid cooling, scaling hit power delivery wall before thermal wall. 21

Questions? 22

Thanks! 23

Temerature Map vs. Voltage Map 24 Voltage (V) Temperature( o C)

25 Voltage (V) Temperature( o C)