Die Stacking (3D) Microarchitecture Bryan Black, Murali Annavaram, Ned Brekelbaum, John DeVale, Lei Jiang, Gabriel H. Loh1, Don McCauley, Pat Morrow, Donald.

Slides:



Advertisements
Similar presentations
Assembly and Packaging TWG
Advertisements

Packaging.
An International Technology Roadmap for Semiconductors
BEOL Al & Cu.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
National Tsing Hua University Po-Yang Hsu,Hsien-Te Chen,
Krit Athikulwongse, Dae Hyun Kim, Moongon Jung, and Sung Kyu Lim
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
3D-MAPS: 3D Massively Parallel Processor with Stacked Memory Dae Hyun Kim, Krit Athikulwongse, Michael Healy, Mohammad Hossain, Moongon Jung, et al. Georgia.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah.
3D Interconnect: Architectural Challenges and Opportunities UC SANTA BARBARA Tim Sherwood.
Thermal-Aware SoC Test Scheduling with Test Set Partitioning and Interleaving Zhiyuan He 1, Zebo Peng 1, Petru Eles 1 Paul Rosinger 2, Bashir M. Al-Hashimi.
Temperature-Aware Design Presented by Mehul Shah 4/29/04.
Delay and Power Optimization with TSV-aware 3D Floorplanning M. A. Ahmed and M. Chrzanowska-Jeske Portland State University, Oregon, USA ISQED 2014.
3.1Introduction to CPU Central processing unit etched on silicon chip called microprocessor Contain tens of millions of tiny transistors Key components:
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Computer performance.
Technology For Realizing 3D Integration By-Dipyaman Modak.
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.
Dept. of Communications and Tokyo Institute of Technology
Application of through-silicon-via (TSV) technology to making of high-resolution CMOS image sensors Name: Qian YU Student ID:
EZ-COURSEWARE State-of-the-Art Teaching Tools From AMS Teaching Tomorrow’s Technology Today.
Modern VLSI Design 4e: Chapter 7 Copyright  2008 Wayne Wolf Topics Global interconnect. Power/ground routing. Clock routing. Floorplanning tips. Off-chip.
MonolithIC 3D Inc., Patents Pending MonolithIC 3D ICs RCAT approach 1 MonolithIC 3D Inc., Patents Pending.
Presentation for Advanced VLSI Course presented by:Shahab adin Rahmanian Instructor:Dr S. M.Fakhraie Major reference: 3D Interconnection and Packaging:
Hao-Hsuan, Liu IEE5011 –Autumn 2013 Memory Systems 3D DRAM using TSV technology Hao-Hsuan, Liu Department of Electronics Engineering National Chiao Tung.
Introspective 3D Chips S. Mysore, B. Agrawal, N. Srivastava, S. Lin, K. Banerjee, T. Sherwood (UCSB), ASPLOS 2006 Shimin Chen (LBA Reading Group Presentation)
The Short, Medium and Long-Term Path to the 3D Ecosystem
Avogadro-Scale Engineering: Form and Function MIT, November 18, Three Dimensional Integrated Circuits C.S. Tan, A. Fan, K.N. Chen, S. Das, N.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
1 © Fraunhofer IIS/EAS | M. Waqas Chaudhary | IWLPC | October 13-15, 2015 INTERPOSER-BASED INTEGRATION OF ADVANCED MEMORIES AND ASIC M. Waqas Chaudhary.
CAD for Physical Design of VLSI Circuits
Comparison of various TSV technology
MARS A Scan-Island Based Design Enabling Pre-Bond Testability in Die-Stacked Microprocessors Dean L. Lewis Hsien-Hsin S. Lee Georgia Institute of Technology.
Copyright © 2007 Heathkit Company, Inc. All Rights Reserved PC Fundamentals Presentation 27 – A Brief History of the Microprocessor.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
® 1 VLSI Design Challenges for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Layout-Driven Test-Architecture Design and Optimization for 3D SoCs under Pre-Bond Test- Pin-Count Constraint Li Jiang 1, Qiang Xu 1, Krishnendu Chakrabarty.
Recent Topics on Programmable Logic Array
Tezzaron Semiconductor 04/27/2015 New Trends in Advanced 3D Vertical Interconnect Technology 1.
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Silicon Design Page 1 The Creation of a New Computer Chip.
Overview of VLSI 魏凱城 彰化師範大學資工系. VLSI  Very-Large-Scale Integration Today’s complex VLSI chips  The number of transistors has exceeded 120 million 
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 3 Computer Evolution.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
CS203 – Advanced Computer Architecture
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
CS203 – Advanced Computer Architecture
Lynn Choi School of Electrical Engineering
Integrated Circuits.
Inc. 32 nm fabrication process and Intel SpeedStep.
Hot Chips, Slow Wires, Leaky Transistors
TECHNOLOGY TRENDS.
Technology advancement in computer architecture
Morgan Kaufmann Publishers
Architecture & Organization 1
Lecture 2: Performance Today’s topics: Technology wrap-up
Architecture & Organization 1
Overview of VLSI 魏凱城 彰化師範大學資工系.
An Automated Design Flow for 3D Microarchitecture Evaluation
Presenter: Gongyu Wang BTCA Jan. 18, 2007
Die Stacking (3D) Microarchitecture -- from Intel Corporation
Presentation transcript:

Die Stacking (3D) Microarchitecture Bryan Black, Murali Annavaram, Ned Brekelbaum, John DeVale, Lei Jiang, Gabriel H. Loh1, Don McCauley, Pat Morrow, Donald W. Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan Shankar, John Shen, and Clair Webb Intel® Corporation Micro’06

OUTLINE Motivation for 3D Die Stacking Manufacture Process Thermal problem Performance and Temperature Result New Circuit Design Inspiration Conclusion

The Problem of Planar Floorplan silicon Planar circuit driver wire Scaled planar circuit 1.Faster transistors 2.Worse wire delays 1. D$  FU : Wire Delay 1 cycle 2. RF  FP : Wire Delay 2 cycle 3. Wire consumes more than 30% of the power within a microprocessor Reference: [1] Die Stacking (3D) Microarchitecture. Black, B. et al. Micro2006. [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA-13.

Die-stacked 3D Integration silicon0 A 2 die-stacked 3D integration driver vertical interconnect silicon1 Die 0 Die 1 Die 2 Die 3 A 4 die-stacked 3D IC Reference: [1] Die Stacking (3D) Microarchitecture. Black, B. et al. Micro2006. [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA-13.

Form Factor: One of the hottest motivation to develop 3D ICs Reference: [5] YOLE Developpement Corp. Market trends for 3D stacking

Current Technology Reference: [5] YOLE Developpement Corp. Market trends for 3D stacking

Comparison Reference: [5] YOLE Developpement Corp. Market trends for 3D stacking

A 3D Structure Face-to-Face Bounding TSV D2D vias (TSV) have size and electrical characteristics similar to conventional vias that connect on die metal routing layers Reference: [1] Die Stacking (3D) Microarchitecture. Black, B. et al. Micro2006. [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA-13.

Manufacture Reference: [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA Manufacture each die separately - start with two processed wafers

Manufacture Reference: [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA Deposit D2D vias - Similar to building conventional vias between different metal layers, deposit copper via stubs that connect to the top- level metal.

Manufacture Reference: [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA Themocompression bonding - The bonding time, amount of pressure, and temperature affect the quality of the bond between the two halves of the d2d via stubs. The pressure and temperature effectively cause the two ends of the copper stubs to fuse together.

Manufacture Reference: [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA CMP thinning - Use chemical-mechanical polishing (CMP) to thin one layer of the 3D stack to only 10 to 20 mm.

Manufacture Reference: [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA Backside etching for power, ground and I/O - The thinning allows the backside or through-silicon vias (TSVs) that implement the external I/O and power/ground connections to be relatively short.

6. Packaging(heat spreader and heat sink) Reference: [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA-13. Manufacture

Thermal Problems 1. Die stacking can dramatically increase power density if two highly active regions are stacked on top of each other. 2. Heat dissipation is also challenged by the fact that each additional die is stacked farther and father from the interface to the heat sink. Solution: 1. Don’t stack two highly active regions 2. The highest power die is placed closest to the heat sink. Reference: [1] Die Stacking (3D) Microarchitecture. Black, B. et al. Micro2006. [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA-13.

Example Intel Pentium 4 microprocessor D$  FU : Wire Delay 1 cycle RF  FP : Wire Delay 2 cycle Most active unit : FP Planar floorplan 3D floorplan Reference: [1] Die Stacking (3D) Microarchitecture. Black, B. et al. Micro2006.

Result - Performance Gain 1. 25% pipeline stages are eliminated % performance improvement % repeated and repeating latch is reduced. 4. The 3D grid has 50% less metal RC because the floorplan is 50% smaller 5. Fewer repeaters, a smaller clock grid, and less global wire  15% power reduction overall. Reference: [1] Die Stacking (3D) Microarchitecture. Black, B. et al. Micro2006.

Result - Temperature Reference: [1] Die Stacking (3D) Microarchitecture. Black, B. et al. Micro2006.

Result - Frequency and Voltage Scaling 1. To keep the same performance, we can reduce 18% voltage and yield 46% power reduction. 2. To use the same power, we can get 29% performance gain. Reference: [1] Die Stacking (3D) Microarchitecture. Black, B. et al. Micro2006.

New Circuit Design Inspiration for 3D Die-Stacking Thermal Herding 1. Many data do not need full width to represent values 63:48 47:32 31:16 15:0 A 64-bit data 2. Most instruction data-widths do not vary during program execution 63:48 47: :0 63:48 47:32 31:16 15:0 63:48 47:32 31:16 15:0 R3 = R1 + R2 R1 R2 R3 Reference: [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA-13.

Thermal herding 1. Significance partitioning Lower order bits of datapath (often, most switching) stay closest to the heat sink Least significant 16 bits (15:0) 16 bits (31:16) 16 bits (47:32) 16 bits (63:48) 2. Data-width prediction Predict data widths and disable access to higher order bits, when they are unused A 64-bit data 63……………………………..0 Reference: [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA-13.

Thermal Herding Adder Prefix-tree based planar Adder Thermal Herding 3D Adder Reference: [2] Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors. Kiran Puttaswamy, and Gabriel H. Lohz. HPCA-13.

Conclusion 1. 3D die stacking increases transistor density by vertically integrating two or more die with a dense, high-seed interface. 2. Blocks within a microprocessor can be placed on different die to reduce block-to-block distance, latency, and power. 3. Disparate Si technologies can also be combined in a 3D die stack, such as DRAM stacked on a CPU. 4. Hot blocks can be folded across two strata to reduce power consumption while maintaining latency and power. 5. New circuit design tools need to be considered.

Reference 1.Black, B., Annavaram, M., Brekelbaum, N., DeVale, J., Jiang, L., Loh, G. H., McCaule, D., Morrow, P., Nelson, D. W., Pantuso, D., Reed, P., Rupley, J., Shankar, S., Shen, J., and Webb, C Die Stacking (3D) Microarchitecture. In Proceedings of the 39th Annual IEEE/ACM international Symposium on Microarchitecture 2.Rao Tummala, 2005, The SOP Technology for Convergent The SOP Technology for Convergent Electronic & Bio Electronic & Bio-electronic Systems electronic Systems. In First International Workshop on SOP, SIP, SOP Electronics Technologies. 3.Kiran Puttaswamy, and Gabriel H. Lohz. Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High- Performance 3D-Integrated Processors. 2007, In HPCA-13 4.Loh, G. H., Xie, Y., and Black, B Processor Design in 3D Die- Stacking Technologies. IEEE Micro 27, 3 (May. 2007), YOLE Developpement Corp. Market trends for 3D stacking.

Backup Page

Example Intel Core 2 Dual microprocessor Memory stacked option 1.Reduce the bandwith requirement by 3X 2.Reduce CMPA by 13% 3.Reduce bus power by 0.5W Reference: [1] Die Stacking (3D) Microarchitecture. Black, B. et al. Micro2006.

Example Intel Core 2 Dual microprocessor 1.The thermal impact of stacking memory is not significant Reference: [1] Die Stacking (3D) Microarchitecture. Black, B. et al. Micro2006.

3D: Stacking of Bare Chips Reference: 2] The SOP Technology for Convergent The SOP Technology for Convergent Electronic & Bio Electronic & Bio-electronic Systems electronic Systems. Rao Tummala, SSET-1