QUANTIFYING THE RELATIONSHIP BETWEEN THE POWER DELIVERY NETWORK AND ARCHITECTURAL POLICIES IN A 3D-STACKED MEMORY DEVICE Manjunath Shevgoor, Niladrish.

Slides:



Advertisements
Similar presentations
Exploring 3D Power Distribution Network Physics
Advertisements

Memory Controller Innovations for High-Performance Systems
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
1 Exploiting 3D-Stacked Memory Devices Rajeev Balasubramonian School of Computing University of Utah Oct 2012.
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores Aniruddha N. Udipi, Naveen Muralimanohar*, Niladrish Chatterjee, Rajeev Balasubramonian,
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
Paul Falkenstern and Yuan Xie Yao-Wen Chang Yu Wang Three-Dimensional Integrated Circuits (3D IC) Floorplan and Power/Ground Network Co-synthesis ASPDAC’10.
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures Dec 15 th 2014 MICRO-47 Cambridge UK Prashant Nair - Georgia Tech David.
Noise Model for Multiple Segmented Coupled RC Interconnects Andrew B. Kahng, Sudhakar Muddu †, Niranjan A. Pol ‡ and Devendra Vidhani* UCSD CSE and ECE.
Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian,
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
Handling the Problems and Opportunities Posed by Multiple On-Chip Memory Controllers Manu Awasthi, David Nellans, Kshitij Sudan, Rajeev Balasubramonian,
Lecture 7: Power.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in partitioned architectures Rajeev Balasubramonian Naveen.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
ECE 510 Brendan Crowley Paper Review October 31, 2006.
University of Utah 1 The Effect of Interconnect Design on the Performance of Large L2 Caches Naveen Muralimanohar Rajeev Balasubramonian.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Low Voltage Low Power Dram
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.
Spring 2007W. Rhett DavisNC State UniversityECE 747Slide 1 ECE 747 Digital Signal Processing Architecture SoC Lecture – Working with DRAM April 3, 2007.
1 Lecture 4: Memory: HMC, Scheduling Topics: BOOM, memory blades, HMC, scheduling policies.
UNDERSTANDING THE ROLE OF THE POWER DELIVERY NETWORK IN 3-D STACKED MEMORY DEVICES Manjunath Shevgoor, Niladrish Chatterjee, Rajeev Balasubramonian, Al.
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
Reducing Refresh Power in Mobile Devices with Morphable ECC
1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Designing a Fast and Reliable Memory with Memristor Technology
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
Yi-Lin, Tu 2013 IEE5011 –Fall 2013 Memory Systems Wide I/O High Bandwidth DRAM Yi-Lin, Tu Department of Electronics Engineering National Chiao Tung University.
02/21/2003 CART 1 On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories Rajagopalan Desikan, Charles R. Lefurgy, Stephen.
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Power Integrity Test and Verification CK Cheng UC San Diego 1.
CS/EE 5810 CS/EE 6810 F00: 1 Main Memory. CS/EE 5810 CS/EE 6810 F00: 2 Main Memory Bottom Rung of the Memory Hierarchy 3 important issues –capacity »BellÕs.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Enabling Big Memory with Emerging Technologies Manjunath Shevgoor Enabling Big Memory with Emerging Technologies1.
High-Performance DRAM System Design Constraints and Considerations by: Joseph Gross August 2, 2010.
Parallelism-Aware Batch Scheduling Enhancing both Performance and Fairness of Shared DRAM Systems Onur Mutlu and Thomas Moscibroda Computer Architecture.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Efficient Scrub Mechanisms for Error-Prone Emerging Memories Manu Awasthi ǂ, Manjunath Shevgoor⁺, Kshitij Sudan⁺, Rajeev Balasubramonian⁺, Bipin Rajendran.
1 Lecture 3: Memory Buffers and Scheduling Topics: buffers (FB-DIMM, RDIMM, LRDIMM, BoB, BOOM), memory blades, scheduling policies.
33 rd IEEE International Conference on Computer Design ICCD rd IEEE International Conference on Computer Design ICCD 2015 Improving Memristor Memory.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
1 Hardware Reliability Margining for the Dark Silicon Era Liangzhen Lai and Puneet Gupta Department of Electrical Engineering University of California,
Exploring the Rogue Wave Phenomenon in 3D Power Distribution Networks Xiang Hu 1, Peng Du 2, Chung-Kuan Cheng 2 1 ECE Dept., 2 CSE Dept. University of.
MICROPROCESSOR DESIGN1 IR/Inductive Drop Introduction One component of every chip is the network of wires used to distribute power from the input power.
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Seth Pugsley, Jeffrey Jestes,
A High-Speed and High-Capacity Single-Chip Copper Crossbar
Multi-Story Power Distribution Networks for GPUs
Duckhwan Kim and Saibal Mukhopadhyay
Gwangsun Kim Niladrish Chatterjee Arm, Inc. NVIDIA Mike O’Connor
Lecture: Memory, Multiprocessors
Staged-Reads : Mitigating the Impact of DRAM Writes on DRAM Reads
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
An Automated Design Flow for 3D Microarchitecture Evaluation
Energy Efficient Power Distribution on Many-Core SoC
Aniruddha N. Udipi, Naveen Muralimanohar*, Niladrish Chatterjee,
Manjunath Shevgoor, Rajeev Balasubramonian, University of Utah
Lecture: Cache Hierarchies
A Case for Interconnect-Aware Architectures
Presentation transcript:

QUANTIFYING THE RELATIONSHIP BETWEEN THE POWER DELIVERY NETWORK AND ARCHITECTURAL POLICIES IN A 3D-STACKED MEMORY DEVICE Manjunath Shevgoor, Niladrish Chatterjee, Rajeev Balasubramonian, Al Davis University of Utah Aniruddha N. Udipi ARM R&D 1 Jung-Sik Kim DRAM Design Team, Samsung Electronics

3D-Stacked DRAM 2  3D-stacking is happening !  Cost of 3D-DRAM is critical for large scale adoption  Low-cost 3D-stacked DRAM has performance penalties stemming from power delivery constraints  Our goal: low-cost, high-performance 3D-DRAM Source: micron.com

Outline 3  IR-drop background  Quantify the effect of IR-Drop  IR-drop aware memory controller  IR-drop aware scheduling and data placement  Evaluation

Background 4 1.5V GND A B Wire Resistance 1.5V 1.2V Circuit Element Voltage along Wire A-B Only part of the supply voltage reaches the circuit elements This loss of Voltage over the Power Delivery Network (PDN) is called IR -Drop IR Drop can lead to correctness issues

The Power Delivery Network 5  Grid of wires which connect power sources to the circuits Source: Sani R. Nassif, Power Grid Analysis Benchmarks VDD VSS

3D stacking increases current density – Increased ‘I’ TSVs add resistance to the PDN – increased ‘R’ Navigate 8 TSV layers to reach the top die IR Drop in 3D DRAM 6 DIE 2 DIE 3 DIE 1 High IR Drop Low IR Drop

Static and Dynamic IR Drop 7  Static IR-Drop  Static current loads  PDN is reduced to a network of resistances  Dynamic IR-Drop  Considers circuit switching  Capacitive and inductive effects are considered  Total noise of 75mV can be tolerated  As a first step, this paper focuses on Static IR-Drop  Pessimistically assume a 75mV margin for static IR-drop

Reducing IR Drop 8  Reduce “I”  Control Activity on chip  Decreases Performance  Current limiting constraints already exist  DDR3 uses t FAW and t RRD  Recent work on PCM (Hay et al.) using Power Tokens to limit PCM current draw  These solutions use Temporal Constraints … more is required to handle IR drop in 3D DRAM

Reducing IR Drop 9  Reduce “R”  Make wires wider  Add more VDD/VSS bumps  Increases Cost Relationship between pin count and package cost Source: Dong el al. Fabrication Cost Analysis and Cost-Aware Design Space Exploration for 3-D ICs

Background Summary 10  Voltage drops across every PDN  3D DRAM has I and R  Reducing I and R has cost and performance penalties Explore architectural policies to manage IR Drop  Can provide high performance and low cost

Overview 11  IR Drop is not uniform across the stack  Different regions can support different activities  Avoid being constrained by worst case IR Drop  IR-Drop aware architectural policies  Memory Scheduling  Data Placement

Outline 12  IR-drop background  Quantify the effect of IR-Drop  IR-drop aware memory controller  IR-drop aware scheduling and data placement  Evaluation

Cost optimized DRAM Layout 13

DRAM Layout – Spatial Dependence 14 VDD on M1 on Layer 9 X Coordinate VDD Y Coordinate The TSV count is high enough to provide the necessary current and not suffer Electro- Migration

IR Drop Profile 15 Figures illustrate IR Drop when all banks in the 3D stack are executing ACT IR Drop worsens as the distance from the source increases Spatial Constraints are clearly needed Layer 2Layer 3Layer 4Layer 5 Layer 6Layer 7Layer 8Layer 9

IR Drop Aware Constraints 16  Quality of power delivery depends on location  Existing constraints are Temporal  Hence, augment with Spatial Constraints

Outline 17  IR-drop background  Quantify the effect of IR-Drop  IR-drop aware memory controller  IR-drop aware scheduling and data placement  Evaluation

Iso-IR Drop Regions 18 IR Drop worsens as distance from TSVs increases IR Drop worsens as distance from C4 bumps increases To reduce complexity, we define activity constraints per region A BanksB Banks C BanksD Banks

19 Region Based Constraints RD No Violations RD

20 Region Based Constraints RD Violation ! RD

21 Multi Region Constraints 21 RD No Violations RD

Multi Region Constraints 22 RD Violation ! RD

23 Region based Read constraints At least one Rd in Top Regions 8 Reads allowed No Top Region Reads16 Reads allowed Top Region Reads1-2 Reads allowed Bottom Region Reads4 Reads allowed

DRAM Currents 24 SymbolValue (mA) DescriptionConsumed By I DD0 66One bank Activate to Precharge Local Sense Amps, Row Decoders, and I/O Sense Amps I DD4R 235Burst Read Current Peripherals, Local Sense Amps, IO Sense Amps, Column Decoders I DD4W 171Burst Write Current Peripherals, IO Sense Amps, Column Decoders Source: Micron Data Sheet for 4Gb x16 part

Read Based constraints 25  To limit controller complexity, we define ACT, PRE and Write constraints in terms of Read  The Read-Equivalent is the min. number of ACT/PRE/WR that cause the same IR-Drop as the Read CommandRead Equivalent ACT2 PRE6 WR1

Outline 26  IR-drop background  Quantify the effect of IR-Drop  IR-drop aware memory controller  IR-drop aware scheduling and data placement  Evaluation

Controlling Starvation 27  As long as Bottom Regions are serving more than 8 Reads, Top Regions can never service a Read  Requests mapped to Top regions suffer  Prioritize Requests that are older than N* Avg. Read Latency (N is empirically determined to be 1.2 in our simulations) Die Stack Wide Constraint At least one Rd in Top Regions 8 Reads allowed No Top Region Reads 16 Reads allowed

Page Placement (Profiled) 28  Profile applications to find highly accessed pages  Map most accessed pages to the most IR Drop resistant regions (Bottom Regions)  The profile is divided into 8 sections. The 4 most accessed sections are mapped to Bottom regions  The rest are mapped to C_TOP, B_TOP, D_TOP, A_TOP, in that order

Page Placement (Dynamic) 29  Recent page activity determines migration candidates  Pages with highest total queuing delay are moved to bottom regions  Using page access count to promote pages can starve threads  Page access count is used to demote pages to top regions  Page migration is limited by Migration Penalty (10k/15M cycles)

Outline 30  IR-drop background  Quantify the effect of IR-Drop  IR-drop aware memory controller  IR-drop aware scheduling and data placement  Evaluation

Modeling Static IR Drop 31 Current consumed by each block is assumed to be distributed evenly over the block Current sources are used to model the current consumption Source: Sani R. Nassif, Power Grid Analysis Benchmarks

Methodology 32  HMC based memory system  Simics coupled with augmented USIMM  SPEC CPU 2006 multi-programmed CPU Configuration CPU8-core Out-of-Order CMP, 3.2 GHz L2 Unified Last Level Cache8MB/8-way, 10-cycle access Memory Configuration Total DRAM Capacity8 GB in 1 3D stack DRAM Configuration 2 16-bit uplinks, 1 16-bit 6.4 Gbps 32 banks/DRAM die, 16 vaults 8 DRAM dies/3D-stack tFAW honored on each die

Effect of Starvation Control on IPC x worse 1.6x worse

Effect of EPP on IPC 34 Within 1.2x of ideal

Impact on DRAM Latency 35 PPP 55% reduction EPP 38% reduction

Conclusions 36  Multiple voltage noise sources – we focus on static IR Drop in 3D DRAM  Multiple ways to cope with IR Drop – we focus on architectural policies – reduce cost and improve performance  We construct simple region based constraints for the memory controller  We introduce starvation and page placement policies  The memory controller manages both spatial and temporal aspects – our low-cost PDN achieves performance that is very close to that of the Ideal PDN

37 Thank You

Manjunath Shevgoor, Niladrish Chatterjee, Rajeev Balasubramonian, Al Davis University of Utah Aniruddha N. Udipi ARM R&D 38 Jung-Sik Kim DRAM Design Team, Samsung Electronics QUANTIFYING THE RELATIONSHIP BETWEEN THE POWER DELIVERY NETWORK AND ARCHITECTURAL POLICIES IN A 3D-STACKED MEMORY DEVICE