Enabling Big Memory with Emerging Technologies Manjunath Shevgoor Enabling Big Memory with Emerging Technologies1.

Slides:



Advertisements
Similar presentations
Improving DRAM Performance by Parallelizing Refreshes with Accesses
Advertisements

A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun ZhuZhao Zhang ECE Department Univ. Illinois at ChicagoIowa State.
Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
A Case for Refresh Pausing in DRAM Memory Systems
Memory Controller Innovations for High-Performance Systems
Orchestrated Scheduling and Prefetching for GPGPUs Adwait Jog, Onur Kayiran, Asit Mishra, Mahmut Kandemir, Onur Mutlu, Ravi Iyer, Chita Das.
Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science University of Michigan.
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores Aniruddha N. Udipi, Naveen Muralimanohar*, Niladrish Chatterjee, Rajeev Balasubramonian,
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture
Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures Dec 15 th 2014 MICRO-47 Cambridge UK Prashant Nair - Georgia Tech David.
Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Memory See: P&H Appendix C.8, C.9.
1 Lecture 13: DRAM Innovations Today: energy efficiency, row buffer management, scheduling.
Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian,
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
Handling the Problems and Opportunities Posed by Multiple On-Chip Memory Controllers Manu Awasthi, David Nellans, Kshitij Sudan, Rajeev Balasubramonian,
Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu Revisiting Memory Errors in Large-Scale Production Data Centers Analysis and Modeling of New Trends from.
1 Coordinated Control of Multiple Prefetchers in Multi-Core Systems Eiman Ebrahimi * Onur Mutlu ‡ Chang Joo Lee * Yale N. Patt * * HPS Research Group The.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in partitioned architectures Rajeev Balasubramonian Naveen.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
QUANTIFYING THE RELATIONSHIP BETWEEN THE POWER DELIVERY NETWORK AND ARCHITECTURAL POLICIES IN A 3D-STACKED MEMORY DEVICE Manjunath Shevgoor, Niladrish.
1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.
1 Lecture 4: Memory: HMC, Scheduling Topics: BOOM, memory blades, HMC, scheduling policies.
UNDERSTANDING THE ROLE OF THE POWER DELIVERY NETWORK IN 3-D STACKED MEMORY DEVICES Manjunath Shevgoor, Niladrish Chatterjee, Rajeev Balasubramonian, Al.
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
Designing a Fast and Reliable Memory with Memristor Technology
Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.
LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches Aarul Jain, Cambridge Silicon Radio, Phoenix Aviral Shrivastava, Arizona State.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
1 Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3)
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
1 Lecture 2: Memory Energy Topics: energy breakdowns, handling overfetch, LPDRAM, row buffer management, channel energy, refresh energy.
Efficient Scrub Mechanisms for Error-Prone Emerging Memories Manu Awasthi ǂ, Manjunath Shevgoor⁺, Kshitij Sudan⁺, Rajeev Balasubramonian⁺, Bipin Rajendran.
1 Lecture 3: Memory Buffers and Scheduling Topics: buffers (FB-DIMM, RDIMM, LRDIMM, BoB, BOOM), memory blades, scheduling policies.
Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan.
33 rd IEEE International Conference on Computer Design ICCD rd IEEE International Conference on Computer Design ICCD 2015 Improving Memristor Memory.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile.
1 Lecture 3: Memory Energy and Buffers Topics: Refresh, floorplan, buffers (SMB, FB-DIMM, BOOM), memory blades, HMC.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Lavanya Subramanian 1.
1 Lecture 4: Memory Scheduling, Refresh Topics: scheduling policies, refresh basics.
1 Lecture 16: Main Memory Innovations Today: DRAM basics, innovations, trends HW5 due on Thursday; simulations can take a few hours Midterm: 32 scores.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
UH-MEM: Utility-Based Hybrid Memory Management
Seth Pugsley, Jeffrey Jestes,
Reducing Memory Interference in Multicore Systems
Lecture 15: DRAM Main Memory Systems
Lecture: Memory, Multiprocessors
Row Buffer Locality Aware Caching Policies for Hybrid Memories
Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
Lecture: Memory Technology Innovations
CARP: Compression-Aware Replacement Policies
Energy Efficient Power Distribution on Many-Core SoC
Aniruddha N. Udipi, Naveen Muralimanohar*, Niladrish Chatterjee,
Manjunath Shevgoor, Rajeev Balasubramonian, University of Utah
15-740/ Computer Architecture Lecture 19: Main Memory
Presentation transcript:

Enabling Big Memory with Emerging Technologies Manjunath Shevgoor Enabling Big Memory with Emerging Technologies1

Big Memory  DRAM needs are increasing rapidly Enabling Big Memory with Emerging Technologies 2 Increased Data Gathering Data Analytics In Memory Databases

Need more capacity Enabling Big Memory with Emerging Technologies 3 Source: Kevin Lim et al., Disaggregated Memory for Expansion and Sharing in Blade Servers, ISCA’09 Core count doubling ~ every 2 years DIMM capacity doubling ~ every 3 years Memory capacity per core expected to drop every year [Source: Memory Scaling: Systems Architecture Perspective, O Mutlu]

Enabling Big Memory with Emerging Technologies 4 Possible Solutions 3D Stacking Increased Current Draw [MICRO’13] Many RankHigh Refresh Power [Under Submission] [ICCD’15, HPCA’12, NVMW’11,15] Non- Volatile Memory Sneak Currents Memristor

Thesis Statement Memory capacity requirements are increasing at a very fast rate. Management of high currents is crucial for effective deployment of new technologies. This thesis hypothesizes that architecture/OS policies for data placement can help manage some of the problems posed by high currents. Enabling Big Memory with Emerging Technologies 5

Talk Outline Current Constraints in 3D DRAM Addressing Refresh Overheads in DRAM Improving Memristor Memory by Re-using Sneak Currents Conclusion and Future Work Enabling Big Memory with Emerging Technologies 6

IR-Drop in 3D DRAM Enabling Big Memory with Emerging Technologies7 [MICRO’13]

What is power delivery network? Enabling Big Memory with Emerging Technologies 8 Source: Sani R. Nassif, Power Grid Analysis Benchmarks V VSS  Grid of wires which connects power and circuits  Voltage drops across every PDN  Voltage lost on the PDN is the IR Drop Explore architectural policies to manage IR Drop

3D stacking increases current density – Increased ‘I’ TSVs add resistance to the PDN – Increased ‘R’ Navigate 8 TSV layers to reach the top die Insufficient voltage leads to incorrect operation 9 High IR Drop Low IR Drop IR Drop in 3D DRAM Enabling Big Memory with Emerging Technologies

 Banks that are farther away from the TSVs suffer higher IR Drop Enabling Big Memory with Emerging Technologies 10 V on M1 on Layer 9 X Coordinate V Y Coordinate Floor Plan and Quality of Power Delivery

Enabling Big Memory with Emerging Technologies11 Layer 2Layer 3Layer 4Layer 5 Layer 6Layer 7Layer 8Layer 9 IR Drop Varies along a Die and across the stack

Enabling Big Memory with Emerging Technologies12 Top 4 Dies Bot 4 Dies Logic Layer Create constraints for Iso-IR Drop regions Place critical pages in IR Drop resistant regions IR Drop oblivious page placement leads to 47% performance degradation

Region Based Constraints Enabling Big Memory with Emerging Technologies 13 Top Region 1-2 Reads allowed/region Bottom Region 4 Reads allowed/region At least 1 Top-Read 8 Reads allowed/stack No Top-Reads 16 Reads allowed/stack Spatio-Temporal Constraints

Dynamic Page Placement 14  Pages with highest total queuing delay are moved to bottom regions  Using page access count to promote pages can starve threads  Scheduler ensures fairness  Page migration is limited by Migration Penalty (10k/15M cycles) Enabling Big Memory with Emerging Technologies

Results Enabling Big Memory with Emerging Technologies 15 Within 20% of ideal

Enabling Big Memory with Emerging Technologies 16 Overview 3D Stacking Increased Current Draw [MICRO’13] Many RankRefresh Overhead [Under Submission] [ICCD’15, HPCA’12, NVMW’11,15] Non- Volatile Memory Sneak Currents Memristor

Re-Thinking Data Placement in Highly Ranked DRAM Systems Enabling Big Memory with Emerging Technologies17

Refresh Power in DRAM CommandCurrent (mA) Act67 Read125 Write125 Refresh245 Enabling Big Memory with Emerging Technologies 18 Refresh consumes 96% more power than read Source: Micron 8GB DDR3L data sheet There can be up to 4 ranks in DIMM

Enabling Big Memory with Emerging Technologies19 8-core CMP MC Channel 1 Channel 2 Rank 1 Rank 2 Rank 3 Rank 4 Stagger refresh to reduce peak power

Increase in Refresh Time Enabling Big Memory with Emerging Technologies 20 Chip Capacity (GB) tRFC (ns) tRFC_2X (ns) tRFC_4X (ns) Refresh Interval7.8 µs3.9 µs1.95 µs Fine grained refresh

Effect of Staggered Refresh Enabling Big Memory with Emerging Technologies 21

Enabling Big Memory with Emerging Technologies22 8-core CMP MC Channel 1 Channel 2 Rank 1 Rank 2 Rank 3 Rank 4 T1 R 2 T2 R 3 T1 R 1 T2 R 2 T2 R 1 T3 R 1 T1 R 1 T2 R 3 T1 R 3 T1 R 3 T3 R 3 T3 R 3 Stalle d Each Staggered Refresh stalls many cores

Limit the spread- Address Mapping Enabling Big Memory with Emerging Technologies 23

Enabling Big Memory with Emerging Technologies24 8-core CMP MC Channel 1 Channel 2 Rank 1 Rank 2 Rank 3 Rank 4 T1 R 2 T2 R 3 T1 R 1 T2 R 2 T2 R 1 T3 R 1 T1 R 1 T2 R 3 T1 R 3 T1 R 3 T3 R 3 T3 R 3 Stalle d T1 R 1 T2 R 2 T1 R 1 T2 R 2 T2 R 2 T3 R 3 T1 R 1 T2 R 2 T1 R 1 T1 R 1 T3 R 3 T3 R 3 Ideally

Rank Assigned Page Mapping Enabling Big Memory with Emerging Technologies 25 8-core CMP MC Channel 1 Channel 2 Rank 1 Rank 2 Rank 3 Rank 4 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 Thread 8 (a) Strict mapping of threads to ranks.

Enabling Big Memory with Emerging Technologies % better than Staggered Refresh

Limit the spread- Page Mapping Enabling Big Memory with Emerging Technologies 27 Channel 1 Channel 2 Rank 1 Rank 2 Rank 3 Rank 4 Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Thread 7 Thread 8 8-core CMP MC

Relaxing Rank Assignment Enabling Big Memory with Emerging Technologies 28

Data Mapping Enabling Big Memory with Emerging Technologies 29 Address Mapping Page Mapping 18.6% better than Staggered Refresh

Enabling Big Memory with Emerging Technologies 30 Overview 3D Stacking Increased Current Draw [MICRO’13] Many RankRefresh Overhead [Under Submission] [ICCD’15, HPCA’12, NVMW’11,15] Non- Volatile Memory Sneak Currents Memristor

Designing a Fast and Reliable Memory with Memristor Technology Enabling Big Memory with Emerging Technologies31 [ICCD’15, NVMW’15]

Background  Store data in the form of resistance  Metal oxide sandwiched between two electrodes  Inherently non conducting  Creation of conductive Filaments of oxygen vacancies reduces resistance Enabling Big Memory with Emerging Technologies 32 Source: Cong Xu et al., Modeling and Design Analysis of 3D Vertical Resistive Memory - A Low Cost Cross-Point Architecture, ASPDAC 2014

Voltage Dependent Resistance  Resistance decreases with increasing voltage Enabling Big Memory with Emerging Technologies 33 The resistance of a ReRAM cell is not constant but varies with the applied voltage Combination of a selector in series with memristor device

Enabling Big Memory with Emerging Technologies 34 Bit Line Word Line DRAM Cell Bit Line Word Line PCM Cell Word Line Bit Line Memristor Cell Cell Size of 4F 2

Cross Point Structure Enabling Big Memory with Emerging Technologies 35 Because of non-linearity, it is possible to select a cell without an access transistor. Arrays can be layered vertically without resorting to 3D stacking. Mem- ristor Selector Memristor Cell

Reading and Writing Enabling Big Memory with Emerging Technologies 36 Half Selected Cells Selected Cell Sneak Current 0V V/2 V

Effects of I leak Enabling Big Memory with Emerging Technologies37

Effects of I leak Enabling Big Memory with Emerging Technologies 38 Decreases Voltage at selected cell  Increases Write Latency  Can cause Write Failure Distorts bit line current  Increases read complexity  Decreases read margin Limits Array Size

Enabling Big Memory with Emerging Technologies 39 Reading from the crossbar array Step 1: Read background current (I leak ) V read /2 0 I leak V read /2

Enabling Big Memory with Emerging Technologies 40 Reading from the crossbar array Step 2: Read total V read current (I read ) V read 0 I read I leak V read /2

Enabling Big Memory with Emerging Technologies 41 State of selected cell determines I read ~ I leak tBG_READtREAD Read Latency

Enabling Big Memory with Emerging Technologies 42 Proposal 1: Re-use value in sample and hold circuit V read V read /2 VrVr P acc P prech S1 Sensing Circuit S2 Sample and Hold Sneak Current

Reusing Sneak Current Read 43 Sneak Current uA Columns Enabling Big Memory with Emerging Technologies Rows

Enabling Big Memory with Emerging Technologies 44 Re-Use Sneak Current Reading for the same Column tBG_READtREAD Read Latency1 tREAD Read Latency2

Impact of Cell Location Enabling Big Memory with Emerging Technologies45

Enabling Big Memory with Emerging Technologies 46 Bit Line Mux Word Line Drivers  Increased error rates

Enabling Big Memory with Emerging Technologies Byte Cache line Array 1 Array 2 Array 3 Array 512 Bit 1 Bit 2 Bit 3 Bit 512 Default mapping leads to some lines with high error rate

Proposal 2: Stagger the array mapping Enabling Big Memory with Emerging Technologies 48 Cacheline 1 Cacheline 2Cacheline 3 Cacheline Nth bit in cacheline Array 0 Array 1 Array 2 Array 3 Default Mapping Proposed Mapping 30X reduction in probability of a single bit error

Performance Vs Baseline Improving Memristor Memory with Sneak Current Sharing 49

Exploring Address Mapping Improving Memristor Memory with Sneak Current Sharing 50

Enabling Big Memory with Emerging Technologies 51 Summary of Dissertation 3D Stacking Increased Current Draw [MICRO’13] Many RankRefresh Overhead [Under Submission] [ICCD’15, HPCA’12, NVMW’11,15] Non- Volatile Memory Memory Latencies Memristor Spatio-Temporal Constraints Re-Thinking Data Placement Re-use Sneak Currents

Conclusions Enabling Big Memory with Emerging Technologies 52 3D Stacking Many Rank MemristorRe-Use Sneak Currents Rank Assignment IR Drop Constraints

Future Work Mitigating the Rising Cost of Process Variation in 3D DRAM PDN Aware Refresh Cycle Time for 3D DRAM Addressing Long Write Latencies in Memristor based Memory Enabling Big Memory with Emerging Technologies 53

Other Projects and Publications Efficiently Prefetching Complex Address Patterns MICRO’15 USIMM: The Utah Simulated Memory Module Used for the Memory Scheduling Championship Efficient Scrub Mechanisms for Error-Prone Emerging Memories HPCA’12 Accelerating Critical Word Access using Heterogeneous Memory MICRO’12 Avoiding Information Leakage in the Memory Controller MICRO’15 Enabling Big Memory with Emerging Technologies 54

Acknowledgements Rajeev Ashwini, Parents Al, Erik, Naveen, Ken Chris Wilkerson, Zeshan Chishti Utah Arch team-mates Karen, Ann Enabling Big Memory with Emerging Technologies 55

Thank You Enabling Big Memory with Emerging Technologies56

Enabling Big Memory with Emerging Technologies 57 Thesis Overview 3D Stacking Increased Current Density [MICRO’13] Many RankHigh Refresh Current [Under Submission] [ICCD’15, NVMW’11,15] Non- Volatile Memory Sneak Currents Memristor Analyze Impact of Currents + Performance Loss Data Placement

Comparisons to Prior Work Enabling Big Memory with Emerging Technologies 58

Enabling Big Memory with Emerging Technologies 59 RWRW RWRW RWRW RWRW RWRW RWRW RWRW RWRW RWRW RWRW RWRW RWRW 0 V/2 V Bit Lines Word Lines V W1 V W2 V WN V WN1 V WNM Bit Line Mux Bit line and word line resistances eat into the cell Voltage

Percentage of refreshes stalling a thread Enabling Big Memory with Emerging Technologies 60

Memory Latency Improving Memristor Memory with Sneak Current Sharing 61

Memristor Read Power Improving Memristor Memory with Sneak Current Sharing 62

Enabling Big Memory with Emerging Technologies 63 Core 1 z Last Level $$ $ Miss Core 8 Delta History Tables Prediction See a Delta? Predict a Delta! Prediction Feedback Delta Prediction Tables Delta History Tables Delta Prediction Tables

Enabling Big Memory with Emerging Technologies 64 Sneak path currents can distort I read V read 0 I read I leak V read /2

Sneak Currents

Compress to reduce write latency Enabling Big Memory with Emerging Technologies Byte Cache line Array 1 Array 2 Array 3 Array 512 Bit 1 Bit 2 Bit 3 Bit Proposed Mapping With 50% Compression

Enabling Big Memory with Emerging Technologies 67

Summary  With great density come a few challenges  Sneak Currents limit array size, complicate reads, and delay writes  Affect reliability  Background current can be reused  Reliability can be improved at the cost of write latency  Compression can reduce write latency  8.3% performance improvement  30X reduction in multi bit error probability Enabling Big Memory with Emerging Technologies 68

Column Hit Rate Improving Memristor Memory with Sneak Current Sharing 69