MEMCON: Detecting and Mitigating Data-Dependent Failures by Exploiting Current Memory Content Samira Khan, Chris Wilkerson, Zhe Wang, Alaa Alameldeen,

Slides:

Advertisements

Similar presentations

Improving DRAM Performance by Parallelizing Refreshes with Accesses

Advertisements

Thank you for your introduction.

Analysis of Algorithms CS Data Structures Section 2.6.

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture

Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.

Improving Cache Performance by Exploiting Read-Write Disparity

Mitigating the Performance Degradation due to Faults in Non-Architectural Structures Constantinos Kourouyiannis Veerle Desmet Nikolas Ladas Yiannakis Sazeides.

Moinuddin K. Qureshi ECE, Georgia Tech

Haiyun Luo, Fan Ye, Jerry Cheng, Songwu Lu, Lixia Zhang

1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Quantitative principles of computer design  Measuring cost.

1 Software Testing and Quality Assurance Lecture 37 – Software Quality Assurance.

Justin Meza Qiang Wu Sanjeev Kumar Onur Mutlu Revisiting Memory Errors in Large-Scale Production Data Centers Analysis and Modeling of New Trends from.

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

Software-Based Online Detection of Hardware Defects: Mechanisms, Architectural Support, and Evaluation Kypros Constantinides University of Michigan Onur.

1 Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu.

Yu Cai1, Erich F. Haratsch2 , Onur Mutlu1 and Ken Mai1

Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes Chris Wilkerson, Alaa R. Alameldeen, Zeshan Chishti, Wei Wu, Dinesh Somasekhar, Shih-Lien.

Prashant Nair Dae-Hyun Kim Moinuddin K. Qureshi

Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu

Chapter Two Memory organisation Examples of operating system n Windows 95/98/2000, Windows NT n Unix, Linux, n VAX/VMS IBM MVS n Novell Netware and Windows.

The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.

Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,

A Mixed Time-Criticality SDRAM Controller MeAOW Sven Goossens, Benny Akesson, Kees Goossens COBRA – CA104 NEST.

Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.

Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency.

Background Gaussian Elimination Fault Tolerance Single or multiple core failures: Single or multiple core additions: Simultaneous core failures and additions:

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.

Improving Cache Performance by Exploiting Read-Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez.

Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.

1 Presented By: Michael Bieniek. Embedded systems are increasingly using chip multiprocessors (CMPs) due to their low power and high performance capabilities.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

A+MAC: A Streamlined Variable Duty-Cycle MAC Protocol for Wireless Sensor Networks 1 Sang Hoon Lee, 2 Byung Joon Park and 1 Lynn Choi 1 School of Electrical.

Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.

Data Retention in MLC NAND FLASH Memory: Characterization, Optimization, and Recovery. 서동화

Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.

Optimizing DRAM Timing for the Common-Case Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Adaptive-Latency.

Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.

Carnegie Mellon University, *Seagate Technology

Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.

CALTECH CS137 Fall DeHon CS137: Electronic Design Automation Day 9: October 17, 2005 Fault Detection.

Improving Multi-Core Performance Using Mixed-Cell Cache Architecture

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Understanding Latency Variation in Modern DRAM Chips Experimental Characterization, Analysis, and Optimization Kevin Chang Abhijith Kashyap, Hasan Hassan,

Software Reliability PPT BY:Dr. R. Mall 7/5/2018.

(Find all PTEs that map to a given PPN)

ChargeCache Reducing DRAM Latency by Exploiting Row Access Locality

Prof. Gennady Pekhimenko University of Toronto Fall 2017

Application Slowdown Model

Hasan Hassan, Nandita Vijaykumar, Samira Khan,

Jeremie S. Kim Minesh Patel Hasan Hassan Onur Mutlu

Ambit In-memory Accelerator for Bulk Bitwise Operations

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Genome Read In-Memory (GRIM) Filter:

Application Slowdown Model

Yixin Luo Saugata Ghose Yu Cai Erich F. Haratsch Onur Mutlu

Session 1A at am MEMCON Detecting and Mitigating

DRAM SCALING CHALLENGE

Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,

Reducing DRAM Latency via

Yaohua Wang, Arash Tavakkol, Lois Orosa, Saugata Ghose,

Use ECP, not ECC, for hard failures in resistive memories

Lecture 22: Cache Hierarchies, Memory

Samira Khan University of Virginia Nov 14, 2018

CROW A Low-Cost Substrate for Improving DRAM Performance, Energy Efficiency, and Reliability Hi, my name is Hasan. Today, I will be presenting CROW, which.

RAIDR: Retention-Aware Intelligent DRAM Refresh

Computer Architecture Lecture 6a: ChargeCache

Presented by Florian Ettinger

Presentation transcript:

MEMCON: Detecting and Mitigating Data-Dependent Failures by Exploiting Current Memory Content Samira Khan, Chris Wilkerson, Zhe Wang, Alaa Alameldeen, Donghyuk Lee, Onur Mutlu University of Virginia, Intel, Nvidia, ETH Zurich VISION: SYSTEM-LEVEL DETECTION AND MITIGATION Unreliable DRAM Cells BENEFITS OF ONLINE PROFILING Reliable Technology Scaling Improves yield, reduces cost, enables scaling Vendors can make cells smaller without a strong reliability guarantee Unreliable DRAM Cells BENEFITS OF ONLINE PROFILING LO-REF HI-REF 2. Improves performance and energy efficiency Reduce refresh rate, refresh faulty rows more frequently Reduce refresh count by using a lower refresh rate, but use higher refresh rate for faulty cells Unreliable DRAM Cells Detect and Mitigate Reliable System Detect and mitigate errors after the system has become operational ONLINE PROFILING CHALLENGE IN DETECTION How to detect data-dependent failures when we even do not know which cells are neighbors? SCRAMBLED MAPPING 1 X-? X X+? 1 Some cells can fail depending on the data stored in neighboring cells FAILURE DETECTION IS HARD DUE TO INTERMITTENT FAILURES NO DATA-DEPENDENT FAILURE HOW TO DETECT DATA-DEPENDENT FAILURES? LINEAR MAPPING X-1 X X+1 L D R 1 SCRAMBLED X-4 X+2 NOT EXPOSED TO THE SYSTEM Test with specific data pattern in neighboring cells GOAL Detects data-dependent failures without the knowledge of the DRAM internal address mapping MEMCON: MEMORY-CONTENT BASED DETECTION MEMCON: MEMORY CONTENT-BASED DETECTION AND MITIGATION Unreliable DRAM Cells with Program Content Simultaneous Detection and Execution Based on current memory content of running applications NO NEED TO DETECT EVERY POSSIBLE FAILURE Current content, Cell A 1 Need to detect and mitigate only with the current content List of Failures Application CURRENT DETECTION MECHANISM Unreliable DRAM Cells Initial Failure Detection and Mitigation Applications in Execution Detect every possible failure with all content before execution Pattern x, Cell A Pattern y, Cell B Pattern z, Cell C … (All possible failing cell) 1 List of Failures MEMCON: HIGH-LEVEL VISION LO-REF HI-REF No initial detection and mitigation Start running the application with a high refresh rate Detect failures with the current memory content If no failure found, use a low refresh rate MEMCON: COST-BENEFIT ANALYSIS Cost: Extra memory accesses to read and write rows Benefit: If no failure found, can reduce refresh rate Avg Cost 16 ms HI-REF 64 ms LO-REF Testing Time Cost t1 t2 t3 Initiate a test only when the cost can amortized Frequent Testing Selective Testing MEMCON: REDUCTION IN REFRESH COUNT UPPER BOUND On average 71% reduction in refresh count, very close to the upper bound of 75% The longer the elapsed time after a write  The longer the write interval Write intervals follow a Pareto distribution If the interval is already 1024 ms long, the probability that the remaining interval is greater than 1024 ms is on average 76% What is the write interval that can amortize the cost? 864 ms MinWriteInterval Four-Core Single-Core MEMCON: PERFORMANCE IMPROVEMENT Leads to significant performance improvement WRITE INTERVAL PREDICTION Wait for 1024 ms Expected RIL > 1024 ms Time Write Interval Length Write Requests After a write, wait for a CIL, where P(RIL) > 1024 is high If idle, predict the interval will last more than 1024 ms Time How much longer?? Write Requests MEMCON selectively initiates testing when the write interval is long enough to amortize the cost of testing