© 2007 IBM Corporation MICRO-2009 Start-Gap: Low-Overhead Near-Perfect Wear Leveling for Main Memories Moinuddin Qureshi John Karidis, Michele Franceschini.

Slides:

Advertisements

Similar presentations

Rethinking Database Algorithms for Phase Change Memory

Advertisements

Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.

A Case for Refresh Pausing in DRAM Memory Systems

Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.

High Performing Cache Hierarchies for Server Workloads

FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.

1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

Lecture 13 Page 1 CS 111 Online File Systems: Introduction CS 111 On-Line MS Program Operating Systems Peter Reiher.

Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.

Citadel: Efficiently Protecting Stacked Memory From Large Granularity Failures June 14 th 2014 Prashant J. Nair - Georgia Tech David A. Roberts- AMD Research.

Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD Fundamental Latency Trade-offs in Architecting DRAM Caches MICRO 2012.

Yoshiharu Ishikawa (Nagoya University) Yoji Machida (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba) A Dynamic Mobility Histogram Construction.

Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.

A Cache-Like Memory Organization for 3D memory systems CAMEO 12/15/2014 MICRO Cambridge, UK Chiachen Chou, Georgia Tech Aamer Jaleel, Intel Moinuddin K.

Phase Change Memory What to wear out today? Chris Craik, Aapo Kyrola, Yoshihisa Abe.

LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan.

Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.

Skewed Compressed Cache

DEUCE: WRITE-EFFICIENT ENCRYPTION FOR PCM March 16 th 2015 ASPLOS-XX Istanbul, Turkey Vinson Young Prashant Nair Moinuddin Qureshi.

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.

Moinuddin K. Qureshi ECE, Georgia Tech ISCA 2012 Michele Franceschini, Ashish Jagmohan, Luis Lastras IBM T. J. Watson Research Center PreSET: Improving.

Defining Anomalous Behavior for Phase Change Memory

CPU Cache Prefetching Timing Evaluations of Hardware Implementation Ravikiran Channagire & Ramandeep Buttar ECE7995 : Presentation.

PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET

Security Refresh Prevent Malicious Wear-out and Increase Durability for Phase-Change Memory with Dynamically Randomized Address Mapping Nak Hee Seong Dong.

A Low-Cost Memory Remapping Scheme for Address Bus Protection Lan Gao *, Jun Yang §, Marek Chrobak *, Youtao Zhang §, San Nguyen *, Hsien-Hsin S. Lee ¶

© 2007 IBM Corporation HPCA – 2010 Improving Read Performance of PCM via Write Cancellation and Write Pausing Moinuddin Qureshi Michele Franceschini and.

THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.

1 Towards Phase Change Memory as a Secure Main Memory André Seznec IRISA/INRIA.

Moinuddin K.Qureshi, Univ of Texas at Austin MICRO’ , 12, 05 PAK, EUNJI.

Improving Content Addressable Storage For Databases Conference on Reliable Awesome Projects (no acronyms please) Advanced Operating Systems (CS736) Brandon.

P AY -A S -Y OU -G O S TORAGE -E FFICIENT H ARD E RROR C ORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research.

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching Somayeh Sardashti and David A. Wood University of Wisconsin-Madison.

Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu.

Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.

Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.

BEAR: Mitigating Bandwidth Bloat in Gigascale DRAM caches

© 2007 IBM Corporation WEST-2010 Practical and Secure PCM Memories via Online Attack Detection Moinuddin Qureshi Luis Lastras, Michele Franceschini, John.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.

Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,

Demand Paging Reference Reference on UNIX memory management

Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.

Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.

IMP: Indirect Memory Prefetcher

Sunpyo Hong, Hyesoon Kim

Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs Lunkai Zhang, Diana Franklin, Frederic T. Chong 1 Brian Neely,

Rakan Maddah1, Sangyeun2,1 Cho and Rami Melhem1

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

Demand Paging Reference Reference on UNIX memory management

Cache Memory Presentation I

Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD

Scalable High Performance Main Memory System Using PCM Technology

Lecture 13: Large Cache Design I

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Energy-Efficient Address Translation

Spare Register Aware Prefetching for Graph Algorithms on GPUs

Demand Paging Reference Reference on UNIX memory management

Reducing Memory Reference Energy with Opportunistic Virtual Caching

Address-Value Delta (AVD) Prediction

(Architectural Support for) Semantically-Smart Disk Systems

Lecture 6: Reliability, PCM

Jianbo Dong, Lei Zhang, Yinhe Han, Ying Wang, and Xiaowei Li

Virtual Memory Hardware

A Self-Tuning Configurable Cache

SYNERGY: Rethinking Secure-Memory Design for Error-Correcting Memories

Use ECP, not ECC, for hard failures in resistive memories

Milestone 2 Enhancing Phase-Change Memory via DRAM Cache

Enabling Transparent Memory-Compression for Commodity Memory Systems

Restrictive Compression Techniques to Increase Level 1 Cache Capacity

Presentation transcript:

© 2007 IBM Corporation MICRO-2009 Start-Gap: Low-Overhead Near-Perfect Wear Leveling for Main Memories Moinuddin Qureshi John Karidis, Michele Franceschini Viji Srinivasan, Luis Lastras, Bulent Abali IBM T. J. Watson Research Center, Yorktown Heights, NY

© 2007 IBM Corporation 2 Introduction: Lifetime Limited Memories Emerging Memory Technologies (PCM) candidate for main memory. Reasons: Scalability, Leakage Power Savings, Density, etc. Challenge : Each cell can endure Million writes  Limited lifetime With uniform write traffic, system lifetime ranges from 4-20 years workloads 16 yrs 4 yrs

© 2007 IBM Corporation 3 Problem: Non-Uniformity in Writes Heavy non-uniformity in writes: <10% lines incur 90%+ of write traffic Database workload (writes occur on eviction from a 256MB DRAM cache) Average

© 2007 IBM Corporation 4 Expected Lifetime with Non-Uniform Writes 20x lower Even with 64K spare lines, baseline gets 5% lifetime of ideal Num. writes before system failure Num. writes before failure with uniform writes Norm. Endurance = x 100% Baseline w/o spares Baseline (64K spare lines)

© 2007 IBM Corporation 5 Outline  Problem  Background on Wear Leveling  Start Gap Wear Leveling  Randomized Start-Gap  Security Considerations  Summary

© 2007 IBM Corporation 6 Existing Proposals: Table-Based Wear Leveling Wear Leveling: Make writes uniform by remapping frequently written lines. Studied extensively for Flash Memories. Almost all proposals Table based. Line Addr. Lifetime CountPeriod Count A99K (Low)1K (Low) B100K (Med)3K (High) C101K (High)2K (Med)  Line Remap Addr AC BA CB Indirection Table Physical Address PCM Address

© 2007 IBM Corporation 7 Disadvantages of Table Based Methods Area overhead can be reduced with more lines per region:  Reduced effectiveness (e.g. Line0 always written)  Support for swapping large memory regions (complex) Overheads: 1. Area of several (tens of) megabytes 2. Indirection latency (table in EDRAM/DRAM) Our Goal: A wear leveling algorithm that avoids the storage, latency, and complexity of table based methods and still achieves lifetime close to ideal.

© 2007 IBM Corporation 8 Outline  Problem  Background on Wear Leveling  Start Gap Wear Leveling  Randomized Start-Gap  Security Considerations  Summary

© 2007 IBM Corporation 9 Start-Gap Wear Leveling Two registers (Start & Gap) + 1 line (GapLine) to support movement. Move GapLine every 100 writes to memory.  START A B C PCMAddr = (Start+Addr); (PCMAddr >= Gap) PCMAddr++) D GAP  Storage overhead: less than 8 bytes (GapLine taken from spares) Latency: Two additions (no table lookup) Write overhead: One extra write every 100 writes  1%

© 2007 IBM Corporation 10 Results for Start-Gap On average, Start-Gap gets 53% normalized endurance 10X better than baseline, but still 2x lower than Ideal. Why?

© 2007 IBM Corporation 11 Spatial Correlation in Heavily Written Regions Start-Gap moves a line to its neighbor  If heavily written regions are spatially close, Start-Gap may move hot lines to other hot lines Peaks Writes Localized If address space is randomized, hot regions will be spread uniformly FFT db1

© 2007 IBM Corporation 12 Outline  Problem  Background on Wear Leveling  Start Gap Wear Leveling  Randomized Start-Gap  Security Considerations  Summary

© 2007 IBM Corporation 13 Randomized Start Gap Line Addr Static Randomizer Start-Gap Mapping Physical Address Randomized Address PCM Address One-to-one mapping  Invertible function. Configured at design/boot. Hot lines PCM Minor change can support Pagemode memory. Randomizer is OS unaware.

© 2007 IBM Corporation 14 Efficient Address Space Randomization Two proposals (very little hardware) c0 c1 c2 c3 = b00 b01 b02 b03 b10 b11 b12 b13 b20 b21 b22 b23 b30 b31 b32 b33 RIB Matrix x a0 a1 a2 a3 Random Invertible Binary (RIB) Matrix 85 byte storage (1 cycle latency) Feistel Network (crypto) 5 byte storage (3 cycle latency)

© 2007 IBM Corporation 15 Results for Randomized Start-Gap Randomized Start-Gap achieves 97% of ideal lifetime while incurring a total storage overhead of 13 bytes.

© 2007 IBM Corporation 16 Analytical Model for Randomized Start Gap Lifetime from analytical model matches very well (97% vs. 96.8%) We developed a simple analytical model that uses variance in write traffic across lines to compute norm. endurance (details in paper)

© 2007 IBM Corporation 17 Comparison with Table Based Methods Randomized Start-Gap achieves lifetime similar to hardware-intensive version of table based & avoids several tens of cycle of latency overhead Baseline TBWL-640MB TBWL-1.25MB RandSGap (1 line per region) (region=128KB) 13 bytes Normalized Endurance (%)

© 2007 IBM Corporation 18 Outline  Problem  Background on Wear Leveling  Start Gap Wear Leveling  Randomized Start-Gap  Security Considerations  Summary

© 2007 IBM Corporation 19 Security Challenge in Lifetime Limited Memories What if an adversary knows about write endurance limit? Repeat Address Attack (RAA): repeat writes to same line. RAA can cause line failure in less than 1 minute Time to 1 line failure (seconds) = Endurance * ( CyclesPerWrite / CyclesPerSec ) = 2 25 x GHz = 32 seconds Both baseline and randomized Start-Gap suffers from this attack. Even table based wear leveling (practical version) suffers.

© 2007 IBM Corporation 20 Security Aware Wear Leveling Solution: Divide memory into regions. One Start-Gap per region. Region size is made such that each line in region guaranteed to move once every “endurance” number of writes to region NumLinesInRegion < Endurance WritesPerGapMovement We use 256K lines per region (256 regions). Area Overhead < 1.5KB RAA now takes about 3-4 months to cause failure. With delayed writes (in paper), time to failure ranges in year(s)

© 2007 IBM Corporation 21 Outline  Problem  Background on Wear Leveling  Start Gap Wear Leveling  Randomized Start-Gap  Security Considerations  Summary

© 2007 IBM Corporation 22 Summary  Limited endurance poses lifetime and security challenge  Table based wear leveling: area and latency overhead  Start-Gap: Cost-effective wear leveling with two registers  Randomized Start-Gap: 97% of ideal endurance with 13 bytes  We took a first step towards making PCM systems secure against malicious attacks (RAA). Motivation for more research

© 2007 IBM Corporation 23 Advertisement HPCA 2010 Tutorial Phase Change Memory: A Systems Perspective Organizers Dr. Moinuddin K Qureshi (IBM Research) Prof. Sudhanva Gurumurthi (University Of Virginia) Dr. Bipin Rajendran (IBM Research) Date: Jan 9, 2010 (Half Day)

© 2007 IBM Corporation 24 Backup Slides

© 2007 IBM Corporation 25 Supporting DRAM PageMode with Start-Gap Randomization must be done at a DRAM-Page granularity instead of line

© 2007 IBM Corporation 26 Lifetime Under RAA attack RAA will now take about 3-4 months to cause failure. With delayed writes (in paper), time required would range in year(s). 1 week 4 months 1 minute

© 2007 IBM Corporation 27 Outline  Problem  Background on Wear Leveling  Start Gap Wear Leveling  Randomized Start-Gap  Security Considerations  Summary

© 2007 IBM Corporation 28 Spare Lines