Use of PCM in Computer Systems: an End-to-End Exploration Sangyeun Cho Computer Science Department University of Pittsburgh We need V.

Slides:

Advertisements

Similar presentations

Flash storage memory and Design Trade offs for SSD performance

Advertisements

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.

Thank you for your introduction.

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.

Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.

Myoungsoo Jung (UT Dallas) Mahmut Kandemir (PSU)

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.

1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.

Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.

Phase Change Memory What to wear out today? Chris Craik, Aapo Kyrola, Yoshihisa Abe.

1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)

1 Lecture 16: Virtual Memory Today: DRAM innovations, virtual memory (Sections )

CAFO: Cost Aware Flip Optimization for Asymmetric Memories RAKAN MADDAH *, SEYED MOHAMMAD SEYEDZADEH AND RAMI MELHEM COMPUTER SCIENCE DEPARTMENT UNIVERSITY.

1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.

11/10/2005Comp 120 Fall November 10 8 classes to go! questions to me –Topics you would like covered –Things you don’t understand –Suggestions.

Solid-State Drive Ding Ruogu Kong Liang. A solid-state drive (SSD) is a data storage device that uses solid-state memory to store persistent data.

Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.

Solid State Drive Feb 15. NAND Flash Memory Main storage component of Solid State Drive (SSD) USB Drive, cell phone, touch pad…

Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.

1 Towards Scalable and Energy-Efficient Memory System Architectures Rajeev Balasubramonian School of Computing University of Utah.

CERN openlab Open Day 10 June 2015 KL Yong Sergio Ruocco Data Center Technologies Division Speeding-up Large-Scale Storage with Non-Volatile Memory.

Slide 1 Windows PC Accelerators Reporter ：吳柏良. Slide 2 Outline l Introduction l Windows SuperFetch l Windows ReadyBoost l Windows ReadyDrive l Conclusion.

File system support on Multi Level Cell (MLC) flash in open source April 17, 2008 Kyungmin Park Software Laboratories Samsung Electronics.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Storage Class Memory Architecture for Energy Efficient Data Centers Bruce Childers, Sangyeun Cho, Rami Melhem, Daniel Mossé, Jun Yang, Youtao Zhang Computer.

/38 Lifetime Management of Flash-Based SSDs Using Recovery-Aware Dynamic Throttling Sungjin Lee, Taejin Kim, Kyungho Kim, and Jihong Kim Seoul.

Sangyeun Cho Hyunjin Lee

SOLID STATE DRIVES By: Vaibhav Talwar UE84071 EEE(5th Sem)

NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. NON.

Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.

University of Pittsburgh Memorage: Emerging Persistent RAM based Malleable Main Memory and Storage Architecture Juyoung Jung and Sangyeun Cho Computer.

2010 IEEE ICECS - Athens, Greece, December1 Using Flash memories as SIMO channels for extending the lifetime of Solid-State Drives Maria Varsamou.

Lecture 16: Storage and I/O EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.

Selling the Storage Edition for Oracle November 2000.

RDIS: A Recursively Defined Invertible Set Scheme to Tolerate Multiple Stuck-At Faults in Resistive Memory Rami Melhem, Rakan Maddah and Sangyeun cho Computer.

I/O Computer Organization II 1 Introduction I/O devices can be characterized by – Behavior: input, output, storage – Partner: human or machine – Data rate:

1 CloudVS: Enabling Version Control for Virtual Machines in an Open- Source Cloud under Commodity Settings Chung-Pan Tang, Tsz-Yeung Wong, Patrick P. C.

Computer Architecture Lecture 32 Fasih ur Rehman.

Esther Spanjer, M-Systems. Competing Technologies: HDD vs Flash Flash Advantages Volume/GB Mass/GB Shock resistance Power Consumption HDD Advantages Total.

“NVM Duet: Unified Working Memory and Persistent Store Architecture”

Literature Review on Emerging Memory Technologies

Emerging Non-volatile Memories: Opportunities and Challenges

Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.

What is it and why do we need it? Chris Ward CS147 10/16/2008.

Tackling I/O Issues 1 David Race 16 March 2010.

Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group.

Computer Architecture Chapter (5): Internal Memory

대용량 플래시 SSD의 시스템 구성, 핵심기술 및 기술동향

XIP – eXecute In Place Jiyong Park. 2 Contents Flash Memory How to Use Flash Memory Flash Translation Layers (Traditional) JFFS JFFS2 eXecute.

1 Paolo Bianco Storage Architect Sun Microsystems An overview on Hybrid Storage Technologies.

Elastic Parity Logging for SSD RAID Arrays Yongkun Li*, Helen Chan #, Patrick P. C. Lee #, Yinlong Xu* *University of Science and Technology of China #

Rakan Maddah1, Sangyeun2,1 Cho and Rami Melhem1

William Stallings Computer Organization and Architecture 7th Edition

Failure-Atomic Slotted Paging for Persistent Memory

What you should know about Flash Storage

Memory COMPUTER ARCHITECTURE

DuraCache: A Durable SSD cache Using MLC NAND Flash Ren-Shuo Liu, Chia-Lin Yang, Cheng-Hsuan Li, Geng-You Chen IEEE Design Automation Conference.

William Stallings Computer Organization and Architecture 7th Edition

Scalable High Performance Main Memory System Using PCM Technology

Flash Disk Technology Stop the Spin!

Introduction I/O devices can be characterized by I/O bus connections

Influence of Cheap and Fast NVRAM on Linux Kernel Architecture

Part V Memory System Design

reFresh SSDs: Enabling High Endurance, Low Cost Flash in Datacenters

Virtual Memory 4 classes to go! Today: Virtual Memory.

William Stallings Computer Organization and Architecture 7th Edition

Lecture 6: Reliability, PCM

2.C Memory GCSE Computing Langley Park School for Boys.

CSC3050 – Computer Architecture

Dong Hyun Kang, Changwoo Min, Young Ik Eom

Presentation transcript:

Use of PCM in Computer Systems: an End-to-End Exploration Sangyeun Cho Computer Science Department University of Pittsburgh We need V

Era of new memories near US Patents Granted MRAM FRAM PCM (PRAM) (Lam, VLSI-TSA ’08)

2000Year (S. H. Noh, NVRAMOS ’10)

Since then (arch. & design community) Wear leveling & memory attack handling –Start-Gap [MICRO ‘09] –Security Refresh [ISCA ’10] –On-line attack detection [HPCA ‘11] Fault masking –ECP [ISCA ‘10] –SAFER [MICRO ‘10] –FREE-p [HPCA ‘11] Process variation awareness –Characterization & mitigation [MICRO ‘09] –Mercury [HPCA ‘11] –Variation vs. endurance [DATE ‘11] DAC-2011 has three papers –“Power Management” (Prof. Yoo), “Wear Rate Leveling” (ICT, China), “Variable Partitioning” (Hong Kong City Univ.)

Agenda PCM 101 Industry trends PCM usage models Summary

Agenda PCM 101 Industry trends PCM usage models Summary

Phase-change memory (PCM) top electrode GST metal bit line select TR current pulse source memory V CC V gate (Pictures from Hegedüs and Elliott, Nature Materials, March 2008) Amorphous = high resistivityCrystalline = low resistivity SET RESET

PCM asymmetries

Note: Write is the thing Cycling of cell states leads to cell aging –Reported write endurance 10 5 to 10 6 (who said ?) Burdensome to scale write bandwidth –High write currents (more bits means higher currents) –Reliability problem, added system design costs, … Theoretically, scaling helps with both problems Architectural techniques to reduce bit updates

E.g.: Flip-N-Write [MICRO ’09] cache block replaced to be written to PCM “New data” “Old data”

E.g.: Flip-N-Write [MICRO ’09] cache block replaced to be written to PCM “New data” “Old data” 11 bits are different!

E.g.: Flip-N-Write [MICRO ’09] cache block replaced to be written to PCM “New data” “Old data” Only five bits are different! “Flipped new data”

E.g.: Flip-N-Write [MICRO ’09] cache block replaced to be written to PCM “New data” “Old data” (5+1) bits need be updated… “Flipped new data” 10 “Flip bit”

E.g.: Flip-N-Write [MICRO ’09] Savings in bit updates can improve energy and endurance Flip-N-Write updates N/2 bits maximum Write-current limited write time (M bits, S bps) –Conventional: (M/S)×T SET –Differential write: T READ + (M/S)×T SET –Flip-N-Write: T READ + (M/2S)×T SET

Agenda PCM 101 Industry trends PCM usage models Summary

Samsung Lee et al. ISSCC ’07 Lee et al. JSSC ’08 Diode switch design 266MB/s read 4.64MB/s write (x16) Diode switch design 266MB/s read 4.64MB/s write (x16) Chung et al. ISSCC ’11 LPDDR2-N “Write skewing” 6.4MB/s write “DCWI” (~Flip-N-Write) LPDDR2-N “Write skewing” 6.4MB/s write “DCWI” (~Flip-N-Write)

Write skewing (H. Chung et al. ISSCC ’11)

Data comparison write w/ inversion (H. Chung et al. ISSCC ’11)

Data comparison write w/ inversion (H. Chung et al. ISSCC ’11)

(Servalli, IEDM ’09) Numonyx (now Micron) Early access program (2009) “Alverstone” (OMNEO) TR switch design 40MB/s read (?) <1MB/s write (?) “Alverstone” (OMNEO) TR switch design 40MB/s read (?) <1MB/s write (?) Numerous press releases (slated for MP in 2011) “Bonelli” 1.8V I/O “Bonelli” 1.8V I/O (2011~2012?) “Imola” and “Mandello” 2Gb & 1.2V & 1.8V I/O LPDDR2-NVM & DDR3-NVM “Imola” and “Mandello” 2Gb & 1.2V & 1.8V I/O LPDDR2-NVM & DDR3-NVM (

Agenda PCM 101 Industry trends PCM usage models Summary

Where does PCM fit? L1 $$ L2 $$ L1 $$ Main memory I/O Hub Fast storage Program memory Program memory Large storage NIC

PCM as program memory “Replace NOR in embedded platforms” –Fast read speed, good retention, reasonable write bandwidth (a few MB/s) –First target of both Micron & Samsung PCM has an edge due to density, scalability, and write speed (use scrubbing to improve reliability) Today, common NOR parts are 64Mb~512Mb Initial PCM offerings –Micron: 128Mb (x8, x1) moving to 1Gb (x16?) –Samsung: 512Mb (x16) moving to 1Gb (x16)

Where does PCM fit? L1 $$ L2 $$ L1 $$ Main memory I/O Hub Fast storage Program memory Program memory Large storage NIC

PCM main memory? “Replace (a good chunk of) DRAM” Why this makes sense –DRAM scaling is hard (no known solutions at < 20nm) –DRAM consumes more power than wanted, even at idle time –PCM can scale better; PCM is power-efficient on reads & at idle time Why this may NOT happen (easily) –PCM has poor write bandwidth (as of now) –DRAM camp has been capable of overcoming hurdles E.g., new DRAM designs and interfacing schemes under consideration to improve on power & reliability –PCM is not getting enough attention (~investment) –Other competing technology maturing in the mean time?

PCM main memory? “Replace (a good chunk of) DRAM” Why this is attractive –PCM can enable low-power servers [ISCA ’09] –Instant on/off [Prof. Noh’s talk at Pitt, ’09] –Fast, potentially no-overhead checkpointing and versioning [Venkataraman et al., FAST ’11] –File system meta-data storage [Park and Park, IEEE TC ’11] More usage models –PCM provides working memory space and (very high- speed) storage space –Fast application launching via pre-loaded binary image –Fast local checkpointing in supercomputing platforms –Novel applications that require gigantic memory space

PCM main memory? L1 $$ L2 $$ L1 $$ PCM-Small Smart Mem-ctrl Smart Mem-ctrl DRAM PCM-Large PCM is slow and write endurance limited; we need DRAM buffering This is PCM working memory; a better species (e.g., SLC)? This is PCM “storage” space; maybe equivalent to PCM-Small or maybe slower and larger (e.g., MLC)? “Smart mem. controller” to handle multiple technologies; cache mgmt, error handling (ECC, sparing), trim, & low-level scheduling

Traditional dichotomy L1 $$ L2 $$ L1 $$ PCM-Small Smart Mem-ctrl Smart Mem-ctrl DRAM PCM-Large “memory land”“storage land”

PRISM =Persistent RAM storage monitor –To study a PRAM storage’s low-level behavior –To guide PRAM storage designs [Jung and Cho, ISPASS ’11]

PRISM test.exe read(file1) write(buf,file1) data values metadata impact resource conflict? access frequency OS vaddr cell wearing stat etc… Low-level storage behavior PRAM paddr parallelism? wear leveling? bit masking? address mapping?

PRISM (example) Plane 7 Plane1 PKG 0 PKG 1 PKG 2 PKG 3 1GB PSD Die 1Die 0 256MB package 0 Plane 0 Page 0 Page 1 Page 2 Page 3 Page N-1 Page N 8 planes in 128MB Die 1 16MB plane 7

PRISM (example) Seriously unbalanced resource utilization Access Count Die0Die1 Die0Die1 Die0Die1Die0Die1 Package 0Package 1 Package 2Package 3 Q. Is plane-level usage fair ?

Traditional dichotomy L1 $$ L2 $$ L1 $$ PCM-Small Smart Mem-ctrl Smart Mem-ctrl DRAM PCM-Large “memory land”“storage land”

Memory + storage = memorage? L1 $$ L2 $$ L1 $$ PCM-Small Smart Mem-ctrl Smart Mem-ctrl DRAM PCM-Large “memorage” [Jung and Cho, CF ’11)

Memorage benefits (elapsed time)

Memorage benefits (lifetime) Ratio of memory bandwidth to storage bandwidth Main memory lifetime improvement α = 30 α = 15 α = ratio of storage capacity to memory capacity

Where does PCM fit? L1 $$ L2 $$ L1 $$ Main memory I/O Hub Fast storage Program memory Program memory Large storage NIC

PCM as main storage medium “Replace NAND in high-speed SSDs” PCM has good potential (theoretically) –Lower latency than NAND (~100ns vs. ~100  s) –More scalable than NAND (~10nm vs. ~20nm) –Much simpler management (e.g., in-place update) –Potentially good bandwidth –Fast paging storage? Huge challenges ahead –NAND density improving, at least for now (scaling & TLC + better error handling) –NAND bandwidth (not latency) improves –NAND momentum ensures continued investment

E.g.: PCM SSD (Numonyx) (Fusionio)

E.g.: PCM SSD SSDwr PSDrd PSDwr SSDrd (Jung et al., NVMW ’11)

E.g.: PCM SSD SSDrand PSDrand PSDseq SSDseq (Jung et al., NVMW ’11)

Where does PCM fit? L1 $$ L2 $$ L1 $$ Main memory I/O Hub Fast storage Program memory Program memory Large storage NIC

PCM as hidden specialist in a drive “Provide specialized non-volatile capacity” PCM can help boost the performance of HDD –Provide fast storage capacity at tier 1 –Capture small writes, keep working set, and minimize arm movements –But can’t NAND do the same? (many hybrid approaches exist) PCM can help ease NAND write complexities –E.g., [Sun et al., HPCA ’09][Kim et al., EMSOFT ’08] –NAND write endurance worse than PCM by orders of mag. Total write data volume = C(PCM)×10 xxx + C(NAND) ×10 yyy –NAND latency slower than PCM Similar reasoning about performance is possible For PCM to become the tier 1 within a storage device, how much capacity & bandwidth is needed?

Where does PCM fit? L1 $$ L2 $$ L1 $$ Main memory I/O Hub Fast storage Program memory Program memory Large storage NIC

PCM as low-power search memory “Keep inter-networking tables in PCM” (Bolla et al., IEEE Communications Surveys & Tutorials 2010) Router power

PCM as low-power search memory “Keep inter-networking tables in PCM” Many data structures in inter-networking are read- intensive, e.g., IP lookup table, rules, patterns –Updates are relatively low bandwidth and incremental PCM could be used to construct TCAMs There are algorithms that use more regular RAM structures, e.g., [Hanna, Cho, and Melhem, Networking ’11] This is a niche application domain where some interesting things can be done

Agenda PCM 101 Industry trends PCM usage models Summary

PCM offers new opportunities to improve platform capabilities and end-user experiences Drop-in replacement of established memory technology does not work –Performance and power asymmetry –Errors –Falls short of fully utilizing PCM features Optimal solutions will likely resort to horizontal & vertical collaboration of multiple system components –And the goal should be to improve the whole system –Are there new ideas?

Why not … Academic researchers –… we explore system designs end-to-end (both horizontally and vertically) together, identify new opportunities, and specify performance, power, and reliability requirements? –Manufacturers will appreciate such a wish list Industry –… you “leak” information on real-world technical challenges you see and a realistic technology roadmap? –Researchers will love a laundry list of real problems

PCM-Related Architecture Problems, an End-to-End Systems View CAST (Computer Architecture, Systems, & Technology) Laboratory