Persistent Memory From Samples to Mainstream Adoption

Slides:



Advertisements
Similar presentations
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Advertisements

Information Means The World.. Enhanced Data Recovery Agenda EDR defined Backup to Disk (DDT) Tape Emulation (Tape Virtualization) Point-in-time Copy Replication.
Virtual Memory Virtual Memory Management in Mach Labels and Event Processes in Asbestos Ingar Arntzen.
Network Implementation for Xen and KVM Class project for E : Network System Design and Implantation 12 Apr 2010 Kangkook Jee (kj2181)
The SNIA NVM Programming Model
© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.
© 2010 IBM Corporation Storwize V7000 IBM’s Solution vs The Competition.
© 2009 Oracle Corporation. S : Slash Storage Costs with Oracle Automatic Storage Management Ara Vagharshakian ASM Product Manager – Oracle Product.
DBI336. Applications Microsoft Data ONTAP ™ Powershell Toolkit for Integration and Automation FlexClone ® for Rapid Provisioning ApplianceWatch for Health.
NVM Programming Model. 2 Emerging Persistent Memory Technologies Phase change memory Heat changes memory cells between crystalline and amorphous states.
Challenges of Storage in an Elastic Infrastructure. May 9, 2014 Farid Yavari, Storage Solutions Architect and Technologist.
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
Lecture 11: DMBS Internals
XenDesktop Built on FlexPod Flexible IT Infrastructure for Desktop Virtualization.
AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.
© 2006 IBM Corporation Flash Copy Solutions im Windows Umfeld TSM for Copy Services Wolfgang Hitzler Technical Sales Tivoli Storage Management
FlashSystem family 2014 © 2014 IBM Corporation IBM® FlashSystem™ V840 Product Overview.
SQL Server 2014: Overview Phil ssistalk.com.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
iSER update 2014 OFA Developer Workshop Eyal Salomon
1 #compromisenothing ©Copyright 2014 Tegile Systems Inc. All Rights Reserved. Company Confidential Think And not Or.
Jérôme Jaussaud, Senior Product Manager
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
2015 Storage Developer Conference. © Intel Corporation. All Rights Reserved. RDMA with PMEM Software mechanisms for enabling access to remote persistent.
Official Announcement QNAP Enterprise System (QES) Enterprise Grade Operating System Waterball Liu Platform PM Team Enterprise Storage Division.
Lecture 17 Raid. Device Protocol Variants Status checks: polling vs. interrupts Data: PIO vs. DMA Control: special instructions vs. memory-mapped I/O.
Persistent Memory (PM)
sponsored by HP Enterprise
WAFL: Write Anywhere File System
CS 704 Advanced Computer Architecture
EonStor DS 2000.
Video Security Design Workshop:
CMSC 611: Advanced Computer Architecture
Failure-Atomic Slotted Paging for Persistent Memory
Seth Pugsley, Jeffrey Jestes,
Memory COMPUTER ARCHITECTURE
Huawei Flash Storm Main Slides
Diskpool and cloud storage benchmarks used in IT-DSS
Using non-volatile memory (NVDIMM-N) as block storage in Windows Server 2016 Tobias Klima Program Manager.
Chapter 1: Introduction
Persistent Memory over Fabrics
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
HPE Persistent Memory Microsoft Ignite 2017
Architecture Background
Cache Memory Presentation I
Microsoft Build /12/2018 5:05 AM Using non-volatile memory (NVDIMM-N) as byte-addressable storage in Windows Server 2016 Tobias Klima Program Manager.
Multi-PCIe socket network device
Ping-Sung Yeh, Te-Hao Hsu Conclusions Results Introduction
ZUFS - Zero-copy User-mode FS
Storage Virtualization
Lecture 11: DMBS Internals
NOVA: A High-Performance, Fault-Tolerant File System for Non-Volatile Main Memories Andiry Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah,
PASTE: A Networking API for Non-Volatile Main Memory
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
The Tail At Scale Dean and Barroso, CACM 2013, Pages 74-80
Xen Network I/O Performance Analysis and Opportunities for Improvement
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
2.C Memory GCSE Computing Langley Park School for Boys.
CSE451 File System Introduction and Disk Drivers Autumn 2002
Open Source Activity Showcase Computational Storage SNIA SwordfishTM
Accelerating Applications with NVM Express™ Computational Storage 2019 NVMe™ Annual Members Meeting and Developer Day March 19, 2019 Prepared by Stephen.
CS 295: Modern Systems Storage Technologies Introduction
NVMe Markets: Rising at Breakneck Speed with Even Better Days Ahead Eric Burgener, Research Vice President Infrastructure Systems, Platforms and Technologies.
Factors Driving Enterprise NVMeTM Growth
Efficient Migration of Large-memory VMs Using Private Virtual Memory
Hybrid Buffer Pool The Good, the Bad and the Ugly
Presentation transcript:

Persistent Memory From Samples to Mainstream Adoption Golander Amit, PhD

Storage Media Generations IOPS (even if random…) Latency (even under load…) PM HDD FLASH NVDIMM / PM / NVMM Memory Speed Storage Persistency PM marries the best of both worlds: +

Gradual PM Adoption BOM reduction 2019 SW infrastructure HW SW Vendor Example Density Cost [GB] [$/GB] Samples 2013 2015 2017 2019 HW standards X0s >>DRAM Drivers & OSs SW infrastructure & on par features X00s >DRAM x000s <DRAM BOM reduction HW – SNIA NVDIMM SIG, JEDEC 4 pins, DDR4 ref board, ACPI Mainstream adoption

Agenda Past Future - Accelerate Adoption Hardware definitions Software approaches Future - Accelerate Adoption Past Future

PM Hardware

PM Hardware – 2017/8 Fast PM Slow(er) PM NVDIMM-N Mother board level SCM-based NVDIMMs

PM-based SW Approached SW reuse Performance Application SW Infrastructure HW PM PM PM Can also mention language extensions

Memory Accelerated (MAX) Data™ Applications language extension SPDK Plexistor (acquired by NetApp) PM-based FS pioneer since 2013 Contributed/ing some of our IP MAX Data approach: Support legacy applications & Enable NPM (e.g. SPDK) Feature rich Integrate with NetApp Data Fabric™ I/O semantics Memory semantics Page Cache PM-based FS DAX-enabled FS Block-based FS e.g. MAX FS bio … Block wrapper PM

PM/DAX FS - Room for Innovation 12 flavors in a decade Most developed wo/ real SCM Half are deprecated Half of the rest are DAX only (MAX FS) Katzburg et al. Submitted to SYSTOR 2018, Pending. An Experimental Study of NVDIMM-N Persistent Memory and its Impact on Two Relational Databases

BOM 1 Fast PM – Limited Capacity Slow PM – Still Expensive Reduce BOM On par Data Protection Larger SW eco system Fast PM – Limited Capacity Slow PM – Still Expensive Which is best for which Application? Offset PM BOM by saving on DRAM and CPU Offset PM BOM by using Lower Tiers Katzburg et al. ICSEE 2016 Storage becomes first class memory Average Storage Access Time: A.S.A.T = PMLatency + PMMissRate* FlashLatency

BOM 2: Auto-Tiering between PM & Flash Benchmark Application Server DBT-2 Postgres 9.5 MAX Data Server PM Flash +

Data Protection is Expected Reduce BOM On par Data Protection Larger SW eco system DP on the application server Single Fault Tolerance @ near-memory speed Snapshot-based DP

1. DP on the Application Server First line of defense Hardware Memory controller NVDIMM controller Software Local FS Xu et al. NVMW 2018 NOVA: A High-Performance, Fault-Tolerant File System for Non-Volatile Main Memories

Negligible Performance Degradation for real applications 2. Single Fault Tolerance @ Near-memory Speed Example NetApp MAX Recovery™ feature No application modification required Negligible Performance Degradation for real applications Few extra µs Penalty Golander et al. Poster at SYSTOR 2017 Persistent Memory over Fabric (PMoF)

3. Snapshot-based Data Protection Example MAX Sync™ feature leverages NetApp Data Fabric™ DP: Disaster Recover - SnapMirror Backup - SnapVault Auditing - SnapLock MAX Data ONTAP Additional synergy examples: ONTAP data reduction (e.g. compression) Ontap resiliency (e.g. RAID-TED) Ease of administration (e.g. hide cultural gap)

By Product: Ease of Administration Bridging cultural gaps, by automation & hiding complexity Application Admins MAX Data UI - Many - Care about their application … Cultural gap Storage Admin/s Data Fabric UI - Few storage expert - Care about corporate DP policies

Larger SW Eco System – Why? Reduce BOM On par Data Protection Larger SW eco system PM SW Market SW Innovation (Many players) Accelerate PM Adoption HW Cost (few big vendors)

Kernel Vs. User Space FS Implementation Fast (shortest path) User space Portable Resilient (contained) Simpler to add functionality & Debug Fewer licensing restrictions Kernel FS K-U Bridges The gap: Near-memory speed Kernel-to-User bridge

Why not extend FUSE to PM? FUSE architecture is great for HDDs and ok(ish) for SSDs, but not suitable for PM HDD Flash PM Memory $/GB Latency TCP RDMA FUSE ? FUSE ZUFS Typical medias Built for HDDs & extended to Flash Built for PM/NVDIMMs and DRAM SW Perf. goals Secondary (High latency media) Async I/O Throughput SW is the bottleneck Latency is everything SW caching Slow media -> Rely on OS Page Cache Near-memory speed media -> Bypass OS Page Cache Access method I/O only I/O and mmap (DAX) Cost of redundant copy / context switch Negligible The bottleneck -> Avoid copies, queues & remain on core Latency penalty under load 100s of µs 3-4 µs Design Assumptions

ZUFS Features & Architecture Low latency & Efficient Core & L1 cache affinity Zero data copy Manages devices Optimal pmem access NUMA aware Data mover to lower tier devices Page table mapping supports I/O & DAX semantics Misc Async hook available System service Optimal pmem access = special file, direct-io mmap the device

Preliminary Results (for PM) Measured on Dual socket, XEON 2650v4 (48HT) DRAM-backed PMEM type Random 4KB DirectIO write access

Conclusions 2018 is the year for PM as COTS (commodity of the shelf) Mass adoption needs more innovation: Hardware SCM-based NVDIMM vendors AMD and ARM support Software ZUFS is a key enabler Kernel-to-User bridge designed for PM https://github.com/NetApp/zufs-zus & zufs-zuf

Thank you