Application Access to Persistent Memory – The State of the Nation(s)!

Slides:



Advertisements
Similar presentations
Lecture 12 Page 1 CS 111 Online Devices and Device Drivers CS 111 On-Line MS Program Operating Systems Peter Reiher.
Advertisements

04/14/2008CSCI 315 Operating Systems Design1 I/O Systems Notice: The slides for this lecture have been largely based on those accompanying the textbook.
The SNIA NVM Programming Model
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.
NVM Programming Model. 2 Emerging Persistent Memory Technologies Phase change memory Heat changes memory cells between crystalline and amorphous states.
Programming mobile devices Part II Programming Symbian devices with Symbian C++
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
University of Amsterdam Computer Systems – a guided tour Arnoud Visser 1 Computer Systems A guided Tour.
OpenFabrics Interface WG A brief introduction Paul Grun – co chair OFI WG Cray, Inc.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
OFA OpenFabrics Interfaces Project Data Storage/Data Access subgroup November 2015 Summarizing NVM Usage Models for DS/DA.
2015 Storage Developer Conference. © Intel Corporation. All Rights Reserved. RDMA with PMEM Software mechanisms for enabling access to remote persistent.
OFA OpenFabrics Interfaces Project Data Storage/Data Access subgroup October 2015 Summarizing NVM Usage Models for DS/DA.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
OFA OpenFabrics Interfaces Project Data Storage/Data Access subgroup October 2015 Summarizing NVM Usage Models for DS/DA.
Tgt: Framework Target Drivers FUJITA Tomonori NTT Cyber Solutions Laboratories Mike Christie Red Hat, Inc Ottawa Linux.
Introduction to Operating Systems Concepts
Lecture 2. A Computer System for Labs
TYPES OF MEMORY.
Operating System Overview
Storage Area Networks The Basics.
Introduction to Kernel
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
Chapter 8: Main Memory.
Chapter 11: File System Implementation
Input/Output Devices ENCE 360
Chapter 1: A Tour of Computer Systems
Cross-platform Libraries Technology Presentation
Using non-volatile memory (NVDIMM-N) as block storage in Windows Server 2016 Tobias Klima Program Manager.
Persistent Memory over Fabrics An Application-centric view
Persistent Memory over Fabrics
Computer Organization
CS399 New Beginnings Jonathan Walpole.
CSE451 I/O Systems and the Full I/O Path Autumn 2002
Chapter 9 – Real Memory Organization and Management
HPE Persistent Memory Microsoft Ignite 2017
Microsoft Build /12/2018 5:05 AM Using non-volatile memory (NVDIMM-N) as byte-addressable storage in Windows Server 2016 Tobias Klima Program Manager.
Introduction to Networks
RDMA Extensions for Persistency and Consistency
Chet Douglas – DCG NVMS SW Architecture
ZUFS - Zero-copy User-mode FS
SCSI over PCI Express (SOP) use cases
Better I/O Through Byte-Addressable, Persistent Memory
Lecture 11: DMBS Internals
Chapter 8: Main Memory.
Enabling the NVMe™ CMB and PMR Ecosystem
CSCI 315 Operating Systems Design
OpenFabrics Alliance An Update for SSSI
CS703 - Advanced Operating Systems
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Today’s agenda Hardware architecture and runtime system
Lecture Topics: 11/1 General Operating System Concepts Processes
CSE 451: Operating Systems Autumn 2005 Memory Management
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Windows Virtual PC / Hyper-V
Application taxonomy & characterization
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Introduction to Computer Systems
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE451 File System Introduction and Disk Drivers Autumn 2002
Year 10 Computer Science Hardware - CPU and RAM.
Accelerating Applications with NVM Express™ Computational Storage 2019 NVMe™ Annual Members Meeting and Developer Day March 19, 2019 Prepared by Stephen.
CSE 542: Operating Systems
Chapter 13: I/O Systems.
Factors Driving Enterprise NVMeTM Growth
Presentation transcript:

Application Access to Persistent Memory – The State of the Nation(s)! Stephen Bates, Paul Grun, Tom Talpey, Doug Voigt Microsemi, Cray, Microsoft, HPE

Stephen Bates Microsemi The Suspects Stephen Bates Microsemi Paul Grun Cray Tom Talpey Microsoft Doug Voigt HPE Paul to add this

We’ve Come a Long Way, Baby! We’ve come a long way baby! Can give ourselves a slap on the back. Leased at 3200 a month (28000 in today’s money) About 5MB 7bit characters!! 1200 rpm Seek ~600ms http://www.snopes.com/photos/technology/storage.asp

Persistent Memory (PM) Low Latency Memory Semantics Storage Features We want three things from PMEM. Low latency access memory sematic access Storage features – persistence, DI, RAID, replacement in field etc. If you don’t need low latency then don’t bother with PMEM (a mmap’ed SSD will do the job just fine). Maybe mention sub10us though we can cover that later. For storage features touch on RAID, replacement of FRUs and data-integrity

Taxonomy NVM – Non-Volatile Memory. All types, including those that are not byte-addressable PM – Persistent Memory. Sometimes PMEM is used but we use PM in this talk NVMe – NVM Express. A block protocol to run over PCIe, RDMA or Fibre Channel. A SATA/SAS replacement. NVMP – NVM Programming Model. Application-visible NVM behavior NVMf – NVMe over Fabrics. NVMe extended over fabrics

Low Latency NVMe SSDs are (relatively) high latency! L1 Cache Read 0.5ns 1 L2 Cache Read 7ns 14 DRAM Read 100ns 200 The PM Opportunity NVMe DRAM SSD Read 10us 20,000 NVMe NAND SSD Read 150us 300,000 SAS HDD 500us 1,000,000 Mention NVMe is adding memory semantics Mention that PMEM is all about persistence near (in time) to the processor. Nobody should store data in PMEM unless that plan to process it. NVMe SSDs are (relatively) high latency! PM provides persistence at memory-like speeds and semantics

Where Are We?

Protocols & Inter- connect What is Needed? Media & Form-Factors Protocols & Inter- connect Apps Libraries & Toolchain The PMEM Jigsaw Progress being made on each of the pieces All the outer pieces need to be in place to deliver on the last (middle) piece OS Support

Lots of Moving Parts User space apps Kernel space apps Communications middleware Apps Consumers APIs Protocols & Interconnect Libraries & Toolchain Infrastructure Drivers Communications Infrastructure Protocols & Interconnect It is a complex problem, with lots of moving parts OS Support Non-volatile Media, Form Factors Media & Form-Factors Media, Form Factors

Where does PM sit? (Answer – anywhere it wants to) DRAM PM NAND NAND CPU DDR PM Fabric PCIe NAND And to add to the complexity, we have to deal with both NAND and PM devices that can be sited anywhere. Also mention that CPU vendors are the ones who decide what comes out of their processors. Mention other interfaces including: NVLINK, CAPI, RAPID-IO Under ‘fabrics’, we are describing the organizations that play a significant role in influencing the fabric implementation, not just the purveyors of the hardware. PM

Rationalizing the Problem Space Memory byte-addressable Storage object, file, block… NVM consumers SNIA, OSVs … OpenFabrics Alliance, NVMe Inc, Linux, Windows… APIs OFI, Verbs, NVMF, ND, NDK network I/O bus memory bus IETF, IBTA, PCIe SIG, OS drivers… Managing complexity is what engineers do best, usually by partitioning large, sticky problems into small, manageable chunks. ND, NDK are Windows APIs for driving fabric devices, similar to verbs Non-volatile Media, Form Factors Vendors, JEDEC… 11

Start with Consumers of NVM Services Apps storage client user app load/store, memcopy… POSIX read or write file system virtual switch DIMM DIMM NVDIMM NVDIMM NVM devices (SSDs…) provider It begins by understanding who the consumers are, and what they need. Consumers are characterized by, among other things, the way that the NVM device is accessed. Is it accessed using some sort of block-based I/O protocol? Is it accessed using memory semantics? Not all NVM devices are created equal; some don’t have the latency characteristics needed to respond well as remote memory NIC Remote NVM device (storage, memory)

PM data structure libraries Application View Apps SNIA NVMP Describe application visible behaviors APIs align with OSs User PM Aware Apps File APIs Ld/St PM data structure libraries PM File System Actions Map – expose PM directly to applications Optimized Flush – make data persistent Kernel Middleware features e.g. RAID PM Aware File Systems MMU Mapp ings PM Device

Possible Stack for NVM Access kernel application user app user app kernel application VFS / Block Layer byte access byte access VFS / Block I/O / Network FS / LNET SRP, iSER, NVMe/F, NFSoRDMA, SMB Direct, LNET/LND,… iSCSI SCSI NVMe ulp APIs libfabric kfabric sockets provider provider kverbs PCIe mem bus PCIe HBA SSD SSD NVDIMM fabric-specific device NIC, RNIC HCA NIC Key point: NVM services are exported to the consumer via a series of APIs. The API, in turn, defines the features that are available to the consumer. There are several organizations involved in developing network APIs for NVM, which are by-and-large harmonious. local I/O local byte addressable remote byte addressable remote I/O

Optimizing Fabrics for NVM Add persistence semantics to RDMA protocols Register persistent memory regions Completion semantics to ensure persistence, consistency Client control of persistence Solve the “write-hole” problem Lots of Initiatives underway! I/O bus mem bus NVM device(s) storage client user app NVM client NVM server NIC NIC Clearly there is lots of work to do here Can we make this work for NVDIMMs? NVMe SSDs with CMBs?

Simple Math NVMe + RDMA ------- AWESOME PM -------- AWESOME2

…continuing down the stack Memory byte-addressable Storage -object, file, block… NVM consumers APIs -OFI, Verbs, NVMe/F… network I/O bus memory bus Non-volatile Media, Form Factors

NVMe over Fabrics – Present an NVMe block device to a client over RDMA or Fibre Channel NVMe Controller Memory Buffers – Standardize (persistent) PCIe memory on NVMe devices. NVDIMM-N on PCIe bus? LightNVM – A low-level SSD interface that is more aligned to the underlying media (NAND) Updates and current topics in the NVMe Standards body

Media PM NVM NOT PM Category Vendors Comments DRAM Drop-In Everspin Micron Toshiba/SK Hynix DRAM like latency Super-Cap Replacement Not for bulk storage Memory Interface Storage Class Memory Micron-Intel SanDisk Toshiba Crossbar Nantero Faster than NAND, Cheaper/Slower than DRAM Byte Addressable Block and Memory Interfaces NAND Micron Toshiba SanDisk SK Hynix Samsung Lowest cost Slow (for NVM) Not byte addressable cheap and plentiful Block Interface PM NVM NOT PM

PM Form Factors NVDIMM-N NVDIMM-P NAND NVMe Not-NAND NVMe

Form factors impact Features (No DMA engines on a DIMM!) PM Form Factors Form-Factor Media Latency Memory Semantics Storage Features NVDIMM-N DRAM/ MRAM NVDIMM-P NAND/ PM Non-NAND NVMe DRAM/ PM NAND NVMe NAND NVDIMM-P user slower media than NVDIMM-N but less overhead than NVMe. NVDIMM-P may present memory semantics to the CPU/OS but underneath it may be using block media (NAND). NVDIMM-N stores data as cachelines. Impossible to get good ECC and DI out of that. Also no DMA engines in a DIMM form factor. Form factors impact Features (No DMA engines on a DIMM!)

PM Scenarios PM region as a block device (a la persistent ram disk). Filesystems support: direct access to the memory (e.g DAX), PM aware FS (e.g. m1fs). You can put your files, databases etc. on top. Remember we are crawling right now! Soon: Shared persistent memory Note that right now this is only x86_64. Mention that I have PMEM performance data for NVMDIMM-N in my other talk.

Libraries and Toolchains EASY HARD int main(int argc, char *argv) { printf(“Hello, PMEM World!\n”); return 0; } section .text global _start _start: mov edx,len mov ecx,msg mov ebx,1 mov eax,4 int 0x80 mov eax,1 section .data msg db 'Hello, PMEM World!',0xa len equ $ - msg Vs Make it easy for applications to utilize PM, regardless of OS and ARCH! Which would the user prefer to write. Slide showing how hard it is to code without glibc and discussing nvml. Library support that is portable across OS and ARCH. NVML is a start. Bindings for all common languages. Compiler, assembler and linker support in the most popular toolchains.

Call to Arms Lots done, lots still to do Not quite a Sisyphean task but there is a lot of work to be done

Call to Arms Libraries and Toolchains: NVML for non-x86, integration into glibc/gcc etc. Media & Form Factors: Production PM, appropriate PM form factors. Protocols and Interconnect: Enhancements to NVMe and RDMA, PM over Fabrics, standardization of memory channels. OS Support: ISA updates, DAX devices, Other OS, new OSes?

Conclusions We are almost walking! Help out if you can If you want sub 10us access to persistent data then PM may be for you The CPU vendors have a lot of say in the interconnect but some open options exist too Toolchains, libraries and OSes are adapting New applications will complete the jigsaw and lead to revenue