PASTE: A Networking API for Non-Volatile Main Memory

Slides:



Advertisements
Similar presentations
Assessment of Data Path Implementations for Download and Streaming Pål Halvorsen 1,2, Tom Anders Dalseng 1 and Carsten Griwodz 1,2 1 Department of Informatics,
Advertisements

Better I/O Through Byte-Addressable, Persistent Memory
Rethink the Sync Ed Nightingale Kaushik Veeraraghavan Peter Chen Jason Flinn University of Michigan.
1 Design and Implementation of A Content-aware Switch using A Network Processor Li Zhao, Yan Luo, Laxmi Bhuyan University of California, Riverside Ravi.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
Memory-Mapped Files & Unified VM System Vivek Pai.
Joel Coburn. , Trevor Bunker
I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.
Performance Tradeoffs for Static Allocation of Zero-Copy Buffers Pål Halvorsen, Espen Jorde, Karl-André Skevik, Vera Goebel, and Thomas Plagemann Institute.
Network Stack Specialization for Performance
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,
Srihari Makineni & Ravi Iyer Communications Technology Lab
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
ND The research group on Networks & Distributed systems.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Making the “Box” Transparent: System Call Performance as a First-class Result Yaoping Ruan, Vivek Pai Princeton University.
The Kangaroo Approach to Data Movement on the Grid Author: D. Thain, J. Basney, S.-C. Son, and M. Livny From: HPDC 2001 Presenter: NClab, KAIST, Hyonik.
Distributed File Systems Questions answered in this lecture: Why are distributed file systems useful? What is difficult about distributed file systems?
Presented by: Xianghan Pei
NVWAL: Exploiting NVRAM in Write-Ahead Logging
CPSC 426: Building Decentralized Systems Persistence
Tgt: Framework Target Drivers FUJITA Tomonori NTT Cyber Solutions Laboratories Mike Christie Red Hat, Inc Ottawa Linux.
Compute and Storage For the Farm at Jlab
Persistent Memory (PM)
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Hathi: Durable Transactions for Memory using Flash
Arrakis: The Operating System is the Control Plane
5/3/2018 3:51 AM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,
Failure-Atomic Slotted Paging for Persistent Memory
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
Free Transactions with Rio Vista
Jonathan Walpole Computer Science Portland State University
J. Xu et al, FAST ‘16 Presenter: Rohit Sarathy April 18, 2017
Using non-volatile memory (NVDIMM-N) as block storage in Windows Server 2016 Tobias Klima Program Manager.
Persistent Memory over Fabrics
HPE Persistent Memory Microsoft Ignite 2017
Effects of Storage Trends on Databases
SABRes: Atomic Object Reads for In-Memory Rack-Scale Computing
Chet Douglas – DCG NVMS SW Architecture
Better I/O Through Byte-Addressable, Persistent Memory
Lecture 11: DMBS Internals
Gwangsun Kim Niladrish Chatterjee Arm, Inc. NVIDIA Mike O’Connor
NOVA: A High-Performance, Fault-Tolerant File System for Non-Volatile Main Memories Andiry Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah,
Operation System Program 4
11/13/ :11 PM Memory Efficient Loss Recovery for Hardware-based Transport in Datacenter Yuanwei Lu1,2, Guo Chen2, Zhenyuan Ruan1,2, Wencong Xiao2,3,
HyperLoop: Group-Based NIC Offloading to Accelerate Replicated Transactions in Multi-tenant Storage Systems Daehyeok Kim Amirsaman Memaripour, Anirudh.
OSDI ‘14 Best Paper Award Adam Belay George Prekas Ana Klimovic
Building a Database on S3
Rethink the Sync Ed Nightingale Kaushik Veeraraghavan Peter Chen
CS703 - Advanced Operating Systems
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Free Transactions with Rio Vista
Rethink the Sync Ed Nightingale Kaushik Veeraraghavan Peter Chen
CSE 451 Fall 2003 Section 11/20/2003.
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Ch 9 – Distributed Filesystem
Software Overheads of Storage in Ceph
Fast Accesses to Big Data in Memory and Storage Systems
Towards Stateless RNIC for Data Center Networks
PMM working group.
SocksDirect: Datacenter Sockets can be Fast and Compatible
Presentation transcript:

PASTE: A Networking API for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Lars Eggert (NetApp) Douglas Santry (NetApp) TSVAREA@IETF 99, Prague May 22th 2017 More details at our HotNets paper: https://goo.gl/Ssreei © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

What are implications for networking? Motivation Non-Volatile Main Memories (NVMMs) Persistent Byte-addressable Low latency 10s-1000s of ns A lot of work in databases and file systems NVHeap [ASPLOS'11], NOVA [FAST'16], NVWal [ASPLOS'16], HiKV [ATC'17] etc https://www.hpe.com/us/en/servers/persistent-memory.html What are implications for networking? © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

Review: Today's Stack with NVMM © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

Review: Today's Stack with NVMM – How data move from the network to the storage © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

Case Study: Careful Data Transfer Server persists client’s request prior to acknowledgment e.g., 1KB commit:

Case Study: Careful Data Transfer Server persists client’s request prior to acknowledgment e.g., 1KB commit: (4) write()/fsync() or memcpy()/msync() (3) read() App (5) Network stack (2) Storage stack DRAM client NIC SSD/ DIsk (1) 2030 us Networking (w/o step (4)) takes 40 us © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

Case Study: Careful Data Transfer Server persists client’s request prior to acknowledgment e.g., 1KB commit: write()/fsync() or memcpy()/msync() (3) read() App (4) Network stack Storage stack (2) DRAM NVMM (5) client NIC Emulated using a reserved region of DRAM (1) 2000 42 us Networking takes 40 us This 2 us is not small

Case Study: Careful Data Transfer Parallel requests are serialized on each core 33 % throughput decrease, 50 % latency increase

We must avoid data copy! Data Copies Matter Cache Misses Persisting data (e.g., to a log) always happens to a different destination app buffer kernel buffer read() log file (mmap()-ed) memcpy() Overall cache misses Major Contributor Networking only 0.0004 % net_rx_action() (84%) Networking + NVMM (read() + memcpy() + msync()) 4.4121 % memcpy() (98%) Reported by Linux perf We must avoid data copy! © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

Packet Store (PASTE) Overview Packet buffers on a named NVMM region DMA to NVMM Zero-copy APIs (4) metadata only (e.g., buffer index) App (3) Network stack Storage stack NVMM /mnt/nvmm/pktbufs /mnt/pmem/appmd (2) client NIC (1)

Fast Persistent Write with PASTE Application (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs

Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs

Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs

Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs

Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO)

Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO)

Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO)

Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO)

Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO) © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

Unnecessary data is not flushed to DIMM Fast and Selective Persistent Write with PASTE Idempotent request Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO) Unnecessary data is not flushed to DIMM © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

Unnecessary data is not flushed to DIMM Fast and Selective Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO) Unnecessary data is not flushed to DIMM © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

Unnecessary data is not flushed to DIMM Fast and Selective Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO) Unnecessary data is not flushed to DIMM © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

Unnecessary data is not flushed to DIMM Fast and Selective Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO) Unnecessary data is not flushed to DIMM © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

Extend the netmap framework Preliminary Results Implementation Extend the netmap framework TCP/IP is also supported 10-88 % throughput increase, 9-46 % latency reduction © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only

Related Work Enhanced network stacks MegaPipe (OSDI’12), Stackmap (ATC’16), Fastsocket (ASPLOS’16) IX and Arrakis (OSDI’14), mTCP (NSDI’13), Sandstorm (SIGCOMM’14), MICA (NSDI’14) NVMM filesystems BPFS (SOSP’09), NOVA (FAST’15) NVMM databases NVWAL (ASPLOS’15), REWIND (VLDB’15), NV-Tree (FAST’15) No NVMM aware No networking aware

Networking APIs are now a bottleneck for durably storing data Conclusion Networking APIs are now a bottleneck for durably storing data Ongoing work Brushing up APIs Smooth integration with netmap API  Forming a filesystem with packet buffers Fast data movement between files Use case applications App Netmap API Network Stack Filesystem NVMM NIC