PASTE: A Networking API for Non-Volatile Main Memory Michio Honda (NEC Laboratories Europe) Lars Eggert (NetApp) Douglas Santry (NetApp) TSVAREA@IETF 99, Prague May 22th 2017 More details at our HotNets paper: https://goo.gl/Ssreei © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
What are implications for networking? Motivation Non-Volatile Main Memories (NVMMs) Persistent Byte-addressable Low latency 10s-1000s of ns A lot of work in databases and file systems NVHeap [ASPLOS'11], NOVA [FAST'16], NVWal [ASPLOS'16], HiKV [ATC'17] etc https://www.hpe.com/us/en/servers/persistent-memory.html What are implications for networking? © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
Review: Today's Stack with NVMM © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
Review: Today's Stack with NVMM – How data move from the network to the storage © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
Case Study: Careful Data Transfer Server persists client’s request prior to acknowledgment e.g., 1KB commit:
Case Study: Careful Data Transfer Server persists client’s request prior to acknowledgment e.g., 1KB commit: (4) write()/fsync() or memcpy()/msync() (3) read() App (5) Network stack (2) Storage stack DRAM client NIC SSD/ DIsk (1) 2030 us Networking (w/o step (4)) takes 40 us © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
Case Study: Careful Data Transfer Server persists client’s request prior to acknowledgment e.g., 1KB commit: write()/fsync() or memcpy()/msync() (3) read() App (4) Network stack Storage stack (2) DRAM NVMM (5) client NIC Emulated using a reserved region of DRAM (1) 2000 42 us Networking takes 40 us This 2 us is not small
Case Study: Careful Data Transfer Parallel requests are serialized on each core 33 % throughput decrease, 50 % latency increase
We must avoid data copy! Data Copies Matter Cache Misses Persisting data (e.g., to a log) always happens to a different destination app buffer kernel buffer read() log file (mmap()-ed) memcpy() Overall cache misses Major Contributor Networking only 0.0004 % net_rx_action() (84%) Networking + NVMM (read() + memcpy() + msync()) 4.4121 % memcpy() (98%) Reported by Linux perf We must avoid data copy! © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
Packet Store (PASTE) Overview Packet buffers on a named NVMM region DMA to NVMM Zero-copy APIs (4) metadata only (e.g., buffer index) App (3) Network stack Storage stack NVMM /mnt/nvmm/pktbufs /mnt/pmem/appmd (2) client NIC (1)
Fast Persistent Write with PASTE Application (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs
Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs
Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs
Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs
Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO)
Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO)
Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO)
Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO)
Fast Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO) © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
Unnecessary data is not flushed to DIMM Fast and Selective Persistent Write with PASTE Idempotent request Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO) Unnecessary data is not flushed to DIMM © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
Unnecessary data is not flushed to DIMM Fast and Selective Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO) Unnecessary data is not flushed to DIMM © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
Unnecessary data is not flushed to DIMM Fast and Selective Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO) Unnecessary data is not flushed to DIMM © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
Unnecessary data is not flushed to DIMM Fast and Selective Persistent Write with PASTE Unread Application Read or written (1) Read data (zero copy) (2) Write metadata entry (3) Flush (buffer and metadata) Flushed User mmap() netmap API Kernel metadata header TCP/IP input and output /mnt/nvmm/pktbufs buf_ofs: 123 metadata entries buf_idx off len 100 1135 1 932 3 1024 NIC ring packet buffers (static) /mnt/nvmm/myapp_metadata /mnt/nvmm/pktbufs DMA is performed to L3 cache (DDIO) Unnecessary data is not flushed to DIMM © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
Extend the netmap framework Preliminary Results Implementation Extend the netmap framework TCP/IP is also supported 10-88 % throughput increase, 9-46 % latency reduction © 2015 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
Related Work Enhanced network stacks MegaPipe (OSDI’12), Stackmap (ATC’16), Fastsocket (ASPLOS’16) IX and Arrakis (OSDI’14), mTCP (NSDI’13), Sandstorm (SIGCOMM’14), MICA (NSDI’14) NVMM filesystems BPFS (SOSP’09), NOVA (FAST’15) NVMM databases NVWAL (ASPLOS’15), REWIND (VLDB’15), NV-Tree (FAST’15) No NVMM aware No networking aware
Networking APIs are now a bottleneck for durably storing data Conclusion Networking APIs are now a bottleneck for durably storing data Ongoing work Brushing up APIs Smooth integration with netmap API Forming a filesystem with packet buffers Fast data movement between files Use case applications App Netmap API Network Stack Filesystem NVMM NIC