Download presentation
Presentation is loading. Please wait.
1
ZUFS - Zero-copy User-mode FS
A new interface for a new breed of user-mode filesystems that require: - Extremely Low-Latency, Synchronous & DAX, NUMA-aware access Boaz Linux Plumbers Sep. 2017 1 © 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
2
In theory kernel APP ZUF APP zt zt zt zt ZU Thread per cpu ... APP ZUS
Zu Feeder APP zt zt zt zt ZU Thread per cpu ... APP ZUS Zu Server Zufs-foo.so Zufs-bar.so Zufs-mem.so User space 2 © 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
3
In Theory ZT - ZUFS Thread per CPU, affinity on a single CPU (thread_fifo/rr) Special ZUFS communication file per ZT (O_TMPFILE + IOCTL_ZUFS_INIT) ZT-vma - Mmap 4M vma zero copy communication area per ZT IOCTL_ZU_WAIT_OPT – threads sleeps in Kernel waiting for an operation On App IO current CPU ZT is selected, app pages mapped into ZT-vma. Server thread released with an operation After execution, ZT returns to kernel (IOCTL_ZU_WAIT_OPT), app is released, Server wait for new operation. On exit (or server crash) file is closed, Kernel cleans all resources 3 © 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
4
In theory kernel ZUF APP P P P Zu Thread zt-vma App pages Mapped into
Zu Feeder kernel APP P P P Zu Thread zt-vma App pages Mapped into Server VM Unmapped on return ZUS Zu Server User space 4 © 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
5
In Theory Async operation is also supported
Server must not sleep in a ZT. All locks are trylocks. If failed to lock operation is queued and server returns EAGAIN. Server will later complete the operation ASYNC. App will be woken up. Do we need PAGE_CACHE support ? Also here write/read_pages() maps page-cache to zt-vma Application mmap is the opposite direction. ZUS exposes pages (opt_get_data_block) into the app VM 5 © 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
6
Raw Results FUSE Vs. ZUFS vs In Kernel In Kernel FS Threads Op/s
Lat [us] 1 71,820 13.5 2 148,083 13.1 4 212,133 18.3 8 209,799 37.6 12 201,689 58.7 18 174,823 101.8 24 149,413 159.0 36 148,276 240.7 48 145,296 327.3 ZUFS 200,799 4.6 314,321 5.9 565,574 6.6 1,113,138 1,598,451 6.8 1,648,689 7.8 1,702,285 8.0 1,783,346 13.4 1,741,873 17.4 FUSE Vs. ZUFS vs In Kernel In Kernel FS Threads Op/s Lat (us) 1 388361 2 635115 4 8 12 18 24 36 48 6 © 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
7
Motivation for ZUFS (for near-memory speed PM media)
Measured on Dual socket Intel XEON 2650v4 (48 HW Threads) DRAM-backed PM type Random 4KB DirectIO writ(ish) access 7 © 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
8
Why is the mm patch required
MMAP_LOCAL_CPU Own-core TLB invalidate Secure file system signing? 8 © 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
9
Raw Results w/ and wo/ mm patch patched unpatched ZUFS penalty Threads
Op/s Lat [us] 1 200,799 4.6 2 314,321 5.9 4 565,574 6.6 8 1,113,138 12 1,598,451 6.8 18 1,648,689 7.8 24 1,702,285 8.0 36 1,783,346 13.4 48 1,741,873 17.4 unpatched 185,391 4.9 197,993 9.6 310,597 12.1 546,702 13.8 641,728 17.2 744,750 22.2 790,805 28.3 849,763 38.9 792,000 44.6 w/ and wo/ mm patch 9 © 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
10
Additional Design Considerations
Single ZUS application server ZUFS filesystems are .so libraries loaded into ZUS. (pre configured or at run time) Regular mount command. New Super blocks created Devices are managed and owned by ZUF in Kernel Bind mount also works, the regular way. ZUS-API with fs-plugins very close to VFS API. Support for compiling zus-plugins as kernel modules also fed by ZUF? 10 © 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
11
Thank you Please talk to me about ZUFS boazh@netapp.com 11
© 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
12
static int _zu_wait(struct file *file, void *parg) {
struct zufs_thread *zt; int cpu = smp_processor_id(); int err; err = _zt_from_f(file, cpu, &zt); if (unlikely(err)) goto err; zt->fss_waiting = true; if (zt->app_waiting) { _unmap_pages(zt, zt->pages, zt->nump); zt->app_waiting = false; get_user(zt->next_opt.hdr.err, (int *)parg); _zu_wakeup_app(zt); } _zu_wait_fss(zt); zt->fss_waiting = false; /* call map here at the zuf thread so we need no locks */ if (zt->next_opt.operation && zt->next_opt.operation < ZUS_OP_BREAK) _map_pages(zt, zt->pages, zt->nump, false); err = copy_to_user(parg, &zt->next_opt, sizeof(zt->next_opt)); return err; err: put_user(err, (int *)parg); int zufs_dispatch(struct m1fs_sb_info *sbi, int operation, uint pgoffset, struct page **pages, uint nump, u64 filepos, uint len) if ((cpu < 0) || (sbi->_max_zts <= cpu)) return -ERANGE; zt = &sbi->_all_zt[cpu]; if (unlikely(!zt->file)) return -EIO; while (!zt->fss_waiting) { mb(); m1fs_err("[%d] can this be\n", cpu); msleep(100); zt->next_opt.operation = operation; zt->next_opt.offset = pgoffset; zt->next_opt.filepos = filepos; zt->next_opt.len = len; zt->pages = pages; zt->nump = nump; zt->app_waiting = true; _zu_wakeup_fss(zt); _zu_wait_app(zt); return zt->file ? zt->next_opt.hdr.err : -EIO; static void _zu_wakeup_fss(struct zufs_thread *zt) { zt->fss_wakeup = true; wake_up(&zt->fss_wq); } static void _zu_wakeup_app(struct zufs_thread *zt) zt->app_wakeup = true; wake_up(&zt->app_wq); static int _zu_wait_fss(struct zufs_thread *zt) zt->fss_wakeup = false; return wait_event_interruptible(zt->fss_wq, zt->fss_wakeup); static int _zu_wait_app(struct zufs_thread *zt) zt->app_wakeup = false; return wait_event_interruptible(zt->app_wq, zt->app_wakeup); 12 © 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
13
Abstract FUSE enables user space file systems ever since kernel It is a widely popular vehicle for rapid development and tens of file systems have used it to date. FUSE is asynchronous in nature and heavily relies on the operating system page cache. It was designed with hard drive latency in mind and was measured to add penalty of 12.5 to micro second [us], depending on the load. Emerging persistent memory technologies, such as NVDIMM-N and 3D XPoint / MRAM / ReRAM based NVDIMM, operate at near memory speed and require a different user space file system mechanism. One that is tuned to latency. The motivation of this work is to enable new bread of User-mode work, based on above Technologies that typically respond within a single micro second – faster than any caching, redundant data copying and queuing. ZUFS, pronounced Zoo-FS and stands for Zero-copy User-mode FS is a new kernel project designed to fill that gap. ZUFS is designed from the get go to provide an example of a full, PMEM based, FS, to demonstrate Speed, behavior and correctness. But the motivation is that not only full flagged Filesystems need apply. Any other User-Mode Service that wants to very effectively with modern direct mmapped access, service many applications and can enjoy a filesystem like interface. Can enjoy this new ABI from Kernel. The emphasis here is on multi-threaded, Low Latency, synchronous, Zero copy, direct mapped type access from the application to the Server-Application. And vise versa direct mapping of Server resources into the application. All this without sacrificing security or robustness. And how we think we can do this? Please see the attached paper, and come to our talk. But mainly this is more of an open question then a ready made proposal. 13 © 2017 NetApp, Inc. All rights reserved NETAPP CONFIDENTIAL ---
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.