Download presentation
Presentation is loading. Please wait.
Published byEdith Walter Modified over 6 years ago
1
NOVA: A High-Performance, Fault-Tolerant File System for Non-Volatile Main Memories
Andiry Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Andy Rudoff (Intel), Steven Swanson Non-Volatile Systems Laboratory Department of Computer Science & Engineering University of California, San Diego
2
Non-volatile main memory and DAX
Non-volatile main memory (NVMM) Byte-addressable, durable memory Latency and bandwidth close to DRAM Battery-backed DRAM, PCM, 3D XPoint DAX: direct access (from user-space) Load/store access Bypass page-cache No kernel code in the way
3
Expectations on NVMM file systems
Speed POSIX I/O Atomicity Fault Tolerance Direct Access DAX
4
✔ ❌ Ext4 XFS BtrFS F2FS DAX Fault Direct Speed POSIX I/O Atomicity
Tolerance Speed Direct Access DAX ✔ ❌
5
✔ ❌ PMFS Ext4-DAX XFS-DAX DAX Fault Direct Speed POSIX IO Atomicity
DRAM + Flash NVDIMM 3D XPoint NVDIMM ❌ ✔ POSIX IO Atomicity Fault Tolerance Speed Direct Access DAX
6
✔ ❌ NOVA DAX Fault Direct Speed POSIX IO Atomicity Tolerance Access
FAST’16 DRAM + Flash NVDIMM 3D XPoint NVDIMM ✔ ❌ POSIX IO Atomicity Fault Tolerance Speed Direct Access DAX
7
✔ NOVA-Fortis DAX Fault Direct Speed POSIX IO Atomicity Tolerance
SOSP’2017 DRAM + Flash NVDIMM 3D XPoint NVDIMM ✔ POSIX IO Atomicity Fault Tolerance Speed Direct Access DAX
8
Outline NOVA architecture Atomicity Fault tolerance Evaluations
Conclusions
9
NOVA: Log Structured file system For NVMM
Scalable per-inode log design Inode keeps head/tail pointers Log contains operation entries Data pages stored separately CPU 0 CPU 1 Inode table Inode Head Tail Inode log Data 0 Data 1 Data 2
10
Atomicity: Single log appending
Log-structuring for atomic single log appending Write, chmod, msync etc. Atomically update the 8-byte tail Lower overhead than journaling and shadow paging append(tail, entry); persist(); *tail = new_offset; Tail Tail Inode log
11
Atomicity: Lightweight journaling
Lightweight journaling for update across logs Create, rename, etc Only journal inode tails instead of metadata or data Tail Tail Directory log Tail Tail File log Dir tail Journal File tail
12
Atomicity: Copy-on-write
Copy-on-write for atomic file data writes Data pages stored separately Recycle pages when overwritten Tail Tail File log Data 0 Data 1 Data 1 Data 2
13
Fault-tolerance Snapshot: Point-in-time backups
Data integrity: Continuously detect and repair errors Existing snapshot file systems: Additional metadata for each block Incurs additional writes Blocks I/O operations
14
Snapshot for file access
Global snapshot ID Saved to each log entry Checked before recycling file pages Snapshot ID 1 Snapshot 0 Inode log Page 1 1 Page 1 1 Page 1 Data Data Data Data Epoch ID File write entry Data in snapshot Current data
15
Snapshot for DAX: Idea Set pages read-only, then copy-on-write
Applications: no file system intervention File system: DAX-mmap() File data: RO
16
Snapshot for DAX: Incorrect implementation
? T Snapshot for DAX: Incorrect implementation Application invariant: if V is True, then D is valid Application thread Application values NOVA snapshot Snapshot values D = ?; V = False; D = 5; V = True; D V snapshot_begin(); set_read_only(page_d); copy_on_write(page_d); set_read_only(page_v); snapshot_end(); D V ? F ? page fault 5 F 5 T ? T ? T
17
Snapshot for DAX: Correct implementation
? T Snapshot for DAX: Correct implementation Application invariant: if V is True, then D is valid Application thread Application values NOVA snapshot Snapshot values D = ?; V = False; D = 5; V = True; D V snapshot_begin(); set_read_only(page_d); set_read_only(page_v); snapshot_end(); copy_on_write(page_d); copy_on_write(page_v); D V ? F ? page fault ? F ? F 5 F 5 T
18
Application performance during snapshot
Two kinds of applications File access DAX-mmap Normal execution v.s. taking a snapshot every 10 seconds File access DAX-mmap <1% throughput loss average 3.7% throughput loss
19
NVMM failure modes Detectable by hardware Undetectable by hardware
Detectable & correctable Detectable & uncorrectable Undetectable by hardware Undetectable media errors “Scribbles” due to kernel bugs
20
NVMM failure modes: Media failures
Detectable by hardware Detectable & correctable Detectable & uncorrectable Undetectable by hardware Undetectable media errors “Scribbles” due to kernel bugs Software: NVMM Ctrl.: Read NVMM data: Detects & corrects errors Consumes good data Media error
21
NVMM failure modes: Media failures
Detectable by hardware Detectable & correctable Detectable & uncorrectable Undetectable by hardware Undetectable media errors “Scribbles” due to kernel bugs NVMM data: Software: NVMM Ctrl.: Detects uncorrectable errors Raises exception Receives MCE Media error Read
22
NVMM failure modes: Media failures
Detectable by hardware Detectable & correctable Detectable & uncorrectable Undetectable by hardware Undetectable media errors “Scribbles” due to kernel bugs NVMM data: Media error Software: NVMM Ctrl.: Sees no error Consumes corrupted data Read
23
NVMM failure modes: Scribbles
Detectable by hardware Detectable & correctable Detectable & uncorrectable Undetectable by hardware Undetectable media errors “Scribbles” due to kernel bugs NVMM data: Software: NVMM Ctrl.: Updates ECC Bug code scribbles NVMM Scribble error Write
24
Capture machine checks
Copy (meta)data from NVMM to DRAM memcpy with exception handling Check memcpy return code
25
NOVA metadata protection
Checksum + replication Per-inode, per-log-entry checksum Replicate all metadata structures replica inode Head1 Tail1 csum Head2 Tail2 Head’ Tail’ csum’ inode Head Tail csum Head1 Tail1 csum Head2 Tail2 Head Tail Ent1 Entn … Ent1 C1 Entn Cn … C1 Cn Data 1 Data 2
26
NOVA tick-tock metadata updates
Always have an fail-safe version of metadata replica inode Head1 Tail1 csum Head2 Tail2 inode Head1 Tail1 csum Head2 Tail2 Old Updating New
27
NOVA data integrity Checksum + parity Sub-page per-strip checksums
One parity strip for each page Data 1 Data 2 S0 S1 S2 S3 S4 S5 S6 S7 P 1 Page (8 strips) P = ⊕ S0..7 Ci = CRC32(Si) Replicated Ci
28
Application performance on NOVA
29
Application performance on NOVA
30
Application performance on NOVA
31
https://github.com/NVSL/linux-nova
Conclusions NOVA’s multi-log design achieves high performance, strong consistency, and fault tolerance on NVMM NOVA supports consistent snapshot with DAX-mmap() NOVA outperforms existing file systems while providing stronger atomicity and data integrity
32
https://github.com/NVSL/linux-nova
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.