Presentation is loading. Please wait.

Presentation is loading. Please wait.

NOVA: A High-Performance, Fault-Tolerant File System for Non-Volatile Main Memories Andiry Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah,

Similar presentations


Presentation on theme: "NOVA: A High-Performance, Fault-Tolerant File System for Non-Volatile Main Memories Andiry Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah,"— Presentation transcript:

1 NOVA: A High-Performance, Fault-Tolerant File System for Non-Volatile Main Memories
Andiry Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Andy Rudoff (Intel), Steven Swanson Non-Volatile Systems Laboratory Department of Computer Science & Engineering University of California, San Diego

2 Non-volatile main memory and DAX
Non-volatile main memory (NVMM) Byte-addressable, durable memory Latency and bandwidth close to DRAM Battery-backed DRAM, PCM, 3D XPoint DAX: direct access (from user-space) Load/store access Bypass page-cache No kernel code in the way

3 Expectations on NVMM file systems
Speed POSIX I/O Atomicity Fault Tolerance Direct Access DAX

4 ✔ ❌ Ext4 XFS BtrFS F2FS DAX Fault Direct Speed POSIX I/O Atomicity
Tolerance Speed Direct Access DAX

5 ✔ ❌ PMFS Ext4-DAX XFS-DAX DAX Fault Direct Speed POSIX IO Atomicity
DRAM + Flash NVDIMM 3D XPoint NVDIMM POSIX IO Atomicity Fault Tolerance Speed Direct Access DAX

6 ✔ ❌ NOVA DAX Fault Direct Speed POSIX IO Atomicity Tolerance Access
FAST’16 DRAM + Flash NVDIMM 3D XPoint NVDIMM POSIX IO Atomicity Fault Tolerance Speed Direct Access DAX

7 ✔ NOVA-Fortis DAX Fault Direct Speed POSIX IO Atomicity Tolerance
SOSP’2017 DRAM + Flash NVDIMM 3D XPoint NVDIMM POSIX IO Atomicity Fault Tolerance Speed Direct Access DAX

8 Outline NOVA architecture Atomicity Fault tolerance Evaluations
Conclusions

9 NOVA: Log Structured file system For NVMM
Scalable per-inode log design Inode keeps head/tail pointers Log contains operation entries Data pages stored separately CPU 0 CPU 1 Inode table Inode Head Tail Inode log Data 0 Data 1 Data 2

10 Atomicity: Single log appending
Log-structuring for atomic single log appending Write, chmod, msync etc. Atomically update the 8-byte tail Lower overhead than journaling and shadow paging append(tail, entry); persist(); *tail = new_offset; Tail Tail Inode log

11 Atomicity: Lightweight journaling
Lightweight journaling for update across logs Create, rename, etc Only journal inode tails instead of metadata or data Tail Tail Directory log Tail Tail File log Dir tail Journal File tail

12 Atomicity: Copy-on-write
Copy-on-write for atomic file data writes Data pages stored separately Recycle pages when overwritten Tail Tail File log Data 0 Data 1 Data 1 Data 2

13 Fault-tolerance Snapshot: Point-in-time backups
Data integrity: Continuously detect and repair errors Existing snapshot file systems: Additional metadata for each block Incurs additional writes Blocks I/O operations

14 Snapshot for file access
Global snapshot ID Saved to each log entry Checked before recycling file pages Snapshot ID 1 Snapshot 0 Inode log Page 1 1 Page 1 1 Page 1 Data Data Data Data Epoch ID File write entry Data in snapshot Current data

15 Snapshot for DAX: Idea Set pages read-only, then copy-on-write
Applications: no file system intervention File system: DAX-mmap() File data: RO

16 Snapshot for DAX: Incorrect implementation
? T Snapshot for DAX: Incorrect implementation Application invariant: if V is True, then D is valid Application thread Application values NOVA snapshot Snapshot values D = ?; V = False; D = 5; V = True; D V snapshot_begin(); set_read_only(page_d); copy_on_write(page_d); set_read_only(page_v); snapshot_end(); D V ? F ? page fault 5 F 5 T ? T ? T

17 Snapshot for DAX: Correct implementation
? T Snapshot for DAX: Correct implementation Application invariant: if V is True, then D is valid Application thread Application values NOVA snapshot Snapshot values D = ?; V = False; D = 5; V = True; D V snapshot_begin(); set_read_only(page_d); set_read_only(page_v); snapshot_end(); copy_on_write(page_d); copy_on_write(page_v); D V ? F ? page fault ? F ? F 5 F 5 T

18 Application performance during snapshot
Two kinds of applications File access DAX-mmap Normal execution v.s. taking a snapshot every 10 seconds File access DAX-mmap <1% throughput loss average 3.7% throughput loss

19 NVMM failure modes Detectable by hardware Undetectable by hardware
Detectable & correctable Detectable & uncorrectable Undetectable by hardware Undetectable media errors “Scribbles” due to kernel bugs

20 NVMM failure modes: Media failures
Detectable by hardware Detectable & correctable Detectable & uncorrectable Undetectable by hardware Undetectable media errors “Scribbles” due to kernel bugs Software: NVMM Ctrl.: Read NVMM data: Detects & corrects errors Consumes good data Media error

21 NVMM failure modes: Media failures
Detectable by hardware Detectable & correctable Detectable & uncorrectable Undetectable by hardware Undetectable media errors “Scribbles” due to kernel bugs NVMM data: Software: NVMM Ctrl.: Detects uncorrectable errors Raises exception Receives MCE Media error Read

22 NVMM failure modes: Media failures
Detectable by hardware Detectable & correctable Detectable & uncorrectable Undetectable by hardware Undetectable media errors “Scribbles” due to kernel bugs NVMM data: Media error Software: NVMM Ctrl.: Sees no error Consumes corrupted data Read

23 NVMM failure modes: Scribbles
Detectable by hardware Detectable & correctable Detectable & uncorrectable Undetectable by hardware Undetectable media errors “Scribbles” due to kernel bugs NVMM data: Software: NVMM Ctrl.: Updates ECC Bug code scribbles NVMM Scribble error Write

24 Capture machine checks
Copy (meta)data from NVMM to DRAM memcpy with exception handling Check memcpy return code

25 NOVA metadata protection
Checksum + replication Per-inode, per-log-entry checksum Replicate all metadata structures replica inode Head1 Tail1 csum Head2 Tail2 Head’ Tail’ csum’ inode Head Tail csum Head1 Tail1 csum Head2 Tail2 Head Tail Ent1 Entn Ent1 C1 Entn Cn C1 Cn Data 1 Data 2

26 NOVA tick-tock metadata updates
Always have an fail-safe version of metadata replica inode Head1 Tail1 csum Head2 Tail2 inode Head1 Tail1 csum Head2 Tail2 Old Updating New

27 NOVA data integrity Checksum + parity Sub-page per-strip checksums
One parity strip for each page Data 1 Data 2 S0 S1 S2 S3 S4 S5 S6 S7 P 1 Page (8 strips) P = ⊕ S0..7 Ci = CRC32(Si) Replicated Ci

28 Application performance on NOVA

29 Application performance on NOVA

30 Application performance on NOVA

31 https://github.com/NVSL/linux-nova
Conclusions NOVA’s multi-log design achieves high performance, strong consistency, and fault tolerance on NVMM NOVA supports consistent snapshot with DAX-mmap() NOVA outperforms existing file systems while providing stronger atomicity and data integrity

32 https://github.com/NVSL/linux-nova
Thank you!


Download ppt "NOVA: A High-Performance, Fault-Tolerant File System for Non-Volatile Main Memories Andiry Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah,"

Similar presentations


Ads by Google