Presentation is loading. Please wait.

Presentation is loading. Please wait.

Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang.

Similar presentations


Presentation on theme: "Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang."— Presentation transcript:

1 Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang

2 2 Conquest Overview File systems are optimized for disks Performance problem Complexity Now we have tons of inexpensive RAM What can we do with that RAM?

3 3 Conquest Approach Combine disk and persistent RAM (e.g., battery-backed RAM) in a novel way Simplification At least 20% smaller code base than ext2, reiserfs, and SGI XFS Performance (under popular benchmarks) 24% to 1900% faster than LRU disk caching Best performance boost since Berkeley FFS

4 4 Performance Problem of Disks 19902000 1 KHz 1 MHz 1 GHz CPU (50% /yr) memory (50% /yr) disk (15% /yr) accesses per second (log scale) 10 5 10 6 1995 (1 sec : 6 days)(1 sec : 3 months) Genesis Conquest Design Performance Evaluation Conclusion

5 5 Inside Pandora’s Box Disk arm Disk platters Access time = seek time (disk arm) + rotational delay (disk platter) + transfer time Genesis Conquest Design Performance Evaluation Conclusion

6 6 Disk Optimization Methods Disk arm scheduling Group information on disk Disk readahead Buffered writes Disk caching Data mirroring Hardware parallelism Genesis Conquest Design Performance Evaluation Conclusion

7 7 Complexity Bytes synchronization predictive readahead cache replacement elevator algorithm data clustering data consistency asynchronous write Genesis Conquest Design Performance Evaluation Conclusion

8 [Caceres et al., 1993; Hillyer et al., 1996; Qualstar 1998; Tanisys 1999; Quantum 2000; Micron Semiconductor Products 2002]8 Storage Media Alternatives accesses/sec (log scale) $/MB (log scale) 10 0 10 3 persistent RAM magnetic RAM? (write once) flash memory disk tape battery-backed DRAM 10 -3 10 6 10 -6 Genesis Conquest Design Performance Evaluation Conclusion

9 9 The Genesis of Conquest Idea: persistent-RAM-only file system Improved performance Remove disk-related complexity Genesis Conquest Design Performance Evaluation Conclusion

10 [Grochowski 2002]10 The Genesis of Conquest (2) Problem: wrong growth curves Disk prices dropping faster than RAM prices Disks will stay around 19952005 10 0 year $/MB (log scale) 2000 10 -2 10 -1 10 1 10 2 3.5" HDD 2.5" HDD 1" HDD persistent RAM booming of digital photography Genesis Conquest Design Performance Evaluation Conclusion

11 [Grochowski 2002]11 The Genesis of Conquest (3) New idea: hybrid system for transition Takes advantage of RAM speed Still simplifies code 19952005 10 0 year $/MB (log scale) 2000 10 -2 10 -1 10 1 10 2 paper/film 3.5" HDD 2.5" HDD 1" HDD persistent RAM booming of digital photography 4 to 10 GB of persistent RAM Genesis Conquest Design Performance Evaluation Conclusion

12 12 Conquest Design Questions How to make effective use of RAM? Common usage patterns Physical characteristics of RAM storage Where and how to reduce complexity? Data paths Data structures and associated management Shutdown/boot sequence How to assure the integrity of file system components that reside in BB-DRAM? Genesis Conquest Design Performance Evaluation Conclusion

13 [Ousterhout 1985; Baker et al., 1991; Iram 1993; Douceur and Bolosky 1999; Roselli et al., 2000; Evans and Kuenning 2002]13 User Access Patterns Small files Take little space (10%) Represent most accesses (90%) Large files Take most space Mostly sequential accesses Not characteristic of database applications Genesis Conquest Design Performance Evaluation Conclusion

14 14 Characteristics of Storage Media RAM Fast random accesses Cost-effective in performance Disk Fast sequential accesses Cost-effective in storage Genesis Conquest Design Performance Evaluation Conclusion

15 15 The Design of Conquest Deliver all file system services from memory, with the exception of high-capacity storage Persistent RAM Data content of small files (smaller than 1 MB) Metadata (file descriptions for large and small files, directories, and data structures) Disk Data content of large files Two separate data paths to memory and disk Genesis Conquest Design Performance Evaluation Conclusion

16 [McKusick et al., 1990; Ganger et al., 2000; Roselli et al., 2000; Seltzer et al., 2000]16 Conquest Alternatives Disk caching Assumption of scarce memory Use disk as the final storage destination Complex mechanisms to maintain consistency RAM drives and RAM file systems Not meant to be persistent Use disk-related mechanisms Limitations on storage capacity Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

17 17 Simplification of Data Paths Genesis Conquest Design Performance Evaluation Conclusion

18 18 Content of Persistent RAM Data content of small files (< 1MB) No seek time or rotational delays Fast byte-level accesses Virtual contiguous allocation Metadata (e.g., directories, file system states) Fast synchronous update No dual representations For both large and small files Genesis Conquest Design Performance Evaluation Conclusion

19 19 Memory Data Path of Conquest Conventional File Systems I/O buffer disk management storage requests I/O buffer management disk persistence support Conquest Memory Data Path storage requests persistence support battery-backed RAM small file and metadata storage Genesis Conquest Design Performance Evaluation Conclusion

20 [Namesys 2002]20 Large-File-Only Disk Storage Only store the data content of large files Allocate in big chunks Lower access overhead Reduced management overhead No fragmentation management No tricks for small files Storing data in metadata No elaborate data structures Wrapping a balanced tree onto disk cylinders Genesis Conquest Design Performance Evaluation Conclusion

21 21 Sequential-Access Large Files Sequential disk accesses Near-raw bandwidth Well-defined readahead semantics Read-mostly Little synchronization overhead (between memory and disk) Genesis Conquest Design Performance Evaluation Conclusion

22 22 Disk Data Path of Conquest Conventional File Systems I/O buffer disk management storage requests I/O buffer management disk persistence support Conquest Disk Data Path I/O buffer management I/O buffer storage requests disk management disk battery-backed RAM small file and metadata storage large-file-only file system Genesis Conquest Design Performance Evaluation Conclusion

23 [Baker et al., 1991; Vogels 1999; Roselli et al., 2000]23 Random-Access Large Files Random access? Common definition: nonsequential access A typical movie has 150 scene changes MP3 stores the title at the end of the files Near sequential access? Simplifies large-file metadata representation significantly Genesis Conquest Design Performance Evaluation Conclusion

24 24 Simplification of Data Structures Genesis Conquest Design Performance Evaluation Conclusion

25 25 Logical File Representation File Name(s) i-node File attributes Data Genesis Conquest Design Performance Evaluation Conclusion

26 26 Physical File Representation File Name(s) i-node File attributes Data locations Data blocks Genesis Conquest Design Performance Evaluation Conclusion

27 27 Ext2 Data Representation data block location index block location data block location index block location data block location i-node (stored on disk) 10 data block location index block location Genesis Conquest Design Performance Evaluation Conclusion

28 28 Disadvantages with Ext2 Design Optimization for small files makes things complex Designed for disk storage Random-access data structure for large files that are accessed mostly sequentially Data access time dependent on the byte position in a file Maximum file size is limited Genesis Conquest Design Performance Evaluation Conclusion

29 29 Conquest Representation index array location i-node (stored in RAM) data block location Persistent RAM Single-level dynamically allocated index Fast data access for files stored in RAM Genesis Conquest Design Performance Evaluation Conclusion

30 30 Conquest Representation (2) segment list location i-node (stored in RAM) end block location begin block location end block location Disk end block location begin block location end block location Worst case: sequential memory search for random disk locations Maximum file size limited by physical storage (stored on disk) Genesis Conquest Design Performance Evaluation Conclusion

31 31 Conquest Directories Per-directory hash tables stored in memory Collisions resolved by rehashing Hard links: multiple names point to same data Problem: Dynamic resizing of directories Need to handle the current file position Important for rm -fr Genesis Conquest Design Performance Evaluation Conclusion

32 32 The Difficulty With Shrinking rm –fr hash table location i-node (stored in RAM) NULL NULL NULL NULL file i-node location file1 i-node location 0110 | file1 file i-node location file1 i-node location 1001 | file2 file i-node location file1 i-node location 1000 | dir Genesis Conquest Design Performance Evaluation Conclusion

33 33 The Difficulty With Shrinking rm -fr hash table location i-node (stored in RAM) NULL NULL NULL NULL file i-node location file1 i-node location 0110 | file1 file i-node location file1 i-node location 1001 | file2 Genesis Conquest Design Performance Evaluation Conclusion

34 34 The Difficulty With Shrinking rm -fr hash table location i-node (stored in RAM) NULL NULL NULL NULL file i-node location file1 i-node location 0110 | file1 file i-node location file1 i-node location 1001 | file2 Genesis Conquest Design Performance Evaluation Conclusion

35 35 The Difficulty With Shrinking rm -fr hash table location i-node (stored in RAM) NULL NULL file i-node location file1 i-node location 0110 | file1 file i-node location file1 i-node location 1001 | file2 Genesis Conquest Design Performance Evaluation Conclusion

36 36 The Difficulty With Shrinking rm -fr hash table location i-node (stored in RAM) NULL NULL file i-node location file1 i-node location 0110 | file1 Quick fixes Never shrink hash tables (for rm –fr ) No promises for ls while adding files Genesis Conquest Design Performance Evaluation Conclusion

37 [Fagin et al., 1979]37 Extensible Hash Tables Use top, not bottom, bits of hash code hash table location i-node (stored in RAM) NULL NULL file i-node location file1 i-node location 0110 | file1 file i-node location file1 i-node location 1001 | file2 Genesis Conquest Design Performance Evaluation Conclusion

38 38 Extensible Hash Tables Preserve ordering of entries when resizing hash table location i-node (stored in RAM) NULL NULL NULL NULL file i-node location file1 i-node location 1001 | file2 file i-node location file1 i-node location 0110 | file1 Genesis Conquest Design Performance Evaluation Conclusion

39 39 Additional Engineering Details Dynamic file positioning Need to handle collisions Memory overhead and complexity tradeoffs Genesis Conquest Design Performance Evaluation Conclusion

40 40 Simplification of Metadata Management Genesis Conquest Design Performance Evaluation Conclusion

41 41 Metadata Allocation Requirements Keep track of usage status of metadata entries Avoid duplicate allocation with unique IDs Fast retrieval of metadata with a given ID ID: 30| free ID: 81| in use ID: 58| free ID: 16| free ID: 89| in use ID: 88| free Genesis Conquest Design Performance Evaluation Conclusion

42 42 Existing Memory Allocation Services Keep track of unallocated memory No duplicate allocation of physical addresses Hmm… ADDR 0xe000000| free ADDR 0xe000038| in use ADDR 0xe000070| free ADDR 0xe0000A8| free ADDR 0xe0000E0| free ADDR 0xe000118| in use Genesis Conquest Design Performance Evaluation Conclusion

43 43 Conquest Metadata Management Metadata = memory allocated by memory manager Metadata ID = physical address of metadata ID: 30| free ID: 81| in use ID: 58| free ID: 16| free ID: 89| in use ID: 88| free ADDR 0xe000000| free ADDR 0xe000038| in use ADDR 0xe000070| free ADDR 0xe0000A8| free ADDR 0xe0000E0| free ADDR 0xe000118| in use Usage status Unique IDs and fast retrieval Genesis Conquest Design Performance Evaluation Conclusion

44 44 Simplification of Shutdown/Boot Sequence Genesis Conquest Design Performance Evaluation Conclusion

45 45 Persistence Support Restore file system states after a reboot Data Metadata Memory manager Keep track of metadata allocation Reinitialized at boot time No knowledge of persistently allocated data Genesis Conquest Design Performance Evaluation Conclusion

46 46 Linux Memory Manager Page allocator maintains individual pages Page allocator Genesis Conquest Design Performance Evaluation Conclusion

47 47 Linux Memory Manager (2) Zone allocator allocates memory in power-of- two sizes Page allocator Zone allocator Genesis Conquest Design Performance Evaluation Conclusion

48 48 Linux Memory Manager (3) Slab allocator groups allocations by sizes to reduce internal memory fragmentation Page allocator Zone allocator Slab allocator Genesis Conquest Design Performance Evaluation Conclusion

49 49 Memory Allocation Example Allocate a 455-byte data structure Slab allocator One page of data structures Zone allocator One page from DMA zone Page allocator Page address 0x0000d000 Genesis Conquest Design Performance Evaluation Conclusion

50 50 Linux Memory Manager (4) Difficult to restore the persistent states Three layers of pointer-rich mappings Mixing of persistent and temporary allocations Page allocator Slab allocator Zone allocator Genesis Conquest Design Performance Evaluation Conclusion

51 51 Conquest Persistence Create memory zones with own instantiations of memory managers Page allocator Slab allocator Zone allocator Genesis Conquest Design Performance Evaluation Conclusion

52 52 Conquest Persistence Reuse existing memory manager code Encapsulate all pointers within each zone Pointers can survive reboots No serialization and deserialization Swapping and paging Disabled for Conquest memory zones Enabled for non-Conquest zones Genesis Conquest Design Performance Evaluation Conclusion

53 [Ng et al., 1996]53 Integrity of Content in RAM User-level program crashes Same file system interface as others Access control Memory protection Operating system crashes 1.5% of crashes lead to memory corruption Lose about one data block a decade Genesis Conquest Design Performance Evaluation Conclusion

54 54 Other Reliability Mechanisms Instantaneous metadata commit Daily backups Pointer-switch commit semantics pointer Genesis Conquest Design Performance Evaluation Conclusion

55 55 Implementation Status Kernel module under Linux 2.4.2 Operational and POSIX compliant Modified memory manager to support Conquest persistence Need to overcome BIOS limitations for distribution Genesis Conquest Design Performance Evaluation Conclusion

56 56 Performance Evaluation Architectural simplification Feature count Performance improvement Memory-only workloads Memory-and-disk workloads Genesis Conquest Design Performance Evaluation Conclusion

57 57 Conventional Data Path Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management Conventional File Systems I/O buffer disk management storage requests I/O buffer management disk persistence support Genesis Conquest Design Performance Evaluation Conclusion

58 58 Memory Path of Conquest Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management Conquest Memory Data Path storage requests Persistence support battery-backed RAM small file and metadata storage Memory manager encapsulation Genesis Conquest Design Performance Evaluation Conclusion

59 59 Disk Path of Conquest Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management Conquest Disk Data Path I/O buffer management I/O buffer storage requests disk management disk battery-backed RAM small file and metadata storage large-file-only file system Genesis Conquest Design Performance Evaluation Conclusion

60 [Card et al., 1994; Sweeney et al., 1996; Katcher 1997; Namesys 2002]60 Conquest is comparable to ramfs At least 24% faster than the LRU disk cache ISP workload (emails, web-based transactions) PostMark Benchmark (1) 40 to 250 MB working set with 2 GB physical RAM Genesis Conquest Design Performance Evaluation Conclusion

61 61 When both memory and disk components are exercised, Conquest can be several times faster than ext2fs, reiserfs, and SGI XFS PostMark Benchmark (2) 10,000 files, 80 MB to 3.5 GB working set with 2 GB physical RAM > RAM<= RAM Genesis Conquest Design Performance Evaluation Conclusion

62 62 When working set > RAM, Conquest is 1.4 to 2 times faster than ext2fs, reiserfs, and SGI XFS PostMark Benchmark (3) 10,000 files, 80 MB to 3.5 GB working set with 2 GB physical RAM Genesis Conquest Design Performance Evaluation Conclusion

63 [Rosenblum and Ousterhout 1991]63 Sprite LFS Microbenchmarks Small-file benchmark Operates on 10,000 1-KB files in three phases Genesis Conquest Design Performance Evaluation Conclusion

64 64 Why ramfs Beats Conquest Assumption of disk dates back decades Implies that caching is necessary File system interface has built-in caching Ramfs benefits from caching Conquest gets no benefit Data and metadata in memory already Data caching avoidable Metadata too deeply wired Conquest pays metadata access costs twice

65 65 Sprite LFS Microbenchmarks (2) Modified large-file microbenchmark: ten 1-MB files (Conquest in-core files) Genesis Conquest Design Performance Evaluation Conclusion

66 66 Sprite LFS Microbenchmarks (3) Modified large-file microbenchmark: ten 1.01-MB files (Conquest on-disk files) Genesis Conquest Design Performance Evaluation Conclusion

67 67 Sprite LFS Microbenchmarks (4) Large-file microbenchmark: forty 100-MB files (Conquest on-disk files) Genesis Conquest Design Performance Evaluation Conclusion

68 68 istory’s Mystery Puzzling Microbenchmark Numbers… Geoff Kuenning: “If Conquest is slower than ext2fs, I will toss you off of the balcony…” Genesis Conquest Design Performance Evaluation Conclusion

69 69 With me hanging off a balcony… Original large-file microbenchmark: one 1-MB file (Conquest in-core file) Genesis Conquest Design Performance Evaluation Conclusion

70 70 Odd Microbenchmark Numbers Why are random reads slower than sequential reads? Genesis Conquest Design Performance Evaluation Conclusion

71 71 Odd Microbenchmark Numbers Why are RAM-based file systems slower than disk-based file systems? Genesis Conquest Design Performance Evaluation Conclusion

72 [Keshava and Penkovski 1999; Torvalds 2001; Abraham 2002]72 A Series of Hypotheses Warm-up effect? Maybe Why do RAM-based systems warm up slower? Bad initial states? No Pentium III streaming I/O option? No Genesis Conquest Design Performance Evaluation Conclusion

73 73 Effects of L2 Cache Footprints Large L2 cache footprintSmall L2 cache footprint write a file sequentially footprintfile end footprint read the same file sequentially footprint flush file end file read write a file sequentially footprintfile end footprint read the same file sequentially footprint flush file end read file Genesis Conquest Design Performance Evaluation Conclusion

74 74 LFS Sprite Microbenchmarks Modified large-file microbenchmark: ten 1-MB files (Conquest in-core files) Genesis Conquest Design Performance Evaluation Conclusion

75 75 LFS Sprite Microbenchmarks (2) Modified large-file microbenchmark: 10 128- KB files (Conquest in-core files) slower random over sequential accesses due to the extra lseek Genesis Conquest Design Performance Evaluation Conclusion

76 [Baker et al., 1992; Garcia-Molina and Salem 1992; Wu and Zwaenepoel 1994; Chen et al., 1996; Riedel 1998; Quantum 2000; Miller et al., 2001]76 Related Work Main-Memory Databases Memory-based data structures and query mechanisms File-system applications of persistent RAM Write buffers Flash-memory-based file systems Disk emulators Rio file cache MRAM enabled storage Genesis Conquest Design Performance Evaluation Conclusion

77 [Anderson et al., 2000; Palm 2000; IBM 2002; Microsoft 2002]77 Related Work (2) PDA operating systems Designed with severe memory constraints Slice Distributed storage system Dedicated servers for metadata, small files, and large files Genesis Conquest Design Performance Evaluation Conclusion

78 78 Lessons Learned Faster than LRU caching, unexpected Heavyweight disk handling Severe penalty for accessing memory content Matching user access patterns to storage media offers considerable simplification and better performance Not an automatic result Need careful design Genesis Conquest Design Performance Evaluation Conclusion

79 79 More Lessons Learned Effects of L2 caching become highly visible in memory workloads (modern workloads) Cannot blindly apply existing disk-based microbenchmarks to measure memory performance of file systems Need to consider states of L2 cache and memory behaviors at each stage of microbenchmarking Genesis Conquest Design Performance Evaluation Conclusion

80 80 Additional Lessons Learned Don’t discuss your performance numbers next to a balcony…unless… Genesis Conquest Design Performance Evaluation Conclusion

81 81 Going Beyond Conquest Matching usage patterns with heterogeneous machines in the distributed domain Specialized tasks for machines within a cluster Preferably self-organizing and self-evolving State-rich computing Caching of runtime data structures Similar to specialized temporary file system Genesis Conquest Design Performance Evaluation Conclusion

82 82 Going Beyond Conquest (2) Separate storage of metadata from data Opportunity for hierarchical replication across devices with different calibers Benchmarking memory performance of file systems Developing new memory benchmarks Why are modern operating systems so complicated? More places to expand Conquest approach Genesis Conquest Design Performance Evaluation Conclusion

83 83 Contributions Demonstrated the feasibility of disk-memory hybrid file systems Showed performance does not preclude simplicity Pinpointed cache-related problems with modern benchmarks Opened doors to many exciting areas of research Genesis Conquest Design Performance Evaluation Conclusion

84 84 Conclusion Conquest demonstrates how rethinking changes in underlying assumptions can lead to significant architectural and performance improvements Radical changes in hardware, applications, and user expectations in the past decade should lead us to rethink other aspects of OS as well. Genesis Conquest Design Performance Evaluation Conclusion

85 85 Questions... Conquest: http://www.cs.fsu.edu/~awang/conquest Andy Wang: awang@cs.fsu.edu


Download ppt "Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang."

Similar presentations


Ads by Google