Conquest: Better Performance Through A Disk/Persistent-RAM Hybrid File System USENIX 2002 An-I Andy Wang Peter Reiher Gerald Popek University of California,

Slides:



Advertisements
Similar presentations
File Systems.
Advertisements

The Conquest File System: An-I A. Wang Geoffrey H. Kuenning Peter Reiher Gerald J. Popek Life after Disks Abstract The rapidly declining cost of persistent.
Conquest: Preparing for Life After Disks An-I Andy Wang Geoff Kuenning, Peter Reiher, Gerald Popek.
Allocation Methods - Contiguous
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
File System Implementation
G Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees.
Recap of Feb 25: Physical Storage Media Issues are speed, cost, reliability Media types: –Primary storage (volatile): Cache, Main Memory –Secondary or.
Memory Organization.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
1 Outline File Systems Implementation How disks work How to organize data (files) on disks Data structures Placement of files on disk.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Memory Management ◦ Operating Systems ◦ CS550. Paging and Segmentation  Non-contiguous memory allocation  Fragmentation is a serious problem with contiguous.
Lecture 11: DMBS Internals
File System Extensibility and Non- Disk File Systems Andy Wang COP 5611 Advanced Operating Systems.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
Storage in Big Data Systems
Chapter 111 Chapter 11: Hardware (Slides by Hector Garcia-Molina,
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
1Fall 2008, Chapter 12 Disk Hardware Arm can move in and out Read / write head can access a ring of data as the disk rotates Disk consists of one or more.
File System Extensibility and Non- Disk File Systems Andy Wang COP 5611 Advanced Operating Systems.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
11.1Database System Concepts. 11.2Database System Concepts Now Something Different 1st part of the course: Application Oriented 2nd part of the course:
CS 153 Design of Operating Systems Spring 2015 Lecture 21: File Systems.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Conquest-2: Improving Energy Efficiency and Performance Through a Disk/RAM Hybrid File System An-I Andy Wang Florida State University (NSF CCR ,
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
CS4432: Database Systems II Data Storage 1. Storage in DBMSs DBMSs manage large amounts of data How does a DBMS store and manage large amounts of data?
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Advanced File Systems Issues Andy Wang COP 5611 Advanced Operating Systems.
Conquest: Preparing for Life After Disks October 2, 2003 An-I Andy Wang.
CS 540 Database Management Systems
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Embedded System Lab. 정영진 The Design and Implementation of a Log-Structured File System Mendel Rosenblum and John K. Ousterhout ACM Transactions.
Lecture Topics: 11/22 HW 7 File systems –block allocation Unix and NT –disk scheduling –file caches –RAID.
W4118 Operating Systems Instructor: Junfeng Yang.
Data Storage and Querying in Various Storage Devices.
CS 540 Database Management Systems
Chapter 11: File System Implementation
Lecture 16: Data Storage Wednesday, November 6, 2006.
FileSystems.
Database Management Systems (CS 564)
Lecture 11: DMBS Internals
Advanced File Systems Issues
Lecture 9: Data Storage and IO Models
File Systems Kanwar Gill July 7, 2015.
Filesystems 2 Adapted from slides of Hank Levy
File System Extensibility and Non-Disk File Systems
Overview Continuation from Monday (File system implementation)
Overview: File system implementation (cont)
Secondary Storage Management Brian Bershad
Advanced File Systems Issues
Chapter 14: File System Implementation
Secondary Storage Management Hank Levy
THE GOOGLE FILE SYSTEM.
File System Implementation
Presentation transcript:

Conquest: Better Performance Through A Disk/Persistent-RAM Hybrid File System USENIX 2002 An-I Andy Wang Peter Reiher Gerald Popek University of California, Los Angeles Geoffrey Kuenning Harvey Mudd College

2 Conquest Overview File systems are optimized for disks Performance problem Complexity Now we have tons of inexpensive RAM What can we do with that RAM?

3 Conquest Approach Combine disk and persistent RAM (e.g., battery-backed RAM) in a novel way Simplification > 20% fewer semicolons than ext2, reiserfs, and SGI XFS Performance (under popular benchmarks) 24% to 1900% faster than LRU disk caching

4 Motivation Most file systems are built for disks Problems with the disk assumption: Performance Complexity Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

5 Hardware Evolution KHz 1 MHz 1 GHz CPU (50% /yr) Memory (50% /yr) Disk (15% /yr) Accesses Per Second (Log Scale) (1 sec : 6 days)(1 sec : 3 months) Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

6 Inside the Pandora’s Box Disk arm Disk platters Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion Access time = seek time (disk arm) + rotational delay (disk platter) + transfer time

7 Disk Optimization Methods Disk arm scheduling Group information on disk Disk readahead Buffered writes Disk caching Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion Data mirroring Hardware parallelism

8 Complexity Bytes synchronization predictive readahead cache replacement elevator algorithm data clustering data consistency asynchronous write Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

[Caceres et al., 1993; Hillyer et al., 1996; Qualstar 1998; Tanisys 1999; Micron Semiconductor Products 2000; Quantum 2000]9 Storage Media Alternatives accesses/sec (log) $/MB (log) persistent RAM Magnetic RAM? (write once) flash memory disk tape battery-backed DRAM Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

[Grochowski 2000]10 Price Trend of Persistent RAM Year $/MB (log) paper/film 3.5” HDD 2.5” HDD 1” HDD Persistent RAM Booming of digital photography 4 to 10 GB of persistent RAM Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

11 Old Order; New World Disk staying around Cost, capacity, power, heat RAM as a viable storage alternative PDAs, digital cameras, MP3 players More architectural changes due to RAM A big assumption change from disk Rethink data structures, interface, applications Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

12 What does it take to design and build a system that assumes ample persistent RAM as the primary storage medium? Getting a Fresh Start Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

13 Conquest Design and build a disk/persistent-RAM hybrid file system Deliver all file system services from memory, with the exception of high-capacity storage Benefits: Simplicity Performance Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

14 Simplicity Remove disk-related complexities for most files Make things simpler for disk as well Less complexity Fewer bugs Easier maintenance Shorter data path Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

15 Overall All management performed in memory Memory data path No disk-related overhead Disk data path Faster speed due to simpler access models Performance Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

16 Conquest Components Media management Metadata management Allocation service Persistence support Resiliency support Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

[Iram 1993; Douceur et al., 1999; Roselli et al., 2000]17 User Access Patterns Small files Take little space (10%) Represent most accesses (90%) Large files Take most space Mostly sequential accesses Except database applications Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

18 Files Stored in Persistent RAM Small files (< 1MB) No seek time or rotational delays Fast byte-level accesses Contiguous allocation Metadata Fast synchronous update No dual representations Executables and shared libraries In-place execution Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

19 Memory Data Path of Conquest Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion Conventional file systems IO buffer Disk management Storage requests IO buffer management Disk Persistence support Conquest Memory Data Path Storage requests Persistence support Battery-backed RAM Small file and metadata storage

[Devlinux.com 2000]20 Large-File-Only Disk Storage Allocate in big chunks Lower access overhead Reduced management overhead No fragmentation management No tricks for small files Storing data in metadata No elaborate data structures Wrapping a balanced tree onto disk cylinders Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

21 Sequential-Access Large Files Sequential disk accesses Near-raw bandwidth Well-defined readahead semantics Read-mostly Little synchronization overhead (between memory and disk) Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

22 Disk Data Path of Conquest Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion Conventional file systems IO buffer Disk management Storage requests IO buffer management Disk Persistence support Conquest Disk Data Path IO buffer management IO buffer Storage requests Disk management Disk Battery-backed RAM Small file and metadata storage Large-file-only file system

23 Random-Access Large Files Random access? Common definition: nonsequential access A typical movie has 150 scene changes MP3 stores the title at the end of the files Near Sequential access? Simplify large-file metadata representation significantly Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

24 Logical File Representation File Name(s) i-node File attributes Data Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

25 Physical File Representation File Name(s) i-node File attributes Data locations Data blocks Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

26 Ext2 Data Representation data block location index block location data block location index block location data block location i-node 10 data block location index block location Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

27 Problems with Ext2 Design -Designed for disk storage -Optimization for small files makes things complex -Random-access data structure for large files that are accessed mostly sequentially -Data access time dependent on the byte position in a file -Maximum file size is limited Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

28 Conquest Representation Persistent RAM Hash(file name) = location of data Offset(location of data) Disk storage Per-file, doubly linked list of disk block segments (stored in persistent RAM) Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

29 Conquest Design + Direct data access for in-core files + Worse case: sequential memory search for infrequent random accesses to on-disk files + Maximum file size limited by physical storage Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

30 Implementation Status Kernel module under Linux Fully functional and POSIX compliant Modified memory manager to support Conquest persistence Preparing for office-wide deployment Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

31 Conquest Evaluation Architectural simplification Feature count Performance improvement Memory-only workload Memory and disk workload Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

32 Conventional Data Path Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management Conventional file systems IO buffer Disk management Storage requests IO buffer management Disk Persistence support Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

33 Memory Path of Conquest Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management Conquest Memory Data Path Storage requests Persistence support Battery-backed RAM Small file and metadata storage Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion Memory manager encapsulation

34 Disk Path of Conquest Buffer allocation management Buffer garbage collection Data caching Metadata caching Predictive readahead Write behind Cache replacement Metadata allocation Metadata placement Metadata translation Disk layout Fragmentation management Conquest Disk Data Path IO buffer management IO buffer Storage requests Disk management Disk Battery-backed RAM Small file and metadata storage Large-file-only file system Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

[Katcher 1997; Sweeney et al., 1996; Card et al., 1999; Namesys 2002]35 Conquest is comparable to ramfs At least 24% faster than the LRU disk cache ISP workload ( s, web-based transactions) PostMark Benchmark Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion 250 MB working set with 2 GB physical RAM

36 When both memory and disk components are exercised, Conquest can be several times faster than ext2fs, reiserfs, and SGI XFS PostMark Benchmark Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion 10,000 files, 3.5 GB working set with 2 GB physical RAM > RAM<= RAM

37 When working set > RAM, Conquest is 1.4 to 2 times faster than ext2fs, reiserfs, and SGI XFS PostMark Benchmark Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion 10,000 files, 3.5 GB working set with 2 GB physical RAM

38 Lessons Learned Faster than LRU caching, unexpected Heavyweight disk handling Severe penalty for accesses to content Matching user access patterns to storage media offers considerable simplification and better performance Not an automatic result Need careful design Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

39 Conclusion Conquest demonstrates how rethinking changes in underlying assumptions can lead to significant architectural and performance improvements Radical changes in hardware, applications, and user expectations in the past decade should lead us to rethink other aspects of OS as well. Motivation – Conquest Design – Conquest Components – Performance Evaluation – Conclusion

40 Questions... Conquest: Andy Wang: