Jeff's Filesystem Papers Review Part II. Review of "The Design and Implementation of a Log-Structured File System"

The Design and Implementation of a Log Structured File System  By Mendel Rosenblum and John K Ousterhout  UCB  Ousterhout introduced the idea in an earlier paper with a few different configurations, this describes the concept as it exists after they implemented it in a new FS called Sprite.  Some empirical research was done after the implementation and they did prove that LFS is a good idea. This presentation is an academic review, the ideas presented are either quotes or paraphrases of the reviewed document.

Intro  Why? (problem statement)  CPU's getting faster  Memory getting faster  Disks not  Amdahl's Law ƒ Bottlenecks move around, as CPU gets faster, bottleneck moves to memory or Disk, etc.  We need to find a way to use disks more efficiently  Assumption ƒ Caching files in RAM improves READ performance of Filesystem, significantly more than WRITE performance. ƒ Therefore Disk activity will become more Write- centric in the future.

2.1  Disk improvement is in area of Price/Capacity and Physical size, not in seek-time  Even if IO improves, Seek-time will still be killer  Memory getting cheaper/faster.. therefore use memory cache to ease disk bottleneck.  Caches  difference between cache and buffer? ƒ Buffer is used between 2 different speed devices, cache is to speed up subsequent similar or proximate accesses.  Caches can reorder writes to write more efficiently.

2.2 Workloads  3 classes of File access patterns (from different paper)  scientific processing - read and write large file sequentially  transaction processing - many simultaneous requests, small chunks of data  engineering/office applications - access large number of small files in sequence  Engineering/office is the killer, and that is what LFS is designed for.

2.3 Problems with Existing FS's  UNIX FFS (fast file system...also Berkeley Developed)  puts files sequentially on disk  inode data in fixed location on disk  directory data at another location  Total of 5 seeks to create a new file (bad).  file data is written asynchronously so that program can continue w/o waiting for FS, BUT  Metadata is written synchronously so program is blocked when messing with things like inode data.

3 LFS  Buffer a sequence of FS changes in file cache and then write them sequentially to disk in a chunk to avoid seeks.  Essentially all data is merely a log entry.  Creates 2 problems...  How to read from log  How to keep freespace on disk ƒ I.E. you start writing and writing forward forever eventually you will wrap at end of disk.

3.1 How to Read  Reads are at same speed as FFS after the inode is located.  Locating the Inode is what slows down FFS and where LFS is better.  FFS has Inode in static portion of disk  unrelated to physical location of data.  LFS stores Inode in proximity to data at the head (end) of log.  Because of this another (but much smaller) map is needed of the inodes. ƒ So small that it is kept in cache all the time to not cause excessive seeks. ƒ called the checkpoint region.

3.2 Free Space Management  Log Wraparound.  Choices.  Don't defragment, just write to next free block  GC-Style Stop everything and copy  Incremental Continuous and Copy  Solution - Segments  Divide Disk into segments ƒ segment size chosen for optimal usage. ƒ segment is written contiguously, and the disk is compacted in segments to avoid fragmentation.  This defrag is known as segment-cleaning

3.3 Segment Cleaning  Should be pretty obvious how to do it  3 steps  Read a number of non-clean segments into memory  get only live (in use portion of segments) data  Write live data back to disk in clean segments  other logistical considerations in segment cleaning  update Inodes  update fixed structures such as checkpoint region ƒ remember these are in cache, and as we will see later they are dumped to disk at predetermined intervals.  There is some other stuff dealt with as each segment has a header and other stuff, Read the paper for details.

3.4 Segment Cleaning - how to configure  When to do it?  Low priority, or when diskspace is needed ƒ the authors choose when diskspace is needed with watermarks.  How many segments to clean at one time?  The more segments cleaned at one time, the more intelligent the cleaning can be and the better organized the disk. ƒ watermarks chosen above  Which segments to clean?...coming  Since you can write the data back to disk any way you want to, you should write it back in the most efficient manner for its predicted use...coming

3.5-6 Determination of Segment-Cleaning configuration.  Here the authors went into empirical studies.  Wrote simulator and played with config of segment- cleaning to determine a good policy.  Results/Conclusions  differentiate between hot and cold segments based on past history ƒ A hot segment is one that is likely to be written ƒ A cold segment is one that is unlikely to be written  They came up with a policy called cost-benefit ƒ cleans cold segments that are at least 75% full ƒ cleans hot segments that are at least 15% full  The utilization and the "temperature" of a segment are maintained in an in-memory table.

4 Crash Recovery  FFS  Major problem is that entire disk must be scanned ƒ most-recently-written data could be anywhere on disk  LFS  Most-recently-written data is at one location on disk.  Uses checkpoints and roll-forward to maintain consistency. ƒ Borrowed ideas from dB technology

4.1 Crash Recovery  Checkpoint Region  2 copies maintained at fixed location on disk ƒ written to alternately, in case of crash while updating checkpoint data  At points in time: ƒ IO is blocked ƒ All cache data is written to end of log ƒ Checkpoint data from cache is written to disk. ƒ IOs then re-enabled ƒ could instead be done at points based on amount of data written  note similarity to GC techniques.  Skipping roll-forward techniques as it is very complex and depends on segment header info, read the paper for more info, it just enhances checkpointing

5 Empirical test results  Comparison to FFS/SunOS  Basically, it is significantly better for small files and it is better or as good for large files in all cases except:  large files that were originally written random and are later accessed sequentially.  Crash-recovery was not rigorously empirically tested against FFS/SunOS.

Jeff's Filesystem Papers Review Part II. Review of "The Design and Implementation of a Log-Structured File System"

Similar presentations

Presentation on theme: "Jeff's Filesystem Papers Review Part II. Review of "The Design and Implementation of a Log-Structured File System""— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Jeff's Filesystem Papers Review Part II. Review of "The Design and Implementation of a Log-Structured File System"

Similar presentations

Presentation on theme: "Jeff's Filesystem Papers Review Part II. Review of "The Design and Implementation of a Log-Structured File System""— Presentation transcript:

Similar presentations

About project

Feedback