File System Performance CSE451 Andrew Whitaker. Ways to Improve Performance Access the disk less  Caching! Be smarter about accessing the disk  Turn.

File System Performance CSE451 Andrew Whitaker

Ways to Improve Performance Access the disk less  Caching! Be smarter about accessing the disk  Turn small operations into large operations  Turn scattered operations into sequential operations

Technique #1: Caching Memory is MUCH faster than disk So, cache whatever we can in memory  File buffers  i-nodes  Directory entries (name => i-node) Caching reads is a no-brainer Caching writes is more interesting…

Caching Writes Two options  Synchronous: data is immediately written out to disk AKA: write-through  Asynchronous: disk writes are delayed AKA: write-back Programmer’s perspective: what does it mean when the “write” system call returns?  With asynchronous writes, the data has not necessarily hit the disk

Why Use Asynchronous Writes? Allows us to batch-up multiple writes to the same block Allows for better overlap of CPU and I/O  CPU does not stall waiting for the disk Allows the disk scheduler to make better decisions  Application: write(a); write (b); write(c);  Disk: write(b); write(a); write(c); Most data updates in UNIX systems use asynchronous writes by default  Programmer can override: fsync(fd);

Problems with Asynchronous Writes File system state can be lost during a crash  Missing blocks, missing files, missing directories, storage leaks, etc. For this reason, meta-data updates tend to be done synchronously  File/directory creation or deletion

Consistency Problems Problems still arise, even with synchronous meta-data updates For example, file creation must modify an i-node and a directory entry  Initialize the i-node  Record the mapping in the directory Disks do not support atomic operations

Dealing with Consistency Problems Always keep the disk in a “safe” state Run a recovery program (like fsck) on startup

i-check: File Consistency Is each block on exactly one list?  Create a bit vector with as many entries as there are blocks  Follow the free list and each i-node block list  When a block is encountered, examine its bit If the bit was 0, set it to 1 If the bit was already 1 if the block is both in a file and on the free list, remove it from the free list and cross your fingers if the block is in two files, call support!  If there are any 0’s left at the end, put those blocks on the free list

d-check: Directory Consistency Do the directories form a tree?  Cycles are bad! Does the link count of each file (i-node) equal the number of directory links to it?

Technique #2: Better Data Layout Recall basic file system structure:  Meta-data: i-nodes, free block list  Data: file data, directory data MetadataData Note: i-nodes are far from the data blocks they describe

Cylinder groups Basic idea: group commonly accessed data and meta-data together  This reduces seeks Details:  Disk is partitioned into groups of cylinders  Data blocks from a file are all placed in the same cylinder group  Files in same directory are placed in the same cylinder group  i-node for file placed in same cylinder group as file’s data

Cylinder Group Analysis +Reduces or eliminates seeks for some common access patterns -Does not address rotational delay -Performance is workload dependent -Performance degrades if cylinders become full -Partial solution: pro-actively reserve space

Log Structured File System Let’s assume all reads are cached  An iffy assumption, but let’s suspend disbelief Q: How can we turn all writes into large, sequential writes? Insight: this is possible if the location of data on disk can change

A Convention File System Files live at fixed location So, file system writes must use seeks For example:  Write to Christine.txt  Write to Andrew.txt  Write to Colin.txt Veneta.txt Joel.txt Colin.txt Matt.txt Andrew.txt Nolan.txt Bishop.txt Christine.txt

Log-structured File System Use the disk as an append- only log  All writes go at the end of the log The location of a file changes over time Old data is not over-written  Until the file system becomes full Christine.txt Andrew.txt Colin.txt Log growth Christine.txt

LFS Details Everything gets written to the log  File data, i-nodes, directories LFS tries to buffer many small writes into large segments  Typically 512k, 1MB

How Can This Possibly Work? Q: If nothing lives at a fixed location, how do we find “the data”? A: Add a layer of indirection: An i-node map  Maps from i-node number to current location  The map resides at a fixed location on disk NOT in the log!  The map is cached in memory for performance

What Happens When the Disk Gets Full? Partial solution: disk is managed in segments, which are threaded on disk  Basically, a linked-list But, this re-introduces seeks!

Segment Cleaner Goal: make scattered segments contiguous again Approach:  Read a segment  Write live data to the end of the log  Presto: The segment is now clean This is very expensive  Each live byte is read and written

LFS Analysis For reads, LFS and a traditional FS are largely equivalent LFS has better performance for small writes and meta-data operations The LFS cleaner has a large impact on performance  How important is this?

LFS in Practice LFS is implemented, but not widely used Reasons?  Assumptions about read behavior were not valid Reads have not gone away  Performance improvements were not sufficient to offset increase complexity, higher variability LFS comeback?  See Jim Gray’s article

File System Performance CSE451 Andrew Whitaker. Ways to Improve Performance Access the disk less  Caching! Be smarter about accessing the disk  Turn.

Similar presentations

Presentation on theme: "File System Performance CSE451 Andrew Whitaker. Ways to Improve Performance Access the disk less  Caching! Be smarter about accessing the disk  Turn."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

File System Performance CSE451 Andrew Whitaker. Ways to Improve Performance Access the disk less  Caching! Be smarter about accessing the disk  Turn.

Similar presentations

Presentation on theme: "File System Performance CSE451 Andrew Whitaker. Ways to Improve Performance Access the disk less  Caching! Be smarter about accessing the disk  Turn."— Presentation transcript:

Similar presentations

About project

Feedback