Download presentation
Presentation is loading. Please wait.
1
Local file systems Landon Cox March 31, 2017
2
Block-oriented vs byte-oriented
Disks are accessed in terms of blocks Also called sectors Similar idea to memory pages E.g., 4KB chunks of data First problem: programs deal with bytes E.g., want to change ‘J’ in “Jello world” to ‘H’ Disks only support block-sized, bulk accesses
3
Block-oriented vs byte-oriented
To read less than a block Read entire block Return the right portion How to write less than a block? Modify the right portion Write out entire block Nothing analogous to byte-grained load/store Flash devices are even more complicated Can only accomplish via mmap
4
Disk drives over the years
5
Disk geometry
6
Surface organized into tracks
7
Parallel tracks form cylinders
8
Tracks broken up into sectors
9
Disk head position
10
Rotation is counter-clockwise
11
About to read a sector
12
After reading blue sector
13
Red request scheduled next
After BLUE read
14
Seek to red’s track After BLUE read Seek for RED SEEK
15
Wait for red sector to reach head
After BLUE read Seek for RED Rotational latency SEEK ROTATE
16
Read red sector SEEK ROTATE After BLUE read Seek for RED
Rotational latency After RED read SEEK ROTATE
17
Minimize time spent doing this Maximize time spent doing this
To access a disk Queue (wait for disk to be free) 0-infinity ms Position disk head and arm Seek + rotation 0-10 ms Pure overhead Access disk data Size/transfer rate (e.g., 1 MB/s) Useful work Minimize time spent doing this Maximize time spent doing this
18
File systems What is a file system? File system issues
OS abstraction that makes disks easy to use Place to put persistent data File system issues How to map file space onto disk space Structure and allocation: like mem management How to use names instead of sectors Naming and directories: not like memory But very similar to DNS
19
Intro to file system structure
Overall question: How do we organize things on disk? Really a data structure question What data structures do we use on disk?
20
Intro to file system structure
Need an initial object to get things going In file systems, this is a file header Unix: this is called an inode (indexed node) inode contains info about the file Size, owner, access permissions Last modification date Many ways to organize inodes on disk Use actual usage patterns to make good decisions
21
File system usage patterns
80% of file accesses are reads (20% writes) Ok to save on reads, even if it hurts writes Most file accesses are sequential and full Form of spatial locality Put sequential blocks next to each other Can pre-fetch blocks next to each other Most files are small Most bytes are consumed by large files
22
1) Contiguous allocation
Store a file in one contiguous segment Sometimes called an “extent” Reserve space in advance of writing it User could declare in advance If grows larger, move it to a place that fits
23
1) Contiguous allocation
File header contains Starting location (block #) of file File size (# of blocks) Other info (modification times, permissions) Exactly like base and bounds memory
24
1) Contiguous allocation
Pros? Fast sequential access Easy random access Cons? External/internal fragmentation Hard to grow files Header B0 B1 B2 Reserved
25
2) Indexed files File header Looks a lot like a page table
File block # Disk block # 18 1 50 2 8 3 15
26
Why isn’t sequential access a problem with page tables?
2) Indexed files Pros Easy to grow (don’t have to reserve in advance) Easy random access Cons How to grow beyond index size? Sequential access may be slow. Why? May have to seek after each block read Why isn’t sequential access a problem with page tables? Memory doesn’t have seek times.
27
How to reduce seeks for sequential access?
2) Indexed files Pros Easy to grow (don’t have to reserve in advance) Easy random access Cons How to grow beyond index size? Potential for lots of seeks for sequential access How to reduce seeks for sequential access? Don’t want to pre-allocate blocks.
28
How to reduce seeks for sequential access?
2) Indexed files Pros Easy to grow (don’t have to reserve in advance) Easy random access Cons How to grow beyond index size? Potential for lots of seeks for sequential access How to reduce seeks for sequential access? When you allocate a new block, choose one near block that precedes it. E.g., blocks in the same cylinder.
29
What about large files? Could just assume it will be really large
Problem? Wastes space in header if file is small Max file size is 4GB File block is 4KB 1 million pointers 4 MB header for 4 byte pointers Remember most files are small 10,000 small files 40GB of headers
30
What about large files? Could use a larger block size Problem?
Internal fragmentation (most files are small) Solution Use a more sophisticated data structure
31
3) Multi-level indexed files
Think of indexed files as a shallow tree Instead could have a multi-level tree Level 1 points to level 2 nodes Level 2 points to level 3 nodes Gives us big files without wasteful headers
32
3) Multi-level indexed files
(data) How many accesses to read one block of data? 3 (one for each level)
33
3) Multi-level indexed files
(data) How to improve performance? Caching.
34
3) Multi-level indexed files
To reduce number of disk accesses Cache level 1 and level 2 nodes Often a useful combination Indirection for flexibility Caching to speed up indirection Can cache lots of small pointers Where else do we see this strategy? TLB, DNS
35
3) Multi-level indexed files
What about small files (i.e. most files)?
36
3) Multi-level indexed files
Use a non-uniform tree
37
3) Multi-level indexed files
Pros Simple Files can easily expand Small files don’t pay the full overhead Cons Large files need lots of indirect blocks (slow) Could have lots of seeks for sequential access
38
Multiple updates and reliability
Reliability is only an issue in file systems Don’t care about losing address space after crash Your files shouldn’t disappear after a crash Files should be permanent Multi-step updates cause problems Can crash in the middle
39
Multi-step updates Transfer $100 from Melissa’s account to mine
Deduct $100 from Melissa’s account Add $100 to my account Crash between 1 and 2, we lose $100
40
Multi-step updates Same for directories
“mv /tmp/foo.txt /home/” foo.txt removed from /tmp, added to /home Acceptable outcomes if crash in middle? foo.txt in /tmp and /home foo.txt in /tmp, not in /home Unacceptable outcome? foo.txt not in /tmp or in /home
41
Multiple updates and reliability
This is a well known, undergrad OS-level problem No modern OS would make this mistake, right? Video evidence suggests otherwise Directory with 3 files Want to move them to external drive Drive “fails” during move Don’t want to lose data due to failure Roll film …
42
Bug in OS X Leopard
43
Multi-step updates Move file from one directory to another
Delete from old directory Add to new directory Crash between 1 and 2, we lose a file “/home/lpc/names” “/home/chase/names”
44
Multi-step updates Create an empty new file
Point directory to new file header Initialize new file header What happens if we crash between 1 and 2? Directory will point to uninitialized header Kernel will crash if you try to access it How do we fix this? Re-order the writes
45
Multi-step updates Create an empty new file
Initialize new file header Point directory to new file header What happens if we crash between 1 and 2? File doesn’t exist File system won’t point to garbage
46
Multi-step updates What if we also have to update a map of free blocks? Initialize new file header Point directory to new file header Update the free block map Does this work? Bad if crash between 2 and 3 Free block map will still think new file header is free
47
Multi-step updates What if we also have to update a map of free blocks? Initialize new file header Update the free block map Point directory to new file header Does this work? Better, but still bad if crash between 2 and 3 Leads to a disk block leak Could scan the disk after a crash to recompute free map Older versions of Unix and Windows do this (now we have journaling file systems …)
48
inode table: inode table pre-allocated in well-known place.
1 2 … n-1 n inode table: Meta-data Direct block Direct block Indirect block Indirect block Double indirect block inode table pre-allocated in well-known place. Each file has an inode (dirs are special files).
49
Write order and corruption
Rule 1: Don’t point to uninitialized data Dir:foo Create foo/bar/new What can go wrong? baz inode bar inode 1) assign new inode for new 2) point bar’s block to new’s inode 3) crash before inode is initialized File:baz Dir:bar Data ?
50
Write order and corruption
Rule 2: Don’t re-use before nullifying existing pointers Dir:foo delete foo/baz + write foo/bar What could go wrong? baz inode bar inode 1) update free map: baz’s data block free 2) allocate baz’s data block to bar 3) point bar at baz’s data block 4) crash File:baz File:bar Free Map Data
51
Write order and corruption
Rule 3: Set new pointer before resetting old one Dir:foo mv foo/baz foo/bar/ What can go wrong? baz inode bar inode 1) remove foo’s pointer to baz 2) crash File:baz Dir:bar Data
52
Ideal file system Apps never wait for disk writes
Minimize number of disk writes Minimize memory used for caching Maximize disk scheduler flexibility Two approaches Journaling (apply to log then FS) Soft updates (maintain dependency info)
53
Journaling Write to journal, then write to file system
Dir:foo mv foo/baz foo/bar/ baz inode bar inode File:baz Dir:bar bar baz inode foo ! baz inode Data journal
54
Journaling Write to journal, then write to file system bar baz inode
Dir:foo Do we need begin/end transaction? No, ordering ensures consistency baz inode bar inode File:baz Dir:bar bar baz inode foo ! baz inode Data journal
55
Journaling Write to journal, then write to file system bar baz inode
Dir:foo Can we reverse the order of operations? No, could crash during replay baz inode bar inode File:baz Dir:bar bar baz inode foo ! baz inode Data journal
56
Journaling Write to journal, then write to file system bar baz inode
Dir:foo Why faster than sync, ordered FS updates? Synchronous FS updates may require seeks Writing to log is sequential Can apply updates to in-memory cache Can flush blocks at leisure baz inode bar inode File:baz Dir:bar bar baz inode foo ! baz inode Data journal
57
Soft updates Maintain dependency information
Only write blocks after those they depend on Don’t have to write anything synchronously Example: create file A, remove file B Inode #4 <-,#0> Inode #5 <B,#5> Inode block Dir block Inode #6 <C,#7> Inode #7
58
Soft updates Maintain dependency information
Only write blocks after those they depend on Don’t have to write anything synchronously Example: create file A, remove file B Inode #4 <A,#4> Inode #5 <B,#5> Inode block Dir block Inode #6 <C,#7> Inode #7 What is the rule? Write inode before dir
59
Soft updates Maintain dependency information
Only write blocks after those they depend on Don’t have to write anything synchronously Example: create file A, remove file B Inode #4 <A,#4> Inode #5 <-,#0> Inode block Dir block Inode #6 <C,#7> Inode #7 What is the rule? Write dir before inode
60
Soft updates Maintain dependency information
Only write blocks after those they depend on Don’t have to write anything synchronously Example: create file A, remove file B Inode #4 <A,#4> Inode #5 <-,#0> Inode block Dir block Inode #6 <C,#7> Inode #7 What is the problem? Cyclic dependency
61
Soft updates Solution Consider the previous example
Fine-grained dependencies Maintain per-field and per-pointer May have to redo/undo updates to fields/pointers Consider the previous example
62
What happens on recovery?
Example: create file A, remove file B What happens on recovery? Memory Disk Inode #4 <A,#4> Inode #4 <-,#0> Inode #5 <-,#0> Inode #5 <B,#5> Inode #6 <C,#7> Inode #6 <C,#7> Inode #7 Inode #7 What is odd about the state of this block? Starting point Inode #4 <A,#4> Inode #4 <-,#0> Inode #5 <-,#0> Inode #5 <-,#0> Inode #6 <C,#7> Inode #6 <C,#7> Inode #7 Inode #7 Step 1: safe version of directory block written
63
Step 2: inode block written
Example: create file A, remove file B What happens on recovery? Memory Disk Inode #4 <A,#4> Inode #4 <-,#0> Inode #5 <-,#0> Inode #5 <B,#5> Inode #6 <C,#7> Inode #6 <C,#7> Inode #7 Inode #7 Starting point Inode #4 <A,#4> Inode #4 <-,#0> Inode #5 <-,#0> Inode #5 <-,#0> Inode #6 <C,#7> Inode #6 <C,#7> Inode #7 Inode #7 Step 2: inode block written
64
Step 3: directory block written
Example: create file A, remove file B What happens on recovery? Memory Disk Inode #4 <A,#4> Inode #4 <-,#0> Inode #5 <-,#0> Inode #5 <B,#5> Inode #6 <C,#7> Inode #6 <C,#7> Inode #7 Inode #7 Starting point Inode #4 <A,#4> Inode #4 <A,#4> Inode #5 <-,#0> Inode #5 <-,#0> Inode #6 <C,#7> Inode #6 <C,#7> Inode #7 Inode #7 Step 3: directory block written
65
Soft updates What is guaranteed about disk state?
Will always be consistent on recovery May have orphaned inodes and blocks Do I need to do anything on recovery? Don’t need to check consistency Can check for orphaned inodes/blocks async
66
Soft updates How are soft updates good for the disk scheduler?
Disk scheduler can schedule blocks “arbitrarily” Can optimize for lowest seek time, etc. Just has to be careful about state of blocks that it writes What info does the disk scheduler need? Needs to know dependencies Needs to be able to undo updates What is the potential downside of soft updates? Can cause extra writes Have to write rolled-back and rolled-forward block versions
67
Next week More storage issues
Can we hide latency w/ speculative execution? How often will our speculations be correct? How costly are the mis-predictions? How do we use persistent DRAM? Byte addressable but costly ($) File system interface? Mmap?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.