File Systems CSE451 Andrew Whitaker

File Systems CSE451 Andrew Whitaker
Look at file system API with copy example: Things to notice: 4 system calls here: open,close, read, write ii) Why do we need to open/close a file? No great reason… Iii) Permissions are part of the API (here shown in creating a file) iv) What does write mean? Does not guarantee that the data has hit the disk. To guarantee this, we need an fsync system call What would happen if we didn’t read the data in small chunks? I.e., BUFFER_SIZE = 4096 * 1024 * 1024 => Read a block from disk, evict a page from memory

Outline File System Interface File System Implementation
The programmer/user’s perspective File System Implementation

File System Goal #1 Allow a single disk (or partition) to be treated as many smaller storage containers Files can have arbitrary size Files can grow and shrink Size is not stated up front

File System Goal #2 Provide a hierarchical name-space for referring to files Key idea: directories as containers for files / “path” Must persist across restarts! home/ var/ tmp/ usr/ andrew veneta colin

File System Goal #3 Protected sharing of information
Allow users / programs to share data Provide access control mechanisms to limit sharing drwxr-xr-x 4 gaetano www Mar sewpc drwxrwx--x 4 zahorjan www Mar software drwxrwxr-x 9 levy www Mar sosp16 -rw lazowska www Oct staff drwxrwxr-x 3 beame ctheory Jun stoc96

Workload Characteristics
Most files are small Median size ~= 4 kb A few files are very large A “heavy-tailed” distribution Most files are read sequentially Many files are quickly deleted Windows NT: 80% of newly created files are deleted within 4 seconds

File System Implementation
Let’s start simple: No directories All files are at the “root” Files are identified by a unique number

Blocks Files are built from blocks
Typical size is 4kb (or 8 sectors of 512 bytes) File system maps from “virtual” blocks (within a file) to physical disk blocks file 1 file 2 Why do we use an allocation unit that is larger than the disk sector size? Less bookkeeping What is wrong with this allocation scheme in foo.txt? What does this remind you of? Paging! disk

déjà vu: File Systems versus Paging
Similarity: chunk-based allocation Address spaces are built from pages Files built from blocks These are often the same size! OS maintains the mapping between virtual and physical resources Page tables map from virtual page to physical frame File system maps from “virtual” block to physical disk block Similarities: chunk-based allocation (blocks vs pages)

Differences Between Paging and File Systems
Persistence File system state must survive restarts Translation performance Virtual address translation must be very fast (done at processor speed) Block mapping can be much slower Layout issues Disk performance is highly influenced by layout Paging performance is (largely) unaffected Any page frame is as good as any other Files rarely have holes

Basic Disk Layout Data region contains actual file data
Metadata region contains information about files and the file system Block size Block mappings (virtual block to physical block) Protection information Metadata Data

Approach #1: Pre-allocated Disk Partitions
On file creation, carve out a contiguous disk allocation Record the partition info in the meta-data region Partition / file Number Offset (block #) Size (block #s) 1 2048 2049 512 2 2561 4096 This strategy has two properties: eagerly reserve disk space, and use contiguous physical allocations Note: this is exactly like base/limit registers for memory

Problems With Static Partitions
Must know (or guess) file size in advance Penalty for getting this wrong is high Tends to create external fragmentation Space between partitions Major advantage: perfect data layout Contiguous layout is optimal for sequential reads and writes disk file 0 file 1 file 2 file 3 Guess to high: internal fragmentation Guess to low: must copy into a larger partition file 4

Alternative to Static Partitions
Allocate disk space lazily Allow for block allocations that are not contiguous Eliminates external fragmentation But, results in sub-optimal data layout file Challenge: must keep track of virtual-to-physical block mappings disk

Approach #2: Block Tables (Silbershatz: Index Blocks)
In the meta-data region, maintain an array of block tables Block table maintains the mappings from virtual file blocks to physical disk blocks … Block table for file 0 Block table for file 1 Block table for file 2 Block table for file 3

Possible Block Table Implementation
block address virtual block # offset Disk data region Block 0 block table Block 1 physical address Block 2 Phys block # Phys block # offset Block 3 … Block 4 What does this remind you of?

Analyzing Block Tables
This is very close to what UNIX does! “Block table” is called an inode One remaining problem: choosing the block table size Small size prohibits large files Large size wastes space for small files Solution: multi-level block-tables Allocate a small number of mappings in the inode Allow for indirection to supply mappings for larger files

UNIX i-nodes (Unix Version 7)
Each i-node contains 13 pointers The first 10 are “direct” Pointers to real data blocks The 11th pointer is a “single indirect block” A pointer to a block full of pointers to real data blocks The 12th pointer is a “doubly indirect block” A pointer to a block full of pointers to blocks full of pointers to real data blocks The 13th pointer is a “triply indirect block” You get the idea…

i-nodes, Visualized 1 10 11 12 … Q: How is this different than multiple level page tables?

Checkpoint What we have What we don’t have
Arbitrary size files that can grow and shrink dynamically What we don’t have File names Directories

Completing the File System
Let’s create special files that contain the mappings from file names to numbers Let’s call these files “directories” i-node number File name 216 Foo.txt 4 Bar.txt 93 Receipe.doc 144 Speadsheet.xls …

UNIX Directory Implementation
Directories are implemented as files Contains mappings from file names to I-nodes Directories can contain other directories This gives us the file system hierarchy The root directory has a well-known I-node

Path name translation Let’s say you want to open “/one/two/three.txt”
fd = open(“/one/two/three.txt”, O_RDWR); What goes on inside the file system? Read the i-node for “/” Read the directory contents for this i-node Read the i-node for “one” Read the i-node for “two” Find the i-node for “three.txt Create an open-file entry for this i-node

File Links The same file can have multiple names
Because every file is uniquely identified by a number i-node number File name 216 Foo.txt Bar.txt 93 Receipe.doc 144 Speadsheet.xls …

Hard Link A hard link is a mapping from a file name (path) to an i-node Stored in a directory file Each link refers to the same file open (“foo.txt”) is equivalent to open (“bar.txt”) What happens on deletion? Each i-node contains a reference count On link deletion, decrement the ref count When the count reaches zero, the OS releases the file

Soft Links Problems with hard links: Soft links address these issues
They can’t span file systems (why?) They can’t refer to directories (why?) Soft links address these issues A soft link is a file containing a complete path When the OS encounters a soft link, it re-writes the path to include the linked location Note: soft links do not modify the i-node ref count This makes it possible to have “broken” soft links

Summary Files serve as a virtualized storage abstraction
Arbitrary size Grow and shrink dynamically The process of mapping from virtual to physical blocks resembles page tables With some key differences In UNIX, files are identified by number Directories are files that map from names to numbers

File Systems CSE451 Andrew Whitaker

Similar presentations

Presentation on theme: "File Systems CSE451 Andrew Whitaker"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

File Systems CSE451 Andrew Whitaker

Similar presentations

Presentation on theme: "File Systems CSE451 Andrew Whitaker"— Presentation transcript:

Similar presentations

About project

Feedback