File Systems.

Name: File Systems.
Uploaded: 2017-10-15T01:18:02+00:00
Duration: PTM30S45
Channel: Bruce Cook
Description: File Systems.

File Systems

Outline Overview of file systems File system design Sharing files
Unix file system Consistency and crash recovery Journaling file systems Log-structured file systems

What is a File System? A file system provides an abstraction for storing, organizing and accessing persistent data I.e., data survives after process that created the data has terminated, and after machines crashes, reboot This data is stored on disks, tapes, solid-state drives (SSD) … File-system data is organized as objects called files Need a way to find files, so files have names and are organized as directories Files are accessed via system calls Files can be accessed concurrently by different processes

File Types The OS typically treats files as an unstructured sequence of bytes Programs can impose any format on files E.g., application programs may look for specific file extension to indicate the file’s type However, OS needs to understand the format of executable files to execute programs Executable file

File Metadata Files have various attributes associated with them
Name, owner, creation time, access permissions, size, etc. These attributes are called file metadata File system maintains file metadata in per-file data structures on disk

Basic File-Related Calls
Open Start using file, set position to beginning of file Read, Write Read/Write n bytes from/to current position Update position Seek Move to a new position Allows random access (mainly for disks, not tape) Close Stop using file Create, Rename, Delete, Get/Set attributes

Directories Provide a method for naming and locating a file
Store a list of directory entries that point to files Modern systems use hierarchical directories A directory contains files or sub-directories E.g., B contains entries for D, j and E Files are accessed with pathnames Absolute pathname E.g., cat /B/D/n Relative pathname Uses current directory E.g., cd /B/D; cat n Directory metadata is similar to files Interior node / A B C i D j E F k l m n G H Leaf node o p q

Basic Directory-Related Calls
Open Readdir Read entries in a directory Each entry points to a file or sub-directory Seekdir Simulated at user level Close Create, Rename, Delete, Get/Set attributes No Writedir! Link, Unlink Add/remove a name for an existing file (more later)

File System Design OS needs to store and retrieve files and directories Needs to maintain information about where they are stored Needs to store files durably, i.e., ensure that files exist after machine reboot Needs to handle machine crashes On a crash, OS stops suddenly, perhaps in the middle of a file system operation On restart, the file system should be able to recover data and bring the file system back to a good or consistent state

Disk Blocks Disks are accessed at the granularity of sectors
Typically, 512 bytes A file system allocates data in chunks called blocks The file system treats the disk as an array of blocks A block contains 2n contiguous sectors Reduces overhead of managing individual bytes Large blocks improve throughput but increase internal fragmentation

File System Tasks A file system performs four main tasks
Free block management Allocates blocks to a file, manages free blocks Uses bitmaps, linked list, B-trees Issues similar to memory, swap management Block allocation and placement Maps (potentially non-contiguous) blocks to the file Issues similar to virtual memory, placement unique to disks Directory management Maps file names to location of starting block of file Buffer cache management Caches disk blocks in memory to minimize I/O

Free Block Management - Bitmaps
Keep a bitmap in a separate area on disk 1 bit per disk block Suppose block size = 4 KB, disk size = 160 GB Nr. of blocks = 40 M Need 40 Mbits = 5 MB disk space => 1280 bitmap blocks Advantages Allows allocating contiguous blocks to a file easily Need only one bitmap block in memory at a time Disadvantages Need extra space for bitmap

Block Allocation and Placement
Maps, potentially non-contiguous, blocks to the file Options Contiguous allocation Linked list allocation File allocation table (FAT) I-node based allocation

Contiguous Allocation
All blocks in a file are contiguous on the disk After deleting D, F

When to Use Contiguous Allocation
Advantages Performance is good for sequential reading Disadvantages File growth requires copying Disk becomes fragmented after deletion Will need periodic compaction Good for CD-ROMs All file sizes are known in advance Files are never deleted

Linked List Allocation
Each file is a linked list of blocks First word in a block contains number of next block Disadvantage Random access are slow

File Allocation Table (FAT)
Keep linked list information in memory Uses an index table with one entry per disk block Each entry contains the address of the next block Advantages Random access needs in-memory search (fast) Disadvantages Entire table stored in memory, doesn’t scale with large file systems End of file marker

Inode Based Allocation
Linked list allocation spreads index information on disk, slowing random access FAT keeps linked-list index information in memory but that limits size of file system Idea Store index information for locating file blocks close together on disk Cache this information in memory when file is opened This approach avoids the problems above Problem with the idea The index information may grow with file growth It cannot be stored contiguously

Inode Based Allocation
Use a tree to store index information Tree structure allows growth of index information, without spreading this information too much Root of tree is called inode (index node) Inode is stored on disk There is one inode per file or directory

Inode Structure Twelve direct block pointers One indirect pointer
Point directly to file data blocks (called direct or data blocks) One indirect pointer Points to an indirect block that contains pointers to direct blocks One double indirect pointer Points to a double indirect block that contains pointers to indirect blocks One triple indirect pointer Points to a triple indirect block that contains pointers to double indirect blocks (not shown below) allocation strategy: it optimizes for small files. Small files can be accessed without requiring additional disk IO for reading indirect blocks. Why this allocation strategy?

Maximum File Size Say block size is 4 KB
Say block pointer size is 4 bytes So 1024 block pointers per block Total number of blocks in the file 12 direct blocks 1024 blocks via indirect block pointer 1024 * 1024 blocks via double indirect block pointer 1024 * 1024 * 1024 blocks via triple indirect block pointer Total file size ( ) * 4KB ≈ 2 10*3 * 4 * 210 = 242 = 4 TB

Unix File System Layout
super block: maintains file system wide information such as partition size, where the different regions (e.g. inode bitmap, block bitmap) start and end. inode bitmap: maintains allocation status of inodes. block bitmap: maintains allocation status of blocks in the “file and directory blocks area” inode blocks: maintains an array of inodes. Each file or directory is associated with an inode. file and directory blocks: all data and indirect blocks for files and directories are located here. Unix File System

Block Placement Block placement is the policy used by file system for block allocation Original Unix file system had two placement problems Data blocks allocated randomly in “aging” file systems Blocks for file allocated sequentially when file system is new As file system fills, blocks are allocated from deleted files Deleted files may be randomly placed So, blocks for new files become scattered across disk Inodes allocated far from blocks All inodes at beginning of disk, far from data Traversing file name paths, manipulating files, directories requires going back and forth from inodes to data blocks Both of these problems generate many long seeks

BSD Fast File System BSD Unix redesigned Unix FS
New FS called Fast File System (FFS) Disk partitioned into groups of cylinders Recall, cylinder is the same track across platters Cylinder group consists of contiguous cylinders Placement policy: place these in same cylinder group Inode, data blocks in a file Files in a directory If cylinder group is full, place in nearby group The reason the super block is placed in all the cylinder groups is for reliability. If the first one gets damaged, due to sector failure, another one can be used. This is really important because without the super block, the entire file system cannot be accessed. The super blocks are placed in an offset manner so that a head failure across the same region of the cylinder does not simultaneously damage all super blocks.

Directory Management A directory contains zero or more entries
One entry per file or sub-directory that resides in the directory Directory entries are kept in directory data blocks Entry maps file names to location of starting block, has File name, file attributes Block number of first block of the file Data blocks Kernel.C attributes Block Nr. Kernel.h os …

Unix Directories In Unix, each entry has (File name, Inode number)
Inode number helps locate i-node of the file Inode contains file attributes Note that the inode is located in the inode blocks area

Unix Directories Hard links Symbolic links (short cuts)
More than one name for a file Different directory entries have the same inode number /C/F/r points to the same inode as /B/D/n Inodes maintains reference count Dag, instead of tree structure Symbolic links (short cuts) A file contains data naming another file (a redirect) The file contents of /C/F/G/s are /B/D/m Interior node / A B C i D j E F k l m n r G H Leaf node s o p q

File Names Short, Fixed Length Names MS-DOS/Windows Original Unix
FILE3.BAK (8+3) Name has 11 bytes Original Unix Name has 14 bytes

File Names Variable Length Names E.g., Unix Options
Each name can be 4096 bytes Size of directory entry is variable Options Entries are allocated contiguously Each entry has length of entry and then name of file name Fragmentation occurs when files are removed Allocate set of pointers to file names in the beginning of the directory Use heap at the end to store names

File Deletion Directory entry is removed from directory
All blocks in file are returned to free list Hard Links Put a “reference count” field in each inode Counts number of entries that point to the file When removing file from directory, decrement count When count goes to zero, reclaim all file blocks Symbolic Link Remove the real file Symbolic link is “broken” Similar to a bad URL

Path Lookup in Unix FS Say File F located in directory /D1/D2 has to be read What blocks need to be read from disk? Note that at least two blocks are read for each path component 2 for / 2 for D1 2 for D2 2 for F

Path Lookup in Unix FS Say File F located in directory /D1/D2 has to be read What blocks need to be read from disk? Super block (provides location of inode blocks area) Normally this block is read when a file system code performs initialization and this block is cached in memory Inode of the / directory (from the inode blocks area) Data blocks of / directory (provides directory entry for D1) Inode of the D1 directory Data blocks of D1 directory (provides directory entry for D2) Inode of the D2 directory Data blocks of the D2 directory (provides directory entry for F) Inode of F file Data blocks of F File Note that at least two blocks are read for each path component 2 for / 2 for D1 2 for D2 2 for F

Buffer Cache Management
Notice each file access requires many block accesses File operations often access the same disk block E.g., block containing contents of root (/) directory Caching disk blocks in memory can reduce disk I/O Traditionally block cache is called a buffer cache Cache operations Block lookup If block in memory, returns data from buffer Block miss Read disk block into buffer, update buffer cache Block flush If buffer is modified, write it back to disk block

Buffer Cache Organization
Many blocks can be cached in memory With 16GB machine, say 8GB for buffer cache Block size = 4K, nr. of blocks cached = 2M Use a hash table to lookup block in memory efficiently key Device Block # Disk blocks in memory

Buffer Cache Write Policy
When an application writes to a file, the corresponding block is updated in the buffer cache When is the disk block updated? Immediately (synchronously) Write-through cache Correct, but very slow Later (asynchronously) Write-back cache Fast, but what if system crashes? File system can become inconsistent because some blocks in memory are not on disk We discuss this problem in detail later

Buffer Cache Issues Buffer cache typically has limited size, so we need replacement algorithms Typically, LRU is used Buffer cache competes with virtual memory system How many frames to allocate for buffer cache vs. virtual memory? Some systems use a unified memory cache for buffer cache and virtual memory pages The blocks of the buffer cache and pages in the page cache are part of a unified caching scheme However, if a program reads a large file, then it affects programs that are not accessing files much

Read Ahead Applications often read files sequentially
File system can predict that a process will request a file block after the one that is requesting File system prefetches next block from disk Also, called read ahead Note that the next block may not be allocated sequentially If process requests next block, it will be in cache Allows operlapping IO with execution

Sharing Files Files need to be shared across processes Issues
Concurrent access What happens when threads read and write a file simultaneously? Protection How should the OS ensure that only an authorized user can access a file?

Concurrent Access OS ensures sequential consistency
A read() call sees data from most recent finished write() call (even if write occurred on another processor) All processors see same order of writes, i.e., if a file block has value 1, followed by 2, then no processor will read 2, 1 Applications still have to ensure read called after write has finished Concurrent accesses may see old and new data

Protection Who (subject) can access a file (object)?
How can they access it (action)? A protection system dictates whether a given action performed by a given subject on an object is allowed Actions include, read, write, execute, append, change protection, delete, etc. Two mechanisms for enforcing protection Access control lists (ACL) Capabilities

Access Control Lists (ACL)
For each object, maintain list of subjects and their permitted actions Easier to manage Easy to grant, revoke Problem when objects are heavily shared ACLs become large Use groups

Capabilities For each subject, maintain list of objects and their permitted actions Easier to transfer Like keys, can handoff, does not depend on subject Revoking capability is challenging Need to keep track of all subjects that have capability

ACLs vs. Capabilities Objects Subjects Capability ACL /one /two /three
Alice rw - Bob w r Charlie Subjects Capability ACL

The UNIX File-System Data Structures
In memory On disk, cached in memory

Example: Unix File-Related System Calls
fd = open(name, mode) Perform path lookup on name to find inode of file Cache inode for file in buffer cache Check permissions Set up entry in open file table Set up entry in file descriptor table Return fd byte_count = read(fd, buffer, buffer_size) Figure out data (and indirect) block(s) to read Read them from disk into buffer cache Copy data to user buffer Update file position Return number of bytes read

Example: Unix File-Related System Calls
byte_count = write(fd, buffer, num_bytes) Figure out data (and indirect) block(s) to write Read them from disk into buffer cache if the block is being partially updated Copy data from user buffer into buffer cache blocks Update i-node Mark modified buffers, such as inode, free maps, indirect, and data blocks, as dirty Schedule writing dirty buffers to disk Update file position Return number of bytes written close(fd) Reclaim resources

Summary File systems are designed to store data durably and reliably in file A file is an abstraction for a disk An application thinks of a file as contiguous byte array File system maps file to non-contiguous blocks A file system performs four main tasks Free block management Block allocation and placement Directory management Buffer cache management

Think Time What is the purpose of directories in a file system?
What operations update directories? In Unix, the directory hierarchy forms an acyclic graph. Explain how. How are cycles not allowed in the graph? Why are they not allowed? What are the benefits/drawbacks of using inodes in a Unix file system vs. the FAT file system? Describe the difference between hard and symbolic links in Unix? purpose of directories: used for naming files, sub-directories update directories: file/dir create, rename file/dir, delete file/dir, change attributes of dir acyclic graph: acyclic graph due to hard links how are cycles are not allowed: a hard link can only be created to files, that are leaves in the file system tree. A hard link cannot be created to a directory, and hence cycles cannot form. why are cycles not allowed: recall that directories need to be empty before they can be removed. if cycles were allowed, it would not be possible to remove a directory that was part of a cycle, unless an expensive cycle detection check was performed to find out that all the directories in the cycle were empty and could be removed. benefits/drawback of inodes: benefits: allows scaling to much larger file systems, allows hard links, drawbacks: FAT is more efficient for smaller file systems difference between hard and symbolic links: a hard link allows multiple directory entries to point to the same inode. a symbolic link is a directory entry that points to a special file, that contains the path of another file.

Think Time What were the problems in the Unix file system that led to the FFS design? Describe the operations needed to write the string "xyz" to an existing file "/a/b" in Unix FFS design: files became fragmented and were too spread out on disk, inodes and file data were far apart Operations needed to write “xyz” read super block (if not in memory) read inode of /, directory block of / read inode of a, directory block of a read inode of b If first block of b exists (it may not exist if file is zero sized) read block into buffer cache else allocate block on disk (update block bitmap) allocate block in buffer cache update block in buffer cache with the string “xyz”, mark it dirty update file position update inode of b with new timestamp, file size, etc. schedule writing of file block, inode, block bitmap to disk

Mounting File Systems We say that a container holds a set of items
A namespace is the set of unique names for all items in a container A file system namespace is the set of names for all files in a file system File system mounting glues a file system namespace into the namespace of another file system Provides a unified namespace

Mounting File Systems - Example
Each device/partition stores a single file system Device or partition can be local or remote A mount point is an empty directory, say A, in existing namespace After mounting new file system is available at A / / FS 2 FS 1 mnt A A

File Systems.

Similar presentations

Presentation on theme: "File Systems."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

File Systems.

Similar presentations

Presentation on theme: "File Systems."— Presentation transcript:

Similar presentations

About project

Feedback