Chapter 4 File Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
File Systems The best way to store information: Store all information in virtual memory address space Use ordinary memory read/write to access information Not feasible: no enough address space When process exits, information is lost Another approach: Define named objects called files which hold programs and data Files are not part of the address space O.S. provides special operations (system calls) to create/destroy files Files are permanent records
Essential requirements for long-term information storage: It must be possible to store a very large amount of information. The information must survive the termination of the process using it. Multiple processes must be able to access the information concurrently. File Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Think of a disk as a linear sequence of fixed-size blocks and supporting reading and writing of blocks. Questions that quickly arise: How do you find information? How do you keep one user from reading another’s data? How do you know which blocks are free? File Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure 4-1. Some typical file extensions. File Naming Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
User’s View of the File System A file system consists of two distinct parts 1. A collection of files, each storing related data 2. A directory structure, which organizes and provides information about all the files in the system Three common file organizations 1. Unstructured byte sequence The basic unit for reading and writing is byte, sequential access. Example: Unix.
User’s View of the File System 2. Sequence of fixed-size records The basic unit is a record, sequential access. Records can be read and written, but cannot be inserted or deleted in the middle of a file. Example: CP/M. 3. A tree of disk blocks The basic unit is a record. Each block holds n keyed records. Records can be looked up by key and inserted. Blocks can be split. Used on mainframe computers.
Figure 4-2. Three kinds of files. (a) Byte sequence. (b) Record sequence. (c) Tree. File Structure Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
File Types Regular files: contain data bytes Directories: contain the names of other files Special files: I/O devices Character special files: e.g. terminals, printers Block special files: e.g. disks, tapes File access methods Sequential access: start at the beginning, read the bytes or records in order Random access: read bytes or records in any order
Figure 4-3. (a) An executable file. (b) An archive. File Types Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure 4-4a. Some possible file attributes. File Attributes Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved File attributes: Extra information associated with each file
The most common system calls relating to files: File Operations Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved Append Seek Get Attributes Set Attributes Rename Create Delete Open Close Read Write Open: bring attributes and disk addresses to memory for fast access Close: put back to disk Seek: put the pointer to a specific location in the file
Figure 4-5. A simple program to copy a file. Example Program Using File System Calls (1) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure 4-5. A simple program to copy a file. Example Program Using File System Calls (2) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Directories 1. Single directory shared by all users Different users cannot use the same file name. 2. One directory per user Different users can use the file name. But each user can keep only a sequence of files, no any group support. 3. Arbitrary tree per user Flexible. Used in Unix.
Figure 4-6. A single-level directory system containing four files. Hierarchical Directory Systems (1) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure 4-7. A hierarchical directory system. Hierarchical Directory Systems (2) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure 4-8. A UNIX directory tree. Path Names Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
System calls for managing directories: Directory Operations Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved Readdir Rename Link Uplink Create Delete Opendir Closedir
File System Implementation (Designer’s View) How to store n bytes on disk 1.Allocate n consecutive bytes of disk space For insertion, have to move the file (slow) 2. The file is split up into a number of (not necessarily contiguous) fixed size blocks More expensive to keep track of where the blocks are.
Candidates: sector, track, cylinder The time to read a block from disk: Seek time + rotation time + transfer time Example: A disk with bytes per track Rotation time: msec Average seek time 30 msec Block size K bytes The time to read a block: /2 + (K/32768) x16.67 Large block size: poor disk space utilization Small block size: reading a block is slow Compromise between the speed and utilization: General block size: 512, 1K or 2K Block Size
Figure The solid curve (left-hand scale) gives the data rate of a disk. The dashed curve (right-hand scale) gives the disk space efficiency. All files are 4 KB. Disk Space Management Block Size Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
1. Store blocks consecutively When file grows, may cause problem. 2. Linked list Use the last two bytes as s pointer to point to the next block in the file. Problems: a. The number of bytes in a block is no longer power of 2 b. Random access is slow Keeping Track of Blocks of Each File
Figure 4-9. A possible file system layout. File System Layout Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure (a) Contiguous allocation of disk space for 7 files. (b) The state of the disk after files D and F have been removed. Contiguous Allocation Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure Storing a file as a linked list of disk blocks. Linked List Allocation Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
File allocation table (FAT) A table has one entry for each disk block. All blocks of a file linked together. The directory keeps the first block number of each file. The table is stored in memory. Example system: MS-DOS Problem: Pointers for all files on the disk are mixed up in the same table. An entire FAT is potentially needed for accessing one file. Linked List Allocation Using a Table in Memory
Figure Linked list allocation using a file allocation table in main memory. Linked List Allocation Using a Table in Memory Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
I-nodes Keep the block lists for different files in different places I-node (index node) in Unix. Each file has a small table called I-node I-node contains attributes, first 10 blocks of the file, and 3 indirect block numbers. Single indirect block: points to data blocks Double indirect block: points to single indirect blocks Triple indirect block: points to double indirect blocks
Figure An example i-node. I-nodes Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Example: For 1K disk block and 32-bit disk address, how large a file could be? Single indirect block holds 256 disk addresses Double indirect block holds 256 single indirect blocks File size < 10 blocks: no indirect blocks File size between 10 and 266 blocks: single indirect File size between 267 and x 256 = 65802K blocks, use double indirect File size between 64M and 16G, triple indirect File size > 16 G, cannot handle I-nodes
Advantage: fast random access At most three disk references are needed to locate the disk address of any byte in the file. When the file is open, its I-node is brought into memory until the file is closed. Example of finding the 300th block in a file: = 34 Follow double indirect to the first single indirect, the 34th block. No need to follow the 299 blocks before this block as in the linked list case. I-nodes
Implementing Directories Directory organization Single directory in the system. Example: CP/M Hierarchical directory tree. Example 1: MS-DOS Attributes: 8 1-bit flags: b7, b6, …, b0 b0 = 1: read only file b1 = 1: hidden file b2 = 1: system file b4 = 1: a subdirectory Example 2: Unix
Figure (a) A simple directory containing fixed-size entries with the disk addresses and attributes in the directory entry. (b) A directory in which each entry just refers to an i-node. Implementing Directories (1) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure The MS-DOS directory entry. The MS-DOS File System (1) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure A UNIX V7 directory entry. The UNIX V7 File System (1) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure A UNIX i-node. The UNIX V7 File System (2) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure The steps in looking up /usr/ast/mbox. The UNIX V7 File System (3) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure Two ways of handling long file names in a directory. (a) In-line. (b) In a heap. Implementing Directories (2) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Shared Files A shared file appears simultaneously in different directories belonging to different users File system is a directed acyclic graph, not a tree Problems: When directories contain disk addresses (e.g. CP/M), cannot share files. Solution: Directory does not point to disk addresses. Point to a data structure associated with the file. Use Link Count to record how many users are sharing the file. When performing ``remove file", only when count = 0, remove the file, otherwise reduce the Link Count.
Figure File system containing a shared file. Shared Files Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure (a) Situation prior to linking. (b) After the link is created. (c) After the original owner removes the file. Shared Files Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Example: Unix. To share a file, different directories point to the same i-node. Problem with pointing to the same data structure: After a file owner removes the file, counting information may still be charged to the file owner. Solution: use symbolic linking: Create a new file, of type LINK. Enter that file in another user's directory. The new file only contains the path to the shared file. Can be used on different machines. Problem with symbolic linking: Extra overhead to access file: read extra block to get the i- node. Shared Files
Unix provides both ways ln file1 file 2 Points to the same i-node. Can be seen by ls -i file1 ls -i file2 ln -s file link Create a symbolic link: link ls -l will show l file more link is equivalent to more file Shared Files
Operations required to remove a file in UNIX: Remove the file from its directory. Release the i-node to the pool of free i-nodes. Return all the disk blocks to the pool of free disk blocks. Journaling File Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure Position of the virtual file system. Virtual File Systems (1) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure A simplified view of the data structures and code used by the VFS and concrete file system to do a read. Virtual File Systems (2) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure Percentage of files smaller than a given size (in bytes). Disk Space Management Block Size (1) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Keeping Track of Free Blocks 1. Linked list Use several blocks to store free block numbers. Link these blocks together. Search is fast. Example: 1K block size. 16-bit disk block number (20 M disk). Need 40 blocks to hold all 20K block numbers. 2. Bit map n blocks require n bits. Smaller space. Search is slow. May be put in memory. Example: 20 M disk requires only 20K bits (3 blocks).
Figure (a) Storing the free list on a linked list. (b) A bitmap. Keeping Track of Free Blocks Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure (a) An almost-full block of pointers to free disk blocks in memory and three blocks of pointers on disk. (b) Result of freeing a three-block file. (c) An alternative strategy for handling the three free blocks. The shaded entries represent pointers to free disk blocks. Keeping Track of Free Blocks Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure Quotas are kept track of on a per-user basis in a quota table. Disk Quotas Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Backups to tape are generally made to handle one of two potential problems: Recover from disaster. Recover from stupidity. File System Backups (1) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure A file system to be dumped. Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number. File System Backups (2) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure Bitmaps used by the logical dumping algorithm. File System Backups (3) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
File System Consistency After system crashes, the file system may be in an inconsistent state. A utility program checks: 1.Block consistency: a block is either free or in a file. Two counters per block: one counter counts the times the block presents in a file; another counter counts the times the block presents in the free list. Read all I-nodes: for each block number in I-node, increment the first counter Check the free list, increment the second counter Consistent: counter1/counter2 = 1/0 or 0/1 Missing block: 0/0 Correction: put the block in free list Duplicated block in free list: 0/2 Correction: delete one from free list Duplicated data block: 2/0 Correction: allocate a free block, copy the data of the block into it and insert the copy to one of the file 1/1: should be corrected to 1/0
Figure File system states. (a) Consistent. (b) Missing block. (c) Duplicate block in free list. (d) Duplicate data block. File System Consistency Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
File System Consistency 2. File consistency: check how many users are sharing a file One counter per file (per I-node) Starts from root directory, inspect all files. For each file, increment the counter for the file's I-node Compare the counters with the Link Count in the I-node _ Consistent: Link Count = counter _ Inconsistent: Link Count not = counter _ Link Count > counter: all files removed, I-node is still not removed, waste space _ Link Count < counter: file may point to a released I-node Solution: Force the Link Count = actual number of directory entries 3. Other heuristic checks I-node number > it should be Strange mode, e.g., 777 Directories with too many entries directory.
Disk access is about 100,000 times slower than memory access Reduce the number of disk accesses needed Block cache: keep some blocks (logically belong to the disk) in memory Read a block, first check cache. If the block is not in cache, read it into cache, then use it. For full cache, replace one block Replacement algorithm: similar to virtual memory (FIFO, LRU,...), but need to consider special features of files File System Performance
Blocks may not have good locality Which blocks will be needed again soon Indirect blocks are seldom used again. Partially full data block may be needed soon (being written). Which blocks are essential to the consistency of the file system I-node, indirect, and directory blocks are essential, and need to be written to disk as soon as modified. Data blocks should not stay in cache for too long without writing them out (possible data loss when system break down). Two ways to deal with this: (1) Provide a system call which forces all modified blocks out into disk. Unix system call SYNC, called every 30 sec. (2) Write through cache. All modified blocks are written back to disk immediately. Example: MS-DOS.
Some blocks, such as i-node blocks, are rarely referenced two times within a short interval. Consider a modified LRU scheme, taking two factors into account: Is the block likely to be needed again soon? Is the block essential to the consistency of the file system? Caching Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure The buffer cache data structures. Caching Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Reducing Disk Arm Motion 1. Allocate the blocks that are likely to be accessed in sequence in the same cylinder. How to get consecutive free blocks. If linked list is used for free blocks, keep track of disk usage not in blocks, but in groups of consecutive blocks (e.g. in units of 2 blocks. Allocation unit is still a block, but 2 consecutive blocks in a file are consecutive on disk). 2. Reduce rotation time: Place consecutive blocks in a file in the same cylinder, but interleaved for maximum throughput. Example: Rotation time = 16:67 ms Block transfer time 4 ms 4 way interleave. One rotation can read 4 blocks
Reducing Disk Arm Motion For the system using I-node First access I-node then data block. Put I-node close to data block. Usually I-node placed at the start of the disk. On the average, data block is ½ of the number of cylinders away. Put I-node at middle of the disk. Average distance: 1/4 of the number of the cylinders. Divide disk into cylinder groups. Each with its own I-nodes, blocks and free list. For an I-node in each group. Try to allocate blocks within the group.
Figure (a) I-nodes placed at the start of the disk. (b) Disk divided into cylinder groups, each with its own blocks and i-nodes. Reducing Disk Arm Motion Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure The ISO 9660 directory entry. The ISO 9660 File System Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Rock Ridge extension fields: PX - POSIX attributes. PN - Major and minor device numbers. SL - Symbolic link. NM - Alternative name. CL - Child location. PL - Parent location. RE - Relocation. TF - Time stamps. Rock Ridge Extensions Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Joliet extension fields: Long file names. Unicode character set. Directory nesting deeper than eight levels. Directory names with extensions Joliet Extensions Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved
Figure Maximum partition size for different block sizes. The empty boxes represent forbidden combinations. The MS-DOS File System (2) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved