11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 12: File System Implementation Chapter 12: File System Implementation File System Structure File System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency and Performance Recovery NFS (SUN’s Network File System)
11.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary storage (disks). File system organized into layers. File control block – storage structure consisting of information about a file, called inode in UNIX/Linux
11.3 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts A Typical File Control Block
11.4 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Layered File System
11.5 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts File System Layers Application programs Logical file system Manages metadata information which includes all of the information about the file system structure It manages the directory structure to provide the file-organization module with the information its needs, given a symbolic name. It maintains file structure via File Control Blocks (FCB). It is also responsible for protection and security File organization module Knows about the files and their logical and physical blocks Can translate logical block address to physical block address for the basic file system to transfer It also includes a free-space manager which keeps track of unallocated blocks Basic file system This needs only issue generic commands to the appropriate device drivers to read and write physical blocks on the disk Each physical block is identified by its disk address (e.g., drive 3, cylinder 10, track 2, sector 20) I/O control Contains the device drivers and interrupt handlers to transfer information between main memory and the disk.
11.6 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Moving-head Disk Machanism
11.7 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts File System Implementation Generally, several on-disk and in-memory structures are used to implement a file system. Sector is the smallest user-accessible portion of the disk, usually it is 512 bytes for magnetic disks and 2048 bytes for optical disks. The term block was used earlier for sector. Nowadays, sector has become a common name. A file block usually consists of several (power of 2) contiguous disk blocks. Usually file block size is a configuration parameter that can be set. On-disk structures Boot control block: contains information needed by the system to boot an operating system from that partition. Volume control block: contains volume (partition) details such as number of blocks in the partition, free block counts, free block pointers, and free FCB count and FCB pointers. In Unix File systems, it is called superblock; In NT File Systems, it is called Master File Table. Directory structure (per file system): used to organize files. A FCB contains the file’s details such as owner, permissions, location of data blocks, etc. In UFS, it is called inode.
11.8 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts File System Implementation… In-memory information used for file system management as well as performance improvement Mount table: contains information about each mounted partition Directory structure: holds directory information of recently accessed directories. System-wide open file table: contains a copy of the FCB of each open file, as well as other information. Per-process open-file table: contains a pointer to the appropriate entry in the system-wide open-file table.
11.9 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts File System Implementation… To illustrate the use of these structures, let us look at how create and open work Creating a file: application program calls the logical file system to create a file. The logical file system allocates a new FCB, reads the appropriate directory into memory, and updates it with the new file name and FCB and writes back to disk Opening a file: The open call passes a file name to the file system. The file systems searches the directory structure for the given file name, and the FCB of the file is copied into a system-wide open- file table in memory. An entry is also made in the per-process open file table, with a pointer to the entry in the system-wide open-file table and some other fields. – The other fields can include a pointer to the current location in the file and access mode in which the file is open.
11.10 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts In-Memory File System Structures The following figure illustrates the necessary file system structures provided by the operating systems. Figure (a) refers to opening a file. Figure (b) refers to reading a file.
11.11 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts In-Memory File System Structures
11.12 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Partitions and Mounting A disk can be sliced into multiple partitions A partition can span multiple disks A partition can be “raw”, containing no file system UNIX swap space can use raw partition because it uses its own format and does not use a file system “cooked”, containing a file system Boot information can be stored in a separate partition This has its own format, because at boot time the system does not have file-system device drivers loaded and therefore cannot interpret file-system format. For example, for PCs that can be dual-booted, a boot loader that understands multiple operating systems can occupy the boot space. The root partition that contains the operating system kernel and other system files is mounted at boot time.
11.13 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Virtual File Systems Virtual File Systems (VFS) is a technique for integrating multiple types of file systems into a directory structure. It separates file-systems-generic operations from their implementation by defining a clean VFS interface. It provides a mechanism for uniquely representing a file throughout a network. VFS is based on a file-representation structure, called vnode which contains a networkwide unique numerical designator for each file. VFS allows the same system call interface (the API) to be used for different types of file systems. The API is to the VFS interface, rather than any specific type of file system.
11.14 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Schematic View of Virtual File System
11.15 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Directory Implementation methods Linear list of file names with pointer to the data blocks. simple to program time-consuming to execute for example, to create a file in a directory the entire list has to be searched to check if there is a file with the same name To delete a file, the entire directory needs to be searched as well Hash Table – linear list with hash data structure. decreases directory search time collisions – situations where two file names hash to the same location A chained-overflow hash table can be used to overcome the fixed size, but it might require stepping through a linked list of colliding hash table entries.
11.16 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Allocation Methods An allocation method refers to how disk blocks are allocated for files: Contiguous allocation Linked allocation Indexed allocation
11.17 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Contiguous Allocation Each file occupies a set of contiguous blocks on the disk. Simple – only starting location (block #) and length (number of blocks) are required. Random access is easy. Wastes space – external and internal fragmentation Files cannot grow.
11.18 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Contiguous Allocation of Disk Space
11.19 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Extent-Based Systems Many newer file systems (i.e. Veritas File System) use a modified contiguous allocation scheme. Extent-based file systems allocate disk blocks in extents. An extent is a contiguous set of blocks on the disk. A file is initially allocated one or more extents. As need arises, more extents are allocated dynamically.
11.20 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Linked Allocation Each file is allocated a linked list of disk blocks: blocks may be scattered everywhere on the disk. pointer block =
11.21 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Linked Allocation (Cont.) Simple – need only starting address Size of file need not be declared at the time of creation Free-space management system – no wastage of space No external fragmentation Effective sequential access but no random access Mapping of LA to PA (assuming a block size of 512 words) Block to be accessed is the Qth block in the linked chain of blocks representing the file. Displacement into block = R + 1 LA/511 Q R
11.22 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Linked Allocation
11.23 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Advantages and disadvantages of linked allocation It can be used effectively only for sequential access Random access is expensive To find the ith block, we must start from the beginning of the file and follow the pointers until we get the ith block If a pointer needs 4 bytes out of 512 byte block then, 0.78 percent is wasted for pointers. Pointers could be lost. A bug in the OS or disk hardware might result in picking up the wrong pointers.
11.24 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts File Allocation Table (FAT) Method This is a variation of linked allocation; it is used in MS-DOS and OS/2 A section of the disk at the beginning of each partition is kept aside for this table. The table has one entry for each block and is indexed by block number The directory entry contains the block number of the first block of the file. The table entry indexed by that block number then contains the block number of the next block of the file. This chain continues until the last block which contains a special end-of-file value as the table entry. Unused blocks are indicated by 0 table value. Allocating a new block involves finding the first 0 valued table entry and replacing the previous end-of-file value with the address of the new block
11.25 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts File-Allocation Table
11.26 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Indexed Allocation Brings all pointers together into the index block. This helps in solving the random access problem in the case of linked allocation Logical view. index table
11.27 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Example of Indexed Allocation
11.28 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Indexed Allocation (Cont.) Needs index table Random access is easy Dynamic allocation without external fragmentation, but has overhead of index block. Mapping from logical to physical in a file of maximum size of 256K words and block size of 512 words. We need only 1 block for index table. LA/512 Q R Q = displacement into index table R = displacement into block
11.29 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Indexed Allocation – Linked Scheme Mapping from logical address to physical address in a file of unbounded length (block size of 512 words). Linked scheme – Link blocks of index table (no limit on size). LA / (512 x 511) Q1Q1 R1R1 Q 1 = block number of index table R 1 is used as follows: R 1 / 512 Q2Q2 R2R2 Q 2 +1 = displacement into block of index table R 2 displacement into block of file:
11.30 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Indexed Allocation – Two level indexing Two-level index (maximum file size is ) LA / (512 x 512) Q1Q1 R1R1 Q 1 = displacement into outer-index table R 1 is used as follows: R 1 / 512 Q2Q2 R2R2 Q 2 = displacement into block of index table R 2 = displacement into block of file:
11.31 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Indexed Allocation – Two level indexing outer-index index table file
11.32 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Combined Scheme: UNIX (4K bytes per block) In UNIX file System, the first, say, 15 pointers to the index blocks are kept in the inode. The first 12 are pointers to the direct blocks The next three pointers point to indirect blocks The first points to a single indirect block The second points to a double indirect block The third points to a triple indirect block
11.33 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Combined Scheme: UNIX (4K bytes per block)
11.34 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] = 1 block[i] free 0 block[i] occupied First free block number calculation (number of bits per word) * (number of 0-value words) + offset of first 1 bit in the next word
11.35 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Free-Space Management (Cont.) Bit map requires extra space. Example: block size = 2 12 bytes disk size = 2 30 bytes (1 gigabyte) n = 2 30 /2 12 = 2 18 bits (or 32K bytes) Easy to get contiguous blocks Linked list (free list) – keep a linked list of all free blocks Cannot get contiguous space easily No waste of space Grouping – store the address of the first n free blocks in the first free block. The first n-1 blocks are actually free, the last block contains the addresses on another n free blocks, etc Counting – Keep the address of the first free block and the number n of free contiguous blocks that follow the first free block. (useful when contiguous allocation is used)
11.36 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Free-Space Management (Cont.) Need to protect: Pointer to free list Bit map Must be kept on disk Copy in memory and disk may differ. Cannot allow for block[i] to have a situation where bit[i] = 1 in memory and bit[i] = 0 on disk. Solution: Set bit[i] = 0 in disk. Allocate block[i] Set bit[i] = 0 in memory
11.37 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Linked Free Space List on Disk
11.38 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Efficiency and Performance Efficiency dependent on: disk allocation and directory algorithms types of data kept in file’s directory entry Performance disk cache – separate section of main memory for frequently used blocks free-behind and read-ahead – techniques to optimize sequential access improve PC performance by dedicating section of memory as virtual disk, or RAM disk.
11.39 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Page Cache Under page caching, pages are cached rather than disk blocks using virtual memory techniques. Solaris, Linux and Windows NT use this approach Solaris uses both page cache and block cache This is also known as unified virtual memory Memory-mapped I/O uses a page cache. Read, write system calls go through buffer cache This approach leads to double caching, wastes memory, and inconsistencies between the two caches can arise. Following figure illustrates this.
11.40 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts I/O Without a Unified Buffer Cache
11.41 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Unified Buffer Cache Some versions of UNIX provide a unified buffer cache to overcome the problem of double caching Under unified buffer caching, both memory mapping and read write system calls use the same page cache.
11.42 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts I/O Using a Unified Buffer Cache
11.43 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Recovery Consistency checking – programs such as fsck in UNIX and chkdsk in Windows compare data in directory structure with data blocks on disk, and try to fix any inconsistencies it finds. For example, if linked allocation is used, from the data blocks the entire file can be reconstructed, and the directory structure can be reconstructed. (Digital forensics) Use system programs to back up data from disk to another storage device (floppy disk, magnetic tape). Recover lost file or disk by restoring data from backup.
11.44 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs (or WANs). The implementation is part of the Solaris and SunOS operating systems running on Sun workstations using an unreliable datagram protocol (UDP) protocol over Ethernet.
11.45 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts NFS (Cont.) Interconnected workstations viewed as a set of independent machines with independent file systems, which allows sharing among these file systems in a transparent manner. A remote directory is mounted over a local file system directory. The mounted directory looks like an integral subtree of the local file system, replacing the subtree descending from the local directory. Specification of the remote directory for the mount operation is nontransparent; the host name of the remote directory has to be provided. Files in the remote directory can then be accessed in a transparent manner. Subject to access-rights accreditation, potentially any file system (or directory within a file system) can be mounted remotely on top of any local directory.
11.46 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts NFS (Cont.) NFS is designed to operate in a heterogeneous environment of different machines, operating systems, and network architectures; the NFS specifications are independent of these media. This independence is achieved through the use of RPC primitives built on top of an External Data Representation (XDR) protocol used between two implementation-independent interfaces. The NFS specification distinguishes between the services provided by a mount mechanism and the actual remote-file-access services.
11.47 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Three Independent File Systems
11.48 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Mounting in NFS Mounts Cascading mounts
11.49 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts NFS Mount Protocol Establishes initial logical connection between server and client. Mount operation includes name of remote directory to be mounted and name of server machine storing it. Mount request is mapped to corresponding RPC and forwarded to mount server running on server machine. Export list – specifies local file systems that server exports for mounting, along with names of machines that are permitted to mount them. Following a mount request that conforms to its export list, the server returns a file handle—a key for further accesses. File handle – a file-system identifier, and an inode number to identify the mounted directory within the exported file system. The mount operation changes only the user’s view and does not affect the server side.
11.50 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts NFS Protocol Provides a set of remote procedure calls for remote file operations. The procedures support the following operations: searching for a file within a directory reading a set of directory entries manipulating links and directories accessing file attributes reading and writing files NFS servers are stateless; each request has to provide a full set of arguments. Modified data must be committed to the server’s disk before results are returned to the client (lose advantages of caching). The NFS protocol does not provide concurrency-control mechanisms.
11.51 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Three Major Layers of NFS Architecture UNIX file-system interface (based on the open, read, write, and close calls, and file descriptors). Virtual File System (VFS) layer – distinguishes local files from remote ones, and local files are further distinguished according to their file-system types. The VFS activates file-system-specific operations to handle local requests according to their file-system types. Calls the NFS protocol procedures for remote requests. NFS service layer – bottom layer of the architecture; implements the NFS protocol.
11.52 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Schematic View of NFS Architecture
11.53 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts NFS Path-Name Translation Performed by breaking the path into component names and performing a separate NFS lookup call for every pair of component name and directory vnode. To make lookup faster, a directory name lookup cache on the client’s side holds the vnodes for remote directory names.
11.54 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts NFS Remote Operations Nearly one-to-one correspondence between regular UNIX system calls and the NFS protocol RPCs. NFS adheres to the remote-service paradigm, but employs buffering and caching techniques for the sake of performance. File-blocks cache – when a file is opened, the kernel checks with the remote server whether to fetch or revalidate the cached attributes. Cached file blocks are used only if the corresponding cached attributes are up to date. File-attribute cache – the attribute cache is updated whenever new attributes arrive from the server. Clients do not free delayed-write blocks until the server confirms that the data have been written to disk.