CS503: Operating Systems Spring 2014 General File Systems
Long Term Storage Must store large amounts of information Must survive termination of process using the information Must allow other processes to access the information Store information on disks in units called files
Why Files? File system model Physical reality Byte oriented Named files Users protected from each other Robust to machine failures Physical reality Block oriented Physical sector #s No protection among users of the system Data might be corrupted if machine crashes 4/15/2017
File System Requirements Users must be able to: Create, modify, and delete files at will. Read, write, and modify file contents with a minimum of fuss about blocking, buffering, etc. Share each other's files with proper authorization Transfer information between files. Refer to files by symbolic names. Retrieve backup copies of files lost through accident or malicious destruction. See a logical view of their files without concern for how they are stored. 4/15/2017
File Structure Byte sequence Record sequence Tree Read or write a number of bytes Unstructured User program decides meaning Record sequence Fixed length Read or write a one of record at a time Punch cards, 80 char records Tree Records with keys Read, insert, delete a specific record Still used on mainframe computers in some commercial data processing 4/15/2017
File Access Patterns Sequential access Random access Read all bytes in order Tapes, continuous media files (mp3, swf…) Random access Read bytes/records out of order Essential for some applications: databases seek Disks
File System Implementation Four key aspects: Layout Allocation Management of free blocks Directory management
File System Layout Partitions: independent file systems Partition Table MBR Boot block Super block Free space mgmt File mgmt Root dir Files and Directores Partitions: independent file systems MBR (Master Boot Record): boots computer, then active partition Boot block: first block executed Superblock: Info about the file system # of files, # of blocks, free blocks
File Allocation Contiguous Linked-List Linked-List + Table I-nodes 4/15/2017
Contiguous Allocation Simple: store 2 #’s Efficient: 1 seek Random Access Must know file size Fragmentation 4/15/2017
Linked-List Allocation Index with address to first First word points to next block Can grow files dynamically Random Access is slow Internal Fragmentation Incomplete block sizes Unreliable: what if you lose one block in chain? 4/15/2017
Linked-List + Table FAT in memory Random Access w/o disk reference pointers stored in table (block,next) next = -1 indicates eof Example file: 4, 7, 2, 10, 12 Current Next 1 2 10 3 11 4 7 5 6 8 9 12 -1 13 14 15 Random Access w/o disk reference FAT must be stored in memory! 20 GB disk 1 KB block 1 word per entry => 80 MB! 4/15/2017
Indexed Allocation Associate each file with data structure: i-node table of file’s blocks Only need i-nodes in memory for open files set max # of open files
i-nodes Attributes: Block Addresses File type, size Owner, group, permissions (r/w/x) Times: creation, last access, last modified Reference count Block Addresses Direct Inderect File Attributes Address of block 0 Address of block 1 … Address of block N Single Indirect Double Indirect Triple Indirect 4/15/2017
i-nodes Assume: N=10, 1KB blocks, 4 byte entries Direct: 10 KB Single indirect: 256 KB Double indirect: 64 MB Triple indirect: A lot! File Attributes Address of block 0 Address of block 1 … Address of block N Single Indirect Double Indirect Triple Indirect
Free Blocks Must keep track of blocks that are free List Bitmap
Free Block List Block Contents: entries of other block addresses that are free last entry points to another block Pros/Cons: Can be large Can be stored on Free Blocks Can load one block into memory at a time 16 GB free, 1 KB blocks, 4 bytes per entry => 65,794 blocks
Free Bitmap 1 bit per block (1 = free, 0 not free) Fixed in size 16 GB free, 1 KB blocks, 1 bit per entry => 2,048 blocks Fixed in size Free list can be smaller if few free pages Can have 1 block in memory at a time Allocated blocks are closer together
Directory System Map ASCII name of file to information needed to locate data (i-node) Can also store attributes about file UNIX Stored like a regular file Table of names and i-nodes 4/15/2017
Opening a File: /usr/bob/file Fetch root dir / Look up “usr” Get i-node for directory (as a file) usr Use i-node to retrieve blocks for directory usr Look up “bob” Get i-node for directory bob Use i-node to retrieve blocks for /usr/bob Look up “file” Get i-node for file
Directory Representation and Hard Links A directory is a file that contains a list of pairs (file name, I-node number) Each pair is also called a hard-link An I-node may appear in multiple directories. A reference count in the I-node keeps track of the number of directories where the I-node appears. When the reference-count reaches 0, the file is removed.
Hard Links Hard Links cannot cross partitions, that is, a directory cannot list an I-node of a different partition. Example. Creating a hard link to a target-file in the current directory ln target-file name-link
Soft-Links Directories may also contain Soft-Links. A soft-link is a pair of the form (file name, i-node number-with-file-storing-path) Where path may be an absolute or relative path in this or another partition. Soft-links can point to files in different partitions. A soft-link does not keep track of the target file. If the target file is removed, the symbolic link becomes invalid (dangling symbolic link). Example: ln –s target-file name-link
Opening a File Once the file i-node is retrieved store r/w bits keep track of opened file i-nodes store current read/write location within file int fseek(FILE *stream, long offset, int whence); // whence can be SEEK_SET, SEEK_CUR, or SEEK_END long lseek(int fd, long offset, int whence); store r/w bits
Data Structures for a Typical File System Process control block Open file table (system-wide, why?) i-node table File i-node Open file pointer array . 4/15/2017
Per-process Table Information All files that the process has open Information regarding the use of the file by the process The following item may be stored on a per file, process basis Current file pointer indicating the location in the file current read, write position Each entry in the per-process table points to an entry in the open-file table 4/15/2017
Open-file Table Information File Open Count counter which tracks the number of opens and closes. Pointer to I-node pointing to the I-node of the file. The I-node has been read from disk to kernel memory upon file open 4/15/2017
Opening a File (continued) Definitions: File descriptor (fd): an integer used to represent a file, easier than using names Directory: locating I-node I-node: disk location of data, used to access the “real” data fd = open( FileName, access) PCB Allocate & link up data structures Open file table File name lookup & authenticate Directory + I-node File system on disk 4/15/2017
Reading A Block read( fd, userBuf, size ) PCB read( fd, userBuf, size ) Open file table Get physical block to sysBuf copy to userBuf I-node read( device, phyBlock, size ) Buffer Cache (why?) Logical physical Disk device driver 4/15/2017
File System Consistency During a crash file system can be damaged File system has inherent redundancy Reconstruct (fsck, scandisk)
File System Consistency Block # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Used Free Iterate through blocks of files Iterate through blocks on free list If file system is consistent then block is in one or the other
File System Consistency Missing block: 0 in both rows Wastes disk capacity Add to free list Free list has block twice (what about bitmap?) Rebuild free list (complement of used blocks) Block present in multiple files Allocate a free block and duplicate
More Cases of Inconsistency Blocks allocated to multiple files. i-nodes containing block numbers that overlap. i-nodes containing block numbers out of range. Discrepancies between the number of directory references to a file and the link count of the file. i-nodes containing block numbers that are marked free in the disk map. i-nodes containing corrupt block numbers. Size checks: Incorrect number of blocks. Directory size not a multiple of 512 bytes. Directory checks: i-node number out of range. Files that are not referenced or directories that are not reachable.
Summary File Systems need to satisfy three essential requirements Store a very large amount of information Survive termination of the process using it Multiple processes must be able to access the information concurrently 4/15/2017