File System Internals Sunny Gleason COM S 414 November 29, 2001
In this Lecture The Hard Disk –Architecture –Performance File System Structures Local File systems –Like, FAT, UFS, Ext2, Ext3
Where to Find More Info Hard Drive Manufacturers – – – Windows File Systems – ware/fatgen103.pdf – =/library/en-us/fileio/fsys_10ku.asp
Where to Find More Info Unix File Systems – NFS Version 4 – The Actual Code - –Linux Kernel Source (look in the “fs” directory of any 2.4 kernel) –BSD Kernel Source (Look in the “sys/ufs” directory)
Where to Find More Info The Book –Chapters 11, 12, … CS414 Spring 2001 Web Site – –(from which these slides are mostly stolen…) CS414 Fall 2000 Web Site – –(other useful slide sets available)
The Memory Hierarchy Memory is arranged as a hierarchy: –Close to CPU: Registers, L1 cache L2 Cache –RAM (primary memory) –Disk Storage (secondary memory) –Tape or Optical Storage (tertiary mem.) Higher = higher speed, higher cost
Hard Disk: Architecture A disk drive has several physical components –spindle –surface (one side in the pack) –read/write arm and head –track (cylinder is vertical set of tracks) –Sector
Physical Disk Access Delays associated with accessing a sector on the disk: –Seek delay (biggest) Moving the read/write head –Rotational delay Waiting for the sector to spin under the head –Transfer delay (smallest) Transferring the bits from the disk
Physical Disks O/S goal: provide file system API Problems with disks: –Read errors –Bad blocks –Missed seeks O/S Disk API may have many levels: –Physical disk block –Disk (volume) logical block –File logical
Logical Disks A single hard disk may contain multiple file systems
Making the HD Usable The hard disk must be partitioned Partitions are formatted with specific filesystems In some cases, can “quick format” instead of full reformat Multiple partitions are useful –(Limited) protection against crashes –If one partition fills up, the rest are still usable –“Dual-booting” - in general, ability to load multiple operating systems
Some Typical Numbers Sector Size: 512 bytes Cylinders per disk: 6962 Platters: Rotational Speed: 10,000 rpm Storage size: GB Seek time: ms Latency: 3ms Transfer Rate: MB/sec
Disk Structure Bare disk interface: cylinders, sectors O/S imposes structure on disks Disk contents: –Data : user files –Metadata: structural / administrative info Any ideas? Free list: structure indicating which blocks are unused Typically maintained as a bitmap: an array of bits, representing blocks
Dealing with Mechanical Latencies Caches –Locality in file access RAM disk –Reserve RAM as a [fast!] filesystem RAID –Exploiting parallelism Clever layouts and scheduling algorithms –Head scheduling –Meta-information layout
Bad Blocks All disks have some bad blocks Blocks go bad as time goes on O/S removes these blocks from the allocation map On some disks, some cylinders have reserve blocks that can be remapped to replace bad blocks
The File System File system supports the abstraction of file objects –Create, delete, read, write, rename File: a named collection of data Typical abstraction: a vector of bytes O/S knows about special file types: –Directories, symlinks, executable files For data files, applications decide internal file structure (data file format)
Accessing Files Files can be accessed in different ways: –Sequential Access Read bytes one at a time, in order –Direct access Random access, given block/byte number –Record access Some higher-level structure, instead of byte –Indexed access Uses map from index field to corresponding file record
Storing Files Files can be allocated in different ways: –Contiguous allocation All bytes together, in order –Linked Structure Each block points to the next block –Indexed Structure An index block contains pointer to many other blocks –Rhetorical Questions -- which is best? For sequential access? Random access? Large files? Small files? Mixed?
Linked-list allocation Each data block contains pointer to the next data block Advantages? Disadvantages?
Linked-List Allocation A single pointer is sufficient to locate all the blocks of the file Seeking takes O(n) time, where n is the size of the file A single corrupt pointer can cause the entire file to be lost
MS-DOS Filesystem MS-DOS uses a File Allocation Table (FAT) Like a linked structure, except pointers are kept in a separate table –For every block, the FAT keeps track of whether or not it is allocated, and if so, which block it points to –Two copies of the FAT on disk
Indexed Allocation Index block contains pointers to each data block Pros? Cons?
Combined Scheme: UFS Unix File System An inode contains the metadata for UNIX files –Contains control and allocation information –Each inode contains 15 block pointers 12 direct 1 single, 1 double, 1 triple indirect –Kind of tricky -- see the diagram!
UNIX Inode
If data blocks are 4K … –First 48K reachable from the inode –Next 4MB available from single-indirect –Next 4GB available from double-indirect –Next 4TB (!) available through the triple- indirect block Any block can be found with at most 3 disk accesses
UNIX Directories Directories are just like regular files –They contain tuples –Filename is usually filename + filename_length usr3 home4 etc5 inode 4 ken7 hopkik9 gleason12
UNIX Disk Layout Boot block provides information on how to boot the computer (tiny “bootstrap” program) Superblock contains the file system layout: # of inodes, block size, location of the free list Boot BlockSuperblock Data Blocks …Inodes
File System Problems Fragmentation –When the blocks of a file are located all over the physical disk –Causes undesirable seeking –Use defragmentation utility to compact the filesystem, consolidate free space –See the pictures!
Fragmentation
Defragmentation
File System Problems Unreliability –Historically, disks have been among the most unreliable components Develop “bad blocks” Modern disks detect such faults, and have replacement blocks that can be remapped to replace bad blocks Filesystems still need to track bad blocks and avoid using them Inode 1 is a special inode that keeps track of where all the bad blocks are
File System Problems System crashes or power failures can occur at any time –Any disk operation can be interrupted at any time –Need to ensure that the filesystem is consistent throughout updates Data that is being modified may be lost, but that should not compromise entire file system
File System Problems Crashes can occur at any time –A write in UNIX involves: Writing the new data Updating the inode Updating the free list –Is there a correct order? What can go wrong if the FS does not respect the order?
Disk Scheduling To minimize mechanical delays, the O/S looks at multiple pending disk requests –FCFS (first come, first serve) Ok when load is low Long waiting times for long request queues –SSTF (shortest seek time first) Always minimize arm movement, maximizes throughput Favors middle blocks –SCAN (elevator) Continue in same direction until done, then reverse direction and service in that order –C-SCAN: like scan, but return to 0 at end
Disk Scheduling In general, unless there are request queues, it doesn’t matter The O/S may locate files strategically for performance reasons –The Organ Pipe distribution locates heavily- used files towards the center of the disk –The Ext2 Filesystem places groups of inodes around the disk, closer to the data blocks that they reference
Conclusion Hard disks provide vast amounts of slow, cheap storage Operating Systems layer file system services on top of the raw disk API The O/S must find ways to work around the slow performance and unreliability of disk storage
Thanks! Any questions? Review session - Tuesday, 12/04 5:30 - 7:30pm