File Systems
Main Points File layout Directory layout
Operating Systems: A Modern Perspective, Chapter 13 The External View of the File Manager Hardware Application Program Application Program File MgrDevice MgrMemory Mgr Process Mgr UNIX File MgrDevice MgrMemory Mgr Process Mgr Windows open() read() close() write() lseek() CreateFile() ReadFile() CloseHandle() SetFilePointer() WriteFile() mount()
Operating Systems: A Modern Perspective, Chapter 13 Persistent storage Shared device Why Programmers Need Files HTML Editor HTML Editor … … Web Browser Web Browser Structured information Can be read by any applic Accessibility Protocol … … … … foo.html File Manager File Manager File Manager File Manager
Operating Systems: A Modern Perspective, Chapter 13 File Management File is a named, ordered collection of information The file manager administers the collection by: – Storing the information on a device – Mapping the block storage to a logical view – Allocating/deallocating storage – Providing file directories
Operating Systems: A Modern Perspective, Chapter 13 Information Structure Records Applications Structured Record Files Record-Stream Translation Stream-Block Translation Byte Stream Files Storage device
Operating Systems: A Modern Perspective, Chapter 13 Information Structure Records Applications Stream-Block Translation Byte Stream Files Storage device
Operating Systems: A Modern Perspective, Chapter 13 Byte Stream File Interface fileID = open(fileName) close(fileID) read(fileID, buffer, length) write(fileID, buffer, length) seek(fileID, filePosition)
Operating Systems: A Modern Perspective, Chapter 13 Low Level Files Stream-Block Translation b0b0 b1b1 b2b2 bibi... fid = open(“fileName”,…); … read(fid, buf, buflen); … close(fid); int open(…) {…} int close(…) {…} int read(…) {…} int write(…) {…} int seek(…) {…} Storage device response to commands
Operating Systems: A Modern Perspective, Chapter 13 Structured Files Records Record-Block Translation Structured Record Files
Operating Systems: A Modern Perspective, Chapter 13 Record-Oriented Sequential Files Logical Record fileID = open(fileName) close(fileID) getRecord(fileID, record) putRecord(fileID, record) seek(fileID, position)
Operating Systems: A Modern Perspective, Chapter 13 Record-Oriented Sequential Files... H byte headerk byte logical record Logical Record
Operating Systems: A Modern Perspective, Chapter 13 Record-Oriented Sequential Files... H byte headerk byte logical record... Fragment Physical Storage Blocks Logical Record
Operating Systems: A Modern Perspective, Chapter 13 Indexed Sequential File Suppose we want to directly access records Add an index to the file fileID = open(fileName) close(fileID) getRecord(fileID, index) index = putRecord(fileID, record) deleteRecord(fileID, index)
Operating Systems: A Modern Perspective, Chapter 13 Indexed Sequential File (cont) Account # Index i k j index = i index = k index = j Application structure
File System Design File System is an organized collection of regular files and directories (mkfs) Data structures – Directories: file name -> file metadata Store directories as files – File metadata: how to find file data blocks – Free map: list of free disk blocks
File System Design Constraints For small files: – Small blocks for storage efficiency – Files used together should be stored together For large files: – Contiguous allocation for sequential access – Efficient lookup for random access May not know at file creation – Whether file will become small or large
Design Challenges Index structure – How do we locate the blocks of a file? Index granularity – What block size do we use? Free space – How do we find unused blocks on disk? Locality – How do we preserve spatial locality? Reliability – What if machine crashes in middle of a file system op?
File Systems Traditional FFS file system (Linux) Microsoft’s FAT, FAT2 and NTFS file systems Journaling file systems, ext3 … others
File System Design Options FATFFSNTFS Index structure Linked listTree (fixed, assym) Tree (dynamic) granularityblock extent free space allocation FAT arrayBitmap (fixed location) Bitmap (file) LocalitydefragmentationBlock groups + reserve space Extents Best fit defrag
Named Data in a File System
Microsoft File Allocation Table (FAT) Linked list index structure – Simple, easy to implement – Still widely used (e.g., thumb drives) File table: – Linear map of all blocks on disk – Each file a linked list of blocks
FAT
Pros: – Easy to find free block – Easy to append to a file – Easy to delete a file Cons: – Small file access is slow – Random access is very slow – Fragmentation File blocks for a given file may be scattered Files in the same directory may be scattered Problem becomes worse as disk fills
Berkeley UNIX FFS (Fast File System) inode table – Analogous to FAT table inode – Metadata File owner, access permissions, access times, … – Set of 12 data pointers – With 4KB blocks => max size of 48KB files
File System Structure Basic unit for allocating space on the disk is a block partition Disk File System Boot Block Super-block i-node table Data blocks
I-nodes Each file or directory in the file system has a unique entry in the i-node table. File type (regular, symbolic link, directory…) Owner Permissions Timestamps for last access; last modification, last status change Size …
i-node entry 0 … 5 … DB 0 DB 5 IPB DB
FFS inode Metadata – File owner, access permissions, access times, … Set of 12 data pointers – With 4KB blocks => max size of 48KB files Indirect block pointer – pointer to disk block of data pointers Indirect block: 1K data blocks => 4MB (+48KB)
FFS inode Metadata – File owner, access permissions, access times, … Set of 12 data pointers – With 4KB blocks => max size of 48KB Indirect block pointer – pointer to disk block of data pointers – 4KB block size => 1K data blocks => 4MB Doubly indirect block pointer – Doubly indirect block => 1K indirect blocks – 4GB (+ 4MB + 48KB)
FFS inode Metadata – File owner, access permissions, access times, … Set of 12 data pointers – With 4KB blocks => max size of 48KB Indirect block pointer – pointer to disk block of data pointers – 4KB block size => 1K data blocks => 4MB Doubly indirect block pointer – Doubly indirect block => 1K indirect blocks – 4GB (+ 4MB + 48KB) Triply indirect block pointer – Triply indirect block => 1K doubly indirect blocks – 4TB (+ 4GB + 4MB + 48KB)
FFS Asymmetric Tree Small files: shallow tree – Efficient storage for small files Large files: deep tree – Efficient lookup for random access in large files
Operating Systems: A Modern Perspective, Chapter 13 Disk Organization Blk 0 Blk 1 Blk k-1 Blk k Blk k+1 Blk 2k-1 Track 0, Cylinder 0 Track 0, Cylinder 1 Blk Track 1, Cylinder 0 Blk Track N-1, Cylinder 0 Blk Track N-1, Cylinder M-1 … … … … … … … … Boot SectorVolume Directory
Operating Systems: A Modern Perspective, Chapter 13 Low-level File System Architecture b 0 b 1 b 2 b 3 b n-1 …… Block 0... Sequential Device Randomly Accessed Device
Operating Systems: A Modern Perspective, Chapter 13 Block Management The job of selecting & assigning storage blocks to the file Three basic strategies: – Contiguous allocation – Linked lists – Indexed allocation
Operating Systems: A Modern Perspective, Chapter 13 Contiguous Allocation Maps the N blocks into N contiguous blocks on the secondary storage device Difficult to support dynamic file sizes Head position237 … First block785 Number of blocks25 File descriptor
Operating Systems: A Modern Perspective, Chapter 13 Linked Lists Each block contains a header with – Number of bytes in the block – Pointer to next block Blocks need not be contiguous Files can expand and contract Seeks can be slow First block … Head: Length Byte 0 Byte Length Byte 0 Byte Length Byte 0 Byte Block 0Block 1Block N-1
Operating Systems: A Modern Perspective, Chapter 13 Indexed Allocation Extract headers and put them in an index Simplify seeks May link indices together (for large files) Index block … Head: Byte 0 Byte Byte 0 Byte Byte 0 Byte Block 0 Block 1 Block N-1 Length
Operating Systems: A Modern Perspective, Chapter 13 DOS FAT Files Disk Block File Descriptor Disk Block Disk Block … File Access Table (FAT) Disk Block Disk Block Disk Block … File Descriptor
Operating Systems: A Modern Perspective, Chapter 13 UNIX Files Data mode owner … Direct block 0 Direct block 1 … Direct block 11 Single indirect Double indirect Triple indirect inode Data Index Data Index Data Index Data
Operating Systems: A Modern Perspective, Chapter 13 Unallocated Blocks How should unallocated blocks be managed? Need a data structure to keep track of them – Linked list Very large Hard to manage spatial locality – Block status map (“disk map”) Bit per block Easy to identify nearby free blocks Useful for disk recovery
FFS Locality Block group allocation – Block group is a set of nearby cylinders – Files in same directory located in same group – Subdirectories located in different block groups inode table spread throughout disk – inodes, bitmap near file blocks First fit allocation – Small files fragmented, large files contiguous
FFS First Fit Block Allocation
FFS Pros – Efficient storage for both small and large files – Locality for both small and large files – Locality for metadata and data Cons – Inefficient for tiny files (a 1 byte file requires both an inode and a data block) – Inefficient encoding when file is mostly contiguous on disk (no equivalent to superpages) – Need to reserve 10-20% of free space to prevent fragmentation
NTFS Master File Table – Flexible 1KB storage for metadata and data Extents – Block pointers cover runs of blocks – Similar approach in linux (ext4) – File create can provide hint as to size of file Journalling for reliability – Discussed next time
NTFS Small File
NTFS Medium File
NTFS Indirect Block
NTFS Multiple Indirect Blocks
Named Data in a File System
Operating Systems: A Modern Perspective, Chapter 13 Directories A set of logically associated files and sub directories File manager provides set of controls: – enumerate – copy – rename – delete – traverse – etc.
Operating Systems: A Modern Perspective, Chapter 13 Directory Implementation Device Directory – A device can contain a collection of files – Easier to manage if there is a root for every file on the device -- the device root directory File Directory – Typical implementations have directories implemented as a file with a special format – Entries in a file directory are handles for other files (which can be files or subdirectories)
Operating Systems: A Modern Perspective, Chapter 13 Directory Structures How should files be organized within directory? – Flat name space All files appear in a single directory – Hierarchical name space Directory contains files and subdirectories Each file/directory appears as an entry in exactly one other directory -- a tree Popular variant: All directories form a tree, but a file can have multiple parents.
Directories unixetchomeprodev twd passwd motdunix...slide1slide2
Directory Representation Component NameInode Number unix117 etc4 home18 pro36 dev93 directory entry.1..1
Directories
Directories can be files – Map file name to file number (MFT #, inode num) Table of file name -> file number – Small directories: linear search
Large Directories: B-Trees
Hard Links unixetchomeprodev twd imagemotdunix...slide1slide2 % ln /unix /etc/image # link system call
Directory Representation unix117 etc4 home18 pro36 dev image117 motd
Create hard links $ echo –n “It is good to collect things, “ > abc $ ln abc xyz $ echo “ but it is better to go on walks” >> xyz $ cat abc It is good to collect things, but it is better to on walks
Hard links If one is removed, the other name and the file itself continue to exist.
Soft (symbolic) links Differing from a hard link, a soft link (or symbolic link) is a special kind of file containing the name of another file. Created with the ln –s
Soft Links unixetchomeprodev twd imagetwd unix... slide1slide2 % ln –s /unix /home/twd/mylink % ln –s /home/twd /etc/twd # symlink system call mylink /unix /home/twd
Working Directory Maintained in kernel for each process – paths not starting from “/” start with the working directory – changed by use of the chdir system call – displayed (via shell) using “pwd” how is this done?
Operating Systems: A Modern Perspective, Chapter 13 UNIX mount Command / binusretcfoo billnutt abc / blah cdexyz FS / binusretcfoo billnutt abc / blah cdexyz mount FS at foo FS
Operating Systems: A Modern Perspective, Chapter 13 VFS-based File Manager File System Independent Part of File Manager File System Independent Part of File Manager Exports OS-specific API Virtual File System Switch MS-DOS Part of File Manager MS-DOS Part of File Manager ISO 9660 Part of File Manager ISO 9660 Part of File Manager ext2 Part of File Manager ext2 Part of File Manager …
Operating Systems: A Modern Perspective, Chapter File Management
Operating Systems: A Modern Perspective, Chapter 13 An open() Operation Locate the on-device (external) file descriptor Extract info needed to read/write file Authenticate that process can access the file Create an internal file descriptor in primary memory Create an entry in a “per process” open file status table Allocate resources, e.g., buffers, to support file usage
Operating Systems: A Modern Perspective, Chapter 13 File Manager Data Structures External File Descriptor Open File Descriptor Copy info from external to the open file descriptor 1 Process-File Session Keep the state of the process- file session 2 Return a reference to the data structure 3
Operating Systems: A Modern Perspective, Chapter 13 Opening a UNIX File fid = open(“fileA”, flags); … read(fid, buffer, len); 0 stdin 1 stdout 2 stderr 3... Open File Table File structure inode Internal File Descriptor On-Device File Descriptor
Operating Systems: A Modern Perspective, Chapter 13 File Descriptors External name Current state Sharable Owner User Locks Protection settings Length Time of creation Time of last modification Time of last access Reference count Storage device details
Operating Systems: A Modern Perspective, Chapter 13 Marshalling the Byte Stream Must read at least one buffer ahead on input Must write at least one buffer behind on output Seek flushing the current buffer and finding the correct one to load into memory Inserting/deleting bytes in the interior of the stream
Operating Systems: A Modern Perspective, Chapter 13 Full Block Buffering Storage devices use block I/O Files place an explicit order on the bytes Therefore, it is possible to predict what is likely to be read after byte i When file is opened, manager reads as many blocks ahead as feasible After a block is logically written, it is queued for writing behind, whenever the disk is available Buffer pool – usually variably sized, depending on virtual memory needs – Interaction with the device manager and memory manager
File-Descriptor Table n–1 File-descriptor table File descriptor User address space Kernel address space ref count access mode file location inode pointer
Allocation of File Descriptors Whenever a process requests a new file descriptor, the lowest numbered file descriptor not already associated with an open file is selected; thus #include close(0); fd = open("file", O_RDONLY); – will always associate file with file descriptor 0 (assuming that the open succeeds)
Redirecting Output … Twice if (fork() == 0) { /* set up file descriptors 1 and 2 in the child process */ close(1); close(2); if (open("/home/twd/Output", O_WRONLY) == -1) { exit(1); } if (open("/home/twd/Output", O_WRONLY) == -1) { exit(1); } execl("/home/twd/bin/program", "program", 0); exit(1); } /* parent continues here */
Redirected Output File-descriptor table File descriptor 1 User address space Kernel address space File descriptor 2 1 WRONLY 0 inode pointer 1 WRONLY 0 inode pointer
Redirected Output After Write File-descriptor table File descriptor 1 User address space Kernel address space File descriptor 2 1 WRONLY 100 inode pointer 1 WRONLY 0 inode pointer
Sharing Context Information if (fork() == 0) { /* set up file descriptors 1 and 2 in the child process */ close(1); close(2); if (open("/home/twd/Output", O_WRONLY) == -1) { exit(1); } dup(1); /* set up file descriptor 2 as a duplicate of 1 */ execl("/home/twd/bin/program", "program", 0); exit(1); } /* parent continues here */
Redirected Output After Dup File-descriptor table File descriptor 1 User address space Kernel address space File descriptor 2 2 WRONLY 100 inode pointer
Fork and File Descriptors int logfile = open("log", O_WRONLY); if (fork() == 0) { /* child process computes something, then does: */ write(logfile, LogEntry, strlen(LogEntry)); … exit(0); } /* parent process computes something, then does: */ write(logfile, LogEntry, strlen(LogEntry)); …
File Descriptors After Fork logfile Parent’s address space Kernel address space 2 WRONLY 0 inode pointer logfile Child’s address space
Naming (almost) everything has a path name – files – directories – devices (known as special files) keyboards displays disks etc.
Uniformity int file = open("/home/twd/data", O_RDWR); // opening a normal file int device = open("/dev/tty", O_RDWR); // opening a device (one’s terminal // or window) int bytes = read(file, buffer, sizeof(buffer)); write(device, buffer, bytes);