Download presentation
Presentation is loading. Please wait.
1
File SystemsCS-502 Fall 20061 File Systems CS-502 Operating Systems Fall 2006 (Slides include materials from Operating System Concepts, 7 th ed., by Silbershatz, Galvin, & Gagne and from Modern Operating Systems, 2 nd ed., by Tanenbaum)
2
File SystemsCS-502 Fall 20062 File (an abstraction) A (potentially) large amount of information or data that lives a (potentially) very long time Often much larger than the memory of the computer Often much longer than any computation Sometimes longer than life of machine itself (Usually) organized as a linear array of bytes or blocks Internal structure is imposed by application (Occasionally) blocks may be variable length (Often) requiring concurrent access by multiple processes Even by processes on different machines!
3
File SystemsCS-502 Fall 20063 File Systems and Disks User view –File is a named, persistent collection of data OS & file system view –File is collection of disk blocks –File System maps file names and offsets to disk blocks
4
File SystemsCS-502 Fall 20064 Fundamental ambiguity Is the file the “container of the information” or the “information” itself? Almost all systems confuse the two. Almost all people confuse the two.
5
File SystemsCS-502 Fall 20065 Example – Suppose that you e-mail me a document Later, how do either of us know that we are using the same version of the document? Windows/Outlook/Exchange/MacOS: Time-stamp is a pretty good indication that they are Time-stamps preserved on copy, drag and drop, transmission via e-mail, etc. Unix/Linux By default, time-stamps not preserved on copy, ftp, e-mail, etc. Time-stamp associated with container, not with information
6
File SystemsCS-502 Fall 20066 Rule of Thumb Almost always, application thinks in terms of the information Most systems think in terms of containers Professional Guidance: Be aware of the distinction, even when the system is not
7
File SystemsCS-502 Fall 20067 Attributes of Files Name: –Although the name is not always what you think it is! Type: –May be encoded in the name (e.g.,.cpp,.txt) Dates: –Creation, updated, last accessed, etc. –(Usually) associated with container –Better if associated with content Size: –Length in number of bytes; occasionally rounded up Protection: –Owner, group, etc. –Authority to read, update, extend, etc. Locks: –For managing concurrent access …
8
File SystemsCS-502 Fall 20068 File Metadata Information about a file –Maintained by the file system –Separate from file itself –Possibly attached to the file E.g., in block # –1 –Some information visible to user/application Dates, permissions, type, name, etc. –Some information primarily for OS Location on disk, locks, cached attributes
9
File SystemsCS-502 Fall 20069 Example – Location Example 1: mv ~lauer/project1.doc ~cs502/public_html/S06 Example 2: –System moves file from disk block 10,000 to disk block 20,000 –System restores a file from backup May or may not be reflected in metadata
10
File SystemsCS-502 Fall 200610 Anomaly – Unix File Naming ln ~lauer/project1.doc ~cs502/public_html/S06 ln ~lauer/project1.doc NewProject1.doc Unix hard links allow one file to have more than one name and/or location –The real name of a Unix file is its i-node – an internal name known only to the OS; points to metadata Hard links are not used very often in modern Unix practice –Exception: Copy-on-write of large directory trees! –Usually safe to regard last element of path as name
11
File SystemsCS-502 Fall 200611 File Types
12
File SystemsCS-502 Fall 200612 Operations on Files Open, Close Gain or relinquish access to a file OS returns a file handle – an internal data structure letting it cache internal information needed for efficient file access Read, Write, Truncate Read: return a sequence of n bytes from file Write: replace n bytes in file, and/or append to end Truncate: throw away all but the first n bytes of file Seek, Tell Seek: reposition file pointer for subsequent reads and writes Tell: get current file pointer Create, Delete: Conjure up a new file; or blow away an existing one
13
File SystemsCS-502 Fall 200613 File – a very powerful abstraction Documents, code Databases Very large, possibly spanning multiple disks Streams Input, output, keyboard, display Pipes, network connections, … Virtual memory backing store … Any time you need to remember something beyond the life of a particular process/computation
14
File SystemsCS-502 Fall 200614 File Access Methods Sequential access –Read all bytes/records from the beginning –Cannot jump around, could possibly rewind or back up –Convenient when medium was magnetic tape or punched cards Random access –Bytes/records read in any order –Essential for data base systems –Read can be … move file pointer (seek), then read or … read and then move file pointer Keyed or indexed access – access items in file based on the contents of an (part of an) item in the file –Provided in older commercial operating systems (IBM ISAM) –(Usually) handled separately by database applications in modern systems
15
File SystemsCS-502 Fall 200615 Questions?
16
File SystemsCS-502 Fall 200616 Directory – A Special Kind of File A tool for users & applications to organize and find files User-friendly names Names that are meaningful over long periods of time The data structure for OS to locate files (i.e., containers) on disk
17
File SystemsCS-502 Fall 200617 Directory structures Single level –One directory per system, one entry pointing to each file –Small, single-user or single-use systems PDA, cell phone, etc. Two-level –Single “master” directory per system –Each entry points to one single-level directory per user –Uncommon in modern operating systems Hierarchical –Any directory entry may point to Individual file Another directory –Common in most modern operating systems
18
File SystemsCS-502 Fall 200618 Directory Considerations Efficiency – locating a file quickly. Naming – convenient to users. Separate users can use same name for separate files. The same file can have different names for different users. Names need only be unique within a directory Grouping – logical grouping of files by properties e.g., all Java programs, all games, …
19
File SystemsCS-502 Fall 200619 Directory Organization – Hierarchical Most systems support idea of current (working) directory –Absolute names – fully qualified from root of file system /usr/group/foo.c –Relative names – specified with respect to working directory foo.c –A special name – the working directory itself “.” Modified Hierarchical – Acyclic Graph (no loops) and General Graph –Allow directories and files to have multiple names –Links are file names (directory entries) that point to existing (source) files
20
File SystemsCS-502 Fall 200620 Links Hard links: bi-directional relationship between file names and file –A hard link is directory entry that points to a source file’s metadata –Metadata maintains reference count of the number of hard links pointing to it – link reference count –Link reference count is decremented when a hard link is deleted –File data is deleted and space freed when the link reference count goes to zero Symbolic (soft) links: uni-directional relationship between a file name and the file –Directory entry contains text representing the absolute or relative path name of source file –If the source file is deleted, the link pointer is invalid
21
File SystemsCS-502 Fall 200621 Path Name Translation Assume that I want to open “/home/lauer/foo.c” fd = open(“/home/lauer/foo.c”, O_RDWR); File System does the following –Opens directory “/” – the root directory is in a known place on disk –Search root directory for the directory home and get its location –Open home and search for the directory lauer and get its location –Open lauer and search for the file foo.c and get its location –Open the file foo.c –Note that the process needs the appropriate permissions at every step File Systems spend a lot of time walking down directory paths –This is why open calls are separate from other file operations –File System attempts to cache prefix lookups to speed up common searches –Once open, file system caches the metadata of the file
22
File SystemsCS-502 Fall 200622 Directory Operations Create: Make a new directory Add, Delete entry: Invoked by file create & destroy, directory create & destroy Find, List: Search or enumerate directory entries Rename: Change name of an entry without changing anything else about it Link, Unlink: Add or remove entry pointing to another entry elsewhere Introduces possibility of loops in directory graph Destroy: Removes directory; must be empty
23
File SystemsCS-502 Fall 200623 More on Directories Fundamental mechanism for interpreting file names in an operating system Orphan: a file not named in any directory Cannot be opened by any application (or even OS) May not even have name! Tools FSCK – check & repair file system, find orphans Delete_on_close attribute (in metadata) Special directory entry: “..” parent in hierarchy Essential for maintaining integrity of directory system Useful for relative naming
24
File SystemsCS-502 Fall 200624 Break
25
File SystemsCS-502 Fall 200625 Implementation of Files Map file abstraction to physical disk blocks Some goals –Efficient in time, space, use of disk resources –Fast enough for application requirements –Scalable to a wide variety of file sizes Many small files (< 1 page) Huge files (100’s of gigabytes, terabytes, spanning disks) Everything in between
26
File SystemsCS-502 Fall 200626 File Allocation Schemes Contiguous –Blocks of file stored in consecutive disk sectors –Directory points to first entry Linked –Blocks of file scattered across disk, as linked list –Directory points to first entry Indexed –Separate index block contains pointers to file blocks –Directory points to index block
27
File SystemsCS-502 Fall 200627 Contiguous Allocation Ideal for large, static files –Databases, fixed system structures, OS code –Multi-media video and audio –CD-ROM, DVD Simple address calculation –Block address sector address Fast multi-block reads and writes –Minimize seeks between blocks Prone to fragmentation when … Files come and go Files change size –Similar to unpaged virtual memory
28
File SystemsCS-502 Fall 200628 Bad Block Management – Contiguous Allocation Bad blocks must be concealed Foul up the block-to-sector calculation Methods Spare sectors in each track, remapped by formatting Look-aside list of bad sectors –Check each sector request against hash table –If present, substitute a replacement sector behind the scene Handling Disk controller, invisible to OS Lower levels of OS file system; invisible to application
29
File SystemsCS-502 Fall 200629 Contiguous Allocation – Extents Extent: a contiguously allocated subset of a file Directory entry contains –(For file with one extent) the extent itself –(For file with multiple extents) pointer to an extent block describing multiple extents Advantages –Speed, ease of address calculation of contiguous file –Avoids (some of) the fragmentation issues –Can be extended to support files across multiple disks Disadvantages –Degenerates to indexed allocation in Unix-like systems Popular in 1960s & 70s, currently used for large files in NTFS
30
File SystemsCS-502 Fall 200630 Linked Allocation Blocks scattered across disk –Each block contains pointer to next block –Directory points to first and last blocks –Sector header: Pointer to next block ID and block number of file 10 16 01 25
31
File SystemsCS-502 Fall 200631 Linked Allocation Advantages –No space fragmentation! –Easy to create, extend files –Ideal for lots of small files Disadvantages –Lots of disk arm movement –Space taken up by links –Sequential access only!
32
File SystemsCS-502 Fall 200632 Variation on Linked Allocation – File Allocation Table Instead of link on each block, put all links in one table –the FAT (File Allocation Table) One entry per physical block in disk –Directory points to first & last blocks of file –Each block points to next block (or EOF)
33
File SystemsCS-502 Fall 200633 FAT File Systems Advantages –Advantages of Linked File System –FAT can be cached in memory –Searchable at CPU speeds, pseudo-random access Disadvantages –Limited size, not suitable for very large disks –FAT cache describes entire disk, not just open files! –Not fast enough for large databases Used in MS-DOS, early Windows systems
34
File SystemsCS-502 Fall 200634 Disk Defragmentation Re-organize blocks in disk so that file is (mostly) contiguous Link or FAT organization preserved Purpose: –To reduce disk arm movement during sequential accesses
35
File SystemsCS-502 Fall 200635 Bad Block Management – Linked and FAT Systems In OS:– format all sectors of disk Don’t reserve any spare sectors Allocate bad blocks to a hidden file for the purpose If a block becomes bad, append to the hidden file Advantages Very simple No look-aside or sector remapping needed Totally transparent without any hidden mechanism
36
File SystemsCS-502 Fall 200636 Indexed Allocation i-node: –Part of file metadata –Data structure lists the sector address of each block of file Advantages –True random access –Only i-nodes of open files need to be cached –Supports small and large files
37
File SystemsCS-502 Fall 200637 Unix/Linux i-nodes Direct blocks: –Pointers to first n sectors Single indirect table: –Extra block containing pointers to blocks n+1.. n+m Double indirect table: –Extra block containing single indirect blocks …
38
File SystemsCS-502 Fall 200638 Indexed Allocation Access to every block of file is via i-node Bad block management –Similar to Linked/FAT systems Disadvantage –Not as fast as contiguous allocation for large databases Requires reference to i-node for every access vs. Simple calculation of block to sector address
39
File SystemsCS-502 Fall 200639 Questions?
40
File SystemsCS-502 Fall 200640 Scalability of File Systems Question: How large can a file be? Answer: limited by –Number of bits in length field in metadata –Size & number of block entries in FAT or i-node Question: How large can file system be? Answer: limited by –Size & number of block entries in FAT or i-node
41
File SystemsCS-502 Fall 200641 MS-DOS & Windows FAT-12 (primarily on floppy disks): 4096 512-byte blocks Only 4086 blocks usable! FAT-16 (early hard drives): 64 K blocks; block sizes up to 32 K bytes 2 GBytes max per partition, 4 partitions per disk FAT-32 (Windows 95) 2 28 blocks; up to 2 TBytes per disk Max size FAT requires 2 32 bytes in RAM!
42
File SystemsCS-502 Fall 200642 MS-DOS File System (continued) Maximum partition for different block sizes The empty boxes represent forbidden combinations
43
File SystemsCS-502 Fall 200643 Classical Unix Maximum number of i-nodes = 64K! How many files in a modern PC? I-node structure allows very large files, but … Limited by size of internal fields
44
File SystemsCS-502 Fall 200644 Modern Operating Systems Need much larger, more flexible file systems Many terabytes per system Multi-terabyte files Suitable for both large and small Cache only open files in RAM
45
File SystemsCS-502 Fall 200645 Examples of Modern File Systems Windows NTFS Silbershatz §22.5 Linux ext2fs Silbershatz §21.7.2 SUSE Administrator’s Manual, §20.2.2 Linux “Reiser” file system SUSE Administrator’s Manual, §20.2.1 & 20.2.5 Max file = 2 46 bytes; max file system = 2 45 bytes
46
File SystemsCS-502 Fall 200646 New Topic
47
File SystemsCS-502 Fall 200647 Implementation of Directories A list of [name, information] pairs Must be scalable from very few entries to very many Name: User-friendly, variable length Any language Fast access by name Information: File metadata (itself) Pointer to file metadata block (or i-node) on disk Pointer to first & last blocks of file Pointer to extent block(s) …
48
File SystemsCS-502 Fall 200648 Very Simple Directory Short, fixed length names Attribute & disk addresses contained in directory MS-DOS, etc. name1attributesname2attributesname3attributesname4attributes …
49
File SystemsCS-502 Fall 200649 Simple Directory Short, fixed length names Attributes in separate blocks (e.g., i-nodes) Attribute pointers are disk addresses (or i-node numbers) Older Unix versions name1name2name3name4… i-node Data structures containing attributes
50
File SystemsCS-502 Fall 200650 More Interesting Directory Variable length file names –Stored in heap at end Modern Unix, Windows Linear or logarithmic search for name Compaction needed after –Deletion, Rename attributes … name1 longer_na me3 very_long_n ame4 name2 …
51
File SystemsCS-502 Fall 200651 Very Large Directories Hash-table implementation Each hash chain like a small directory with variable-length names Must be sorted for listing
52
File SystemsCS-502 Fall 200652 Free Block Management Bitmap –Very compact on disk –Expensive to search –Supports contiguous allocation Free list –Linked list of free blocks Each block contains pointer to next free block –Only head of list needs to be cached in memory –Very fast to search and allocate –Contiguous allocation vary difficult
53
File SystemsCS-502 Fall 200653 Free Block Management Bit Vector … 012n-1 bit[i] = 0 block[i] free 1 block[i] occupied Free block number calculation (number of bits per word) * (number of 0-value words) + offset of first 1 bit
54
File SystemsCS-502 Fall 200654 Free Block Management Bit Vector (continued) Bit map –Must be kept both in memory and on disk –Copy in memory and disk may differ –Cannot allow for block[i] to have a situation where bit[i] = 1 in memory and bit[i] = 0 on disk Solution: –Set bit[i] = 1 in disk –Allocate block[i] –Set bit[i] = 1 in memory –Similarly for set of contiguous blocks Potential for lost blocks in event of crash! –Discussion:– How do we solve this problem?
55
File SystemsCS-502 Fall 200655 Free Block Management Linked List Linked list of free blocks Not in order! Cache first few free blocks in memory Head of list must be stored both On disk & in memory Each block written to disk when freed Potential for losing blocks?
56
File SystemsCS-502 Fall 200656 Reading Assignment in Silbershatz File systems (general) – Chapter 11 Ignore §11.9, 11.10 for now!
57
File SystemsCS-502 Fall 200657 Break Next Topic
58
File SystemsCS-502 Fall 200658 Raw Disk Layout Track format – n sectors –200 < n < 2000 in modern disks –Some disks have fewer sectors on inner tracks Inter-sector gap –Enables each sector to be read or written independently Sector format –Sector address: Cylinder, Track, Sector –Optional header –Data –Each field separated by small gap and with its own CRC Sector length –Almost all operating systems specify uniform sector length –512 – 4096 bytes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.