Download presentation
Presentation is loading. Please wait.
Published byAndrew Marcus Tucker Modified over 9 years ago
1
1 CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY Filesystems – Case studies Topics: FAT UNIX V7
2
2 FAT (MS-DOS) Filesystems MS-DOS, Windows 98, Windows ME Still supported in Windows NT, XP, Vista Its use has been shifted towards embedded devices such as Digital cameras MP3 playes iPod (the default filesystem)
3
3 Directories Filenames are limited to 8+3 characters.. Smaller ones are left justified and padded with space. Attributes: read-only, archive, hidden, system Time represented with 2 bytes: correct upto +-2 second Date: Counts in three fields: day (5 bit), month (4 bits), year (7 bits). Contains: Y2108 problem File size: 2 bytes. Theoretically 4GB limit, but the limit is 2GB due to other reasons. 10 bits reserved for future use.
4
4 File Allocation Table MS-DOS keeps track of files through FAT table hold in memory. First block number (2 bytes) used as the index to the FAT table which has 64K entries. Block size can be set as multiple of 512 byes. Three versions of FAT depending on the number of bits a disk address contains: FAT-12 FAT-16 FAT-32 (actually should be called as FAT-28) FAT is also used for keeping track of free blocks.
5
5 FAT-12 Block size: 512 bytes (2^12-10) X 512 bytes =~ 2MB 10 disk addresses are used as special markers. FAT table size: 4096 entries with 2 bytes each. Worked well for floppy disks. The limit was extended using larger block sizes: 1KB, 2KB, 4KB providing support for partitions 16MB. Limited for hard disks.
6
6 Figure 4-32. Maximum partition size for different block sizes. The empty boxes represent forbidden combinations. The MS-DOS File System (2) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
7
7 A UNIX V7 directory entry. The UNIX V7 File System (1) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
8
8 A UNIX i-node. The UNIX V7 File System (2) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
9
9 Figure 4-35. The steps in looking up /usr/ast/mbox. The UNIX V7 File System (3) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
10
10
11
11
12
12
13
13
14
14
15
15 The ISO 9660 directory entry. The ISO 9660 File System Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
16
16 Rock Ridge extension fields: PX - POSIX attributes. PN - Major and minor device numbers. SL - Symbolic link. NM - Alternative name. CL - Child location. PL - Parent location. RE - Relocation. TF - Time stamps. Rock Ridge Extensions Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
17
17 Joliet extension fields: Long file names. Unicode character set. Directory nesting deeper than eight levels. Directory names with extensions Joliet Extensions Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
18
18 File System Mounting A file system must be mounted before it can be accessed A unmounted file system is mounted at a mount point
19
19 Virtual File Systems VFS: another layer of abstraction Upper interface: for processes implementing POSIX interface Lower interface: for concrete file systems VFS translates the POSIX calls to the calls of the filesystems under it.
20
20 VFS- 2 At boot time, the root filesystem is registered with VFS. When other filesystems are mounted, they must also register with VFS. When a filesystem registers, it provides the list of addresses of the functions that the VFS demands, such as reading a block. After registration, when one opens a file: open(“/usr/include/unistd.h”, O_RDONLY) VFS creates a v-node and makes a call to the concrete filesystem to return all the information needed. The created v-node also contains pointers to the table of functions for the concrete filesystem that the file resides.
21
21 A simplified view of the data structures and code used by the VFS and concrete file system to do a read. Virtual File Systems (2)
22
22 CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY Filesystem Corruption and Backups
23
23 Filesystem corruption What happens when you are making changes to a filesystem and the system crashes? Example: Modifying block 5 of a large directory, adding lots of new file entries System crashes while the block is being written The new files are “lost!” System runs fsck program on reboot Scans through the entire filesystem and locates corrupted inodes and directories Can typically find the bad directory, but may not be able to repair it! The directory could have been left in any state during the write fsck can take a very long time on large filesystems And, no guarantees that it fixes the problems anyway
24
24 Example: Example: removing a file requires Remove the file from its directory Release the i-node to the pool of free i-nodes Return all the disk blocks to the pool of free disk blocks In the absence of crashes the order these steps taken do not matter. In the presence of crashes, however, it does!
25
25 Example: Example: removing a file requires Remove the file from its directory Release the i-node to the pool of free i-nodes Return all the disk blocks to the pool of free disk blocks The inodes and file blocks will not be accessible from any file yet they will not be available for reassignment.
26
26 Example: Example: removing a file requires Remove the file from its directory Release the i-node to the pool of free i-nodes Return all the disk blocks to the pool of free disk blocks The directory node will point to an invalid inode or (if the inode is reassigned) point to a different file. The blocks of the file will not be available for reassignment.
27
27 Example: Example: removing a file requires Remove the file from its directory Release the i-node to the pool of free i-nodes Return all the disk blocks to the pool of free disk blocks The file will point to empty blocks, or (after reassignment) it will share the blocks of other files to which these were reassigned..
28
28 File system consistency Consistency check is typically done after a crash.. UNIX: fsck Windoze: scandisk Redundant information in the filesystem is used: Check the blocks Check the files Two tables, each containing a counter initialized to 0 Blocks in use: How many times a block is present in a file Read all the i-nodes using a raw device (not through the filesystem calls) For each block that is referenced in the inode structure, increment the corresponding block use counter by one. Free blocks: Examine the free block list of free block bitmap structure Each appearance of a free block increments the counter by one
29
29 File system states. (a) Consistent. (b) Missing block. (c) Duplicate block in free list. (d) Duplicate data block. File System Consistency Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
30
30 Harmless but wastes space Action: Add the missing block to the free list. Missing block Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
31
31 Duplicate block in free list Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639 Can only occur in linked list representation. Bitmap representation does not have this problem. Action: Rebuild the free list.
32
32 Duplicate data block Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639 The worst thing that can happen! A data block appears in two different files.. Action: Allocate a free block Copy the contents into the new block. Change the links such that each copy appears once in each file. For sure, the contents of one of the files is garbled. The filesystem is made to be consistent. The user is informed.
33
33 Consistency check for directories Uses a table of counters per file (rather than per block) Starts from the root and traverses the tree For each i-node, it increments the corresponding counter for that file Remember due to hard links, a file can appear more than once It then checks the link counts stored in the i-nodes to these values. If the link count > counter Even if the file is deleted by from all the directory entries, it will continue to exist. Solution: correct the link count If the counter > link count Although the file is linked from, say, two directories, removal from one would cause the i-node deleted leaving the other one invalid. Solution: correct the link count
34
34 Journaling Filesystems Ensure that changes to the filesystem are made atomically That is, a group of changes are made all together, or not at all In the directory modification example, this means that after the system reboots: The directory either looks exactly as it did before the block was modified Or the directory looks exactly as it did after the block was modified Cannot leave an FS entity (data block, inode, directory, etc.) in an intermediate state! Idea: Maintain a log of all changes to the filesystem Log contains entries that indicate what was done e.g., “Directory 2841 had inodes 404, 407, and 408 added to it” To make a filesystem change: 1. Write an intent-to-commit record to the log 2. Write the appropriate changes to the log Do not modify the filesystem data directly!!! 3. Write a commit record to the log This is essentially the same as the notion of database transactions
35
35 Journaling FS Recovery What happens when the system crashes? Filesystem data has not actually been modified, just the log! So, the FS itself reflects only what happened before the crash Periodically synchronize the log with the filesystem data Called a checkpoint Ensures that the FS data reflects all of the changes in the log No need to scan the entire filesystem after a crash... Only need to look at the log entries since the last checkpoint! For each log entry, see if the commit record is there If not, consider the changes incomplete, and don't try to make them
36
36 Journaling FS Example File 2File 1 Log Checkpoint
37
37 Journaling FS Example File 2File 1 Log Checkpoint
38
38 Journaling FS Example File 2File 1 Log Checkpoint Filesystem reflects changes up to last checkpoint Fsck scans changelog from last checkpoint forward Doesn't find a commit record... changes are simply ignored
39
39 Backups are generally made to handle one of two potential problems: Recover from disaster. Recover from stupidity. Thrash bins File System Backups (1) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
40
40 In order to have storage efficiency, one can choose not to backup: Executable files (since they can usually be restored from manufacturer CD-ROMs Temporary files /tmp Special files (which correspond to I/O devices) /dev Backup what? Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
41
41 Back-up issues Full backup backup all the filesystem providing a snapshot of the system at that point which can be fully restored Typically done weekly or monthly Incremental backup Backup only the files that were changed after the most recent backup Typically, a weekly backup is followed by daily incremental backups Smaller backup size, and faster Compressed backups Reduced storage Less secure: a single bad byte can screw the whole backup Backing up and active filesystem is tricky Since during backup, files and directories are being modified Makes the system less secure Each backup tape/disk needs to be as safe as the serve itself.. It doesn’t matter if your backup tapes are lying around, even if you have the most secure computer system..
42
42 Dumping strategies – physical dump Algorithm Starts at block 0 of the disk and copies all the data onto a tape/disk Pros Simple to implement in a bug-free way Fast Cons Backups the free blocks as well Backuping a free disk takes as much storage/time as a full disk Dumping of “bad blocks” is a concern Typically disk controllers provide bad block replacement transparently without the OS even knowing about it If a block goes bad after formatting, then the OS typically creates a “file” consisting of all the bad blocks.
43
43 Dumping strategies – logical dump Pros Allows to restore a single file. If directories that lie on the path to the file-to-be-restored were deleted, then they would also be restored. Cons Slow and complicated.
44
44 Dumping strategies – logical dump Algorithm Full backup: Traverses the filesystem as a tree and creates the same filesystem structure on the backup disk/tape. Partial backup: all the directories on the path to the particular file needs to be saved. For instance, backing up file 9 requires the saving of directories, 1,5,6, and 7. Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number.
45
45 Dumping strategies – incremental logical dump Bitmaps, indexed by i-node number are used. Phase 1: Examine all files and directories below the starting directory (root in this case), and mark the modified files. mark ALL THE DIRECTORIES Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number.
46
46 Dumping strategies – incremental logical dump Phase 2: Recursively walk the tree again, and UNMARK all the directories that do not have any modified files under them. Note: 10 and 11 are unmarked 5 and 6 remain marked Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number.
47
47 Dumping strategies – incremental logical dump Phase 3: Scan the i-nodes in numerical order and Dump all the marked directories Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number.
48
48 Dumping strategies – incremental logical dump Phase 4: dump the marked files. Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number.
49
49 Dumping strategies – incremental logical dump Restoring: Restore all the directories that were backupped Restore all the files Squares are directories, circles are files. Shaded items have been modified since last dump. Each directory and file is labeled by its i-node number.
50
50 Issues Links: If a file is linked to more than one directory, only one copy should be saved. Holes: In UNIX, some file, such as core files may contain holes. These files, write a few bytes, and then seek to a distant file offset and write some more bytes. These empty blocks that are not written should not be dumped and stored. Cores typically have may megabytes of empty blocks. Special files Such as named pipes (which can appear anywhere in the filesystem) should not be dumped.
51
51 CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY Filesystem Caching Topics: Disks
52
52 File System Caching Most filesystems cache significant amounts of disk in memory e.g., Linux tries to use all “free” physical memory as a giant cache Avoids huge overhead for going to disk for every I/O Issues: When do you commit a write to the disk? What happens if you write only to the memory cache and then the system crashes? How do you keep the memory and disk views of a file consistent? What if the file metadata (inodes, etc.) is modified before the data blocks? Read-ahead Read a few extra blocks into memory when you do one read operation Amortize the cost of the seek Useful if the blocks of a file are laid out in contiguous blocks Take advantage of sequential access patterns on the file
53
53 Caching Reading a 32-bit word from memory takes 10 nsec. Hard disks can transfer data at: 100MB/sec, that is 40nsec per 32-bit words.. PLUS 5-10 msecs of seek time! Caching aims to fill in the gap.. Often thousands of blocks are kept in cache.
54
54 Hash the device and the disk address and look up the result in a hash table. All the blocks with the same cache value are chained together through a linked list. In addition to this, a bidirectional link runs through all the blocks implementing a LRU list. Caching (1) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
55
55 Some blocks, such as i-node blocks, are rarely referenced two times within a short interval. Consider a modified LRU scheme, taking two factors into account: Is the block likely to be needed again soon? Is the block essential to the consistency of the file system? For both questions, blocks can be divided into categories i-node blocks Indirect blocks Directory blocks Data blocks Full Partially full Blocks that are essential for filesystem consistency should be written to disk immediately – write-through-caches UNIX: synch (every 30 seconds) Windows: In the past: none, recent ones: FlushFileBuffers Caching (2) Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
56
56 Block read ahead The system tries to get blocks into the cache, before they are accessed to increase the hit rate. Sequential access: performance improvement Random access: performance degradation
57
57 Figure 4-29. (a) I-nodes placed at the start of the disk. (b) Disk divided into cylinder groups, each with its own blocks and i-nodes. Reducing Disk Arm Motion Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.