More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006

More on Disks and File Systems 2 CS502 Spring 2006 Review – Disks Implementer of File abstraction Storage of large amounts of data for very long times Persistence, reliability Controlled like I/O devices, but integral part of information storage subsystem Rapidly increasing capacities, dropping prices $0.5–$6.0 per gigabyte Slowly improving transfer rates, seek performance Only a factor of 5-10 in three decades!

More on Disks and File Systems 3 CS502 Spring 2006 Review – Disks (continued) Organized into cylinders, tracks, sectors Random access Any sector can be read & written independently of any other Very high bandwidth for consecutive reads or writes Seek time is (often) dominating factor in performance Bad blocks are a fact of life Most detected during formatting Some occur during operation Controller or OS must step around them Seek optimization algorithms Popular study topics, less popular in real systems Long seek queues  system is out of balance

More on Disks and File Systems 4 CS502 Spring 2006 Review – File Systems Fundamental abstraction for persistent storage Usually organized as linear array of bytes Any sequence of bytes may be read or overwritten Extreme performance demands Many small files vs. a few humongous files Fundamental ambiguity Is file the “information” or the “container” OS sees the container; users focus on information Many attributes Stored in file metadata associated with file

More on Disks and File Systems 5 CS502 Spring 2006 Review – File Systems (continued) Operations Open, Close; Read, Write, Truncate; Seek, Tell Create; Destroy Access methods Sequential Random Indexed (not used very much any more) Structure imposed by applications Databases, libraries, executable images

More on Disks and File Systems 6 CS502 Spring 2006 Review – Directories Special kind of file Tool for users to organize files Tool for system to find file containers Organization Single level, two level, hierarchical Directory operations Create, Destroy; Add entry, Remove entry Find, List, Rename; Link, Unlink Links Soft (symbolic) links in Unix, Windows Hard links in Unix (reference counted in metadata)

More on Disks and File Systems 7 CS502 Spring 2006 Review – File System Implementation Contiguous (with optional extents) Very efficient for large files (e.g., databases) Prone to space fragmentation for many small files Bad blocks must be concealed by OS or controller Linked No space fragmentation; lots of seek fragmentation Sequential access only FAT (File Allocation Table)  pseudo-random Indexed i-node (index block) points to every block of file Fast random access Scales easily from small to large No space fragmentation; lots of seek fragmentation Defragmentation Remapping linked, FAT, or indexed files to minimize seek time

More on Disks and File Systems 8 CS502 Spring 2006 Additional Topics Implementation of Directories CD-ROM devices and file systems RAID – Redundant Array of Inexpensive Disks Stable Storage Log Structured File Systems

More on Disks and File Systems 9 CS502 Spring 2006 Implementation of Directories A list of [name, information] pairs Must be scalable from very few entries to very many Name: User-friendly, variable length, any language Fast access by name Information: File metadata (itself) Pointer to file metadata block (or i-node) on disk Pointer to first & last blocks of file Pointer to extent block(s) …

More on Disks and File Systems 10 CS502 Spring 2006 Very Simple Directory Short, fixed length names Attribute & disk addresses contained in directory MS-DOS, etc. name1attributesname2attributesname3attributesname4attributes …

More on Disks and File Systems 11 CS502 Spring 2006 Simple Directory Short, fixed length names Attributes in separate blocks (e.g., i-nodes) Attribute pointers are disk addresses (or i-node numbers) Older Unix versions name1name2name3name4… i-node Data structures containing attributes

More on Disks and File Systems 12 CS502 Spring 2006 More Interesting Directory Variable length file names –Stored in heap at end Modern Unix, Windows Linear or logarithmic search for name Compaction needed after –Deletion, Rename attributes … name1 longer_na me3 very_long_n ame4 name2 …

More on Disks and File Systems 13 CS502 Spring 2006 Very Large Directories Hash-table implementation Each hash chain like a small directory with variable-length names Must be sorted for listing

More on Disks and File Systems 14 CS502 Spring 2006 File System Implementation – Free Space Management Bitmap –Very compact on disk –Expensive to search Free list –Linked list of free blocks –Only head of list needs to be cached in memory –Larger than bitmap:– Consumes 1/n of free space List grows and shrinks inversely with allocating or freeing blocks –Very fast to search and allocate

More on Disks and File Systems 15 CS502 Spring 2006 CD-ROMs See Tanenbaum, pp. 306-310 Audio CD –Molded polycarbonate –120 mm diameter with 15 mm hole –One single spiral track Starts in center, spirals outward 22,188 revolutions, approx 5.6 kilometers long –Constant linear velocity under read head Audio playback:– 120 cm/sec Variable speed motor:– 200 – 530 rpm ISO standard IS 10149, aka the Red Book

More on Disks and File Systems 16 CS502 Spring 2006 CD-ROM (continued) Problem for adapting to data usage –No bad block recovery capability! ISO standard for data: Yellow Book –Three levels of error-correcting schemes: – Symbol, Frame, Sector ~7200 bytes to record 2048 byte payload per sector –Mode 2: less error correction in exchange for more data rate Audio and video data –Sectors linearly numbered from center to edge Read speed –1x ~ 153,000 bytes/sec –40x ~ 5.9 megabytes/sec ISO standard for multi-media: Green Book –Interleaved audio, video, data in same sector

More on Disks and File Systems 17 CS502 Spring 2006 CD-ROM File System ISO 9660 — High Sierra Write once  contiguous file allocation Variable length directories Variable length directory entries Points to first sector of file File size and metadata stored in directory entry Variable length names Several extensions to standard for additional features

More on Disks and File Systems 18 CS502 Spring 2006 Break

More on Disks and File Systems 19 CS502 Spring 2006 Problem Question:– –If mean time to failure of a disk drive is 100,000 hours, –and if your system has 100 identical disks, –what is mean time between drive replacement? Answer:– –1000 hours (i.e., 41.67 days  6 weeks) I.e.:– –You lose 1% of your data every 6 weeks! But don’t worry – you can restore most of it from backup!

More on Disks and File Systems 20 CS502 Spring 2006 Can we do better? Yes, mirrored –Write every block twice, on two separate disks –Mean time between simultaneous failure of both disks is 57,000 years Can we do even better? –E.g., use fewer extra disks? –E.g., get more performance?

More on Disks and File Systems 21 CS502 Spring 2006 RAID – Redundant Array of Inexpensive Disks Distribute a file system intelligently across multiple disks to –Maintain high reliability and availability –Enable fast recovery from failure –Increase performance

More on Disks and File Systems 22 CS502 Spring 2006 “Levels” of RAID Level 0 – non-redundant striping of blocks across disk Level 1 – simple mirroring Level 2 – striping of bytes or bits with ECC Level 3 – Level 2 with parity, not ECC Level 4 – Level 0 with parity block Level 5 – Level 4 with distributed parity blocks

More on Disks and File Systems 23 CS502 Spring 2006 RAID Level 0 – Simple Striping Each stripe is one or a group of contiguous blocks Block/group i is on disk (i mod n) Advantage –Read/write n blocks in parallel; n times bandwidth Disadvantage –No redundancy at all. System MBTF is 1/n disk MBTF! stripe 8 stripe 4 stripe 0 stripe 9 stripe 5 stripe 1 stripe 10 stripe 6 stripe 2 stripe 11 stripe 7 stripe 3

More on Disks and File Systems 24 CS502 Spring 2006 RAID Level 1– Striping and Mirroring Each stripe is written twice Two separate, identical disks Block/group i is on disks (i mod 2n) & (i+n mod 2n) Advantages –Read/write n blocks in parallel; n times bandwidth –Redundancy: System MBTF = (Disk MBTF) 2 at twice the cost –Failed disk can be replaced by copying Disadvantage –A lot of extra disks for much more reliability than we need stripe 8 stripe 4 stripe 0 stripe 9 stripe 5 stripe 1 stripe 10 stripe 6 stripe 2 stripe 11 stripe 7 stripe 3 stripe 8 stripe 4 stripe 0 stripe 9 stripe 5 stripe 1 stripe 10 stripe 6 stripe 2 stripe 11 stripe 7 stripe 3

More on Disks and File Systems 25 CS502 Spring 2006 RAID Levels 2 & 3 Bit- or byte-level striping Requires synchronized disks Highly impractical Requires fancy electronics For ECC calculations Not used; academic interest only See Silbershatz, §12.7.3 (pp. 471-472)

More on Disks and File Systems 26 CS502 Spring 2006 Observation When a disk or stripe is read incorrectly, we know which one failed! Conclusion: –A simple parity disk can provide very high reliability (unlike simple parity in memory)

More on Disks and File Systems 27 CS502 Spring 2006 RAID Level 4 – Parity Disk parity 0-3 = stripe 0 xor stripe 1 xor stripe 2 xor stripe 3 n stripes plus parity are written/read in parallel If any disk/stripe fails, it can be reconstructed from others –E.g., stripe 1 = stripe 0 xor stripe 2 xor stripe 3 xor parity 0-3 Advantages –n times read/write bandwidth –System MBTF = (Disk MBTF) 2 at 1/n additional cost –Failed disk can be reconstructed “on-the-fly” (hot swap) –Hot expansion: simply add n + 1 disks all initialized to zeros stripe 8 stripe 4 stripe 0 stripe 9 stripe 5 stripe 1 stripe 10 stripe 6 stripe 2 stripe 11 stripe 7 stripe 3 parity 8-11 parity 4-7 parity 0-3

More on Disks and File Systems 28 CS502 Spring 2006 RAID Level 5 – Distributed Parity Parity calculation is same as RAID Level 4 Advantages & Disadvantages –Same as RAID Level 4 Additional advantage: avoids beating up on parity disk Writing individual stripes (RAID 4 & 5) –Read existing stripe and existing parity –Recompute parity –Write new stripe and new parity stripe 12 stripe 8 stripe 4 stripe 0 parity 12-15 stripe 9 stripe 5 stripe 1 stripe 13 parity 8-11 stripe 6 stripe 2 stripe 14 stripe 10 parity 4-7 stripe 3 stripe 15 stripe 11 stripe 7 parity 0-3

More on Disks and File Systems 29 CS502 Spring 2006 New Topic Problem – how to protect against disk write operations that don’t complete –Power or CPU failure in the middle of a block –Related series of writes interrupted in middle Examples: –Database update of charge and credit –RAID 1, 4, 5 failure between redundant writes

More on Disks and File Systems 30 CS502 Spring 2006 Solution (part 1) – Stable Storage Write everything twice (separate disks) Be sure 1 st write does not invalidate previous 2 nd copy RAID 1 is okay; RAID 4/5 not okay! Read blocks back to validate; then report completion Reading both copies If 1 st copy okay, use it – i.e., newest value If 2 nd copy different, update it with 1 st copy If 1 st copy error; use 2 nd copy – i.e., old value

More on Disks and File Systems 31 CS502 Spring 2006 Stable Storage (continued) Crash recovery Scan disks, compare corresponding blocks If one is bad, replace with good one If both good but different, replace 2 nd with 1 st copy Result:– If 1 st block is good, it contains latest value If not, 2 nd block still contains previous value An abstraction of an atomic disk write of a single block Uninterruptible by power failure, etc.

More on Disks and File Systems 32 CS502 Spring 2006 What about more complex disk operations? E.g., File create operation involves Allocating free blocks Constructing and writing i-node –Possibly multiple i-node blocks Reading and updating directory What if system crashes with the sequence only partly completed? Answer: inconsistent data structures on disk

More on Disks and File Systems 33 CS502 Spring 2006 Solution (Part 2) – Log-Structured File System Make changes to cached copies in memory Collect together all changed blocks Write to log file A circular buffer on disk Fast, contiguous write Update log file pointer in stable storage Offline: Play back log file to actually update directories, i-nodes, free list, etc. Update playback pointer in stable storage

More on Disks and File Systems 34 CS502 Spring 2006 Transaction Data Base Systems Similar techniques –Every transaction is recorded in log before recording on disk –Stable storage techniques for managing log pointers –One log exist is confirmed, disk can be updated in place –After crash, replay log to redo disk operations

More on Disks and File Systems 35 CS502 Spring 2006 Unix LFS Tanenbaum, §6.3.8, pp. 428-430 Everything is written to log i-nodes point to updated blocks in log i-node cache in memory updated whenever i-node is written Cleaner daemon follows behind to compact log Advantages: –LFS is always consistent –LFS performance Much better than Unix FS for small writes At least as good for reads and large writes

More on Disks and File Systems 36 CS502 Spring 2006 Break

More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

Similar presentations

Presentation on theme: "More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006.

Similar presentations

Presentation on theme: "More on Disks and File Systems 1 CS502 Spring 2006 More on Disks and File Systems CS-502 Operating Systems Spring 2006."— Presentation transcript:

Similar presentations

About project

Feedback