Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO and RAID CS : Computer Architecture
Computer ArchitectureFall 2008 © Interfacing Processor with peripherals main memory I/O bridge bus interface Front side bus, aka system bus memory bus L2 Cache L1 cache data L1 cache Instrs. To I/O Processor
Computer ArchitectureFall 2008 © Another view
Computer ArchitectureFall 2008 © Disk Access Seek: position head over the proper track (5 to 15 ms. avg.) Rotate: wait for desired sector (.5 / RPM). RPM 5400— 15,000 currently Transfer: get the data (30-100Mbytes/sec)
Computer ArchitectureFall 2008 © Manufacturing Advantages of Disk Arrays 14”10”5.25”3.5” Disk Array: 1 disk design Conventional: 4 disk designs Low End High End Disk Product Families
Computer ArchitectureFall 2008 © RAID: Redundant Array of Inexpensive Disks RAID 0: Striping (misnomer: non-redundant) RAID 1: Mirroring RAID 2: Striping + Error Correction RAID 3: Bit striping + Parity Disk RAID 4: Block striping + Parity Disk RAID 5: Block striping + Distributed Parity RAID 6: multiple parity checks
Computer ArchitectureFall 2008 © Non-Redundant Array Striped: write sequential blocks across disk array High performance Poor reliability: MTTF Array = MTTF Disk / N MTTF Disk = 50,000 hours (6 years) N = 70 Disks MTTF Array = 700 hours (1 month) Odd Blocks Even Blocks
Computer ArchitectureFall 2008 © Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability When disks fail, contents are reconstructed from data redundantly stored in the array High reliability comes at a cost: –Reduced storage capacity –Lower performance
Computer ArchitectureFall 2008 © RAID 1: Mirroring Each disk is fully duplicated onto its “shadow” very high availability Bandwidth sacrifice on writes: Logical write = two physical writes Reads may be optimized Most expensive solution: 100% capacity overhead Used in high I/O rate, high availability environments
Computer ArchitectureFall 2008 © RAID 3: bit striping + parity A parity bit for every bit in the striped data Parity is relatively easy to compute How does it perform for small reads/writes?
Computer ArchitectureFall 2008 © Redundant Arrays of Disks RAID 3: Parity Disk P logical record Striped physical records Parity computed across recovery group to protect against hard disk failures 33% capacity cost for parity in this configuration wider arrays reduce capacity costs, decrease expected availability, increase reconstruction time Arms logically synchronized, spindles rotationally synchronized logically a single high capacity, high transfer rate disk Targeted for high bandwidth applications: Scientific, Image Processing
Computer ArchitectureFall 2008 © RAID 4 (Block interleaved parity)
Computer ArchitectureFall 2008 © Redundant Arrays of Disks RAID 5+: High I/O Rate Parity A logical write becomes four physical I/Os Independent writes possible because of interleaved parity Reed-Solomon Codes ("Q") for protection during reconstruction A logical write becomes four physical I/Os Independent writes possible because of interleaved parity Reed-Solomon Codes ("Q") for protection during reconstruction D0D1D2 D3 P D4D5D6 P D7 D8D9P D10 D11 D12PD13 D14 D15 PD16D17 D18 D19 D20D21D22 D23 P Disk Columns Increasing Logical Disk Addresses Stripe Unit Targeted for mixed applications
Computer ArchitectureFall 2008 © Nested RAID levels RAID 01 and 10 combine mirroring and striping –Combine high performance (striping) and reliability (mirroring) –Get reliability without having to compute parities: higher performance and less complex controller RAID 05 and 50 (also called 53)
Computer ArchitectureFall 2008 © Operating System can help (1) Reducing access time Disk defragmentation: why does that work? Disk scheduling: operating system can reorder requests –How does it work? Reduce seek time Example: Mean seek distance first, Elevator algorithm, Typewriter algorithm –Lets do an example Log structured file systems
Computer ArchitectureFall 2008 © Log structured file systems Idea: most reads to disk are serviced from cache – locality! But what about writes? they have to go to disk; if system crashes, we the file system is compromised How can we make updates perform better: –Save them in a log (sequentially) instead of their original location; why does that help? –Tricky to manage
Computer ArchitectureFall 2008 © Operating System can help (2) Reliability RAIDs are reliable to disk failures, not CPU failures/software bugs –If the cpu writes corrupt data to all redundant disks, what can we do? Backups Reliability in the operating system
Computer ArchitectureFall 2008 © How are files allocated on disk? Index block, has pointers to the other blocks in the file Alternatives: linked allocation Data and meta data both stored on disk What do we do for bigger files?
Computer ArchitectureFall 2008 © Unix Inodes
Computer ArchitectureFall 2008 © Disk reliability Any update to disk, changes both data and meta data –requires several writes Operating system may reorder them as we saw What happens if there is a crash? –Lets look at examples Solution: journaling file system –Update journal before updating filesystem
Computer ArchitectureFall 2008 © Flash Memory Emerging technology for non-volatile storage – competitor to hard disks, especially for embedded market – Can be used as cache for the disk (much larger than RAM disks for the same price, and persistent) Floating gate transistors: semi-conductor technology (like microprocessors and memory) – we know how to build them big (or small!) and cheap –Faster, lower power than disk drives –...but still more expensive, and has some limitations Two types of flash memory: NAND and NOR
Computer ArchitectureFall 2008 © NOR Flash NOR accessed like regular memory and has faster read time –Used for executables/firmware that dont need to change often (PDAs, cellphones, etc.. code) – Can be executed in place bad write/erase performance (2 seconds to erase a block!) bad wear properties (100,000 writes average lifetime)
Computer ArchitectureFall 2008 © NAND Flash Accessed like a block device (like a disk drive) –Higher density, lower cost Faster write/erase time; longer write life expectancy Well suited for cameras, mp3 players, USB drives... Less reliable than NOR (requires error correction codes)
Computer ArchitectureFall 2008 © Different properties from Disks Flash memory has quite different properties from disks – Emphasis on seek time gone Needs to erase a segment before writing (small writes are expensive!) – Slow...(especially NOR erase/write and NAND random access reads) –Must be done in large segments (10s of KBytes) –Can only be rewritten a limited number of times
Computer ArchitectureFall 2008 © Summary of Flash circa. 2006