Presentation is loading. Please wait.

Presentation is loading. Please wait.

Disks and Files Vivek Pai Princeton University. 2 Gedankyou Imagine the following: A disk scheduling policy says “handle the request that is closest to.

Similar presentations


Presentation on theme: "Disks and Files Vivek Pai Princeton University. 2 Gedankyou Imagine the following: A disk scheduling policy says “handle the request that is closest to."— Presentation transcript:

1 Disks and Files Vivek Pai Princeton University

2 2 Gedankyou Imagine the following: A disk scheduling policy says “handle the request that is closest to where the disk head currently is” On a system with lots of disk-intensive jobs, what problem can arise? What tweaks can avoid this problem?

3 3 Why Files Physical reality Block oriented Physical sector #s No protection among users of the system Data might be corrupted if machine crashes Filesystem model Byte oriented Named files Users protected from each other Robust to machine failures

4 4 File Structures Byte sequence Read or write a number of bytes Unstructured or linear Record sequence Fixed or variable length Read or write a number of records Tree Records with keys Read, insert, delete a record (typically using B-tree)

5 5 File Structures Today Stream of bytes Simplest to implement in kernel Easy to manipulate in other forms Little performance loss More complicated structures Hardware assist fell out of favor Special-purpose hardware slower, costly

6 6 File Types ASCII – plain text A Unix executable file header: magic number, sizes, entry point, flags Text (code) Data relocation bits symbol table Devices Everything else in the system

7 7 So What Makes Filesystems Hard? Files grow and shrink in pieces Little a priori knowledge 6 orders of magnitude in file sizes Overcoming disk performance behavior Desire for efficiency Coping with failure

8 8 File System Components Disk management Arrange collection of disk blocks into files Naming User gives file name, not track or sector number, to locate data Security Keep information secure Reliability/durability When system crashes, lose stuff in memory, but want files to be durable User File Naming File access Disk management Disk drivers

9 9 Some Definitions File descriptor (fd) – an integer used to represent a file – easier than using names Metadata – data about data - bookkeeping data used to eventually access the “real” data Open file table – system-wide list of descriptors in use

10 10 Kinds of Metadata inode – index node, or a specific set of information kept about each file Two forms – on disk and in memory Directory – names and location information for files and subdirectories Note: stored in files in Unix Superblock – contains information to describe the file system, disk layout Information about free blocks/inodes on disk

11 11 Contents of an Inode Disk inode: File type, size, blocks on disk Owner, group, permissions (r/w/x) Reference count Times: creation, last access, last mod Inode generation number Padding & other stuff 128 bytes on classic Unix

12 12 Directories in Unix Stored like regular files Contents are file names and inode #s Names are nul-terminated strings Logic Separates file from location in tree File can appear in multiple places What are the drawbacks?

13 13 Effects of Corruption inode – file gets “damaged” Maybe some “free” block gets viewed Directory – “lose” files/directories Might get to read deleted files Superblock – can’t figure out anything This is why we replicate the superblock

14 14 Data Structures for A Typical File System Process control block...... Open file pointer array Open file table (systemwide) Memory Inode Disk inode

15 15 Opening A File File name lookup and authentication Copy the file metadata into the in-memory data structure, if it is not in yet Create an entry in the open file table (system wide) if there isn’t one Create an entry in PCB Link up the data structures Return a pointer to user PCB fd = open( FileName, access) Open file table Metadata Allocate & link up data structures File name lookup & authenticate File system on disk

16 16 Reading And Writing What happens when you… read 10 bytes from a file? write 10 bytes into an existing file? write 4096 bytes into a file? Disk works on blocks (sectors) Can have temporary (ephemeral) buffers Longer lasting buffers = disk cache

17 17 Reading A Block PCB Open file table Metadata read( fd, userBuf, size ) Logical  phyiscal read( device, phyBlock, size ) Get physical block to sysBuf copy to userBuf Disk device driver Buffer cache

18 18 A Disk Layout for A File System Superblock defines a file system size of the file system size of the file descriptor area free list pointer, or pointer to bitmap location of the file descriptor of the root directory other meta-data such as permission and various times For reliability, replicate the superblock Super block File metadata (i-node in Unix) File data blocks Boot block

19 19 File Usage Patterns How do users access files? Sequential: bytes read in order Random: read/write element out of middle of arrays Whole file or partial file How are files used? Most files are small Large files use up most of the disk space Large files account for most of the bytes transferred Bad news Need everything to be efficient

20 20 Data Structures for Disk Management A “header” for each file (part of the file meta-data) Disk sectors associated with each file A data structure to represent free space on disk Bit map 1 bit per block (sector) blocks numbered in cylinder-major order, why? Linked list Others? How much space does a bit map need for a 4G disk?

21 21 Linked Files (Alto) File header points to 1st block on disk Each block points to next Pros Can grow files dynamically Free list is similar to a file Cons random access: horrible unreliable: losing a block means losing the rest File header null...

22 22 Contiguous Allocation Request in advance for the size of the file Search bit map or linked list to locate a space File header first sector in file number of sectors Pros Fast sequential access Easy random access Cons External fragmentation Hard to grow files

23 23 Single-Level Indexed Files or Extent-based Filesystems A user declares max size A file header holds an array of pointers to point to disk blocks Pros Can grow up to a limit Random access is fast Cons Clumsy to grow beyond limit Periodic cleanup of new files Up-front declaration a real pain File header Disk blocks

24 24 217 File Allocation Table (FAT) Approach A section of disk for each partition is reserved One entry for each block A file is a linked list of blocks A directory entry points to the 1st block of the file Pros Simple Cons Always go to FAT Wasting space 619 399 foo 217 EOF FAT 0 399 619

25 25 Multi-Level Indexed Files (Unix) 13 Pointers in a header 10 direct pointers 11: 1-level indirect 12: 2-level indirect 13: 3-level indirect Pros & Cons In favor of small files Can grow Limit is 16G and lots of seek What happens to reach block 23, 5, 340? 1 2 data...... 11 12 13 data....................................

26 26 Reliability In Disk Systems Make sure certain actions have occurred before function completes Known as “synchronous” operation Ex: make sure new inode is on disk & that the directory has been modified before declaring a file creation is complete Drawback: speed Some ops easily asynchronous: access time Some filesystems don’t care: Linux ext2fs

27 27 Recovery After Failure Need to ensure consistency Does free bitmap match tree walk? Do reference counts in inodes match directory entries? Do blocks appear in multiple inodes? This kind of recovery grows with disk size Clean shutdown – mark as such, no recovery

28 28 Reducing Synchronous Times Write to a faster storage Nonvolatile memory – expensive, requires some additional OS/firmware support Write to a special disk or section – logging Only have to examine log when recovering Eventually have to put information in place Some information dies in the log itself Write in a special order Write metadata in a way that is consistent but possibly recovers less

29 29 Challenges Unix filesystem has great flexibility Extent-based filesystems have speed Seeks kill performance – locality Bitmaps show contiguous free space Linked lists easy to search How do you perform backup/restore?

30 30 A Quick XOR Overview XOR = eXclusive OR a XOR a = 0 a XOR 0 = a a XOR b = b XOR a (a XOR b) XOR c = a XOR (b XOR c) In other words, count the bits, even = 0, odd = 1

31 31 More Fun With XOR Result = XOR (a1, a2, a3, a4,…) a2 goes bad Can we reconstruct a2? a2 = XOR (a1, result, a3, a4,…) What does this imply for disks? What kinds of failures does it handle?

32 32 Bigger, Faster, Stronger Making individual disks larger is hard Throw more disks at the problem Capacity increases Effective access speed may increase Probability of failure also increases Use some disks to provide redundancy Generally assume a fail-stop model Fail-stop versus Byzantine failures

33 33 RAID ( Redundant Array of Inexpensive Disks ) Main idea Store the error correcting codes on other disks General error correcting codes are too powerful Use XORs or single parity Upon any failure, one can recover the entire block from the spare disk (or any disk) using XORs Pros Reliability High bandwidth Cons The controller is complex RAID controller XOR

34 34 Synopsis of RAID Levels RAID Level 0: Non redundant (JBOD) RAID Level 1: Mirroring RAID Level 2: Byte-interleaved, ECC RAID Level 3: Byte-interleaved, parity RAID Level 4: Block-interleaved, parity RAID Level 5: Block-interleaved, distributed parity

35 35 Did RAID Work? Performance: yes Reliability: yes Cost: no Controller design complicated Fewer economies of scale High-reliability environments don’t care Now also software implementations

36 36 RAID’s Real Benefit Partly addresses the failure problem Backup/restore less of an issue Failed disk “rebuilt” at sector level Lower performance during rebuild, but system still on-line Still not perfect Geographic problems Failure during rebuild

37 37


Download ppt "Disks and Files Vivek Pai Princeton University. 2 Gedankyou Imagine the following: A disk scheduling policy says “handle the request that is closest to."

Similar presentations


Ads by Google