CSE 451: Operating Systems Spring 2006 Module 14 From Physical to Logical: File Systems John Zahorjan zahorjan@cs.washington.edu Allen Center 534 1.

Slides:



Advertisements
Similar presentations
File Systems.
Advertisements

CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
File System Implementation
File System Implementation CSCI 444/544 Operating Systems Fall 2008.
Disk Drivers May 10, 2000 Instructor: Gary Kimura.
Disks.
Secondary Storage CSCI 444/544 Operating Systems Fall 2008.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
CSE 451: Operating Systems Winter 2010 Module 14 File Systems Mark Zbikowski Gary Kimura.
1 File System Implementation Operating Systems Hebrew University Spring 2010.
1Fall 2008, Chapter 11 Disk Hardware Arm can move in and out Read / write head can access a ring of data as the disk rotates Disk consists of one or more.
IT 344: Operating Systems Winter 2010 Module 13 Secondary Storage Chia-Chi Teng CTB 265.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
IT 344: Operating Systems Winter 2008 Module 16 Journaling File Systems Chia-Chi Teng CTB 265.
CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.
I/O Management and Disk Structure Introduction to Operating Systems: Module 14.
The Google File System CSE 490h, Autumn Today … z Disks z File systems z GFS.
IT 344: Operating Systems Winter 2008 Module 15 BSD UNIX Fast File System Chia-Chi Teng CTB 265.
CSE 451: Operating Systems Spring 2012 Module 16 BSD UNIX Fast File System Ed Lazowska Allen Center 570.
CSE 451: Operating Systems Winter 2015 Module 15 File Systems Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska, Levy, Zahorjan.
CSE 451: Operating Systems Winter 2012 Secondary Storage Mark Zbikowski Gary Kimura.
CSE 451: Operating Systems Spring 2006 Module 14 From Physical to Logical: File Systems John Zahorjan Allen Center 534.
CSE 451: Operating Systems Spring 2010 Module 12.5 Secondary Storage John Zahorjan Allen Center 534.
© 2013 Gribble, Lazowska, Levy, Zahorjan
Operating System (013022) Dr. H. Iwidat
Today topics: File System Implementation
FileSystems.
Secondary Storage Secondary storage typically: Characteristics:
CS703 - Advanced Operating Systems
Filesystems.
Mass-Storage Structure
IT 344: Operating Systems Winter 2008 Module 13 Secondary Storage
Lecture 45 Syed Mansoor Sarwar
CSE 451: Operating Systems Spring 2012 Module 15 File Systems
CSE 451: Operating Systems Spring 2013 Module 15 File Systems
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
CSE 451: Operating Systems Winter 2006 Module 13 Secondary Storage
Overview Continuation from Monday (File system implementation)
CSE 451: Operating Systems Autumn 2004 File Systems
CSE 451: Operating Systems Winter 2007 Module 14 File Systems
CSE 451: Operating Systems Autumn Module 16 Journaling File Systems
CSE 451: Operating Systems Autumn 2003 Lecture 12 Secondary Storage
CSE 451: Operating Systems Winter 2007 Module 13 Secondary Storage
CSE 451: Operating Systems Autumn 2004 BSD UNIX Fast File System
CSE 451: Operating Systems Spring Module 17 Journaling File Systems
CSE 451: Operating Systems Spring 2006 Module 13 Secondary Storage
CSE 451: Operating Systems Spring 2006 Module 12
IT 344: Operating Systems Winter 2008 Module 14 File Systems
CSE 451: Operating Systems Spring Module 16 Journaling File Systems
Secondary Storage Management Brian Bershad
CSE 451: Operating Systems Secondary Storage
CSE 451: Operating Systems Winter 2003 Lecture 12 Secondary Storage
CSE 451: Operating Systems Winter Module 15 BSD UNIX Fast File System
CSE 451: Operating Systems Winter 2009 Module 14 File Systems
CSE 451: Operating Systems Winter 2004 Module 14 File Systems
CSE 451: Operating Systems Winter 2009 Module 12 Secondary Storage
CSE 451: Operating Systems Spring 2005 Module 13 Secondary Storage
CSE 451: Operating Systems Spring 2008 Module 14 File Systems
Secondary Storage Management Hank Levy
CSE451 File System Introduction and Disk Drivers Autumn 2002
CSE 451: Operating Systems Autumn 2004 Secondary Storage
CSE 451: Operating Systems Winter 2004 Module 13 Secondary Storage
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
File System Performance
CSE 451: Operating Systems Autumn 2009 Module 14 File Systems
CSE 451: Operating Systems Spring 2007 Module 11 Secondary Storage
CSE 451: Operating Systems Autumn 2010 Module 14 File Systems
CSE 451: Operating Systems Winter Module 15 BSD UNIX Fast File System
Presentation transcript:

CSE 451: Operating Systems Spring 2006 Module 14 From Physical to Logical: File Systems John Zahorjan zahorjan@cs.washington.edu Allen Center 534 1

Physical disk structure Disk components platters surfaces tracks sectors cylinders arm heads sector track surface cylinder platter arm head 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

© 2006 Gribble, Lazowska, Levy, Zahorjan Disk performance Performance depends on a number of steps seek: moving the disk arm to the correct cylinder depends on how fast disk arm can move seek times aren’t diminishing very quickly (why?) rotation (latency): waiting for the sector to rotate under head depends on rotation rate of disk rates are increasing, but slowly (why?) transfer: transferring data from surface into disk controller, and from there sending it back to host depends on density of bytes on disk increasing, and very quickly When the OS uses the disk, it tries to minimize the cost of all of these steps particularly seek and rotation 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

From Physical To Logical: Low Level Disk Boot block: boot loader (code) partition table Partition 0: Independently managed group of blocks Partition 1 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

From Physical To Logical: File Systems OS Partition Need to keep track of 3 things: Free blocks Inodes File blocks Directory Entries / root etc 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

© 2006 Gribble, Lazowska, Levy, Zahorjan A Strawman Approach OS Superblock Where are the free blocks? Where is /? Partition Inode File meta-data Where are the file blocks? / .. root etc Directory Entry File name -> inode next directory entry foo foo root 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

Formatting: Preparing the Empty File System OS Partition . .. / 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

© 2006 Gribble, Lazowska, Levy, Zahorjan Evaluation Positives: Simple No preset limits on: File size Number of files Disk size Partition Negatives: Incredibly slow: Many block transfers to read a directory Many seek / latency delays Direct access to file bytes requires walking linked list of data blocks Internal fragmentation 1KB allocated for every inode 1KB allocated for every directory entry .. foo root 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

© 2006 Gribble, Lazowska, Levy, Zahorjan Solutions Performance Pack logical items into physical blocks Inodes Directory entries 1 seek / latency retries many items Keep items small Fewer files than blocks  fewer bits in an inode name than a block name 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

The original Unix file system Dennis Ritchie and Ken Thompson, Bell Labs, 1969 “UNIX rose from the ashes of a multi-organizational effort in the early 1960s to develop a dependable timesharing operating system” – Multics Designed for a “workgroup” sharing a single system Did its job exceedingly well Although it has been stretched in many directions and made ugly in the process A wonderful study in engineering tradeoffs 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

All disks are divided into five parts … Boot block can boot the system by loading from this block Superblock specifies boundaries of next 3 areas, and contains head of freelists of inodes and file blocks i-node area contains descriptors (i-nodes) for each file on the disk; all i-nodes are the same size; head of freelist is in the superblock File contents area fixed-size blocks; head of freelist is in the superblock Swap area holds processes that have been swapped out of memory 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

© 2006 Gribble, Lazowska, Levy, Zahorjan Disk Layout Superblock Direct Access to inodes: Inode K is in block K / (BLOCK_SIZE / sizeof(inode)) At offset K % (BLOCK_SIZE/sizeof(inode)) Inode Blocks Directory entries are packed into blocks in a manner similar to inodes Data Blocks 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

The tree (directory, hierarchical) file system A directory is a flat file of fixed-size entries Each entry consists of an i-node number and a file name i-node number File name 152 . 18 .. 216 my_file 4 another_file 93 oh_my_god 144 a_directory 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

The “block list” portion of the i-node (Unix Version 7) Must be able to represent very small and very large files… with minimal chaining… and leaving inodes small Each inode contains 13 block pointers first 10 are “direct pointers” (pointers to blocks of file data) then, single, double, and triple indirect pointers … … 1 … … 10 11 … … 12 … 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

© 2006 Gribble, Lazowska, Levy, Zahorjan So … Data pointers occupy only 13 x 4B in the inode Can get to 10 x 512B = a 5120B file directly (10 direct pointers, blocks in the file contents area are 512B) Can get to 128 x 512B = an additional 65KB with a single indirect reference (the 11th pointer in the i-node gets you to a 512B block in the file contents area that contains 128 4B pointers to blocks holding file data) Can get to 128 x 128 x 512B = an additional 8MB with a double indirect reference Can get to 128 x 128 x 128 x 512B = an additional 1GB with a triple indirect reference Maximum file size is 1GB + a smidge 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

© 2006 Gribble, Lazowska, Levy, Zahorjan A later version of Bell Labs Unix utilized 12 direct pointers rather than 10 Why? Berkeley Unix went to 1KB block sizes What’s the effect on the maximum file size? 256x256x256x1K = 17 GB + a smidge What’s the price? Suppose you went to 4KB blocks? 1Kx1Kx1Kx4K = 4TB + a smidge 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

File system consistency Both i-nodes and file blocks are cached in memory The “sync” command forces memory-resident disk information to be written to disk system does a sync every few seconds A crash or power failure between sync’s can leave an inconsistent disk You could reduce the frequency of problems by reducing caching, but performance would suffer big-time 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

i-check: consistency of the flat file system Is each block on exactly one list? create a bit vector with as many entries as there are blocks follow the free list and each i-node block list when a block is encountered, examine its bit If the bit was 0, set it to 1 if the bit was already 1 if the block is both in a file and on the free list, remove it from the free list and cross your fingers if the block is in two files, call support! if there are any 0’s left at the end, put those blocks on the free list 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

d-check: consistency of the directory file system Do the directories form a tree? Does the link count of each file equal the number of directories links to it? I will spare you the details uses a zero-initialized vector of counters, one per i-node walk the tree, then visit every i-node 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

File System Performance 1: Disk scheduling Seeks are very expensive, so the OS attempts to schedule disk requests that are queued waiting for the disk FCFS (do nothing) reasonable when load is low long waiting time for long request queues SSTF (shortest seek time first) minimize arm movement (seek time), maximize request rate unfairly favors middle blocks SCAN (elevator algorithm) service requests in one direction until done, then reverse skews wait times non-uniformly (why?) C-SCAN like scan, but only go in one direction (typewriter) uniform wait times 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan

File System Performance 2: Layout Disk scheduling attempts to minimize the impact of blocks needed at the moment located widely over the disk How effective do you imagine it is / can be? An alternative (complementary) approach is to allocate blocks likely to be needed together near each other? Which blocks might be needed together? A related approach is to observe block usage patterns and move them near each other The “pipe organ” layout is the simplest example 5/23/2019 © 2006 Gribble, Lazowska, Levy, Zahorjan