CS 4284 Systems Capstone Godmar Back Disks & File Systems.

Slides:



Advertisements
Similar presentations
Chapter 12: File System Implementation
Advertisements

File Management.
More on File Management
Chapter 4 : File Systems What is a file system?
File Management.
File Systems.
Allocation Methods - Contiguous
File Systems Examples.
COS 318: Operating Systems File Layout and Directories
Chapter 11: File System Implementation
File System Implementation
Operating Systems File Systems (in a Day) Ch
File System Implementation CSCI 444/544 Operating Systems Fall 2008.
File Systems Implementation
Ceng Operating Systems
1 Outline File Systems Implementation How disks work How to organize data (files) on disks Data structures Placement of files on disk.
1 Friday, July 07, 2006 “Vision without action is a daydream, Action without a vision is a nightmare.” - Japanese Proverb.
Chapter 12: File System Implementation
File System Implementation
Operating Systems File Systems (Select parts of Ch 11-12)
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Contiguous Allocation of Disk Space. Linked Allocation.
File Systems (1). Readings r Silbershatz et al: 10.1,10.2,
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
File Implementation. File System Abstraction How to Organize Files on Disk Goals: –Maximize sequential performance –Easy random access to file –Easy.
1Fall 2008, Chapter 11 Disk Hardware Arm can move in and out Read / write head can access a ring of data as the disk rotates Disk consists of one or more.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
Chapter 11: File System Implementation Hung Q. Ngo KyungHee University Spring 2009
CS 346 – Chapter 12 File systems –Structure –Information to maintain –How to access a file –Directory implementation –Disk allocation methods  efficient.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
CSCI-375 Operating Systems Lecture Note: Many slides and/or pictures in the following are adapted from: slides ©2005 Silberschatz, Galvin, and Gagne Some.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
File Storage Organization The majority of space on a device is reserved for the storage of files. When files are created and modified physical blocks are.
File System Implementation
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
Module 4.0: File Systems File is a contiguous logical address space.
CS 153 Design of Operating Systems Spring 2015 Lecture 21: File Systems.
Disk & File System Management Disk Allocation Free Space Management Directory Structure Naming Disk Scheduling Protection CSE 331 Operating Systems Design.
CE Operating Systems Lecture 17 File systems – interface and implementation.
File Systems. 2 What is a file? A repository for data Is long lasting (until explicitly deleted).
Css430 file-system implementation1 CSS430 File-System Implementation Textbook Ch12 These slides were compiled from the OSC textbook slides (Silberschatz,
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems File systems.
CS 3204 Operating Systems Godmar Back Lecture 21.
File Systems Topics Design criteria History of file systems Berkeley Fast File System Effect of file systems on programs fs.ppt CS 105 “Tour of the Black.
Chapter 6 File Systems. Essential requirements 1. Store very large amount of information 2. Must survive the termination of processes persistent 3. Concurrent.
File Systems 2. 2 File 1 File 2 Disk Blocks File-Allocation Table (FAT)
Operating Systems 1 K. Salah Module 4.0: File Systems  File is a contiguous logical address space (of related records)  Access Methods  Directory Structure.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Part III Storage Management
CS 3204 Operating Systems Godmar Back Lecture 25.
1 Lecture 15: File System Interface  file system interface reasons for delegating storage management to OS file definition and operations on a file file.
File System Department of Computer Science Southern Illinois University Edwardsville Spring, 2016 Dr. Hiroshi Fujinoki CS 314.
W4118 Operating Systems Instructor: Junfeng Yang.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 12: File System Implementation.
File Systems and Disk Management
File System Examples Unix Fast File System (FFS)
Chapter 11: File System Implementation
FileSystems.
CS 5204 Operating Systems Disks & File Systems Godmar Back.
File System Structure How do I organize a disk into a file system?
Filesystems.
File Systems and Disk Management
File System Implementation
Chapter 14: File-System Implementation
File system : Disk Space Management
Chapter 5 File Systems -Compiled for MCA, PU
Presentation transcript:

CS 4284 Systems Capstone Godmar Back Disks & File Systems

Filesystems

CS 4284 Spring 2015 Files vs Disks File Abstraction Byte oriented Names Access protection Consistency guarantees Disk Abstraction Block oriented Block #s No protection No guarantees beyond block write

CS 4284 Spring 2015 Filesystem Requirements Naming –Should be flexible, e.g., allow multiple names for same files –Support hierarchy for easy of use Persistence –Want to be sure data has been written to disk in case crash occurs Sharing/Protection –Want to restrict who has access to files –Want to share files with other users

CS 4284 Spring 2015 FS Requirements (cont’d) Speed & Efficiency for different access patterns –Sequential access –Random access –Sequential is most common & Random next –Other pattern is Keyed access (not usually provided by OS) Minimum Space Overhead –Disk space needed to store metadata is lost for user data Twist: all metadata that is required to do translation must be stored on disk –Translation scheme should minimize number of additional accesses for a given access pattern –Harder than, say page tables where we assumed page tables themselves are not subject to paging!

Filesystems Software Architecture (including in-memory data structures)

CS 4284 Spring 2015 Overview File Operations: create(), unlink(), open(), read(), write(), close() Buffer Cache Device Driver File System Uses names for files Views files as sequence of bytes Uses disk id + sector indices Must implement translation (file name, file offset)  (disk id, disk sector, sector offset) Must manage free space on disk

CS 4284 Spring 2015 The Big Picture PCB …543210… …543210… Data structures to keep track of open files struct file inode + position + … struct dir inode + position struct inode Data structures to keep track of open files struct file inode + position + … struct dir inode + position struct inode Per-process file descriptor table Buffer Cache Open file table Filesystem Information File Descriptors (inodes) Directory Data File Data Cached data and metadata in buffer cache On-Disk Data Structures ?

CS 4284 Spring 2015 Steps in Opening & Reading a File Lookup (via directory) –find on-disk file descriptor’s block number Find entry in open file table (struct inode list in Pintos) –Create one if none, else increment ref count Find where file data is located –By reading on-disk file descriptor Read data & return to user

CS 4284 Spring 2015 Open File Table inode – represents file –at most 1 in-memory instance per unique file –#number of openers & other properties file – represents one or more processes using an file –With separate offsets for byte-stream dir – represents an open directory file Generally: –None of data in OFT is persistent –Reflects how processes are currently using files –Lifetime of objects determined by open/close Reference counting is used

CS 4284 Spring 2015 File Descriptors (“inodes”) Term “inode” can refer to 3 things: 1.in-memory inode –Store information about an open file, such as how many openers, corresponds to on-disk file descriptor 2.on-disk inode –Region on disk, entry in file descriptor table, that stores persistent information about a file – who owns it, where to find its data blocks, etc. 3.on-disk inode, when cached in buffer cache –A bytewise copy of 2. in memory –Q.: Should in-memory inode store a pointer to cached on-disk inode? (Answer: No.)

Filesystems On-Disk Data Structures and Allocation Strategies

CS 4284 Spring 2015 Filesystem Information Contains “superblock” stores information such as size of entire filesystem, etc. –Location of file descriptor table & free map Free Block Map –Bitmap used to find free blocks –Typically cached in memory Superblock & free map often replicated in different positions on disk Free Block Map Super Block

CS 4284 Spring 2015 File Allocation Strategies Contiguous allocation Linked files Indexed files Multi-level indexed files

CS 4284 Spring 2015 Contiguous Allocation Idea: allocate files in contiguous blocks File Descriptor = (first block, length) Good sequential & random access Problems: –hard to extend files – may require expensive compaction –external fragmentation –analogous to segmentation-based VM Pintos’s baseline implementation does this File AFile B

CS 4284 Spring 2015 Linked Files Idea: implement linked list –either with variable sized blocks –or fixed sized blocks (“clusters”) Solves fragmentation problem, but now –need lots of seeks for sequential accesses and random accesses –unreliable: lose first block, may lose file Solution: keep linked list in memory –DOS: FAT File Allocation Table File A Part 1 File B Part 1 File A Part 2 File B Part 2

CS 4284 Spring 2015 DOS FAT FAT stored at beginning of disk & replicated for redundancy FAT cached in memory Size: n-bit entries, m-bit blocks  2^(m+n) limit –n=12, 16, 28 –m=9 … 15 (0.5KB-32KB) As disk size grows, m & n must grow –Growth of n means larger in-memory table FilenameLengthFirst Block “a”21 “b”43 “c”312 “d”14

CS 4284 Spring 2015 DOS FAT Scalability Limits FAT-12 uses 12 bit entries, max of 4096 clusters –FAT-16: clusters, FAT-32 uses 28bits, so theoretical max of 2^28 (1 Gi) clusters Floppy disk, say 1.4MB; FAT-12, 1K clusters, need 1,400 entries, 2 bytes each -> 2.8KB Modern disk, say ~500 GB (~2^41 bytes) –At 4 KB cluster size, would need 2^29 entries. Each entry at 4 bytes, would need 2^31 bytes, or 2GB, RAM just to hold the FAT. –At 32 KB cluster size, would need only 1/8, but still 256MB RAM to hold FAT; simple operations, such as determining how much space is free on disk, require reading entire FAT

CS 4284 Spring 2015 Blocksize Trade-Offs Chart above assumes all files are 2KB in size (observed median file size is about 2KB) –Larger blocks: faster reads (because seeks are amortized & more bytes per transfer) –More wastage (2KB file in 32KB block means 15/16 th are unused) Source: Tanenbaum, Modern Operating Systems

CS 4284 Spring 2015 Indexed Allocation Single-index: specify maximum filesize, create index array, then note blocks in index –Random access ok – one translation step –Sequential access requires more seeks – depending on contiguous allocation Drawback: hard to grow beyond maximum File A Part 1 File A Part 2 File A Index File A Part 3

CS 4284 Spring 2015 Multi-Level Indices Used in Unix & (possibly) Pintos (P4) N FLI SLI TLI 1 2 index N index 2 index N+IN+1 N+I+1 index 3 index 2 Direct Blocks Indirect Block Double Indirect Block Triple Indirect Block index N+I+I 2

CS 4284 Spring Logical View (Per File)offset in file Physical View (On Disk) (ignoring other files) Inode Data Index Index 2 sector numbers on disk

CS 4284 Spring Logical View (Per File)offset in file Physical View (On Disk) (ignoring other files) Inode Data Index Index 2 sector numbers on disk … … … …

CS 4284 Spring 2015 Multi-Level Indices If filesz < N * BLKSIZE, can store all information in direct block array –Biased in favor of small files (ok because most files are small…) Assume index block stores I entries –If filesz < (I + N) * BLKSIZE, 1 indirect block suffices Q.: What’s the maximum size before we need triple-indirect block? Q.: What’s the per-file overhead (best case, worst case?)

CS 4284 Spring 2015 Extents Index-tree based scheme avoids external fragmentation, and is efficient for small files, but incurs relatively high meta-data overhead for large files Extents can improve that – store (bnum, length) pair to denote that file occupies blocks [bnum, …, bnum+length-1] –But complicates offset -> sector translation –Used in ext4.

CS 4284 Spring 2015 Storing Inodes Unix v7, BSD 4.3 FFS (BSD 4.4) Cylindergroups have superblock+bitmap+inode list+file space Try to allocate file & inode in same cylinder group to improve access locality I 0 I 1 I 2 I 3 I 4 ….. SuperblockRest of disk for files & directories I 0 I 1 … SB1Files …I 3 I 4 ….. Files …I 8 I 9 ….. Files …SB2SB3 CG i

CS 4284 Spring 2015 Positioning Inodes Putting inodes in fixed place makes finding inodes easier –Can refer to them simply by inode number –After crash, there is no ambiguity as to what are inodes vs. what are regular files Disadvantage: limits the number of files per filesystem at creation time –Use “df –ih” on Linux/ext3 to see how many inodes are used/free

Filesystems Directories and Name Resolution

CS 4284 Spring 2015 Directories Need to find file descriptor (inode), given a name Approaches: –Single directory (old PCs), Two-level approaches with 1 directory per user Now exclusively hierarchical approaches: –File system forms a tree (or DAG) How to tell regular file from directory? –Set a bit in the inode Data Structures –Linear list of (inode, name) pairs –B-Trees that map name -> inode –Combinations thereof

CS 4284 Spring 2015 Using Linear Lists Advantage: (relatively) simple to implement Disadvantages: –Scan makes lookup (& delete!) really slow for large directories –Could cause fragmentation (though not a problem in practice) 23multi-oom15sample.txt offset 0 inode #

CS 4284 Spring 2015 Using B+-Trees Advantages: –Scalable to large number of files: in growth, in lookup time Disadvantage: –Complex –Overhead for small directories (some filesystems switch to B+-Tree only for large directories) Note: some filesystems use B+-Tree not only for directory files, but for block indexes as well. –HFS’s ‘catalog’ – single B+-Tree that stores inodes + directories. –Also done in NTFS, XFS & Reiserfs, ZFS, and Btrfs Source: Wikipedia)Wikipedia

CS 4284 Spring 2015 Absolute Paths How to resolve a path name such as “/usr/bin/ls”? –Split into tokens using “/” separator –Find inode corresponding to root directory (how? Use fixed inode # for root) –(*) Look up “usr” in root directory, find inode –If not last component in path, check that inode is a directory. Go to (*), looking for next comp –If last component in path, check inode is of desired type, return

CS 4284 Spring 2015 Name Resolution Must have a way to scan an entire directory without other processes interfering -> need a “lock” function –But don’t need to hold lock on /usr when scanning /usr/bin Directories can only be removed if they’re empty –Requires synchronization also Most OS cache translations in “namei” cache – maps absolute pathnames to inode –Must keep namei cache consistent if files are deleted

CS 4284 Spring 2015 Current Directory Relative pathnames are resolved relative to current directory –Provides default context –Every process has one in Unix/Pintos chdir(2) changes current directory –cd tmp; ls; pwd vs (cd tmp; ls); pwd lookup algorithm the same, except starts from current dir –process should keep current directory open –current directory inherited from parent

CS 4284 Spring 2015 Hard & Soft Links Provides aliases (different names) for a file Hard links: (Unix: ln) –Two independent directory entries have the same inode number, refer to same file –Inode contains a reference count –Disadvantage: alias only possible with same filesystem Soft links: (Unix: ln –s) –Special type of file (noted in inode); content of file is absolute or relative pathname – stored inside inode instead of direct block list Windows: “junctions” & “shortcuts”