Advanced File Systems Issues

Slides:



Advertisements
Similar presentations
Chapter 12: File System Implementation
Advertisements

More on File Management
Chapter 4 : File Systems What is a file system?
File Systems.
Chapter 11: File System Implementation
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
File System Implementation CSCI 444/544 Operating Systems Fall 2008.
CS 104 Introduction to Computer Science and Graphics Problems Operating Systems (4) File Management & Input/Out Systems 10/14/2008 Yang Song (Prepared.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
Ceng Operating Systems
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
File Systems (1). Readings r Silbershatz et al: 10.1,10.2,
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
File Systems and Disk Management. File system Interface between applications and the mass storage/devices Provide abstraction for the mass storage and.
File Implementation. File System Abstraction How to Organize Files on Disk Goals: –Maximize sequential performance –Easy random access to file –Easy.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Operating Systems CMPSC 473 I/O Management (4) December 09, Lecture 25 Instructor: Bhuvan Urgaonkar.
1Fall 2008, Chapter 11 Disk Hardware Arm can move in and out Read / write head can access a ring of data as the disk rotates Disk consists of one or more.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
CSCI-375 Operating Systems Lecture Note: Many slides and/or pictures in the following are adapted from: slides ©2005 Silberschatz, Galvin, and Gagne Some.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
File System Implementation
Module 4.0: File Systems File is a contiguous logical address space.
CS 153 Design of Operating Systems Spring 2015 Lecture 21: File Systems.
Disk & File System Management Disk Allocation Free Space Management Directory Structure Naming Disk Scheduling Protection CSE 331 Operating Systems Design.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Why Do We Need Files? Must store large amounts of data. Information stored must survive the termination of the process using it - that is, be persistent.
12/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Advanced File Systems Issues Andy Wang COP 5611 Advanced Operating Systems.
Lecture Topics: 11/22 HW 7 File systems –block allocation Unix and NT –disk scheduling –file caches –RAID.
W4118 Operating Systems Instructor: Junfeng Yang.
File Systems and Disk Management
Jonathan Walpole Computer Science Portland State University
Chapter 11: File System Implementation
FileSystems.
File System Structure How do I organize a disk into a file system?
Chapter 11: File System Implementation
Filesystems.
Lecture 11: DMBS Internals
Naming and Directories
Advanced File Systems Issues
Chapter 11: File System Implementation
File Systems and Disk Management
Naming and Directories
Filesystems 2 Adapted from slides of Hank Levy
File Systems: Fundamentals.
Naming and Directories
Chapter 11: File System Implementation
Directory Structure A collection of nodes containing information about all files Directory Files F 1 F 2 F 3 F 4 F n Both the directory structure and the.
File Systems and Disk Management
File Systems and Disk Management
File Systems and Disk Management
Secondary Storage Management Brian Bershad
File System Implementation
Chapter 16 File Management
File Systems and Disk Management
Chapter 14: File-System Implementation
File Systems and Disk Management
Secondary Storage Management Hank Levy
File Systems and Disk Management
Chapter 11: File System Implementation
Naming and Directories
CS 105 “Tour of the Black Holes of Computing”
Chapter 5 File Systems -Compiled for MCA, PU
Andy Wang COP 5611 Advanced Operating Systems
Presentation transcript:

Advanced File Systems Issues Andy Wang COP 5611 Advanced Operating Systems

Outline File systems basics Making file systems faster Making file systems more reliable Making file systems do more Using other forms of persistent storage

File System Basics File system: a collection of files An OS may support multiples file systems Instances of the same type Different types of file systems All file systems are typically bound into a single namespace Often hierarchical

A Hierarchy of File Systems

Some Questions… Why hierarchical? What are some alternative ways to organize a namespace? Why not a single file system?

Types of Namespaces Flat Hierarchical Relational Contextual Content-based

Example: “Internet FS” Flat: each URL mapped to one file Hierarchical: navigation within a site Relational: keyword search via search engines Contextual: page rank to improve search results Content-based: searching for images without knowing their names

Why not a single FS?

Advantages of Independent File Systems Easier support for multiple hardware devices More control over disk usage Fault isolation Quicker to run consistency checks Support for multiple types of file systems

Overall Hierarchical Organizations Constrained Unconstrained

Constrained Organizations Independent file systems only located at particular places Usually at the highest level in the hierarchy (e.g., DOS/Windows and Mac) + Simplicity, simple user model - lack of flexibility

Unconstrained Organizations Independent file systems can be put anywhere in the hierarchy (e.g., UNIX) + Generality, invisible to user - Complexity, not always what user expects These organizations requires mounting

Mounting File Systems Each FS is a tree with a single root Its root is spliced into the overall tree Typically on top of another file/directory Or the mount point Complexities in traversing mount points

Mounting Example tmp root mount(/dev/sd01, /w/x/y/z/tmp)

After the Mount root tmp mount(/dev/sd01, /w/x/y/z/tmp)

Before and After the Mount Before mounting, if you issue ls /w/x/y/z/tmp You see the contents of /w/x/y/z/tmp After mounting, if you issue You see the contents of root

Questions Can we end up with a cyclic graph? What are some implications? What are some security concerns?

What is a File? A collection of data and metadata (often called attributes) Usually in persistent storage In UNIX, the metadata of a file is represented by the i_node data structure

Logical File Representation Name(s) i-node File attributes Data File

File Attributes Typical attributes include: File length File ownership File type Access permissions Typically stored in special fixed-size area

Extended Attributes Some systems store more information with attributes (e.g., Mac OS) Sometimes user-defined attributes Some such data can be very large In such cases, treat attributes similar to file data

Storing File Data Where do you store the data? Next to the attributes, or elsewhere? Usually elsewhere Data is not of single size Data is changeable Storing elsewhere allows more flexibility

Physical File Representation i-node File attributes Data locations Data blocks Name(s) File

Ext2 i-node 12 i-node data block location data block location index block location data block location data block location data block location index block location index block location index block location i-node

A Major Design Assumption File size distribution number of files 22KB – 64 KB file size

Pros/Cons of i_node Design + Faster accesses for small files (also accessed more frequently) + No external fragmentations - Internal fragmentations - Limited maximum file size

Directories A directory is a special type of file Instead of normal data, it contains “pointers” to other files Directories are hooked together to create the hierarchical namespace

Ext2 Directory Representation data block location data block location index block location file i-node location file1 file1 i-node number file i-node location file1 file2 i-node number file2 i-node

Links Multiple different names for the same file A Hard link: A second name that points to the same file A Symbolic link: A special file that directs name translation to take another path

Hard Link Diagram i-node data block location data block location index block location file i-node location file1 file1 i-node number file i-node location file1 file1 i-node number file2 i-node

Implications of Hard Links Multiple indistinguishable pathnames for the same file Need to keep link count with file for garbage collection “Remove” sometimes only removes a name Rather odd and unexpected semantics

Symbolic Link Diagram i-node data block location data block location index block location file i-node location file1 file1 i-node number file i-node location file1 file2 i-node number file2 file1 file1 i-node

Implications of Symbolic Links If file at the other end of the link is removed, dangling link Only one true pathname per file Just a mechanism to redirect pathname translation Less system complications

Disk Hardware in Brief One disk head per platter; they typically move together, with one head activated at a time One or more rotating disk platters Disk arm

Disk Hardware in Brief Track Sector Cylinder

Modern Disk Complexities Zone-bit recording More sectors near outer tracks Track skews Track starting positions are not aligned Optimize sequential transfers across multiple tracks Thermo-calibrations

Laying Out Files on Disks Consider a long sequential file And a disk divided into sectors with 1-KB blocks Where should you put the bytes?

File Layout Methods Contiguous allocation Threaded allocation Segment-based (variable-sized, extent-based) allocation Indexed (fixed-sized, extent-based) allocation Multi-level indexed allocation Inverted (hashed) allocation

Contiguous Allocation + Fast sequential access + Easy to compute random offsets - External fragmentation

Threaded Allocation Example: FAT + Easy to grow files - Internal fragmentation - Not good for random accesses - Unreliable

Segment-Based Allocation A number of contiguous regions of blocks + Combines strengths of contiguous and threaded allocations - Internal fragmentation - Random accesses are not as fast as contiguous allocation

Segment-Based Allocation segment list location i-node end block location begin block location

Indexed Allocation + Fast random accesses - Internal fragmentation - Complexity in growing/shrinking indices data block location data block location data block location data block location i-node

Multi-level Indexed Allocation UNIX, ext2 + Easy to grow indices + Fast random accesses - Internal fragmentation - Complexity to reduce indirections for small files

Multi-level Indexed Allocation data block location 12 data block location data block location index block location data block location data block location data block location index block location index block location index block location ext2 i-node

Inverted Allocation Venti + Reduced storage requirement for archives - Slow random accesses data block location data block location data block location data block location data block location data block location data block location data block location i-node for file A i-node for file B

FS Performance Issues Disk-based FS performance limited by Disk seek Rotational latency Disk bandwidth

Typical Disk Overheads ~8.5 msec seek time ~4.2 msec rotational delay ~.017 msec to transfer a 1-KB block (based on 58 MB/sec) To access a random location ~.13 msec to access a 1-KB block ~ 76KB/sec effective bandwidth

How are disks improving? Density: 10-25% per year Capacity: 25% per year Transfer rate: 20% per year Seek time: 8% per year Rotational latency: 5-8% per year All slower than processor speed increases

The Disk/Processor Gap Since processor speeds double every two to three years And disk seek times double every ten years Processors are waiting longer and longer for data from disk Important for OS to cover this gap

Disk Usage Patterns Based on numbers from USENIX 1993 57% of disk accesses are writes Optimizing write performance is a very good idea 18-33% of reads are sequential Read-ahead of blocks likely to win

Disk Usage Patterns (2) 8-12% of writes are sequential Perhaps not worthwhile to focus on optimizing sequential writes 50-75% of all I/Os are synchronous Keeping files consistent is expensive 67-78% of writes are to metadata Need to optimize metadata writes

Disk Usage Patterns (3) 13-42% of total disk access for user I/O Focusing on user patterns alone won’t solve the problem 10-18% of all writes are to last written block Savings possible by clever delay of writes Note: these figures are specific to one file system!

What Can the OS Do? Minimize amount of disk accesses Improve locality on disk Maximize size of data transfers Fetch from multiple disks in parallel

Minimizing Disk Access Avoid disk accesses when possible Use caching (typically LRU methods) to hold file blocks in memory Generally used fro all I/Os, not just disk Effect: decreases latency by removing the relatively slow disk from the path

Buffer Cache Design Factors Most files are short Long files can be very long User access is bursty 70-90% of accesses are sequential 75% of files are open < ¼ second 65-80% of files live < 30 seconds

Implications Design for holding small files Read-ahead is good for sequential accesses Anticipate disk needs of program Read blocks that are likely to be used later During times where disk would otherwise be idle

Pros/Cons of Read-ahead + Very good for sequential access of large files (e.g., executables) + Allows immediate satisfaction of disk requests - Contend memory with LRU caching - Extra OS complexity

Buffering Writes Buffer writes so that they need not be written to disk immediately Reducing latency on writes But buffered writes are asynchronous Potential cache consistency and crash problems Some systems make certain critical writes synchronously

Should We Buffer Writes? Good for short-lived files But danger of losing data in face of crashes And most short-lived files are also short in length ¼ of all bytes deleted/overwritten in 30 seconds

Improved Locality Make sure next disk block you need is close to the last one you got File layout is important here Ordering of accesses in controller helps Effect: Less seek time and rotational latency

Maximizing Data Transfers Transfer big blocks or multiple blocks on one read Readahead is one good method here Effect: Increase disk bandwidth and reduce the number of disk I/Os

Use Multiple Disks in Parallel Multiprogramming can cause some of this automatically Use of disk arrays can parallelize even a single process’ access At the cost of extra complexity Effect: Increase disk bandwidth

UNIX Fast File System Designed to improve performance of UNIX file I/O Two major areas of performance improvement Bigger block sizes Better on-disk layout for files

Block Size Improvement Quadrupling of block size quadrupled amount of data gotten per disk fetch But could lead to fragmentation problems So fragments introduced Small files stored in fragments Fragments addressable (but not independently fetchable)

Disk Layout Improvements Aimed toward avoiding disk seeks Bad if finding related files takes many seeks Very bad if find all the blocks of a single file requires seeks Spatial locality: keep related things close together on disk

Cylinder Groups A cylinder group: a set of consecutive disk cylinders in the FFS Files in the same directory stored in the same cylinder group Within a cylinder group, tries to keep things contiguous But must not let a cylinder group fill up

Locations for New Directories Put new directory in relatively empty cylinder group What is “empty”? Many free i_nodes Few directories already there

The Importance of Free Space FFS must not run too close to capacity No room for new files Layout policies ineffective when too few free blocks Typically, FFS needs 10% of the total blocks free to perform well

Performance of FFS 4 to 15 times the bandwidth of old UNIX file system Depending on size of disk blocks Performance on original file system Limited by CPU speed Due to memory-to-memory buffer copies

FFS Not the Ultimate Solution Based on technology of the early 80s And file usage patterns of those times In modern systems, FFS achieves only ~5% of raw disk bandwidth