CSE 153 Design of Operating Systems Spring 2016 Lecture 13: File Systems.

Slides:



Advertisements
Similar presentations
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
Advertisements

Instructor: Prof. Jason Fritts
Carnegie Mellon 1 The Memory Hierarchy : Introduction to Computer Systems 9th Lecture, Sep. 21, 2010 Instructors: Randy Bryant and Dave O’Hallaron.
Disk Drivers May 10, 2000 Instructor: Gary Kimura.
Disks.
Operating Systems File systems
Secondary Storage CSCI 444/544 Operating Systems Fall 2008.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
The Memory Hierarchy Topics Storage technologies Capacity and latency trends The hierarchy Systems I.
Disk-based Storage Oct. 23, 2008 Topics Storage technologies and trends Locality of reference Caching in the memory hierarchy lecture-17.ppt “The.
CSE 451: Operating Systems Winter 2010 Module 14 File Systems Mark Zbikowski Gary Kimura.
Disk Memory Topics Disk Memory Structure Disk Capacity class10.ppt.
Operating Systems ECE344 Ding Yuan File System Lecture 11: File System.
CS 153 Design of Operating Systems Spring 2015 Lecture 20: File Systems.
IT 344: Operating Systems Winter 2010 Module 13 Secondary Storage Chia-Chi Teng CTB 265.
Disks. Secondary Storage Secondary Storage Typically – Storage systems outside of “primary memory” – Cannot access data using load/store instructions.
The Memory Hierarchy Topics Storage technologies and trends Locality of reference Caching in the memory hierarchy.
The Memory Hierarchy Topics Storage technologies and trends Locality of reference Caching in the memory hierarchy.
1 Memory Hierarchy ( Ⅱ ). 2 Outline Storage technologies and trends Locality The memory hierarchy Cache memories Suggested Reading: 6.1, 6.2, 6.3, 6.4.
The Memory Hierarchy Topics Storage technologies and trends Locality of reference Caching in the memory hierarchy.
CSE 451: Operating Systems Winter 2012 Secondary Storage Mark Zbikowski Gary Kimura.
1 Input/Output: Organization, Disk Performance, and RAID.
Disk-based Storage Oct. 20, 2009 Topics How disk storage fits in systems Performance effects of paging How disks work lecture-15.ppt “The course.
CSE 451: Operating Systems Spring 2010 Module 12.5 Secondary Storage John Zahorjan Allen Center 534.
Sarah Diesburg Operating Systems CS 3430
Operating Systems ECE344 Lecture 11: File System Ding Yuan.
Disks and RAID.
Database Management Systems (CS 564)
The Memory Hierarchy.
The Memory Hierarchy CSE 238/2038/2138: Systems Programming
CSE 120 Principles of Operating
The Memory Hierarchy.
Operating System I/O System Monday, August 11, 2008.
IT 344: Operating Systems Winter 2008 Module 13 Secondary Storage
CS703 - Advanced Operating Systems
CS 201 The Memory Heirarchy
Lecture 45 Syed Mansoor Sarwar
Memory Hierarchy (I).
Lecture 11: DMBS Internals
CSE 153 Design of Operating Systems Winter 2018
CSE 153 Design of Operating Systems Winter 2018
Input-output I/O is very much architecture/system dependent
CSE 451: Operating Systems Winter 2006 Module 13 Secondary Storage
CSE 451: Operating Systems Autumn 2003 Lecture 12 Secondary Storage
CSE 451: Operating Systems Winter 2007 Module 13 Secondary Storage
CSE 451: Operating Systems Autumn 2003 Lecture 13 File Systems
CSE 451: Operating Systems Spring 2006 Module 13 Secondary Storage
Secondary Storage Management Brian Bershad
Persistence: hard disk drive
Memory Hierarchy (I).
CSE 451: Operating Systems Secondary Storage
Persistence: I/O devices
CSE 451: Operating Systems Winter 2003 Lecture 12 Secondary Storage
CSE 451: Operating Systems Winter 2003 Lecture 13 File Systems
CSE 153 Design of Operating Systems Winter 2019
Instructors: Majd Sakr and Khaled Harras
CSE 451: Operating Systems Winter 2009 Module 12 Secondary Storage
CSE 451: Operating Systems Spring 2005 Module 13 Secondary Storage
CSE 451: Operating Systems Spring 2008 Module 14 File Systems
Secondary Storage Management Hank Levy
CS703 - Advanced Operating Systems
CSE451 File System Introduction and Disk Drivers Autumn 2002
CSE 451: Operating Systems Autumn 2004 Secondary Storage
CSE 451: Operating Systems Winter 2004 Module 13 Secondary Storage
Services for Non-Volatile Storage
Operating Systems 2019 Spring by Euiseong Seo
CSE 451: Operating Systems Spring 2007 Module 11 Secondary Storage
CSE 451: Operating Systems Autumn 2001 Lecture 13 File Systems
Andy Wang Operating Systems COP 4610 / CGS 5765
Presentation transcript:

CSE 153 Design of Operating Systems Spring 2016 Lecture 13: File Systems

OS Abstractions 2 Operating System Hardware Applications CPUDiskRAM ProcessFile systemVirtual memory CSE 153 – Lecture 13 – File Systems

3 File Systems l First we’ll discuss properties of physical disks u Structure u Performance u Scheduling l Then we’ll discuss how we build file systems on them u Files u Directories u Sharing u Protection u File System Layouts u File Buffer Cache u Read Ahead

CSE 153 – Lecture 13 – File Systems4 Disks and the OS l Disks are messy physical devices: u Errors, bad blocks, missed seeks, etc. l OS’s job is to hide this mess from higher level software u Low-level device control (initiate a disk read, etc.) u Higher-level abstractions (files, databases, etc.)

What’s Inside A Disk Drive? Spindle Arm Actuator Platters Electronics (including a processor and memory!) SCSI connector Image courtesy of Seagate Technology

6 Physical Disk Structure l Disk components u Platters u Surfaces u Tracks u Sectors u Cylinders u Arm u Heads Arm Heads Platter Surface Cylinder Track Sector (512 bytes) Arm

Disk Geometry l Disks consist of platters, each with two surfaces. l Each surface consists of concentric rings called tracks. l Each track consists of sectors separated by gaps. Spindle Surface Tracks Track k Sectors Gaps

Disk Geometry (Muliple- Platter View) l Aligned tracks form a cylinder. Surface 0 Surface 1 Surface 2 Surface 3 Surface 4 Surface 5 Cylinder k Spindle Platter 0 Platter 1 Platter 2

Disk Capacity l Capacity: maximum number of bits that can be stored. u Vendors express capacity in units of gigabytes (GB), where 1 GB = 10 9 Bytes (Lawsuit pending! Claims deceptive advertising). l Capacity is determined by these technology factors: u Recording density (bits/in): number of bits that can be squeezed into a 1 inch segment of a track. u Track density (tracks/in): number of tracks that can be squeezed into a 1 inch radial segment. u Areal density (bits/in2): product of recording and track density. l Modern disks partition tracks into disjoint subsets called recording zones u Each track in a zone has the same number of sectors, determined by the circumference of innermost track. u Each zone has a different number of sectors/track

Computing Disk Capacity Computing Disk Capacity Capacity = (# bytes/sector) x (avg. # sectors/track) x (# tracks/surface) x (# surfaces/platter) x (# platters/disk) Example: u 512 bytes/sector u 300 sectors/track (on average) u 20,000 tracks/surface u 2 surfaces/platter u 5 platters/disk Capacity = 512 x 300 x x 2 x 5 = 30,720,000,000 = GB

Disk Operation (Single-Platter View) The disk surface spins at a fixed rotational rate By moving radially, the arm can position the read/write head over any track. The read/write head is attached to the end of the arm and flies over the disk surface on a thin cushion of air. spindle

Disk Operation (Multi-Platter View) Arm Read/write heads move in unison from cylinder to cylinder Spindle

Tracks divided into sectors Disk Structure - top view of single platter Surface organized into tracks

Disk Access Head in position above a track

Disk Access Rotation is counter-clockwise

Disk Access – Read About to read blue sector

Disk Access – Read After BLUE read After reading blue sector

Disk Access – Read After BLUE read Red request scheduled next

Disk Access – Seek After BLUE read Seek for RED Seek to red’s track

Disk Access – Rotational Latency After BLUE read Seek for REDRotational latency Wait for red sector to rotate around

Disk Access – Read After BLUE read Seek for REDRotational latencyAfter RED read Complete read of red

Disk Access – Service Time Components After BLUE read Seek for REDRotational latencyAfter RED read Data transferSeekRotational latency Data transfer

Disk Access Time l Average time to access some target sector approximated by : u Taccess = Tavg seek + Tavg rotation + Tavg transfer l Seek time (Tavg seek) u Time to position heads over cylinder containing target sector. u Typical Tavg seek is 3—9 ms l Rotational latency (Tavg rotation) u Time waiting for first bit of target sector to pass under r/w head. u Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 min u Typical Tavg rotation = 7200 RPMs l Transfer time (Tavg transfer) u Time to read the bits in the target sector. u Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min.

Disk Access Time Example l Given: u Rotational rate = 7,200 RPM u Average seek time = 9 ms. u Avg # sectors/track = 400. l Derived: u Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms. u Tavg transfer = 60/7200 RPM x 1/400 secs/track x 1000 ms/sec = 0.02 ms u Taccess = 9 ms + 4 ms ms l Important points: u Access time dominated by seek time and rotational latency. u First bit in a sector is the most expensive, the rest are free. u SRAM access time is about 4 ns/doubleword, DRAM about 60 ns »Disk is about 40,000 times slower than SRAM, »2,500 times slower then DRAM.

Logical Disk Blocks l Modern disks present a simpler abstract view of the complex sector geometry: u The set of available sectors is modeled as a sequence of b- sized logical blocks (0, 1, 2,...) l Mapping between logical blocks and actual (physical) sectors u Maintained by hardware/firmware device called disk controller. u Converts requests for logical blocks into (surface,track,sector) triples. l Allows controller to set aside spare cylinders for each zone. u Accounts for the difference in “formatted capacity” and “maximum capacity”.

I/O Bus Main memory I/O bridge Bus interface ALU Register file CPU chip System busMemory bus Disk controller Graphics adapter USB controller MouseKeyboardMonitor Disk I/O bus Expansion slots for other devices such as network adapters.

Reading a Disk Sector (1) Main memory ALU Register file CPU chip Disk controller Graphics adapter USB controller mouse keyboard Monitor Disk I/O bus Bus interface CPU initiates a disk read by writing a command, logical block number, and destination memory address to a port (address) associated with disk controller.

Reading a Disk Sector (2) Main memory ALU Register file CPU chip Disk controller Graphics adapter USB controller MouseKeyboardMonitor Disk I/O bus Bus interface Disk controller reads the sector and performs a direct memory access (DMA) transfer into main memory.

Reading a Disk Sector (3) Main memory ALU Register file CPU chip Disk controller Graphics adapter USB controller MouseKeyboardMonitor Disk I/O bus Bus interface When the DMA transfer completes, the disk controller notifies the CPU with an interrupt (i.e., asserts a special “interrupt” pin on the CPU)

CSE 153 – Lecture 13 – File Systems30 Disk Interaction l Specifying disk requests requires a lot of info: u Cylinder #, surface #, track #, sector #, transfer size… l Older disks required the OS to specify all of this u The OS needed to know all disk parameters l Modern disks are more complicated u Not all sectors are the same size, sectors are remapped, etc. l Current disks provide a higher-level interface (SCSI) u The disk exports its data as a logical array of blocks [0…N] »Disk maps logical blocks to cylinder/surface/track/sector u Only need to specify the logical block # to read/write u But now the disk parameters are hidden from the OS

CSE 153 – Lecture 13 – File Systems31 Disks Heterogeneity l Seagate Barracuda 3.5" (workstation) u capacity: GB u rotational speed: 7,200 RPM u sequential read performance: 78 MB/s (outer) - 44 MB/s (inner) u seek time (average): 8.1 ms l Seagate Cheetah 3.5" (server) u capacity: GB u rotational speed: 15,000 RPM u sequential read performance: 135 MB/s (outer) - 82 MB/s (inner) u seek time (average): 3.8 ms l Seagate Savvio 2.5" (smaller form factor) u capacity: 73 GB u rotational speed: 10,000 RPM u sequential read performance: 62 MB/s (outer) - 42 MB/s (inner) u seek time (average): 4.3 ms

Recent: Seagate Enterprise (2014) l 6TB! 1 Tb/in 2 (announced July) l 6 (3.5”) platters, 2 heads each l Perpendicular recording l 7200 RPM, 4.16ms latency l 216MB/sec sustained transfer speed l 128MB cache l Error Characteristics: u MBTF: hours u Bit error rate: l Special considerations: u Normally need special “bios” (EFI): Bigger than easily handled by 32- bit OSes. u Seagate provides special “Disk Wizard” software that virtualizes drive into multiple chunks that makes it bootable on these OSes.

Storage Performance & Price Bandwidth (sequential R/W) Cost/GBSize HHD MB/s$ /GB2-4 TB SSD MB/s (SATA) 6 GB/s (PCI) $1.5-5/GB200GB-1TB DRAM10-16 GB/s$5-10/GB64GB-256GB 33 BW: SSD up to x10 than HDD, DRAM > x10 than SSD Price: HDD x30 less than SSD, SSD x4 less than DRAM BW: SSD up to x10 than HDD, DRAM > x10 than SSD Price: HDD x30 less than SSD, SSD x4 less than DRAM 1

Contrarian View l A lot of file system ideas tied to disk drives are no longer relevant?

16 TB SSD announced

CSE 153 – Lecture 13 – File Systems36 Disk Scheduling l Because seeks are so expensive (milliseconds!), OS schedules requests that are queued waiting for the disk u FCFS (do nothing) »Reasonable when load is low »Does nothing to minimize overhead of seeks u SSTF (shortest seek time first) »Minimize arm movement (seek time), maximize request rate »Favors middle blocks, potential starvation of blocks at ends u SCAN (elevator) »Service requests in one direction until done, then reverse »Long waiting times for blocks at ends u C-SCAN »Like SCAN, but only go in one direction (typewriter)

CSE 153 – Lecture 13 – File Systems37 Disk Scheduling (2) l In general, unless there are request queues, disk scheduling does not have much impact u Important for servers, less so for PCs l Modern disks often do the disk scheduling themselves u Disks know their layout better than OS, can optimize better u Ignores, undoes any scheduling done by OS

CSE 153 – Lecture 13 – File Systems38 File Systems l File systems u Implement an abstraction (files) for secondary storage u Organize files logically (directories) u Permit sharing of data between processes, people, and machines u Protect data from unwanted access (security)

CSE 153 – Lecture 13 – File Systems39 Files l A file is a sequence of bytes with some properties u Owner, last read/write time, protection, etc. l A file can also have a type u Understood by the file system »Block, character, device, portal, link, etc. u Understood by other parts of the OS or runtime libraries »Executable, dll, souce, object, text, etc. l A file’s type can be encoded in its name or contents u Windows encodes type in name ».com,.exe,.bat,.dll,.jpg, etc. u Unix encodes type in contents »Magic numbers, initial characters (e.g., #! for shell scripts)

CSE 153 – Lecture 13 – File Systems40 Basic File Operations Unix l creat(name) l open(name, how) l read(fd, buf, len) l write(fd, buf, len) l sync(fd) l seek(fd, pos) l close(fd) l unlink(name) NT l CreateFile(name, CREATE) l CreateFile(name, OPEN) l ReadFile(handle, …) l WriteFile(handle, …) l FlushFileBuffers(handle, …) l SetFilePointer(handle, …) l CloseHandle(handle, …) l DeleteFile(name) l CopyFile(name) l MoveFile(name)

CSE 153 – Lecture 13 – File Systems41 File Access Methods l Different file systems differ in the manner that data in a file can be accessed u Sequential access – read bytes one at a time, in order u Direct access – random access given block/byte number u Record access – file is array of fixed- or variable-length records, read/written sequentially or randomly by record # u Indexed access – file system contains an index to a particular field of each record in a file, reads specify a value for that field and the system finds the record via the index (DBs) l Older systems provide more complicated methods l What file access method do Unix, Windows provide?

CSE 153 – Lecture 13 – File Systems42 Directories l Directories serve two purposes u For users, they provide a structured way to organize files u For the file system, they provide a convenient naming interface that allows the implementation to separate logical file organization from physical file placement on the disk l Most file systems support multi-level directories u Naming hierarchies (/, /usr, /usr/local/, …) l Most file systems support the notion of a current directory u Relative names specified with respect to current directory u Absolute names start from the root of directory tree

CSE 153 – Lecture 13 – File Systems43 Directory Internals l A directory is a list of entries u u Name is just the name of the file or directory u Location depends upon how file is represented on disk l List is usually unordered (effectively random) u Entries usually sorted by program that reads directory l Directories typically stored in files u Only need to manage one kind of secondary storage unit

CSE 153 – Lecture 13 – File Systems44 Basic Directory Operations Unix l Directories implemented in files u Use file ops to create dirs l C runtime library provides a higher-level abstraction for reading directories u opendir(name) u readdir(DIR) u seekdir(DIR) u closedir(DIR) Windows l Explicit dir operations u CreateDirectory(name) u RemoveDirectory(name) l Very different method for reading directory entries u FindFirstFile(pattern) u FindNextFile()

CSE 153 – Lecture 13 – File Systems45 Path Name Translation l Let’s say you want to open “/one/two/three” l What does the file system do? u Open directory “/” (well known, can always find) u Search for the entry “one”, get location of “one” (in dir entry) u Open directory “one”, search for “two”, get location of “two” u Open directory “two”, search for “three”, get location of “three” u Open file “three” l Systems spend a lot of time walking directory paths u This is why open is separate from read/write u OS will cache prefix lookups for performance »/a/b, /a/bb, /a/bbb, etc., all share “/a” prefix

CSE 153 – Lecture 13 – File Systems46 File Sharing l File sharing is important for getting work done u Basis for communication between processes and users l Two key issues when sharing files u Semantics of concurrent access »What happens when one process reads while another writes? »What happens when two processes open a file for writing? u Protection

CSE 153 – Lecture 13 – File Systems47 Protection l File systems implement some kind of protection system u Who can access a file u How they can access it l More generally… u Objects are “what”, subjects are “who”, actions are “how” l A protection system dictates whether a given action performed by a given subject on a given object should be allowed u You can read and/or write your files, but others cannot u You can read “/etc/motd”, but you cannot write to it

CSE 153 – Lecture 13 – File Systems48 Representing Protection Access Control Lists (ACL) l For each object, maintain a list of subjects and their permitted actions Capabilities l For each subject, maintain a list of objects and their permitted actions /one/two/three Alicerw- Bobw-r Charliewrrw Subjects Objects ACL Capability

CSE 153 – Lecture 13 – File Systems49 ACLs and Capabilities l The approaches differ only in how table is represented u What approach does Unix use? l Capabilities are easier to transfer u They are like keys, can handoff, does not depend on subject l In practice, ACLs are easier to manage u Object-centric, easy to grant, revoke u To revoke capabilities, have to keep track of all subjects that have the capability – a challenging problem l ACLs have a problem when objects are heavily shared u The ACLs become very large u Use groups (e.g., Unix)