File System Implementation

Slides:



Advertisements
Similar presentations
I/O Systems & Mass-Storage Systems
Advertisements

Overview of Mass Storage Structure
Silberschatz, Galvin and Gagne Operating System Concepts Disk Scheduling Disk IO requests are for blocks, by number Block requests come in an.
Operating Systems Disk Scheduling A. Frank - P. Weisberg.
CS 6560: Operating Systems Design
Disk Scheduling Based on the slides supporting the text 1.
OPERATING SYSTEMS CS3530 Summer 2014 OPERATING SYSTEMS CS3530 Summer 2014 Input/Output System Chapter 9.
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:
Mass-Storage Structure
Based on the slides supporting the text
Disks.
1 Disk Scheduling Chapter 14 Based on the slides supporting the text.
Disks CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 12: Mass-Storage Systems.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 12: Mass-Storage Systems.
12.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 12: Mass-Storage Systems.
Disk and I/O Management
1 Recitation 8 Disk & File System. 2 Disk Scheduling Disks are at least four orders of magnitude slower than main memory –The performance of disk I/O.
1 File System Implementation Operating Systems Hebrew University Spring 2010.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 14: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management Swap-Space.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 10: Mass-Storage Systems.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
1 Lecture 8: Secondary-Storage Structure 2 Disk Architecture Cylinder Track SectorDisk head rpm.
Operating Systems CMPSC 473 I/O Management (4) December 09, Lecture 25 Instructor: Bhuvan Urgaonkar.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 12: Mass-Storage Systems.
Disk Structure Disk drives are addressed as large one- dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 13+14: I/O Systems and Mass- Storage Structure I/O Hardware Application I/O.
Chapter 12: Mass-Storage Systems Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 12: Mass-Storage Systems Overview of Mass.
Page 110/12/2015 CSE 30341: Operating Systems Principles Network-Attached Storage  Network-attached storage (NAS) is storage made available over a network.
Chapter 12: Mass-Storage Systems Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 1, 2005 Chapter 12: Mass-Storage.
1Fall 2008, Chapter 12 Disk Hardware Arm can move in and out Read / write head can access a ring of data as the disk rotates Disk consists of one or more.
CS 6502 Operating Systems Dr. J.. Garrido Device Management (Lecture 7b) CS5002 Operating Systems Dr. Jose M. Garrido.
CE Operating Systems Lecture 20 Disk I/O. Overview of lecture In this lecture we will look at: Disk Structure Disk Scheduling Disk Management Swap-Space.
I/O Management and Disk Structure Introduction to Operating Systems: Module 14.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 14: Mass-Storage Systems Disk Structure Disk Scheduling Disk Management Swap-Space.
Operating Systems (CS 340 D) Princess Nora University Faculty of Computer & Information Systems Computer science Department.
1.  Disk Structure Disk Structure  Disk Scheduling Disk Scheduling  FCFS FCFS  SSTF SSTF  SCAN SCAN  C-SCAN C-SCAN  C-LOOK C-LOOK  Selecting a.
Operating Systems (CS 340 D) Princess Nora University Faculty of Computer & Information Systems Computer science Department.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 13 Mass-Storage Systems Slide 1 Chapter 13 Mass-Storage Systems.
Chapter 14: Mass-Storage
Chapter 14: Mass-Storage Systems Disk Structure. Disk Scheduling. RAID.
Chapter 9 I/O System. 2 Input/Output System I/O Major objectives are: Take an application I/O request and send it to the physical device. Then, take whatever.
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
1 Chapter 13 Mass-Storage Structure. 2 Disk Structure Disk drives are addressed as large 1- dimensional arrays of logical blocks, where the logical block.
M ASS S TORAGE S TRUCTURES Lecture: Operating System Concepts Lecturer: Pooja Sharma Computer Science Department, Punjabi University, Patiala.
Part IV I/O System Chapter 12: Mass Storage Structure.
Lecture Topics: 11/22 HW 7 File systems –block allocation Unix and NT –disk scheduling –file caches –RAID.
Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Silberschatz, Galvin and Gagne ©2007 Chapter 11: File System Implementation.
Chapter 10: Mass-Storage Systems
Operating System (013022) Dr. H. Iwidat
Operating Systems Disk Scheduling A. Frank - P. Weisberg.
Chapter 12: Mass-Storage Structure
Operating System I/O System Monday, August 11, 2008.
Disk Scheduling Algorithms
Mass-Storage Structure
DISK SCHEDULING FCFS SSTF SCAN/ELEVATOR C-SCAN C-LOOK.
Lecture 45 Syed Mansoor Sarwar
Chapter 14 Based on the slides supporting the text
Chapter 12: Mass-Storage Systems
Operating Systems (CS 340 D)
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
Overview Continuation from Monday (File system implementation)
Secondary Storage Management Brian Bershad
Mass-Storage Systems.
Secondary Storage Management Hank Levy
Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk.
Operating Systems Disk Scheduling A. Frank - P. Weisberg.
Presentation transcript:

File System Implementation Operating Systems Hebrew University Spring 2007

File Sequence of bytes, with no structure as far as the operating system is concerned. The only operations are to read and write bytes. Interpretation of the data is left to the application using it. File descriptor - file system internal structure that lets file system locate file on disk

Permissions and Data Layout Owner: the user who owns this file. Permissions: who is allowed to access this file. Modification time: when this file was last modified. Size: how many bytes of data are there. Data location: where on the disk the file’s data is stored.

Operations Open: gain access to a file. Close: relinquish access to a file. Read: read data from a file, usually from the current position. Write: write data to a file, usually at the current position. Append: add data at the end of a file. Seek: move to a specific position in a file. Rewind: return to the beginning of the file. Set attributes: e.g. to change the access permissions.

Storing and Accessing File Data Store data in a file involves: Allocation of disk blocks to the file. Read and write operations Optimization - avoid disk access by caching data in memory.

Opening a File fd=open(“myfile",R), leads to the following sequence of actions: The file system reads the current directory, and finds that “myfile” is represented internally by entry 13 in the list of files maintained on the disk. Entry 13 from that data structure is read from the disk and copied into the kernel’s open files table. This includes various file attributes, such as the file’s owner and its access permissions, and a list of the file blocks. The user’s access rights are checked against the file’s permissions to ascertain that reading (R) is allowed. The user’s variable fd (for “file descriptor”) is made to point to the allocated entry in the open files table.

Opening a File – Cont.

Opening a File – Cont. Entry 13 from that data structure is read from the disk and copied into the kernel’s open files table. The file system reads the current directory, and finds that “myfile” is represented internally by entry 13 in the list of files maintained on the disk. User’s access rights are checked and the user’s variable fd is made to point to the allocated entry in the open files table

Opening a File – Cont. Why do we need the open file table? Time Locality principle. Saving disk’s calls. All the operations on the file is performed through the file descriptor.

Reading from a File read(fd,buf,100), leads to the following sequence of actions: The argument fd identifies the open file by pointing into the kernel’s open files table. The system gains access to the list of blocks that contain the file’s data. The file system therefore reads disk block number 5 into its buffer cache (Why? – Place Locality principle).. The desired range of 100 bytes is copied into the user’s memory at the address indicated by buf.

Reading from a File – Cont,

Reading from a File – Cont. the system gains access to the list of blocks that contain the file’s data. The argument fd identifies the open file by pointing into the kernel’s open files table. the desired range of 100 bytes is copied into the user’s memory at the address indicated by buf. The file system therefore reads disk block number 5 into its buffer cache.

Writing to a File We have two assumptions: We want to write 100 bytes, starting with byte 2000 in the file. Each disk block is 1024 bytes. Therefore, the data we want to write spans the end of the second block to the beginning of the third block. The full block must first be read into the buffer cache. Then the part being written is modified by overwriting it with the new data.

Writing to a File – Cont. We will write 48 bytes at the end of block number 8 and the rest of the data should go into the third block. The third block must be allocated from the pool of free blocks. Let’s assume that block number 2 on the disk was free, and that this block was allocated.

Writing to a file – Cont. Q. Is there a need to read it from the disk before modifying it ? Since this is a new block, we can just allocate a block in the buffer cache, prescribe that it now represents block number 2, and copy the requested data to it (52 bytes). Finally, the modified blocks are written back to the disk.

The Location in The File The read system call provides a buffer address for placing the data in the user’s memory, but does not indicate the offset in the file from which the data should be taken. The operating system maintains the current offset into the file, and updates it at the end of each operation. If random access is required, the process can set the file pointer to any desired value by using the seek system call.

Mapping File Blocks How do we find the blocks that together constitute the file? How do we find the right one if we want to access the file at a particular offset?

The Unix i-node In Unix, files are represented internally by a structure known as an inode, it includes an index of disk blocks. The index is arranged in a hierarchical manner: Few (e.g. 10) direct pointers, which list the first blocks of the file. One single indirect pointer - points to a whole block of additional direct pointers. One double indirect pointer - points to a block of indirect pointers. One triple indirect pointer - points to a block of double indirect pointers.

i-node - Example Blocks are 1024 bytes. Each pointer is 4 bytes. The 10 direct pointers then provide access to a maximum of 10 KB. The indirect block contains 256 additional pointers, for a total of 266 blocks (266 KB). The double indirect block has 256 pointers to indirect blocks, so it represents a total of 65536 blocks. File sizes can grow to a bit over 64 MB.

i-node Structure

Super Block How the kernel assigns inodes and disk blocks? In order to make this assignment, the kernel hold a super block, this block contains: The size of the file system A list of free blocks available on the fs A list of free inodes in the fs Etc…

i-node Assignment to a New File The file system contain a list of inodes, when a process needs a new inode, the kernel can search the inodes list for a free one. To improve performance, the super block contains the number of free inodes in the file system. If the super block list of free inodes is empty, the kernel searches the disk and places as many free inode numbers as possible into the super block. Freeing an inode - The kernel places the inode number in the super block free inodes list.

Allocation of Disk Blocks When a process writes data to a file, the kernel must allocate disk blocks from the file system for direct or indirect block. Super block contains the number of free disk blocks in the file system. When the kernel wants to allocate a block from the file system, it allocates the next available block in the super block list.

Direct Memory Access When a process needs a block from the disk, the cpu needs to copy the requested block from the disk to the main memory. The is a waste of cpu time. If we could exempt the cpu from this job, it will be available to run other ‘ready’ processes. This is the reason that the operating system contains the DMA.

Direct Memory Access – Cont. Sequence of actions: Os generates the DMA and pass it the needed parameters (address on disk, address on main memory, amount of data to copy) The running process is transferred to the blocked queue and a new process from the ready queue is selected to run. The DMA transfers the requested data from the disk to the main memory. The DMA sends an interrupt to the cpu, indicating the IO operation has been finished. The waiting process is transferred to the ready queue.

Sharing a File Assume there is a log file that needs to be written by more than one process. If each process has its own file pointer, there is a danger that one process will overwrite log entries of another process. If the file pointer is shared, each new log entry will indeed be added to the end of the file. Unix uses a set of three tables to control access to files.

Sharing a File - Cont. The inode table - each file may appear at most once in this table. The open files table – an entry in this table is allocated every time a file is opened. Each entry contains a pointer to the inode table and a position within the file. There could be multiple open file entries pointing to the same inode. The file descriptor table – separate for each process. Each entry point to an entry in the open files table. The index of this slot is the fd that returned by open.

Sharing a File - Cont.

I/O Management and Disk Scheduling

Disk Structure Disk drives are addressed as large 1-dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer. The 1-dimensional array of logical blocks is mapped into the sectors of the disk sequentially. Sector 0 is the first sector of the first track on the outermost cylinder. Mapping proceeds in order through that track, then the rest of the tracks in that cylinder, and then through the rest of the cylinders from outermost to innermost.

Moving-Head Disk Mechanism

Disk Scheduling The operating system is responsible for using hardware efficiently - having a fast access time and disk bandwidth. Access time has two major components Seek time is the time for the disk to move the heads to the cylinder containing the desired sector. Rotational latency is the additional time waiting for the disk to rotate the desired sector to the disk head. Minimize seek time Seek time  seek distance Disk bandwidth is the total number of bytes transferred, divided by the total time between the first request for service and the completion of the last transfer.

Disk Scheduling - Cont. Whenever a process needs I/O from the disk, it issues a system call to the OS with the following information. Whether the operation is input or output What the disk address for the transfer is What the memory address for the transfer is What the number of bytes to be transferred is. Several algorithms exist to schedule the servicing of disk I/O requests. We illustrate them with a request queue (0-199). 98, 183, 37, 122, 14, 124, 65, 67 Head pointer 53

FCFS FCFS Advantage: fair Disadvantage: slow Illustration shows total head movement of 640 cylinders.

Shortest-Seek-Time-First (SSTF) Selects the request with the minimum seek time from the current head position. SSTF scheduling is a form of SJF scheduling May cause starvation of some requests. Illustration shows total head movement of 236 cylinders.

Total head movement=236 cylinders SSTF - Cont. Total head movement=236 cylinders

SCAN or Elevator Algorithm The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues. Sometimes called the elevator algorithm. This algorithm does not give uniform wait time. Illustration shows total head movement of 236 cylinders.

SCAN - Cont. Total head movement=236 cylinders (or 208 if the head doesn’t reach the edges)

C-SCAN Provides a more uniform wait time than SCAN. The head moves from one end of the disk to the other. servicing requests as it goes. When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip. Treats the cylinders as a circular list that wraps around from the last cylinder to the first one.

C-SCAN - Cont.

C-LOOK Version of C-SCAN Arm only goes as far as the last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk.

C-LOOK - Cont.

Selecting a Disk-Scheduling Algorithm SSTF is common and has a natural advantage over FCFS SCAN and C-SCAN perform better for systems that place a heavy load on the disk (little chance to starvation). Performance depends on the number and types of requests. Requests for disk service can be influenced by the file-allocation method. Either SSTF or LOOK is a reasonable choice for the default algorithm.

Layout on Disk The conventional Unix file system is composed of 3 parts: the superblock, inodes, and data blocks. The conventional layout: the superblock is the first block on the disk. Next come all the inodes. All the rest are data blocks. The problem with this layout: much seeking. Consider the example of opening a file named "/a/b/c“: access the root inode, to find the blocks used to implement it. reads these blocks to find which inode has been allocated to directory a. read the inode for a to get to its blocks read the blocks to find b, end so on. If all the inodes are concentrated at one end of the disk, while the blocks are dispersed throughout the disk, this means repeated seeking back and forth.

Cylinder Group Cylinder group - a set of adjacent cylinders. A file system consists of a set of cylinder groups. Each cylinder group has a redundant copy of the super block, space for inodes and a bit map describing available blocks in the cylinder group. Basic idea: put related information together in the same cylinder group and unrelated information apart in different cylinder groups. Use a bunch of heuristics. Try to put all inodes for a given directory in the same cylinder group. Try to put blocks for one file adjacent in the cylinder group. For long files redirect blocks to a new cylinder group every megabyte. Increased block size. The minimum block size is now 4096 bytes. Helps read bandwidth and write bandwidth for big files.

Layout on Disk – Cont. A possible solution is to try and put inodes and related blocks next to each other, rather than concentrating all the inodes in one place. This was done in the Unix file system.

Indirect Blocks For large files, most of the file data is in indirect blocks. The file system therefore switches to a new cylinders group whenever a new indirect block is allocated. Assuming 12 direct blocks of size 8 KB each, the first indirect block is allocated when the file size reaches 96 KB. Having to perform a seek at this relatively small size leads to a substantial reduction in the achievable bandwidth. The solution to this is to make the first indirect block a special case, that stays in the same cylinders group as the inode.

Block Fragment Each disk block can be chopped up into 2, 4, or 8 fragments. Each file contains at most one fragment which holds the last part of data in the file. If have 8 small files they together only occupy one disk block. Can allocate larger fragments if the end of the file is larger than one eighth of the disk block. When increase the size of the file, may need to copy out the last fragment if the size gets too big.

RAID Structure Problem with disk Data transfer rate is limited by serial access. Reliability Solution to both problems: Redundant arrays of inexpensive disks (RAID) In the past, RAID (combination of cheap disks) was alternative for large and expensive disks. Today: RAID is used for their higher reliability and higher data-transfer rate. So the ‘I’ in RAID stands for “independent” instead of ‘inexpensive”. So RAID stands for Redundant Arrays of Independent Disks.

RAID Approaches RAID 1: mirroring —there are two copies of each block on distinct disks. This allows for fast reading (you can access the less loaded copy), but wastes disk space and delays writing.

RAID Approaches - Cont RAID 3: parity disk — data blocks are distributed among the disks in round-robin manner. For each set of blocks, a parity block is computed and stored on a separate disk. If one of the original data blocks is lost due to a disk failure, its data can be reconstructed from the other blocks and the parity.

RAID Approaches - Cont RAID 5: distributed parity — in RAID 3, the parity disk participates in every write operation (because this involves updating some block and the parity), and becomes a bottleneck. The solution is to store the parity blocks on different disks.