OPERATING SYSTEMS CS 3502 Fall 2017

Slides:



Advertisements
Similar presentations
Chapter 4 : File Systems What is a file system?
Advertisements

File Systems.
Allocation Methods - Contiguous
Chapter 10: File-System Interface
1 Chapter 11: File-System Interface  File Concept  Access Methods  Directory Structure  File System Mounting  File Sharing  Protection  Chapter.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
Dr. Kalpakis CMSC 421, Operating Systems. Fall File-System Interface.
Chapter 11: File System Implementation
File System Implementation
CS 104 Introduction to Computer Science and Graphics Problems Operating Systems (4) File Management & Input/Out Systems 10/14/2008 Yang Song (Prepared.
1 Operating Systems Chapter 7-File-System File Concept Access Methods Directory Structure Protection File-System Structure Allocation Methods Free-Space.
Ceng Operating Systems
1 Friday, July 07, 2006 “Vision without action is a daydream, Action without a vision is a nightmare.” - Japanese Proverb.
Objectives Learn what a file system does
Chapter 8 File Management
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
CS3530 OPERATING SYSTEMS Summer 2014 File Management Chapter 8.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
File System Management File system management encompasses the provision of a way to store your data in a computer, as well as a way for you to find and.
File Systems CSCI What is a file? A file is information that is stored on disks or other external media.
File System Interface. File Concept Access Methods Directory Structure File-System Mounting File Sharing (skip)‏ File Protection.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
File System Implementation
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
Module 4.0: File Systems File is a contiguous logical address space.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
Chapter 16 File Management The Architecture of Computer Hardware and Systems Software: An Information Technology Approach 3rd Edition, Irv Englander John.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
UNIX File System (UFS) Chapter Five.
File Systems. 2 What is a file? A repository for data Is long lasting (until explicitly deleted).
File Systems cs550 Operating Systems David Monismith.
Chapter 6 File Systems. Essential requirements 1. Store very large amount of information 2. Must survive the termination of processes persistent 3. Concurrent.
Operating Systems 1 K. Salah Module 4.0: File Systems  File is a contiguous logical address space (of related records)  Access Methods  Directory Structure.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
SOCSAMS e-learning Dept. of Computer Applications, MES College Marampally FILE SYSTEM.
NTFS Filing System CHAPTER 9. New Technology File System (NTFS) Started with Window NT in 1993, Windows XP, 2000, Server 2003, 2008, and Window 7 also.
Part III Storage Management
W4118 Operating Systems Instructor: Junfeng Yang.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 12: File System Implementation.
File-System Management
File Management Chapter 12.
File System Interface CSSE 332 Operating Systems
Chapter 11: File System Implementation
Chapter 12: File System Implementation
File System Implementation
File System Structure How do I organize a disk into a file system?
Chapter 11: File System Implementation
Operating Systems (CS 340 D)
Filesystems.
Journaling File Systems
File Management.
File Sharing Sharing of files on multi-user systems is desirable
Chapter 11: File System Implementation
File Managements.
CS510 Operating System Foundations
Chapter 3: Windows7 Part 3.
Chapter 11: File System Implementation
File System B. Ramamurthy B.Ramamurthy 11/27/2018.
Directory Structure A collection of nodes containing information about all files Directory Files F 1 F 2 F 3 F 4 F n Both the directory structure and the.
Overview: File system implementation (cont)
File-System Structure
Chapter 16 File Management
Chapter 14: File-System Implementation
Chapter 11: File System Implementation
Department of Computer Science
Lecture 4: File-System Interface
Chapter 5 File Systems -Compiled for MCA, PU
Presentation transcript:

OPERATING SYSTEMS CS 3502 Fall 2017 File Management Chapter 8 Dr. Donghyun (David) Kim Department of Computer Science College of Computing and Software Engineering Kennesaw State University

Purpose of File Management Data should be organized in some convenient and efficient manner. In particular, users should be able to: Store data into files Find and use files that have previously been created

File System A set of OS Services that provides Files and Directories for user applications

Files A file is simply a sequence of bytes that have been stored in some device (storage) on the computer system

Files The bytes contain data stored in the file, such as: A text file just containing characters that we are interested in A word processing document file that also contains data about how to format the text A database contains data organized in multiple files. In general, the File Management system does not have any knowledge about how the data in a file is organized. That is the responsibility of the application programs that create and use the file.

Permanent Storage Devices Disk Drives Flash Memory (Memory stick) CDs and DVDs Magnetic tape drives

File Attributes Name Type Location Size in Mb, Gb, or Tb Symbolic (Human-readable) label of the file Type Executable file, text print file, binary file, etc. Location The physical address on disk of the file Size in Mb, Gb, or Tb

Other File Attributes Protection Time, date Permissions for who can read, write, and execute the file, etc. Time, date When file was created, modified, accessed

Folders Name of folder Typically, a folder may contain Files and other folders (commonly called sub-folders or sub-directories) This results in a Tree Structure of Folder and Files.

Folder/Directory Tree Structure

Pathnames The pathname of a file specifies the sequence of folders one must traverse to travel down the tree to the file. This pathname actually describes the absolute path of the file, which is the sequence of folders one must travel from the root of the tree to the desired file. A relative path describes the sequence of Folders one must traverse starting at some intermediate place on the absolute path.

File Links Allow a directory entry to point to a file (or entry) that is not directly below it in the tree structure Unix: Symbolic Link Windows: Shortcut

Link in Directory Tree Structure

Access Methods An access method describes the manner and mechanisms by which a process accesses the data in a file. There are two common access methods: Sequential Random (or Direct)

File Operations When a process needs to use a file, there are a number of operations it can perform: Open Close Read Write

Create File Allocate space for file Make entry for file in the Directory

Open File Make file accessible for read/write operations Locates file in Directory Returns internal ID for the file Commonly called a Handle handle = open(filename, parameters)

File Open

Write File System call specifies: Handle from Open call Location, length of information to be written Possibly, location in the file where data is to be written write(file handle,buffer,length)

Write File Use Handle to locate file on disk Use file’s Write pointer to determine the position in the file to write to Update file’s Write Pointer

Read File System call specifies: Handle from Open call Memory Location, length of information to be read Possibly, location in the file where data is to be read from read(file handle, buffer) read(file handle, buffer, length)

Read File Uses Handle to locate file on disk Uses file’s Read Pointer to determine the position in the file to read from Update file’s Read Pointer

Close File Makes file no longer accessible from application Deletes the Handle created by Open

File Close

Delete File Deletes entry for file in Directory De-allocates disk space used by the file

Sequential Access If the process has opened a file for sequential access, the File Management subsystem will keep track of the current file position for reading and writing. To carry this out, the system will maintain a file pointer that will be the position of the next read or write.

File Pointer The value of the file pointer will be initialized during Open to one of two possible values Normally, this value will be set to 0 to start the reading or writing at the beginning of the file. If the file is being opened to append data to the file, the File Position pointer will be set to the current size of the file. After each read or write, the File Position Pointer will be incremented by the amount of data that was read or written.

Stream A Stream is the flow of data bytes, one byte after another, into the process (for reading) and out of the process (for writing). This concept applies to Sequential Access and was originally invented for network I/O, but several modern programming languages (e.g. C/C++, Java, C#) have also incorporated it.

Standard I/O Standard Input Standard Output Defaults to keyboard Defaults to console

I/O Redirection Standard Input can come from a file app.exe < def.txt Standard Output can go to a file App.exe > def.txt Standard Output from one application can be Standard Input for another with a pipe App1.exe | app2.exe Called a Pipe

A Pipe

Pipe A Pipe is a connection that is dynamically established between two processes. When a process reads data, the data will come from another process rather than a file. Thus, a pipe has a process at one end that is writing to the pipe and another process reading data at the other end of the pipe. It is often the situation that one process will produce output that another process needs for input.

Pipe and Performance Using a pipe can improve system performance in two ways: By not using a file, the applications save time by not using disk I/O. A pipe has the characteristic that the receiving process can read whatever data has already been written. Thus we do not need to wait until the first process has written all of the data before we start executing the second process.

Directory Functions Search for a file Create a file Delete a file List a directory Rename a file Traverse the file system

Disk Space Allocation Contiguous File is allocated contiguous disk space

Contiguous Allocation Read/Write Disk Address Calculation

Contiguous Allocation Advantages Simple to implement Good disk I/O performance Disadvantages Need to know max file size ahead of time Probably will waste disk space Necessary space may not be available

Disk Space Allocation Cluster Allocation Disk space allocated in blocks Space allocated as needed

Cluster Allocation

Cluster Allocation Advantages Disadvantages Tends not to waste disk space Disadvantages Additional overhead to keep track of clusters Can cause poor disk I/O performance May limit maximum size of File System

Cluster Performance Clusters tend to be scattered around the disk This is called External Fragmentation Can cause poor performance as disk arm needs to move a lot Requires De-fragmentation utility

Cluster Performance Large clusters can reduce External Fragmentation If lots of small files, then space will be wasted inside each cluster This is called Internal Fragmentation

Crossing Cluster Boundary Break logical read into multiple physical reads

Managing Cluster Allocation Linked Each cluster has a pointer to the next cluster Indexed Single table has pointers to each of the clusters

Linked Blocks

Index Block

Windows File Systems Fat16 (File Allocation Table) Fat32 MS-Dos, Windows 95 Max 2GB space for a FileSystem Generally bad disk fragmentation Fat32 Windows 98 Supported by Windows 2000, XP, 2003 NTFS (New Technology File System)

Windows FAT Table

Windows FAT Processing Disk Space Allocation Allocate a free cluster Update FAT System Failure

Free Space Management Linked Bit Map Linked list of free clusters Special File with a vector of bits Each bit corresponds to a cluster

Linked Free Blocks

Windows FAT Table FAT Cluster 1 Directory Free Entry Cluster Cluster 2 Free List Free Cluster Cluster 3 Cluster 2

Windows FAT Processing Unreliable!!! Need to run Scandisk after reboot to attempt to fix any problems

Windows NTFS File System Available on Windows 2000, XP, 2003 Maintains transaction log to recover after reboot Support for file protection Large (64 bit) cluster pointers Allows small clusters Avoids internal fragmentation

Windows NTFS File System

Disk File System Types FAT16, FAT32, NTFS, UFS Unix Journaling File System Windows Encrypting File System Network File System (NFS)

Multiple File System Types Disk File Systems Other File System Organizations CD-Rom DVD Zip Disk

Multiple File System Types Each Disk Partition has a single File System A given computer can have a number of different File System types Modern systems support this capability with a Virtual File System

Virtual File System

Removable Devices Connecting Media Called Mounting the FileSystem Can be Physical Media Logical (across the network)

Journaling Is only used when writing to a disk and it acts as a sort of punch clock for all writes. This fixes the problem of disk corruption when data is written to the hard drive and then the computer crashes or power is lost. Without a journal the operating system would have no way to know if the file was completely written to disk.

Journaling With a journal the file is first written to the journal, punch-in, and then the journal writes the file to disk when ready. Once it has successfully written to the disk, it is removed from the journal, punch-out, and the operation is complete. If power is lost while being written to disk the file system can check the journal for all operations that have not yet been completed and remember where it left off.

Linux File Systems Ext stands for Extended file system and was the first created specifically for Linux. It has had four revisions and each one has added fairly significant features. The first version of Ext was a major upgrade from the Minix file system used at the time, but it lacks major features used in today’s computing. At this time you probably should not use Ext in any machine due to its limitation and age. It also is no longer supported in many distributions.

Linux File Systems Ext2 is not a journaling file system, and when introduced was the first to allow for extended file attributes and 2 terabyte drives. Because Ext2 does not use a journal it has significantly less writes applied to the disk. Due to lower write requirements, and hence lower erases, it is ideal for flash memory especially on USB flash drives. Modern SSDs have a increased life span and additional features that can negate the need for using a non-journaling file systems.

Linux File Systems Ext3 is basically just Ext2 with journaling. The aim of Ext3 was to be backwards compatible with Ext2 and therefore disks can be converted between the two without needing to format the drive. The problem with keeping compatibility is many of the limitations of Ext2 still exist in Ext3. The benefit of keeping backwards compatibility is the fact that most of the testing, bug fixes, and use cases for Ext2 also apply to Ext3 making it stable and fast. Use if you need to upgrade a previous Ext2 file system to have journaling. You will probably get the best database performance from Ext3 due to years of optimizations. Not the best choice for file servers because it lacks disk snapshots and file recovery is very difficult if deleted.

Linux File Systems Ext4, just like Ext3 before it, keeps backwards compatibility with its predecessors. As a matter of fact, you can mount Ext2 and Ext3 as an Ext4 file system in Linux and that alone can increase performance under certain conditions. You can also mount an Ext4 file system as Ext3 without ill effects. Ext4 reduces file fragmentation, allows for larger volumes and files, and employs delayed allocation which helps with flash memory life as well as fragmentation. Although it is used in other file systems, delayed allocation has potential for data loss and has come under some scrutiny. A better choice for SSDs than Ext3 and improves on general performance over both previous Ext versions. If this is your distro’s default supported file system, you should probably stick with it for any desktop or laptop you set up. It also shows promising performance numbers for database servers, but hasn’t been around as long as Ext3.

Unix File System Supports protection More reliable than Windows FAT system Need to run fsck (File System Check) utility on boot-up (similar to Windows Scandisk)

Unix File System

Inodes In a file system, a file is represented by an inode, a data structure containing information about the actual data that make up the file. Every partition has its own set of inodes. Each inode describes a data structure on the hard disk, storing the attributes of a file, including the physical location of the file data. When a hard disk is initialized to accept data storage, usually during the initial system installation process or when adding extra disks to an existing system, a fixed number of inodes per partition is created. This number will be the maximum amount of files, of all types (including directories, special files, links etc.) that can exist at the same time on the partition. The typically count is 1 inode per 2 to 8 kilobytes of storage.

Inodes At the time a new file is created, it also creates new inode. In that inode is the following information: Owner and group owner of the file. File type (regular, directory, ...) Permissions on the file Section 3.4.1 Date and time of creation, last read and change. Date and time this information has been changed in the inode. Number of links to this file (see later in this chapter). File size An address defining the actual location of the file data. The only information not included in an inode, is the file name and directory. These are stored in the special directory files. By comparing file names and inode numbers, the system can make up a tree-structure that the user understands. Users can display inode numbers using the -i option to ls. The inodes have their own separate space on the disk.

Problem Set What is the difference between relative path and absolute path? What will happen to a file if an associated symbolic link is deleted? Need to know the following concepts Contiguous Allocation Cluster Allocation External Fragmentation Internal Fragmentation

Problem Set Some systems automatically delete all user files when a user logs off or a job terminates, unless the user explicitly requests that they be kept; other systems keep all files unless the user explicitly deletes them. Discuss the relative merits of each approach. Why do some systems keep track of the type of a file, while others leave it to the user or simply do not implement multiple file types? Which system is "better?“ Consider a file system where a file can be deleted and its disk space reclaimed while links to that file still exist. What problems may occur if a new file is created in the same storage area or with the same absolute path name? How can these problems be avoided?

Problem Set If the operating system were to know that a certain application is going to access the file data in a sequential manner, how could it exploit this information to improve performance? The open-file table is used to maintain information about files that are currently open. Should the operating system maintain a separate table for each user or just maintain one table that contains references to files that are being accessed by all users at the current time? If the same file is being accessed by two different programs or users, should there be separate entries in the open file table?