Copyright © 2000-2015 by Curt Hill File Management An OS Function Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill File Content and Type A file is a collection of related data, with one name We have to deal with several things: Type Organization Storage Among others Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Types of data External data Characters, numbers and other people readable stuff Eg: Reports to print Source programs Documents Binary data Things not in a people readable format such as machine language Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Classification An operating system may type files according to one of several schemes Programs Commands Data UNIX believes in two types of files: Regular files Directories DOS follows UNIX Other categorizations are also possible Copyright © 2000-2015 by Curt Hill
Things that may determine file types Organization Operations allowable Whim of authors Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Organization Sequential Most files are this Direct Array on disk Indexed Sequential Usually a tree of indices pointing into a list of values Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Operations allowable What operations are acceptable for a particular file Deleting a file is different than a directory What naming conventions are acceptable? DOS requires .exe or .com for programs UNIX is not so particular Copyright © 2000-2015 by Curt Hill
System Functions and Components The following services need to be provided: Creation and deletion Manipulation Copying, renaming I/O operations Open/Close Read/Write There may be multiple kinds EOF test Secondary storage management Security Backup and recovery Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Layers These services are provided in layers in a way similar to the OS layers Bottom level is the actual device Storage I/O control contains device drivers, interupt handlers etc Part of kernel At this level you see physical manipulation of the device What you do not see is files as logical entities Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Upper Layers File control contains basic file services Open, close, read, write Allocation of storage space This is the bridge between the physical view of files and the logical view of files Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill The file hierarchy The smallest unit of storage is one character (at the file level) A field is composed of one or more characters Contains one unit of information Eg, name, price, text A record is one or more fields that apply their information to the same person or thing being described Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Hierarchy Continued A file is one or more records In most cases each record is of same type and size though it may be the case that there are differing record types in the file A database is one or more files The files may be logical files in one physical file or different physical files A file system will generally have one or more files and/or databases Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Hierarchy yet again The OS might or might not have some of the upper concepts UNIX and DOS do not distinguish a database from a file Only the program treats a file (or set of files) as a database VMS does know what a data base is Will not let an application treat it as a simple file Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Storage allocation On a disk device we have a large number of storage allocation units These may be: Sectors Clusters (groups of sectors) Tracks Cylinders Since a disk is a direct access device we can access any of these without bothering any neighboring units Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill DOS FAT We need a record of these units such as DOS’s File Allocation Table This tells us: Which are free are which are used What belongs to whom This generally numbers all the units from zero to some maximum number Generally units with close numbers are physically close in some way Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Close Sectors Close translates to rapid access Two sectors in the same track are close in that they only take rotation to access one when starting at the other Two tracks in the same cylinder are close in the sense that we only have to switch heads and not reposition Adjacent cylinders are close in that it takes a small amount of positioning to reach Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Storage Allocation There are two types of storage allocation: Contiguous All together Scatter Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Contiguous In contiguous the units of the file are together All the file units except the first and last must be surrounded by file units Usually in this type of file allocation you need to announce estimated size before you allocate, so that it can allocate an adequate block This is a problem with a poor estimate Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Fragmentation Fragmentation can cause problems since we need to have one place to put a whole file Compaction is usually the answer It needs to be able to recognize absolute disk addresses in the file, if they are allowed Consider how a Indexed Sequential file links to the next item Such an item may be marked as unmovable or there must be something to aid relocation This problem is bypassed if the disk address is relative to the beginning of the file Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Scatter In scatter (or non-contiguous) storage the units can be all over the disk These must be linked together in some way This can be done in a variety of ways Copyright © 2000-2015 by Curt Hill
How to Link scatter storage Each unit points to the next unit The directory entry can have a list of units The disk sector map (FAT in DOS) may also have it This makes each entry a pointer to another entry as opposed to just a bitmap Then there is a free list as well Copyright © 2000-2015 by Curt Hill
Blocking and Buffering We have already discussed this Mechanism for speeding I/O The OS should manage so it is transparent to the user Copyright © 2000-2015 by Curt Hill
File organization methods We have also discussed this, some time ago Sequential Entry sequenced Direct Position is the key Indexed sequential Key can be anything Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill File Access Methods The file access method must be compatible with the organization The searching of the file is removed from application code into system code Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Directory Many OSs have something akin to a directory What should it contain? How is it organized? Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Directory contents Name of file Location on the disk Ownership Type of file Size How do we detect EOF? Sometimes with an eof character Rest of times with length in directory Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Directory contents Protection or access priviledges This might include the number of processes that have access Miscellaneous Dates and times Number of accesses since some time ago Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill One level directories DOS 1 used a one level directory Each floppy disk had only one directory and no subdirectories were allowed Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Two Level directories A central directory with pointers toward a series of other directories The files of that second entry constitute the second level I think CMS and NOS used this style Mac OS recently did something like this Copyright © 2000-2015 by Curt Hill
Tree structured directories UNIX introduced tree shaped directories, though it may have borrowed them from Multics For each disk device there is a root which is the main directory A directory may contain files or other directories Branching can occur as often as desired Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Tree Structured Again DOS 2 and up has adopted this approach UNIX adds one more wrinkle: A user becomes a directory Any files or directories rooted in that directory then belong to that user Access controls of a directory are generally inherited from the directory it is in UNIX puts public important things in directories in the root Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Tree Structured Paths It also needs the idea of a path Command search path: A list of directories that will be searched for commands and/or files File specification Relative and absolute paths Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Graph directories Acyclic Very similar to tree shaped but allows sharing in a different way Two tree branches can arrive at the same node (file) thus allowing sharing of that file The Acyclic name comes from the idea that you can never traverse a node twice in any path The links of UNIX and NTFS allow this form General does allow cycles This looks more like a network database than most file systems Copyright © 2000-2015 by Curt Hill
Directory Implementation How do we represent a directory? Array (linear list) of items This is the DOS approach Limits the number of files in a directory The list is not sorted Sorted lists Same as above but sorted Linked list Tree (embedded in a table) Copyright © 2000-2015 by Curt Hill
File and Directory manipulation What should be allowed? Read and Write of the directory Edit Delete Security and File integrity issues Copyright © 2000-2015 by Curt Hill
Ownership and access controls How is the access to files regulated On single user systems there is seldom any such controls On multi-user systems it becomes an OS task to control how others may access a user’s file UNIX has a good example system so we will examine Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill UNIX access controls For each file or directory there are a group of 9 bits that determine access The three categories of access are: Read Write Execute Furthermore those who would access the file are partitioned into three groups which can have separate R/W/E access Owner Group World Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Notes The owner by default has full access, but can restrict his own access, as desired Group A group of users that can access group accessible information The system administrator determines the group that you are in, usually by your own request Any user can only be in one group World Any user of the system Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Multi-User File I/O Considerable thought needs to given to how users share access to a file Reading may be a shared operation Writing may be exclusive or shared Copyright © 2000-2015 by Curt Hill
System administration Number of topics: File migration File backup Transaction logging File recovery Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill File migration How do we keep track of different versions? VMS has a version number applied to each item DOS program traditionally use .BAK or some other extension to represent the last one There was a master number in OS/MVS files of certain types What do we do with the old ones? Eventually they need to be moved off of disk Copyright © 2000-2015 by Curt Hill
System Administration Again File backup Full backup involves the entire disk Incremental involves just those things with a newer date Transaction logging Journal transactions File recovery Remove error sectors from a file Copyright © 2000-2015 by Curt Hill
Copyright © 2000-2015 by Curt Hill Database Generalized Data Definitions Data models Relational Hierarchical Network Physical access and storage implementation Copyright © 2000-2015 by Curt Hill