CT213 – File Management Petronel Bigioi.

CT213 – File Management Petronel Bigioi

Content File Management Overview
File Management Functional Requirements File Management Architecture File Organization – pile, sequential, indexed sequential, indexed and hashed File Directory Content and structure File Sharing Access rights and simultaneous access Record blocking Secondary Storage management File Allocation and Free Space Management

File Management Overview
Filing systems provide A storage service – clients don’t know about the physical characteristics of the disks or where the files have been stored on them The filing systems must make sure that a file is not lost, even if there are hardware failures or software crashes A directory service – clients can give convenient text names to files and group them in directories (establish some relationship between them) Clients should be able to access the sharing of their files with other by specifying who can access a given file and in what way

(2) The filing systems stores a very large numbers of files, only a few being used at any time. The filing system is ready for the client to use the file (It will have set up the information about this file in its tables in main memory) It will return a user-file-identifier (UFID) for the client to use in subsequent requests to read or write the file (3) Request to read the file (4) The storage service returns the portion of the file that was requested at (3) (1) – client calls an operation such as open-file with the text name as an argument Another argument could be a field specifying the type of access (read-only or read-write) The directory service will carry out an access check to ensure that the client is authorized to access the file The directory service will translate the text name into a form that enables the file storage service to locate the file on disk (name resolution) In order to do the name resolution, the directory service may call the file storage service, the file storage service may call the disk handler to access the disk and find the required information (1) – client calls an operation such as open-file with the text name as an argument Another argument could be a field specifying the type of access (read-only or read-write) The directory service will carry out an access check to ensure that the client is authorized to access the file The directory service will translate the text name into a form that enables the file storage service to locate the file on disk (name resolution) In order to do the name resolution, the directory service may call the file storage service, the file storage service may call the disk handler to access the disk and find the required information (2) The filing systems stores a very large numbers of files, only a few being used at any time. The filing system is ready to for the client to use the file (It will have set up the information about this file in its tables in main memory) It will return a user-file-identifier (UFID) for the client to use in subsequent requests to read or write the file (3) Request to read the file (4) The storage service returns the portion of the file that was requested at (3)

Concurrency control must be provided One approach is to allow multiple clients read control, but only one client the right to write An operating system will deal with such approach by noting if a file system has been opened for reading access, in which case it will allow multiple clients or for writing, in which case it would refuse any subsequent requests to read or write from other clients; this is called mandatory concurrency control. A file is said to be locked for reading or writing. Many applications may need to have write access to different parts of the same file (the above approach is inflexible in this case) Many operating systems allow simultaneous write access to files, being the application job to make sure it will sync the access Eventually the OS will provide some extra locking service to help the clients to cooperate with each other.

Files Common Terms Field – basic element of data, containing a single value (i.e. an employees last name) Characterized by its length and data-type (i.e. ASCII string, decimal, etc..) Depending on file design, it may be fixed or variable length Record – a collection of related fields that can be treated as single unit The record of an employee can contain fields such as name, social security number, etc.. Up to the design of the file system it may be fixed or variable in length

Files Common terms File – is a collection of similar records
The file is treated as a single entity by users and applications and may be referenced by name Files have unique file names and may be created and deleted Access control restrictions usually apply at the file level Database – a collection of related data The essential aspect is the relations that exists between elements of data are explicit The data base itself consists of one or more types of files Usually there is a separate database management system that is independent of the file system management and the operating system

File Management Requirements
To meet the data management needs and requirements from the user Create, delete, read and change files; controlled access to other users files; control the type of access to own files; restructure the files; move data between files; backup and recovery files in case of damage; file access by using symbolic file names To guarantee that the data in the file is valid To optimize performance To provide I/O support for a wide range of storage device types To provide a standardized set of I/O interface routines To provide I/O support for multiple users, in the case of multiple-user systems

File System Architecture
At the lowest level we have device drivers, that do communicate directly with the peripheral device or their controllers or channels A device driver is responsible for staring I/O operations on a device and processing the completion of an I/O request Device drivers are considered to be part of the operating system, and in the case of file system, the devices controlled are disks and tapes Basic File System deals with blocks of data that are exchanged with the disk or tape systems. It is concerned with the placement of those blocks on the secondary storage device and on the buffering of those blocks in main memory It doesn’t understand the content of the data structure of the files involved The Basic I/O Supervisor is responsible for all file I/O initiation and termination Maintains control structures that deal with device I/O scheduling and file status Is concerned with the selection of the device on which file I/O is to be performed, on the basis of which file has been selected Is also concerned with scheduling disk and tape access to optimize performance I/O buffers and secondary memory is allocated at this level Logical I/O ebables users and applications to access records. Thus, whereas the basic file system deals with blocks of data, the logical I/O module deals with file records. Logical I/O provides a general purpose record I/O capability and maintains basic data about files Access method is the closest level of the file system to the user Provides a standard interface between applications and file system Different access methods: pile, sequential, indeed sequential, indexed, hashed The Basic I/O Supervisor is responsible for all file I/O initiation and termination Maintains control structures that deal with device I/O scheduling and file status Is concerned with the selection of the device on which file I/O is to be performed, on the basis of which file has been selected Is also concerned with scheduling disk and tape access to optimize performance I/O buffers and secondary memory is allocated at this level Logical I/O enables users and applications to access records. Thus, whereas the basic file system deals with blocks of data, the logical I/O module deals with file records. Logical I/O provides a general purpose record I/O capability and maintains basic data about files Access method is the closest level of the file system to the user Provides a standard interface between applications and file system Different access methods: pile, sequential, indeed sequential, indexed, hashed Basic File System deals with blocks of data that are exchanged with the disk or tape systems. It is concerned with the placement of those blocks on the secondary storage device and on the buffering of those blocks in main memory It doesn’t understand the content of the data structure of the files involved At the lowest level we have device drivers, that do communicate directly with the peripheral device or their controllers or channels A device driver is responsible for staring I/O operations on a device and processing the completion of an I/O request Device drivers are considered to be part of the operating system, and in the case of file system, the devices controlled are disks and tapes

File Organization Criteria: These criteria may vary in importance
Rapid access Ease of update Economy of storage Simple maintenance Reliability These criteria may vary in importance CD-ROM – Ease of update irrelevant Indexes – Faster but use more storage We will outline five common organizations (the actual number that have been implemented or proposed is unimaginably large) Pile Sequential file Indexed sequential file Indexed file Direct (Hashed) file

File Organization Pile Sequential File
Add data to the file as it arrives (chronological order) Record size and field order may vary (variable length records, variable set of fields) Requires use of exhaustive search Sequential File Fixed length record format Size and order of fields fixed Key field - unique record ID Records stored in order based on key Handles random requests poorly Must use sequential search (batch system) Hard to insert new records

File Organization Indexed Sequential File
Still maintains the organization of records in sequence, based on a key Adds an index to the file to speed lookup Index provides a lookup capability to reach quickly the vicinity of a desired record The index file contains two records: the key and a pointer to the main file May have multiple levels of indexes Overflow area to handle new records Each record in the main file contains a hidden pointer to the overflow file (used if needed) Link from main records to overflow, and back Operations: Search: To find a specific field, a search begins in the index file. The highest key value that is less than or equal to the desired key record is looked up in the index file. A pointer to the main file is retrieved and the search continues in the main file. Additions: Each record in the main file contains an additional field (not visible to the application) that is a pointer to the overflow file; When a new record is to be inserted, it is actually added to the overflow file. The record in the main file, that immediately precedes the new record in logical sequence is updated to contain a pointer to the new record (in the overflow file). Indexed Sequential File: Search: To find a specific field, a search begins in the index file. The highest key value that is less than or equal to the desired key record is looked up in the index file. A pointer to the main file is retrieved and the search continues in the main file. Additions: Each record in the main file contains an additional field (not visible to the application) that is a pointer to the overflow file; When a new record is to be inserted, it is actually added to the overflow file. The record in the main file, that immediately precedes the new record in logical sequence is updated to contain a pointer to the new record (in the overflow file).

File Organization Direct (Hashed) file Indexed File
Useful when is necessary to search for a record on the basis of some other attribute than the key field May have multiple indexes One for each field we may search Records accessed only through the indexes Each index may be: Exhaustive – contains one entry for every record in the main file Partial – contains entries only for records where the field of interest exists Used in applications where time is critical and where data is rarely processed exhaustively (such as reservation systems, inventory controls, etc…) Direct (Hashed) file Use hashing on a key to find the record No notion of sequential access Generally used when rapid access to one record is required (directory)

Hashing It can find most of the items with a single seek
Insertions and deletions can be handled without added complexity Assuming that a number of N items are to be inserted into a hash table of length M, with M>=N Insert an item into the hash table: Convert the label of the item to near random number n (between 0 and M-1) (i.e. if the label is numeric, then a popular mapping function is to divide the label by M and take the reminder as the value of n Use n as index into the hash table If the entry is empty, then store the item If the entry is occupied, then store the item according to the hashing criteria (linear or overflow with chaining) Table lookup of an item whose label is known Convert the label of the item to a near random number n (using same mapping function as for insertion) If the corresponding entry is empty, then the item hasn’t been inserted If the corresponding entry is occupied, and the labels match, then retrieve the value If the corresponding entry is occupied and the labels are not matching, then continue the search according to the hashing criteria ( linear or overflow with chaining)

Linear Hashing Labels of the items to be stored are numeric and the hash table has eight positions (M=8). The hashing function takes the reminder upon division by 8 In the linear hashing schema, if the entry is already occupied, set n=(n+1)(mod M) and try again. Perform this step until we will find an empty entry The figure assumes that the entries have been inserted in the ascending order Item 50 and 51 maps in positions 2 and 3 Item 74 maps in position 2, position 2 is taken, so we try position 3 (taken). Next is position 4 we need to try and is empty, so we write it on position 4 The average search is not depending on the table size, is dependent of how full the table is (at 80%, we are getting an average for the search around 3)

Hashing using Overflow with Chaining
A separate table in which overflow entries are inserted is kept. This table includes pointers passing down the chain of entries associated with any positions in the hash table. For large values of N and M, for N=M, the average search is around 1.5 This method provides for compact storage with fast lookup.

File Directory A file directory is a structure associated with any file management system and collection of files It contains information about the files, including attributes, location and ownership. Most of this information (especially the one concerned with the storage is handled by the operating system) The directory itself is a file, owned by the operating system and accessible by various file management routines. Some of the information in directories is available to users through system routines The users cannot directly access the directory even in read-only mode

Typical Directory Entries
Basic Name – Unique in directory (some systems permit file versions) Type – Text, binary, load module, etc. Organization – Sequential, indexed, etc. Address Device – Which disk holds the file Often this must be the same device as the directory is on Starting address/Blocks used Block #, cylinder #, or other location id Size used – Current file size May be in bytes or blocks Size allocated – Maximum space allocated for this file Not used on all file systems

Typical Directory Entries
Access Control Owner – Who has control of the file Access Information – What users are allowed to work with the file Permitted Actions – Controls reading, writing, etc. Usage Information Date Created Identity of Creator Date Last Read Access Identity of Last Reader Date Last Modified Identity of Last Modifier Date of Last Backup Current Usage – Who has the file open, is the file locked, are there updates waiting in main memory?

Directory Structure Operations to support: Simplest form
Search for the file entry (open) Create a new file Delete a file List the files in the directory May be for all or part of the directory May also include attribute information Simplest form A list of directory entries, one for each file (CP/M, DOS 1.0) Difficult to handle large numbers of files or multiple users The directory would be very large and held on the disk (looking up a given filename in it would take the directory service a long time) Different users might use the same text names for their files. Unique text names would be achieved by appending the username to each filename. Some support for organizing the information is desirable. Convenient grouping within user’s files should be supported for easy location and access control More complex form One directory for each user Easier to manage access information Users still can’t structure files

Directory Structure Tree-structured file system
Single master (root) directory DOS: Master directory for each drive Each directory may contain files and other subdirectories Names only unique in directory Each directory often stored as a sequential file Less effective when there are a large number of files in a given directory

Directory Tree Structure

Directory Tree Example
Path - following set of directories from master directory to file Example: /UserB/Word/UnitA/ABC “/” often used to separate directories Working directory Current directory for files: /UserB/Word Files in this directory unless path given

File Sharing Rights that may be granted:
None – Others don’t know it exists Often done by preventing user from reading the parent directory (Unix) May have an explicit permission bit for access to the file name (Novell) Knowledge – Know it is there and who the owner is Execution – Able to run a program Read – Look at/copy contents Execute and Read may be independent Append – Add data to the file Cannot modify existing contents Update – Modify/delete/add data Change Protection – Grant rights to file Owner can specify what other users have rights to this file Deletion – Can delete file

File Sharing Who to grant rights to: Simultaneous Access Specific user
May allow different users to have distinct permissions Group of users World (public files) Simultaneous Access Multiple users may want to access or modify the same file Example: Airline reservation database Locking: Entire file vs. Records Easier to lock entire file Locking records allows more concurrency Instance of reader/writer problem Must address mutual exclusion and deadlock

Record Blocking A record is the logical access unit of a file
Blocks are unit of I/O with secondary storage. For I/O to be performed, records must be organized as blocks. Issues to consider: Should be blocks be fixed or variable length On most systems blocks are fixed length Simplifies I/O, buffer allocation in main memory and organization of blocks on secondary storage What should the relative size of a block be compared to the average record size The larger the block the more records that are passed in one I/O operation If a file is being processed or searched sequentially, than this is an advantage If records are being accessed randomly, it will result in unnecessary transfer of unused records, than this is a disadvantage Three methods of blocking Fixed blocking, variable-length spanned blocking and variable-length un-spanned blocking

Fixed Blocking Fixed length records are used and an integral number of records are stored in a block There may be unused space at the and of each block (internal fragmentation)

Variable Length Spanned Blocking
Variable length records are used and are packed into blocks with no unused space Two records may span across two blocks with the continuation indicated by a pointer to the successor block Wastes space only at the end of the file

Variable Length Unspanned Blocking
Variable length records are used, but spanning is not employed. There is wasted space inmost blocks because of the inability to use the remainder of a block if the next record is larger than the remaining unused space

Secondary Storage Management
On secondary storage a file is a collection of blocks; the operating system or file management system is responsible for allocating blocks to files Two management issues Space on secondary storage must be allocated to files Keep track of the space available for allocation The approach taken for file allocation may influence the approach taken for available space management

File Allocation Issues in file allocation Pre-Allocation
When a new file is created, do we specify the maximum size? Is that space allocated at once? Space is allocated to a file as one or more contiguous units (portions). How big of a unit should we use when allocating space for a file? How do we keep track of what space has been allocated to a given file (what kind of structure or table is used to keep track for a unit allocated to a file)? Pre-Allocation Declare max size in advance May be hard to guess space needed Tendency to overestimate space needed Ok if the file will never change Dynamic allocation Get space as the file needs it Files are often no longer contiguous

File Allocation Portion (unit) size
At one extreme, a single unit large enough to hold the entire file, while at the other extreme space on disk is allocated one block at a time. In choosing the unit allocation size, there is a tradeoff between efficiency from the point of view of a single file versus overall system efficiency Few items to be considered: Having lots of small units requires more space for allocation tables Fixed-size portions simplifies the reallocation of space Variable-sized units or small fixed-size units reduces wasted space

File Allocation Two common alternatives:
Variable-sized large contiguous portions Minimizes waste, allocation overhead Have to deal with fragmentation First-Fit – choose the first unused contiguous group of blocks of sufficient size from a free block list. Best-Fit – choose the smallest unused group that is of sufficient size Nearest-Fit allocation – choose the unused group of sufficient size that is closest to the previous allocation for the file to increase locality Blocks – Small fixed-size portions May require large tables or complex structures for their allocation Abandons contiguity Allocate blocks as needed Either strategy is compatible with pre-allocation and dynamic allocation. Not clear which strategy is best.

File Allocation Methods
Three methods are in common use Contiguous allocation Single contiguous set of blocks is allocated to a file at the time of file creation Chained allocation Each block contains a pointer to the next block in the chain Indexed allocation The file allocation table contains a separate one level index for each file; the index has one entry for each allocated portion (unit) to the file.

Contiguous allocation
A single contiguous set of blocks assigned to a file when it is created Pre-allocation strategy with variable-sized portions (units) Good performance (especially for sequential files) External fragmentation tends to occur Use compaction to combine free space Need to specify the size of the file at the time of creation Used by CD-ROMs (ISO 9660) Before compaction (left) After compaction (right)

Chained Allocation Allocate on the basis of individual blocks
Directory only links to the first block Each block points to the next block Easy to add blocks to a file No external fragmentation MSDOS FAT12/16/32 is a variation Best suited to sequential files File B, with start=1and len=5 No accommodation for locality If necessary to brig in several blocks of a file at a time, then series of access to different parts of the disk is necessary To overcome this problem, files are “consolidated” by some systems

Indexed Allocation File allocation table contains one level index for each file; the index has just one entry for portion (unit) allocated to the file Typically the file indexes are not stored as part of the allocation table The file index for a file is kept in a separate block and the entry for the file allocation table points to that block Supports both sequential and random access to a file In the figure above, a fixed size blocks allocation schema is presented Eliminates external fragmentation

Indexed Allocation Indexed allocation supports also a variable size portions (units) allocation schema Improves locality It is the most popular form of file allocation In both cases, from time to time consolidation may be done It reduces the size of the index for the variable sized portions schema

Free Space Management Same as managing the space allocated to files, the free space that is not currently allocated needs to be managed In order to be able to perform file allocation, we need to know what blocks on the disk are available, therefore we need to keep a disk allocation table, in addition to file allocation table A number to methods to record free space: Bit Tables – Free bit for each block Works well with any of the presented allocation methods It is as small as possible Still, it can be large. The amount (in bits) required for a block bitmap is as follows: [disk size (Bytes)/ (8 *file system block size(BYTES))]. For a 16GBytes disk, we would get a 4MBytes large table … it is large to hold in memory and also large to search To speed up the search in the bit tables, the OS may divide disk into sections Additional data structures must be kept to summarize the status of each section (i.e. the number of free blocks and the maximum sized contiguous number of free blocks)

Bit Table Example Example bit vector for the above figure

Free Space Management A number to methods to record free space:
Chained Free List Free portions may be chained together by using a pointer and length value in each free portion It can produce fragmentation of the disk and many portions (units) will be a single block long. In this situation, every time a block gets allocated, it needs to be first read, to find out the pointer to the next free block. If this is done for a file creation (for multiple blocks) it can slow down the operation. Similarly, deleting highly fragmented files, it is time consuming Indexing Free space is treated as a “file”, store list of blocks in the same manner as ordinary files For efficiency, the index should be on the basis of variable-size portions rather than blocks One entry in the file for each free portion on the disk Provides efficient support for all known file allocation methods Free Block List Each block is assigned a number sequentially and the list of the numbers of all free blocks is maintained in a reserved portion of the disk Can treat it as a stack and re-allocate recently freed blocks – only the last few blocks need to be kept in memory Can use FIFO structure (a block is allocated from the head of the FIFO, and de-allocated by adding it to the tail of the queue) May have a background process that works to facilitate contiguous allocation

Free Space Management Where to store allocation tables
Memory Table size may be a problem Information may be lost if it crashes Disk Requires extra read/write to allocate a block - slows system down dramatically Handling system crashes Lock the allocation table on disk, do the allocation, update the disk Will make the system very slow May choose to pre-allocate a batch of blocks, then allocate to files on demand Mark it “in use” on disk Clean it up when a crash occurs

References “Operating Systems”, William Stallings, ISBN 0-13-032986-X
“Operating Systems”, Jean Bacon and Tim Harris, ISBN

CT213 – File Management Petronel Bigioi.

Similar presentations

Presentation on theme: "CT213 – File Management Petronel Bigioi."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CT213 – File Management Petronel Bigioi.

Similar presentations

Presentation on theme: "CT213 – File Management Petronel Bigioi."— Presentation transcript:

Similar presentations

About project

Feedback