File Organizations What an OS provides Copyright © 1998-2013 Curt Hill
Why files? Computer memory has several problems: Expensive Volatile Persistant data must be stored on disk or tape The Operating System controls disk access All disk access is somewhat platform dependent Copyright © 1998-2013 Curt Hill
The Memory Hierarchy CPU Cache Memory Disk Tape As you move down the cost per byte decreases and the access time increases. Disk Tape Copyright © 1998-2013 Curt Hill
Disks Rotating magnetic media Flat, round platter of metal or plastic and covered with magnetic coating Information stored as magnetized spots on the magnetic coating called bits Access arms hold the heads Move to read different areas Copyright © 1998-2013 Curt Hill
Disk units Organization of disk Cylinders Tracks Any area that can be read without moving head Tracks Concentric circles on surface Disk rotates under head Sectors Pie shaped divisions on surface Copyright © 1998-2013 Curt Hill
Hard Disk Organization Sector Track Copyright © 1998-2013 Curt Hill
Disks are DASD Direct Access Storage Device Any sector may be read directly Disk address contains Cylinder address Forces moving heads Track address Selects a head Sector address Rotation of sector under head Copyright © 1998-2013 Curt Hill
Hard Disk Drives Copyright © 1998-2013 Curt Hill The access harm moves all the read/write heads in simultaneously A cylinder is all the tracks that can be read by one set of heads without moving the access arm. Two on floppies and many hard disks, but can be 20 or more. The heads fly a few millionths of an inch from the surface and have to be designed aerodynamically so that they will be close to the disk, but never touch it. A collision is called a head crash Copyright © 1998-2013 Curt Hill
Disk Access Time Seek time Rotational delay Movement of access arm to correct cylinder Rotational delay Movement of disk to position correct sector under read/write head Activation of appropriate read/write head Transfer rate of data from disk to main memory Seek time and rotational delay dominate the access time. Manufacturers put many read/write heads for each platter to minimize the seek time. Copyright © 1998-2013 Curt Hill
Access Time Again Seek time and rotational delay dominate. Seek time varies from about 1 to 20msec Rotational delay varies from 0 to 10msec Transfer rate is about 1msec per 4KB page Goal is to reduce these delays This is done with both hardware and software RAID is a hardware solution What are the software solutions? Copyright © 1998-2013 Curt Hill
Reducing Access Time If you access the first block of a track You have all the access delays If you then access the second block of a track the only delay is transfer time Copyright © 1998-2013 Curt Hill
Quicker Access Blocks in same track Blocks in same cylinder Blocks in adjacent cylinders Pre-fetching Read the entire track at the time of the request for any block Copyright © 1998-2013 Curt Hill
Disk Reliability How reliable are disk drives? Reliable but not perfect The advent of server farms with thousands of consumer disks has allowed for studies that give us some answers Disk failures tend to make a “bathtub” curve 80% survived four years Copyright © 1998-2013 Curt Hill
Failure Graph Copyright © 1998-2013 Curt Hill
Explanation The initial failures are usually manufacturing defects They cause an early death Next comes a period of high reliability Finally we see a period of drives wearing out These were from a study of 25,000 drives in a server farm http://www.pcworld.com/article/2062254/25-000-drive-study-shines-a-light-on-how-long-hard-drives-actually-last.html Copyright © 1998-2013 Curt Hill
Failure Rates Again Copyright © 1998-2013 Curt Hill
Common File Organizations Sequential Direct Indexed Sequential Most others are variations on these basic themes Copyright © 1998-2013 Curt Hill
Sequential Files Records are stored one after another Accessing 500threcord requires reading 499 prior records Weakest file organization Easiest to implement Even tape drives support Copyright © 1998-2013 Curt Hill
Direct Files AKA Relative files The file is a linear sequence of equal sized numbered slots Each slot may be accessed directly Slots may be empty or used Key is an integer Requires DASD 1 2 3 4 5 6 7 8 9 10 11 12 12 Copyright © 1998-2013 Curt Hill
Indexed Files AKA Indexed Sequential File Depending on the OS may actually be two files An index The data The index is a tree of keys The data is a sequence of records at the bottom of the tree Requires DASD Key may be of any type Copyright © 1998-2013 Curt Hill
Index Sequential File Tree Charles Kline Roberts Zane Charles Kline Roberts Zane Abel Bag Casey Charles Able Bart Calvin Charles Dean Easy Frank Kline Larry Morris Roberts Smith Taylor Vernon Zane Easy Frank Jam Kline Leestra May Roberts Singer Smith Taylor Vernon Zane Each block should represent one disk block, such as a sector The top level contains the keys only The bottom level contains key and data We can sequentially process the bottom level to get all names We can randomly enter the tree and find any name with just two accesses Copyright © 1998-2013 Curt Hill
Indexed Files Previous tree only had two levels One index One data Usually many levels of index One level of data The index is a tree of keys The data is a sequence of records at the bottom of the tree Copyright © 1998-2013 Curt Hill
Linkage Indexed files require links between records The link is a disk address Cylinder, Track, Sector or Block number The links connect the indices with their corresponding data Different from Sequential or Direct OS supports these Copyright © 1998-2013 Curt Hill
Access and Organization There are two typical means to access a file: Sequential Random Sequential access may be applied to any file organization Random requires a key and may only be applied to Direct or Indexed files Copyright © 1998-2013 Curt Hill
Finally Not every OS provides each of these Always sequential Usually some form of direct Indexed sequential seldom Often provided by external software Copyright © 1998-2013 Curt Hill