Chapter 3 Secondary Storage

Slides:



Advertisements
Similar presentations
Secondary Storage Devices: Magnetic Disks
Advertisements

Csci 2111: Data and File Structures Week2, Lecture 1 & 2
Peripheral Storage Devices
Faculty of Information Technology Department of Computer Science Computer Organization Chapter 7 External Memory Mohammad Sharaf.
Magnetic Disk Magnetic disks are the foundation of external memory on virtually all computer systems. A disk is a circular platter constructed of.
Section 6.2. Record data by magnetizing the binary code on the surface of a disk. Data area is reusable Allows for both sequential and direct access file.
Lesson 9 Types of Storage Devices.
Storage Devices.
Secondary Storage Rohit Khokher
Types Of Storage Device
Section 5a Types of Storage Devices.
Chapter4: Memory External Memory.
Faculty of Information Technology Department of Computer Science Computer Organization and Assembly Language Chapter 6 External Memory.
January 25 & 27, Csci 2111: Data and File Structures Week3, Lecture 1 & 2 Secondary Storage and System Software: CD-ROM & Issues in Data Management.
Advance Database System
Data Storage Lecture 3 CSCI 1405, CSCI 1301 Introduction to Computer Science Fall 2009.
1 Chapter 6 Storage and Multimedia: The Facts and More.
Computer Organization and Architecture External Memory.
1 Storage Hierarchy Cache Main Memory Virtual Memory File System Tertiary Storage Programs DBMS Capacity & Cost Secondary Storage.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
CENG 351 Fall Secondary Storage Devices: Magnetic Disks.
Data Storage Technology
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
1 Secondary Storage Devices. 2 Content ►Secondary storage devices ►Organization of disks ►Organizing tracks by sector ►Organizing tracks by blocks ►Non-data.
Operating Systems COMP 4850/CISG 5550 Disks, Part II Dr. James Money.
Storage Device Computer Component : Storage Device (External Memory, Secondary Memory, Secondary Storage) Storage Types Magnetic Types Optical Types.
1 Introduction to Computers Day 4. 2 Storage device A functional unit into which data can be –placed –retained(stored) –retrieved(accessed)
Storage device.
Information Technology Storage Devices Prof. Adnan Khalid.
L/O/G/O External Memory Chapter 3 (C) CS.216 Computer Architecture and Organization.
Chapter 3 Data Storage. Media Storage Main memory (Electronic Memory): Stores data currently being used Is made of semiconductor chips. Secondary Memory.
1 Storing And Retrieving Information 2 Mass Storage and Files Programs and information (text, image, audio, video) are stored: –Magnetic Magnetic Tape.
Physical Storage and File Organization COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Storing Data On Your Computer Chapter 12, Exploring the Digital Domain.
A Secondary Storage: CD – ROM Dr. Robert J. Hammell Assistant Professor Towson University Computer and Information Sciences Department 8000 York Road -
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
January 18 & 20, Files Secondary Storage and System Software: Magnetic Disks &Tapes.
GCSE Information Technology Storing data Data storage devices can be divided into 2 main categories: Backing storage is used to store programs and data.
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
Disks Chapter 5 Thursday, April 5, Today’s Schedule Input/Output – Disks (Chapter 5.4)  Magnetic vs. Optical Disks  RAID levels and functions.
External Storage Primary Storage : Main Memory (RAM). Secondary Storage: Peripheral Devices –Disk Drives –Tape Drives Secondary storage is CHEAP. Secondary.
January 18 & 20, Csci 2111: Data and File Structures Week2, Lecture 1 & 2 (Cont’d) Secondary Storage and System Software: Magnetic Disks &Tapes.
2.1 Operating System Concepts Chapter 2: Computer-System Structures Computer System Operation Storage Structure Storage Hierarchy Hardware Protection General.
Chapter 8 External Storage. Primary vs. Secondary Storage Primary storage: Main memory (RAM) Secondary Storage: Peripheral devices  Disk drives  Tape.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Auxiliary Memory Magnetic Disk:
Lecture 5 Secondary Storage and System Software III.
STORAGE DEVICES Introduction Comparision Storage Hierarchy Slide 1.
Storage devices 1. Storage Storage device : stores data and programs permanently its retained after the power is turned off. The most common type of storage.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
Lecture 3 Secondary Storage and System Software I
File organization Secondary Storage Devices Lec#7 Presenter: Dr Emad Nabil.
File Structures & Data Processing. Unit-I Introduction : File structure design, File processing operations : open, close, read, write, seek. Unix directory.
File Organization Record Storage and Primary File Organization
File Structures How are the database tables stored on disk?
Secondary Storage Devices
Chapter 2: Computer-System Structures
Lecture 16: Data Storage Wednesday, November 6, 2006.
Backing Store.
Oracle SQL*Loader
9/12/2018.
Lecture 11: DMBS Internals
Secondary Storage Devices
Lesson 9 Types of Storage Devices.
Networks & I/O Devices.
Presentation transcript:

Chapter 3 Secondary Storage Objectives: To get familiar with: Storage and access of data on disk Storage and access of data on tape Storage and access of data on CD-ROM Buffer management

Outline Disk organization and capacity Disk access Tape CD-ROM A journey of a byte Buffer management

Disks Serial device: permit serial data access only. Example: magnetic tapes. Direct access storage device (DASD): permit direct data access. Example: magnetic disks, optical disks. hard disk: high capacity, low cost. Usually, attached to a computer system on a hard disk drive. floppy disk: small capacity, low cost. Usually, removable from a floppy disk drive. CD-ROM: read only, higher capacity, low cost. Usually, removable from a CD-ROM drive. Compact disks can be writable (CD-RW).

Disk Organization -- Platters The information stored on a disk is stored on the surface of one or more platters.

Disk Organization -- Tracks and Sectors The information is stored in successive tracks on the surface of the disk. Each track is divided into a number of sectors. A sector is the smallest addressable unit of a disk. When a READ( ) statement fetches a particular byte from the disk, the entire sector containing that byte is loaded to a special space in RAM, called buffer.

Disk Organization -- Cylinders The tracks that are directly above and below one another form a cylinder. Data on a single cylinder can be accessed without moving the arm. Moving the arm is called seeking. The arm movement is the slowest part of reading data from a disk.

Disk Capacities Disks ranges in width from 2 to 14 inches, commonly 3.5”. The capacity of a disk ranges from several megabytes to several hundreds of gigabytes. In a disk, each platters can store data on both sides, called surfaces. The number of surfaces is twice the number of platters. The number of cylinders is the same as the number of tracks on a single surface. The bit density on a track affects the amount of data can be held on the track surface. The bit density depends on the quality of the recording medium and the size of the read/write head. A low density disk can hold about 4KB on a track and 35 tracks on a surface. A top-of-the-line disk can hold more than 1MB on a track and more than 10,000 tracks on a surface (cylinders).

Disk Capacities (cont’d) Disk drive capacity: track capacity = number of sectors per track  bytes per sector number of tracks per cylinder = 2  number of platters. cylinder capacity = number of track per cylinder  track capacity drive capacity = number of cylinder  cylinder capacity Example: suppose a disk has the following specification number of bytes per sector = 512 number of sectors per track = 256 number of platters = 12 number of cylinders = 8192 track capacity = 512  256 bytes = 128 KB number of cylinders = 2  12 = 24 cylinder capacity = 24  128 KB = 3MB total disk capacity = 8192  3 = 24GB

Disk Capacities (cont’d) If the size of a file is known, the amount of disk space can be calculated. Example: a file of 500,000 fixed-length data records 256 bytes in each record Using the disk in the previous slide: A sector can hold 2 records. The file needs 500000/2 = 250000 sector, or 250000/256 = 977 tracks, or 977/24 = 41 cylinders If the disk does not have 41 physically contiguous cylinders available, the file may be spread out over dozens or even hundreds of cylinders.

Track Organization -- by Sector Two basic ways to organize data on a disk: organizing tracks by sector, and organizing tracks by user-defined block. The physical placement of sectors: physically adjacent sectors interleaving sectors For newer disks with faster data transfer rate For disks with slow data transfer rate

Clusters A cluster is a fixed number of contiguous sectors (not physically contiguous; the degree of physical contiguity is determined by the interleaving factor). Once a cluster has been found on a disk, all sectors in that cluster can be accessed without additional seeks. A file is viewed as a series of clusters of sectors using a file allocation table (FAT) containing a list of all clusters ordered according to the logical order of the sectors they contain. The system administrator can decide how many sectors in a cluster.

Extents An extent is a file stored in contiguous sectors, tracks and cylinders. Its clusters are contiguous. An extent is possible, if a disk has a lot of free space. The file can be accessed with a minimum amount of seeking. If no contiguous free space for a file, the file can be stored in two or more extents.

Fragmentation Internal fragmentation of a disk is the unused disk space which cannot be used by other files. Store a file of 300-byte records in a disk of sector size 512 bytes. Store a record in a sector. This will cause the loss of disk space, i.e., internal fragmentation. Allow records to span in two sectors. This will save disk space. But, it may require the retrieval of two sectors when accessing a record. If the number of bytes in a file is not a multiple of the cluster size, internal fragmentation will occur in the last extent of the file.

Track Organization -- by Block Disk tracks can be divided into integral numbers of user-defined blocks. Block size can have fixed or variable length. In block organization, different amount of data can be transferred in a single I/O operation. A block organization does not have sector-spanning and fragmentation problems. (a) track organization by sectors (b) track organization by blocks

Track Organization -- by Block (cont’d) A block may contain one or several records. Each block is usually accompanied by on or more subblocks containing extra information about the data blocks. count subblock: counting the number bytes in the accompanied data block. key subblock: containing the key for the last record in the data block. When key subblocks are used, a track can be searched by the disk controller for a block or record identified by a given key. This search is more efficient than sector-addressable schemes because it does not load the keys into primary memory.

Nondata Overhead Preformatting overhead for sector-addressable disks stored at the beginning of each sector, including information about sector address, track address, and condition (whether the sector is usable or defective). preformatting also involves placing gaps and synchronization marks between fields of information. Nondata overhead for block-addressable disks subblocks and interblock gaps. block factors: number of bytes per track/block length. In general, block factors is the greater the better. However, larger blocks have higher potential of internal track fragmentation.

Disk Access Cost Seek time: the time required to move the access arm to the correct cylinder. average seek time Rotational delay: the time required to rotate the disk so the desired sector can be placed under the read/write head. Maximum rotational delay: time for one resolution Average rotational delay: half of maximum rotational delay Transfer time:the time required to read the data from the disk (number of bytes transferred  number of bytes on a track)  rotation time or (number of sectors transferred  number of sectors in a track)  rotation time

Disk Access Time Suppose the previous mentioned disk with 10000 rpm (resolutions per minute) average seek time = 10 ms average rotational delay = half resolution = (1/2)  (1/10000) minute = 3 ms Suppose the previous mentioned file is stored as Case 1. Random sectors, that is, we can read only one sector a time Case 2. Random clusters: each cluster has 8 sectors (4KB). Case 3.One extent Decide the access time of the file for these three cases

Disk Access Time (cont’d) Case 1: assume the file is read sector by sector in random. average seek 10.0 msec rotational delay 3.0 msec read one sector 0.023 msec //(1/256)  (1/10000 min) Total 13.023 msec Total time =250000 13.023 msec = 3255.75 seconds = 54 minutes Case 2: assume the file is read cluster by cluster in random. average seek 10.0 msec rotational delay 3.0 msec read one cluster 0.187 msec //(8/256)  (1/10000 min) total 13.187 msec Total time: (250000/8)  13.187 msec = 412.09 seconds = 6.9 minutes

Disk Access Time (cont’d) Case 3: sequential access average seek 10.0 msec  41 = 410 msec rotational delay 3 msec read one extend (250000/256)  (1/10000 min) = 5859.4 msec Total time: 410 + 3 + 5859.4 = 6272.4. msec = 6.3 seconds Conclusion Seeking is the most expensive operation. Avoid seeking as much as possible. Grouping data into larger units (e.g., cluster) can reduce access time. Sequential access is much faster than random access.

Disk as Bottleneck Disk is slow comparing with memory, CPU, and high-speed network. A process is disk-bound when CPU or network is waiting for disk I/O.The execution time of the process is bound by the disk access. Possible solutions: Multi-tasking: CPU switches among processes Stripping/RAID: using multiple disks for different parts of a file -- parallelism. Buffering

Tape No direct accessing facility, but very rapid sequential access. Compactness, resistance to rough environmental conditions, easy to store and transport, cheaper than disk Used to be used for application data Currently, tapes are primarily used as archival storage.

Organization of Data on Nine-Track Tapes On a tape, the logical position of a byte within a file corresponds directly to its physical position relative to the start of the file. The surface of a typical tape can be seen as a set of parallel tracks each of which is a sequence of bits. These bits correspond to 1 byte + a parity bit. One Byte = a one-bit-wide slice of tape called a frame. In odd parity, the bit is set to make the number of bits in the frame odd. This is done to check the validity of the data. Frames are organized into data blocks of variable size separated by interblock gaps (long enough to permit stopping and starting)

Estimating Tape Length Requirements Let b= the physical length of a data block Let g= the length of an interblock gap, and Let n= the number of data blocks. The space requirement, s, for storing the file is s = n  (b+g) b= blocksize (i.e., bytes per block)/ tape density (i.e., bytes per inch) The number of records stored in a physical block is called the blocking factor. Effective Record Density: a general measure of the effect of choosing different block sizes: (number of bytes per block)/ (number of inches required to store a block) ==> Space utilization is sensitive to the relative sizes of data blocks and interblock gaps.

Estimating Data Transmission Times Normal Data Transmission Rate= (Tape Density (bpi))  (Tape Speed (ips)) Interblock gaps, however, must be taken into consideration Effective Transmission Rate = (Effective Recording Density)  (Tape Speed) Blocking factor affects effective transmission rate.

Disk versus Tape In the past: Both Disks and Tapes were used for secondary storage. Disks were preferred for random access and tape was better for sequential access. Now: Disks have taken over much of secondary storage ==> Because of the decreased cost of disk + memory storage Tapes are used as Tertiary storage (Cheap, fast & easy to stream large files or sets of files between tape and disk)

CD-ROM A single disc can hold more than 600 MB of data. CD-ROM is a descendent of CD Audios. i.e., listening to music is sequential and does not require fast random access to data. CD-ROM is read only. i.e., it is a publishing medium rather than a data storage and retrieval like magnetic disks. There can’t be any changes ==> File organization can be optimized. CD-ROM Strengths: High storage capacity Inexpensive price Durability CD-ROM Weaknesses: Extremely slow seek performance (between 1/2 a second to a second) ==> Intelligent File Structures are critical.

Pits and Lands CD-ROMs are stamped from a glass master disk which has a coating that is changed by the laser beam. When the coating is developed, the areas hit by the laser beam turn into pits along the track followed by the beam. The smooth unchanged areas between the pits are called lands. Pits scatter light; lands reflect light. 1’s are represented by the transition from pit to land and back again. 0’s are represented by the amount of time between transitions. The longer between transitions, the more 0s we have. There must be at least two 0s between any pair of 1s. Raw patterns of 1s and 0s have to be translated to get the 8-bit patterns of 1s and 0s that form the bytes of the original data. EFM encoding (Eight to Fourteen Modulations) turns the original 8 bits of data into 14 expanded bits that can be represented in the pits and lands on the disk. Since 0s are represented by the length of time between transition, the disk must be rotated at a precise and constant speed. This affects the CD-ROM drive’s ability to seek quickly.

CLV vs. CAV Data on a CD-ROM is stored in a single, spiral track. This allows the data to be packed as tightly as possible since all the sectors have the same size (whether in the center or at the edge) -- constant linear velocity (CLV). Since reading the data requires that it passes under the optical pick-up device at a constant rate, the disc has to spin more slowly when reading the outer edges than when reading towards the center. The CLV format is responsible for the poor seeking performance of CD-ROM Drives: there is no straightforward way to jump to a location. Part of the problem is the need to change rotational speed. To read the address info, we need to be moving the data under the optical pick up at the correct speed. But to adjust the speed, we need to read the address info. How do we break this loop? By guessing and through trial and error ==> Slows down performance. Disk drives pack the data more densely in the center than in the edge -- constant angular velocity (CAV). The disk spins at a constant rate. Data density is less on outer tracks. It is easy to find the start of a tractor.

Addressing Different from the “regular” disk method. Each second of playing time on a CD is divided into 75 sectors. Each sector holds 2 Kilobytes of data. Each CD-ROM contains at least one hour of playing time. The disc is capable of holding at least 60 min * 60 sec/min * 75 sector/sec * 2 Kilobytes/sector = 540, 000 KBytes Often, it is actually possible to store over 600, 000 KBytes. Sectors are addressed by min:sec:sector e.g., 16:22:34

A Journey of A Byte What happens when the program statement: write(fd, &ch, 1) is executed ? Part that takes place in memory: Statement calls the Operating System (OS) which overseas the operation File manager (Part of the OS that deals with I/O) Checks whether the operation is permitted Locates the physical location where the byte will be stored (Drive, Cylinder, Track & Sector) Finds out whether the sector to put the character is already in memory (if not, call the I/O Buffer) Puts ‘P’ (content of ch) in the I/O Buffer Keep the sector in memory to see if more bytes will be going to the same sector in the file

A Journey of A Byte (Cont’d) Part that takes place outside of memory: I/O Processor: Wait for an external data path to become available (CPU is faster than data-paths ==> Delays) Disk Controller: I/O Processor asks the disk controller if the disk drive is available for writing Disk Controller instructs the disk drive to move its read/write head to the right track and sector. Disk spins to right location and byte is written

Buffer Management What happens to data travelling between a program’s data area and secondary storage? Buffering involves working with a large chunk of data in memory so the number of accesses to secondary storage can be reduced. How many buffers do we need? at least two: one for input and the other for output Moving data to or from disk is very slow and programs may become I/O bound. Buffering Strategies Multiple Buffering Double Buffering Buffer Pooling Move mode: move between buffer and program data area Locate mode: operating directly on buffer Scatter/gather I/O: fill/empty multiple buffer with a single read/write