File Organization Record Storage and Primary File Organization

Name: File Organization Record Storage and Primary File Organization
Uploaded: 2017-12-17T10:57:27+00:00
Duration: PTM15S0
Channel: Percival Kelly
Description: File Organization Record Storage and Primary File Organization

File Organization Record Storage and Primary File Organization
Index Structures for Files Anusha Indika Walisadeera University of Ruhuna Matara

Outline Concepts of computer storage hierarchies
Description of magnetic disk storage device and their characteristics Buffering of blocks Placing file records on disk Methods for organizing records of a file on disk Unordered records Ordered records Hashed records Index structures for files Primary Index Clustering Index Secondary Index

Reference Book Fundamentals of Database Systems – Third Edition
(Chapter 5 & Chapter 6) Authors : Elmasri / Navathe

Concepts of computer storage hierarchies

Storage Medium A computerized database must be stored physically on some computer storage medium. Primary Storage: Storage media that can be operated on directly by CPU (e.g., main memory, cache memory) Secondary Storage: Data in secondary storage (e.g., magnetic disks, optical disks, and tapes) cannot be processed directly by the CPU; it must first be copied into primary storage

Memory Hierarchies and Storage Devices
At primary storage level: Cache (Static RAM) DRAM (Dynamic RAM or Main Memory) At secondary storage level: - Magnetic disks (on-line) and CD-ROM (WORM) - Tapes (off-line)

Storage of Databases Databases are stored permanently on magnetic disk secondary storage, for the following reasons: - Fitness: Databases are too large to fit entirely in main memory - Volatility: Data on secondary storage has less chance of being loss comparing to the main memory - Cost: The cost of storage per unit of data is less for magnetic disk than the primary storage

Secondary Storage Device - Magnetic Disks
Magnetic disks are used for storing large amount of data A hard disk or a floppy disk are good examples of a magnetic disk A disk is either single-sided or double-sided A disk consists of tracks, surfaces, blocks and center spindle

A disk pack with read/write hardware
Note : To increase storage capacity, disks are assembled into a disk pack which may include many disks and hence many surfaces A disk pack with read/write hardware

Disks Data Block A data block (sector) is the smallest unit of data defined within the database.

Magnetic Disks Cont… Read-write head
Positioned very close to the platter surface (almost touching it) Reads or writes magnetically encoded information Surface of platter divided into circular tracks ( i.e. each circle is called a track) Over 16,000 tracks per platter on typical hard disks Each track is divided into sectors (or blocks) A sector is the smallest unit of data that can be read or written and transfer between disk and main memory Sector size typically 512 to 4096 bytes Typical sectors per track: 200 (on inner tracks) to 400 (on outer tracks) Blocks are separated by fixed-size interblock gaps To read/write a sector disk arm swings to position head on right track platter spins continually; data is read/written as sector passes under head Cylinder i consists of ith track of all the platters (i.e. For disk packs, the tracks with the same diameter on the various surfaces are called a cylinder because of the shape they would form if connected in space)

Addressing Disk Records
Individual records on disks can typically be addressed in the following way: 1. Surface number 2. Track number 3. Sector number (for floppy disks) or cylinder number (for larger units)

Definitions Seek time (s): Average time to move the read-write head to the correct track (i.e. – time it takes to reposition the arm over the correct track) Rotational delay (rd): Average time for the sector to move under the read-write head Transfer time: time to read a sector and transfer the data to memory Logical record: the data about an entity ( a row in a table) Physical record: A sector, page or block on the storage medium Typically several logical records can be stored in one physical record

Formulas Average Rotational Delay (rd)= Time for one revolution (msec)
2 Data transfer rate (tr) = Track size (bytes) Time for one revolution (msec) Block Transfer Time (btt) = Block size (B) Transfer rate (tr) Average time to locate and transfer a block = average seek time (S) + rd + btt Access time = Seek time + Rotational delay + Transfer time Access time – the time it takes from when a read or write request is issued to when data transfer begins.

Example Consider a disk with the following characteristics:
block size B=512 bytes, interblock gap size G=128 bytes, number of blocks per track=20, number of tracks per surface=400. A disk pack consists of 15 double-sided disks.

Example Cont… (a) What is the total capacity of a track and what is its useful capacity (excluding interblock gaps)? Total track size = 20 * ( ) = bytes = 12.8 Kbytes Useful capacity of a track = 20 * 512 = bytes = Kbytes (b) How many cylinders are there? Number of cylinders = number of tracks = 400 (c) What is the total capacity and the useful capacity of a cylinder? Total cylinder capacity = 15*2*20*( ) = bytes = 384 Kbytes Useful cylinder capacity = 15 * 2 * 20 * 512 = bytes = Kbytes (d) What is the total capacity and the useful capacity of a disk pack? Total capacity of a disk pack = 15 * 2 * 400 * 20 * ( ) = bytes = Mbytes Useful capacity of a disk pack = 15 * 2 * 400 * 20 * 512 = Mbytes

Buffering of blocks

Storage Access A database file is partitioned into fixed-length storage units called blocks. Blocks are units of both storage allocation and data transfer. Database system seeks to minimize the number of block transfers between the disk and memory. We can reduce the number of disk accesses by keeping as many blocks as possible in main memory. Buffer – portion of main memory available to store copies of disk blocks.

Buffering of Blocks Buffering are used to speed up the transportation of data between disk and MM Interleaved Concurrency (Fig 1) Simultaneous (Parallel) Concurrency (Fig 2) Double buffering : When two buffers are used for reading (or writing) by I/O processor and CPU (Fig 3) - Allows continuous R/W of data on consecutive disk blocks, which eliminate seek time and rotational delay for all but the first block transfer.

More than one cpu is available

Fig 3 : Use of two buffers, A and B, for reading from disk

Placing File Records on Disk

Records and Record Types
Data is usually stored in the form of records. Each record consists of collection of related data values. A collection of field names and their corresponding data types constitute a record type or record format definition. A data type associated with each field determines the type of values a field can take. For example, an EMPLOYEE record type may be defined as the following structure: struct employee{ char name[30]; char ssn[9]; int salary; int jobcode; char department[20]; };

Files of Records A file is a sequence of records. If every record in the file has exactly the same size, the file is said to be made up of fixed-length records. A file may have variable-length records for several reasons: one or more of the fields are of varying size, one or more of the fields may have multiple values for individual records, one or more of the fields are optional. A file descriptor (or file header) includes information that describes the file, such as the field names and their data types, and the addresses of the file blocks on disk. Records are stored on disk blocks. The blocking factor (bfr) for a file is the (average) number of file records stored in a disk block. i.e. bfr = B/R B = block size R = record size

Fig 4 :

Types of Record Organization
(a) Spanned : Records can cross the disk block boundary If a record size (R) > Block size (B), we must use a spanned organization. (b) Unspanned : Records cannot cross the disk block boundary This is used with fixed-length records having B > R because it makes each record start at a known location block Note : For variable-length records, either a spanned or an unspanned organization can be used.

Fig 5 :

Allocating File Blocks on Disk
There are several standard techniques for allocating the blocks of a file on disk. Contiguous allocation Linked allocation Clusters of consecutive disk blocks, the clusters are linked using pointers Indexed allocation Common to use combinations of these techniques

Contiguous Allocation
Index Allocation

Operations on Files - Open/Close - Reset - Find (or locate)
DBMS uses the following operations: - Open/Close - Reset - Find (or locate) - Read (or get) - Find/Next - Delete/Insert - Modify

File Organization File organization refers to the methods of storing data records in a file and then to access these records easily. Three Primary Methods for organizing records of a file on disk: Heap / unordered files Sequential / ordered files Hashed files (Direct files)

Files of unordered records
The simplest and most primitive type of file organization Data are collected in the order they arrive Record insertion is very efficient Record searching is very inefficient (use linear search). To delete a record, a program must first find its block. So it is not very efficient. Can use either spanned or unspanned and either fixed-length or variable-length records.

Files of ordered records
Fixed format used for records Records are the same length The records in the file are ordered by a search-key called the ordering key field. Advantages: Reading of the records in order of the ordering field is extremely efficient Finding the next records is fast Finding records based on a query of the ordering field is efficient.(Use Binary Search) Disadvantages: Searches on non-ordering fields are inefficient (Use linear search) Insertion and deletion of records are very expensive operation for an ordered file because records must remain physically ordered

Hashing Technique (Direct file)
A method of distributing data evenly (almost randomly) to different areas of memory. Provides very fast access to records based on certain search conditions Search condition must be an equality condition “=“ on a single field known as hash field (HF) HF is also key field (or hash key) Uses a function known as hash function (or randomizing function) The hash function takes hash field as an input and generates the address of the disk block as an output

Hashed File (Direct file)

File Organization Record Storage and Primary File Organization

Similar presentations

Presentation on theme: "File Organization Record Storage and Primary File Organization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

File Organization Record Storage and Primary File Organization

Similar presentations

Presentation on theme: "File Organization Record Storage and Primary File Organization"— Presentation transcript:

Similar presentations

About project

Feedback