Download presentation
Presentation is loading. Please wait.
Published byPearl Kory Hodge Modified over 8 years ago
1
Storage and File Structure Malavika Srinivasan Prof. Franya Franek
2
Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage
3
Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage
4
Overview of Physical Storage Media Cost Primary memory Secondary memory Tertiary memory Access time
5
Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage
6
Cache : fastest Most expensive; volatile; managed by the computer system hardware Main memory: fast access (10s to 100s of nanoseconds; generally too small (capacities - few Gigabytes.) Volatile Overview of Physical Storage Media – Primary Memory
7
Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage
8
Flash memory Non volatile Form of EEPROM – Data has to be erased before overwriting READ – fast, WRITE and ERASE - slow Limitations Erasing has to be done to an entire bank of memory Can support only a limited number (10K – 1M) of write/erase cycles. Overview of Physical Storage Media – Secondary Memory
9
Outline Magnetic disk : Components of Magnetic disk Read / Write Operation Disk Controller Performance Measures Optimizing disk access
10
Outline Magnetic disk : Components of Magnetic disk Read / Write Operation Disk Controller Performance Measures Optimizing disk access
11
Magnetic-disk Data is stored on spinning disk, and read/written magnetically Data must be moved from disk to main memory for access, and written back for storage Non-volatile Suitable for long term storage of entire database. Overview of Physical Storage Media – Secondary Memory
12
Magnetic Disk Mechanism
13
1.Read-write head Reads or writes magnetically encoded information. 2. Tracks Surface of platter divided into circular tracks 3. Sector Each track is divided into sectors. Sector size - 512 bytes 4. Arm assembly Contains more than one READ/Write head for acess from more than one platter simultaneously. 5. Cylinder i consists of i th track of all the platters Components of Magnetic Disk
14
Outline Magnetic disk : Components of Magnetic disk Read / Write Operation Disk Controller Performance Measures Optimizing disk access
15
READ/WRITE Operation: The R/W HEAD is kept floating above the platter by the breeze created due to spinning of the platter. Disk arm swings to position R/W HEAD on right track. Platter spins continually; data is read/written as sector passes under R/W HEAD. Magnetic Disk – READ/WRITE Operation
16
Outline Magnetic disk : Components of Magnetic disk Read / Write Operation Disk Controller Performance Measures Optimizing disk access
17
Magnetic Disk – Disk Controller Disk controller – interfaces between the computer system and the disk drive.
18
Functions of Disk Controller : Magnetic Disk – Disk Controller Receives R/W command Initiates actions to move arm to corresponding track Checksum for each sector Remapping of sectors Disk Controller
19
Outline Magnetic disk : Components of Magnetic disk Read / Write Operation Disk Controller Performance Measures Optimizing disk access
20
1. Access time – the time it takes from when a read or write request is issued to when data transfer begins. Consists of: Seek time – time it takes to reposition the arm over the correct track. Rotational latency – time it takes for the sector to be accessed to appear under the head. Magnetic Disk – Performance Measures Access time = Seek Time + Rotational Latency
21
2. Data-transfer rate – the rate at which data can be retrieved from or stored to the disk. 3. Mean time to failure (MTTF) – the average time the disk is expected to run continuously without any failure - Typically 3 to 5 years Magnetic Disk – Performance Measures
22
Outline Magnetic disk : Components of Magnetic disk Read / Write Operation Disk Controller Performance Measures Optimizing disk access
23
Scheduling File Organization Non volatile Write buffers Log disk Optimization of Data Access from Disk
24
1.Disk-arm-scheduling : It is an approach that aims at providing algorithms to order the Requests(to access the disk) such that the disk arm movement is minimized. E.g. Elevator algorithm : Move disk arm in one direction (from outer to inner tracks or vice versa), Process all pending request in that direction irrespective of the when the request was issued. If no more requests in that direction, then reverse direction and repeat Magnetic Disk – Optimizing Access Time
25
2. File organization Organizing the blocks corresponding to how data will be accessed E.g. Store related information on the same or nearby blocks. 3. Nonvolatile write buffers – Speed up disk writes by writing blocks to a non-volatile RAM buffer immediately Controller then writes to disk whenever the disk is free. Writes can be reordered to minimize disk arm movement. Magnetic Disk – Optimizing Access Time
26
4. Log disk A disk devoted to writing. Only Sequential access. Write to log disk is very fast since no seeks are required. Magnetic Disk – Optimizing Access Time
27
RAID : Redundant Arrays of Independent Disks Motivation: High speed - parallelism High reliability – Mirroring Disadvantages : Mirroring – Costly Parallelism – No reliability Best approach? Alternative schemes to provide reliability at lower cost
28
RAID : Redundant Arrays of Independent Disks 1. RAID Level 0: Block striping; Non-redundant. Data loss not critical 2. RAID Level 1: Mirrored disks Block striping
29
1. RAID Level 2: Bit level striping Memory style ECC Parity bits 2. RAID Level 3: Bit level striping Handles parity at sector level. Bit interleaved Parity organization RAID : Redundant Arrays of Independent Disks
30
RAID Level 4: Block-Interleaved Parity; Block-level striping Handles parity at block level for N disks Computing Damaged Block : Compute XOR of bits from corresponding blocks (including parity block) from other disks.
31
RAID : Redundant Arrays of Independent Disks RAID Level 5: Block-Interleaved Distributed Parity; partitions data and parity among all N + 1 disks, E.g., with 5 disks, parity block for nth set of blocks is stored on disk (n mod 5) + 1, with the data blocks stored on the other 4 disks.
32
Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage
33
1.Optical Disks : Compact disk-read only memory (CD-ROM) Digital Video Disk (DVD) Record once versions (CD-R and DVD-R) 2. Magnetic tapes : Hold large volumes of data and provide high transfer rates Very slow access time in comparison to magnetic disks and optical disks Used mainly for backup Overview of Physical Storage Media – Tertiary Memory
34
Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage
35
Terminologies : 1.Buffer – portion of main memory available to store copies of disk blocks. 2. Buffer manager – subsystem responsible for allocating buffer space in main memory. 3.Blocks - Data has to be in main memory for DBMS to operate it. So, it has to be transferred to and from disk. Hence it is organized into manageable units called Blocks. Storage Access
36
Source : cs.wisu.edu notes Buffer Manager
37
Requesting A Disk Page 22 disk page 3 Source : cs.wisu.edu notes MAIN MEMORY DISK BUFFER POOL 1232290 …… Higher level DBMS component I need page 3 Disk Mgr Buf Mgr 3
38
Page Replacement policies 22 3 23 20 21 1 5 51 2 4 7 11 9 29 30 71 8 41 What if all the pages in the buffer are full? 3 31 98 17 Disk1 Disk 2 Disk 3 Transfer from disk High level DBMS I need page 17
39
Page Replacement policies 1.FIFO 2.LRU Algorithm 3.MRU Algorithm (Pinning pages ) 4.Optimal Page replacement Algorithm
40
FIFO Page Replacement Policy
41
LRU Page Replacement Policy Source : technet.microsoft.com
42
Optimal Page Replacement Policy
43
Most Recently Used (MRU) A block that is currently being used cannot be removed. So, the block can be pinned while in use. After being used, the block will be unpinned and it will become Most Recently used block, which can be replaced. It can also be used to indicate that this block is not allowed to be written to disk as it is still under use. Page Replacement policies
44
Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage
45
File Organization A database is stored as a collection of files. Each file is a sequence of records, and a record is a sequence of fields. i)Fixed Length records ii)Variable length records
46
Consider type deposit = record account_number char(10); 1X10= 10 bytes branch_name char(22); 1X22= 22 bytes balance numeric(12,2); 8 bytes end Total : 40 bytes Strategy : First record stored in first 40 bytes and the second record in next 40 and so on….... File Organization: Fixed Length Records
47
Deletion is difficult – space occupied by deleted record must be occupied or we must have a way to mark deleted records so that it can be ignored. Unless block size happens to be multiple of 40, some records will cross boundaries, if they do so, then to access each record it will require two access to two blocks. Fixed Length Records - Problems
48
Deletion in Fixed length records Record 0A1Perry ridge100 Record 1A2Downtown200 Record 2A3Redwood300 Record 3A4Downtown400 Consider Account records : Move all records one level up, following a deleted record. Record 0A1Perry ridge100 Record 1A2Downtown200 Record 2A3Redwood300 Record 3A4Downtown400 *****But this requires large no: of moves.
49
An optimal way would be to use free lists. Header – Address of first record deleted. Delete Record 1,3 and 5, Deletion in Fixed length records – Free Lists Header : address of first record deleted Record 0A1Perry ridge100 Record 1A2Downtown200 Record 2A3Redwood300 Record 3A4Downtown400 Record 4A3Redwood300 Record 5A4Downtown400
50
Variable Length Records : Variable-length records arise in database systems in several ways: – Storage of multiple record types in a file. – Record types that allow variable length for one or more fields (e.g., varchar)
51
Approaches to store variable length records (Block Based): Each record is identified by a record identifier (rid) (or tuple identifier (tid)). The rid/tid contains number of block and position in block.
52
Approaches to store variable length records (Block Based):
53
Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage
54
Sequential – store records in sequential order, based on the value of the search key of each record Heap – a record can be placed anywhere in the file where there is space. Hashing – a hash function computed on some attribute of each record; the result specifies in which block of the file the record should be placed Multi table clustering file organization records of several different relations can be stored in the same file Organization of Records in Files
55
Sequential Organization Suitable for applications that require sequential processing of the entire file The records in the file are ordered by a search-key
56
Store several relations in one file using a multitable clustering file organization Multi table Clustering File Organization good for queries involving depositor customer, and for queries involving one single customer and his accounts bad for queries involving only customer
57
Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage
58
Data Dictionary 1. Information about relations Relation name, Attribute names and types View names and definitions and Integrity constraints Physical location of relation 2. User and accounting information, including passwords 3. Statistical and descriptive data - number of tuples in each relation 4. File organization information (sequential/hash…) Data dictionary (also called system catalog) stores metadata: that is, data about data, such as
59
Data Dictionary A possible catalog/ Data dictionary representation: Relation_metadata = (relation_name, number_of_attributes, storage_organization, location) Attribute_metadata = (relation_name, attribute_name, domain_type, position, length) User_metadata = (user_name, encrypted_password, group) Index_metadata = (relation_name, index_name, index_type, index_attributes) View_metadata = (view_name, definition)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.