Presentation is loading. Please wait.

Presentation is loading. Please wait.

Storage and File Structure Malavika Srinivasan Prof. Franya Franek.

Similar presentations


Presentation on theme: "Storage and File Structure Malavika Srinivasan Prof. Franya Franek."— Presentation transcript:

1 Storage and File Structure Malavika Srinivasan Prof. Franya Franek

2 Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage

3 Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage

4 Overview of Physical Storage Media Cost Primary memory Secondary memory Tertiary memory Access time

5 Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage

6 Cache : fastest Most expensive; volatile; managed by the computer system hardware Main memory: fast access (10s to 100s of nanoseconds; generally too small (capacities - few Gigabytes.) Volatile Overview of Physical Storage Media – Primary Memory

7 Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage

8 Flash memory Non volatile Form of EEPROM – Data has to be erased before overwriting READ – fast, WRITE and ERASE - slow Limitations Erasing has to be done to an entire bank of memory Can support only a limited number (10K – 1M) of write/erase cycles. Overview of Physical Storage Media – Secondary Memory

9 Outline Magnetic disk : Components of Magnetic disk Read / Write Operation Disk Controller Performance Measures Optimizing disk access

10 Outline Magnetic disk : Components of Magnetic disk Read / Write Operation Disk Controller Performance Measures Optimizing disk access

11 Magnetic-disk Data is stored on spinning disk, and read/written magnetically Data must be moved from disk to main memory for access, and written back for storage Non-volatile Suitable for long term storage of entire database. Overview of Physical Storage Media – Secondary Memory

12 Magnetic Disk Mechanism

13 1.Read-write head Reads or writes magnetically encoded information. 2. Tracks Surface of platter divided into circular tracks 3. Sector Each track is divided into sectors. Sector size - 512 bytes 4. Arm assembly Contains more than one READ/Write head for acess from more than one platter simultaneously. 5. Cylinder i consists of i th track of all the platters Components of Magnetic Disk

14 Outline Magnetic disk : Components of Magnetic disk Read / Write Operation Disk Controller Performance Measures Optimizing disk access

15 READ/WRITE Operation: The R/W HEAD is kept floating above the platter by the breeze created due to spinning of the platter. Disk arm swings to position R/W HEAD on right track. Platter spins continually; data is read/written as sector passes under R/W HEAD. Magnetic Disk – READ/WRITE Operation

16 Outline Magnetic disk : Components of Magnetic disk Read / Write Operation Disk Controller Performance Measures Optimizing disk access

17 Magnetic Disk – Disk Controller Disk controller – interfaces between the computer system and the disk drive.

18 Functions of Disk Controller : Magnetic Disk – Disk Controller Receives R/W command Initiates actions to move arm to corresponding track Checksum for each sector Remapping of sectors Disk Controller

19 Outline Magnetic disk : Components of Magnetic disk Read / Write Operation Disk Controller Performance Measures Optimizing disk access

20 1. Access time – the time it takes from when a read or write request is issued to when data transfer begins. Consists of:  Seek time – time it takes to reposition the arm over the correct track.  Rotational latency – time it takes for the sector to be accessed to appear under the head. Magnetic Disk – Performance Measures Access time = Seek Time + Rotational Latency

21 2. Data-transfer rate – the rate at which data can be retrieved from or stored to the disk. 3. Mean time to failure (MTTF) – the average time the disk is expected to run continuously without any failure - Typically 3 to 5 years Magnetic Disk – Performance Measures

22 Outline Magnetic disk : Components of Magnetic disk Read / Write Operation Disk Controller Performance Measures Optimizing disk access

23 Scheduling File Organization Non volatile Write buffers Log disk Optimization of Data Access from Disk

24 1.Disk-arm-scheduling : It is an approach that aims at providing algorithms to order the Requests(to access the disk) such that the disk arm movement is minimized. E.g. Elevator algorithm : Move disk arm in one direction (from outer to inner tracks or vice versa), Process all pending request in that direction irrespective of the when the request was issued. If no more requests in that direction, then reverse direction and repeat Magnetic Disk – Optimizing Access Time

25 2. File organization Organizing the blocks corresponding to how data will be accessed E.g. Store related information on the same or nearby blocks. 3. Nonvolatile write buffers – Speed up disk writes by writing blocks to a non-volatile RAM buffer immediately Controller then writes to disk whenever the disk is free. Writes can be reordered to minimize disk arm movement. Magnetic Disk – Optimizing Access Time

26 4. Log disk A disk devoted to writing. Only Sequential access. Write to log disk is very fast since no seeks are required. Magnetic Disk – Optimizing Access Time

27 RAID : Redundant Arrays of Independent Disks Motivation: High speed - parallelism High reliability – Mirroring Disadvantages : Mirroring – Costly Parallelism – No reliability Best approach? Alternative schemes to provide reliability at lower cost

28 RAID : Redundant Arrays of Independent Disks 1. RAID Level 0: Block striping; Non-redundant. Data loss not critical 2. RAID Level 1: Mirrored disks Block striping

29 1. RAID Level 2: Bit level striping Memory style ECC Parity bits 2. RAID Level 3: Bit level striping Handles parity at sector level. Bit interleaved Parity organization RAID : Redundant Arrays of Independent Disks

30 RAID Level 4: Block-Interleaved Parity; Block-level striping Handles parity at block level for N disks Computing Damaged Block : Compute XOR of bits from corresponding blocks (including parity block) from other disks.

31 RAID : Redundant Arrays of Independent Disks RAID Level 5: Block-Interleaved Distributed Parity; partitions data and parity among all N + 1 disks, E.g., with 5 disks, parity block for nth set of blocks is stored on disk (n mod 5) + 1, with the data blocks stored on the other 4 disks.

32 Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage

33 1.Optical Disks : Compact disk-read only memory (CD-ROM) Digital Video Disk (DVD) Record once versions (CD-R and DVD-R) 2. Magnetic tapes : Hold large volumes of data and provide high transfer rates Very slow access time in comparison to magnetic disks and optical disks Used mainly for backup Overview of Physical Storage Media – Tertiary Memory

34 Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage

35 Terminologies : 1.Buffer – portion of main memory available to store copies of disk blocks. 2. Buffer manager – subsystem responsible for allocating buffer space in main memory. 3.Blocks - Data has to be in main memory for DBMS to operate it. So, it has to be transferred to and from disk. Hence it is organized into manageable units called Blocks. Storage Access

36 Source : cs.wisu.edu notes Buffer Manager

37 Requesting A Disk Page 22 disk page 3 Source : cs.wisu.edu notes MAIN MEMORY DISK BUFFER POOL 1232290 …… Higher level DBMS component I need page 3 Disk Mgr Buf Mgr 3

38 Page Replacement policies 22 3 23 20 21 1 5 51 2 4 7 11 9 29 30 71 8 41 What if all the pages in the buffer are full? 3 31 98 17 Disk1 Disk 2 Disk 3 Transfer from disk High level DBMS I need page 17

39 Page Replacement policies 1.FIFO 2.LRU Algorithm 3.MRU Algorithm (Pinning pages ) 4.Optimal Page replacement Algorithm

40 FIFO Page Replacement Policy

41 LRU Page Replacement Policy Source : technet.microsoft.com

42 Optimal Page Replacement Policy

43 Most Recently Used (MRU)  A block that is currently being used cannot be removed. So, the block can be pinned while in use. After being used, the block will be unpinned and it will become Most Recently used block, which can be replaced.  It can also be used to indicate that this block is not allowed to be written to disk as it is still under use. Page Replacement policies

44 Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage

45 File Organization A database is stored as a collection of files. Each file is a sequence of records, and a record is a sequence of fields. i)Fixed Length records ii)Variable length records

46 Consider type deposit = record account_number char(10); 1X10= 10 bytes branch_name char(22); 1X22= 22 bytes balance numeric(12,2); 8 bytes end Total : 40 bytes Strategy : First record stored in first 40 bytes and the second record in next 40 and so on….... File Organization: Fixed Length Records

47 Deletion is difficult – space occupied by deleted record must be occupied or we must have a way to mark deleted records so that it can be ignored. Unless block size happens to be multiple of 40, some records will cross boundaries, if they do so, then to access each record it will require two access to two blocks. Fixed Length Records - Problems

48 Deletion in Fixed length records Record 0A1Perry ridge100 Record 1A2Downtown200 Record 2A3Redwood300 Record 3A4Downtown400 Consider Account records : Move all records one level up, following a deleted record. Record 0A1Perry ridge100 Record 1A2Downtown200 Record 2A3Redwood300 Record 3A4Downtown400 *****But this requires large no: of moves.

49 An optimal way would be to use free lists. Header – Address of first record deleted. Delete Record 1,3 and 5, Deletion in Fixed length records – Free Lists Header : address of first record deleted Record 0A1Perry ridge100 Record 1A2Downtown200 Record 2A3Redwood300 Record 3A4Downtown400 Record 4A3Redwood300 Record 5A4Downtown400

50 Variable Length Records : Variable-length records arise in database systems in several ways: – Storage of multiple record types in a file. – Record types that allow variable length for one or more fields (e.g., varchar)

51 Approaches to store variable length records (Block Based): Each record is identified by a record identifier (rid) (or tuple identifier (tid)). The rid/tid contains number of block and position in block.

52 Approaches to store variable length records (Block Based):

53 Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage

54 Sequential – store records in sequential order, based on the value of the search key of each record Heap – a record can be placed anywhere in the file where there is space. Hashing – a hash function computed on some attribute of each record; the result specifies in which block of the file the record should be placed Multi table clustering file organization records of several different relations can be stored in the same file Organization of Records in Files

55 Sequential Organization Suitable for applications that require sequential processing of the entire file The records in the file are ordered by a search-key

56 Store several relations in one file using a multitable clustering file organization Multi table Clustering File Organization good for queries involving depositor customer, and for queries involving one single customer and his accounts bad for queries involving only customer

57 Outline Overview of Physical Storage Media -Primary Memory -Secondary Memory -Tertiary Memory Storage Access File Organization Organization of Records in Files Data-Dictionary Storage

58 Data Dictionary 1. Information about relations Relation name, Attribute names and types View names and definitions and Integrity constraints Physical location of relation 2. User and accounting information, including passwords 3. Statistical and descriptive data - number of tuples in each relation 4. File organization information (sequential/hash…) Data dictionary (also called system catalog) stores metadata: that is, data about data, such as

59 Data Dictionary A possible catalog/ Data dictionary representation: Relation_metadata = (relation_name, number_of_attributes, storage_organization, location) Attribute_metadata = (relation_name, attribute_name, domain_type, position, length) User_metadata = (user_name, encrypted_password, group) Index_metadata = (relation_name, index_name, index_type, index_attributes) View_metadata = (view_name, definition)

60


Download ppt "Storage and File Structure Malavika Srinivasan Prof. Franya Franek."

Similar presentations


Ads by Google