Database Systems November 2, 2011 Lecture #7.

Slides:



Advertisements
Similar presentations
The Bare Basics Storing Data on Disks and Files
Advertisements

Storing Data: Disk Organization and I/O
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9 Yea, from the table of my memory Ill wipe away.
1 Storing Data: Disks and Files Chapter 7. 2 Disks and Files v DBMS stores information on (hard) disks. v This has major implications for DBMS design!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke, edited K/Shomper1 Storing Data: Disks and Files Chapter 9 Yea, from the table of my memory.
Storing Data: Disks and Files
5. Disk, Pages and Buffers Why Not Store Everything in Main Memory
1 Storing Data Disks and Files Yea, from the table of my memory Ill wipe away all trivial fond records. -- Shakespeare, Hamlet.
FILES (AND DISKS).
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7 Yea, from the table of my memory Ill wipe away all.
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Storing Data: Disks and Files
Buffer management.
1 Storing Data: Disks and Files Yanlei Diao UMass Amherst Feb 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Murali Mani Overview of Storage and Indexing (based on slides from Wisconsin)
The Relational Model (cont’d) Introduction to Disks and Storage CS 186, Spring 2007, Lecture 3 Cow book Section 1.5, Chapter 3 (cont’d) Cow book Chapter.
Storing Data: Disks and Files Lecture 3 (R&G Chapter 9) “Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
1 Database Systems November 12/14, 2007 Lecture #7.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9.
Storing Data: Disks and Files. General Overview Relational model - SQL –Formal & commercial query languages Functional Dependencies Normalization Physical.
Storage and File Structure. Architecture of a DBMS.
Database Management 6. course. OS and DBMS DMBS DB OS DBMS DBA USER DDL DML WHISHESWHISHES RULESRULES.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Physical Storage Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems November 20, 2007.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7 “ Yea, from the table of my memory I ’ ll wipe away.
1 Storing Data: Disks and Files Chapter 9. 2 Disks and Files  DBMS stores information on (“hard”) disks.  This has major implications for DBMS design!
R. Ramakrishnan and J. Gehrke: Storing Data on Disks 1 Storing Data: Disks and Files Chapter 9 “Yea, from the table of my memory I’ll wipe away all trivial.
“Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
Database Management Systems,Shri Prasad Sawant. 1 Storing Data: Disks and Files Unit 1 Mr.Prasad Sawant.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Chapter Ten. Storage Categories Storage medium is required to store information/data Primary memory can be accessed by the CPU directly Fast, expensive.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Content based on Chapter 9 Database Management Systems, (3.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Storing Data: Disks and Files Chapter 7 Jianping Fan Dept of Computer Science UNC-Charlotte.
1 Storing Data: Disks and Files Chapter 9. 2 Objectives  Memory hierarchy in computer systems  Characteristics of disks and tapes  RAID storage systems.
Database Applications (15-415) DBMS Internals: Part II Lecture 12, February 21, 2016 Mohammad Hammoud.
Storing Data: Disks and Files Memory Hierarchy Primary Storage: main memory. fast access, expensive. Secondary storage: hard disk. slower access,
The very Essentials of Disk and Buffer Management.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Disks and Files.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS522 Advanced database Systems
Database Applications (15-415) DBMS Internals- Part I Lecture 11, February 16, 2016 Mohammad Hammoud.
Storing Data: Disks and Files
Storing Data: Disks and Files
Database Applications (15-415) DBMS Internals: Part II Lecture 11, October 2, 2016 Mohammad Hammoud.
CS522 Advanced database Systems
Lecture 16: Data Storage Wednesday, November 6, 2006.
CS222/CS122C: Principles of Data Management Lecture #3 Heap Files, Page Formats, Buffer Manager Instructor: Chen Li.
Database Management Systems (CS 564)
Storing Data: Disks and Files
Lecture 10: Buffer Manager and File Organization
Database Applications (15-415) DBMS Internals: Part II Lecture 13, February 25, 2018 Mohammad Hammoud.
Lecture 9: Data Storage and IO Models
Disk Storage, Basic File Structures, and Buffer Management
Database Applications (15-415) DBMS Internals: Part III Lecture 14, February 27, 2018 Mohammad Hammoud.
Introduction to Database Systems
5. Disk, Pages and Buffers Why Not Store Everything in Main Memory
Storing Data: Disks and Files
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Basics Storing Data on Disks and Files
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Database Systems (資料庫系統)
Storing Data: Disks and Files
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Presentation transcript:

Database Systems November 2, 2011 Lecture #7

topobo (mit)

Announcement Assignment #2 due today Assignment #3 out today & due on 11/16. Midterm exam in class next week. Cover Chapters 1, 2<except 2.7>, 3, 4.1, 4.2, 5

Files and Access Methods Structure of DBMS Disk Space Manager Manage space (pages) on disk. Buffer Manager Manage traffic between disk and main memory. (bring in pages from disk to main memory). File and Access Methods Organize records into pages and files. Applications Queries Query Optimization and Execution Relational Operators Files and Access Methods Buffer Manager Disk Space Manager

Storing Data: Disks and Files Chapter 9 1

Disks and Files DBMS stores information on (“hard”) disks. This has major performance implications for DB system design! READ: transfer data from disk to main memory (RAM). WRITE: transfer data from RAM to disk. Both are high-cost operations, relative to in-memory operations, so must be planned carefully!

Why Not Store Everything in Main Memory? Costs too much. NTD 8,000 for 8G of SDRAM (1,000 per 1GB) NTD 4,000 for 1,000 GB of HD (4 per 1GB) Main memory is volatile. We want data to be saved between runs. Typical storage hierarchy: Main memory (RAM) for currently used data. Disk for the main database (secondary storage). Tapes for archiving older versions of the data (backup storage) or just disk-to-disk backup. 19

Disks Secondary storage device of choice. Main advantage over tapes: random access vs. sequential. Tapes are for data backup, not for operational data. Access the last byte in a tape requires winding through the entire tape. Data is stored and retrieved in units called disk blocks or pages. Unlike RAM, time to retrieve a disk page varies depending upon location on disk. Therefore, relative placement of pages on disk has major impact on DBMS performance! 20

Components of a Disk The platters spin. Spindle Disk head Arm movement Arm assembly Tracks Sector The platters spin. The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder. Only one head reads/writes at any one time. Block size is a multiple of sector size (which is fixed). Spindle, platters, track, cylinder sector, disk head, disk arm 21

Accessing a Disk Page Time to access (read/write) a disk block is called access time. It is a sum of: seek time (moving arm to position disk head on right track) rotational delay (waiting for block to rotate under head) transfer time (actually moving data to/from disk surface) Seek time and rotational delay (mechanical parts) dominate the access time Seek time varies from about 1 to 20msec (avg 10msec) Rotational delay varies from 0 to 8msec (avg. 4msec) Transfer rate is about 100MBps (0.025msec per 4KB page) Two pages are close if there is little seek/rotation delay accessing first page to the 2nd page. 22

How to reduce I/O cost? access time = seek time + rotational latency + transfer time How to lower I/O cost? Reduce seek/rotation delays! How to reduce seek/rotational delays for a large I/O requests of many pages? If two pages of records are accessed together frequently, put them close together on disk. Two pages are close if there is little seek/rotation delay accessing first page to the 2nd page. 22

Arranging Pages on Disk Next block concept (measure the closeness of blocks) (1) blocks on same track (no movement of arm), followed by (2) blocks on same cylinder (switch head, but almost no movement of arm), followed by (3) blocks on adjacent cylinder (little movement of arm) Blocks in a file should be arranged sequentially on disk (by `next’), to minimize seek and rotational delay. For a sequential scan, pre-fetching several pages at a time is a big win! 23

Next Block Concept 3 1 2 Spindle Tracks Disk head Sector Platters Arm movement Platters Arm assembly

RAID RAID = Redundant Arrays of Independent (Inexpensive) Disks Disk Array: Arrangement of several disks that gives abstraction of a single, large disk. Goals: Increase performance and reliability. Say you have D disks & each I/O request wants D blocks How to improve the performance (data transfer rate)? Say you have D disks & D number of I/O request each wanting one block How to improve the performance (request service rate)? Say you have D disks and at most one disk can fail at any time How to improve reliability (in case of disk failure)?

Two main techniques in RAID Data striping improves performance. Data (e.g., in the same time file) is partitioned across multiple HDs; size of a partition is called the striping unit. Performance gain is from reading/writing multiple HDs at the same time. Redundancy improves reliability. Data striping lowers reliability: More disks → more failures. Store redundant information on different disks. When a disk fails, you can reconstruct data from other disks.

RAID Levels Level 0: No redundancy (only data striping) Level 1: Mirrored (two identical copies) Level 0+1: Striping and Mirroring (Level 2: Error-Correcting Code) Level 3: Bit-Interleaved Parity Level 4: Block-Interleaved Parity Level 5: Block-Interleaved Distributed Parity (Level 6: Error-Correcting Code) More Levels (01-10, 03/30, …)

RAID Level 0 Strip data across all drives (minimum 2 drives) Sequential blocks of data (in the same file) are written across multiple disks in stripes. Two performance criterions: Data transfer rate: net transfer rate for a single (large) file Request service rate: rate at which multiple requests (from different files) can be serviced

RAID Level 0 Improve data transfer rate: Improve request service rate: Read 10 blocks (1~10) takes only 2-block access time (worse of 5 disks). Theoretical speedup over single disk = N (number of disks) Improve request service rate: File 1 occupies blocks 1 and file 2 occupies block 2. Service two requests (two files) at the same time. Given N disks, theoretical speedup over single disk = N.

RAID Level 0 Poor reliability: Space redundancy overhead? Mean Time To Failure (MTTF) of one disk = 50K hours (5.7 years). MTTF of a disk array of 100 disks is 50K/100 = 500 hours (21 days)! MTTF decreases linearly with the number of disks. Space redundancy overhead? No overhead

Mirrored (RAID Level 1) Redundancy by duplicating data on different disks: Mirror means copy each file to both disks Simple but expensive. Fault-tolerant to a single disk failure Recovery by copying data from the other disk to new disk. The other copy can continue to service requests (availability) during recovery.

Mirrored (RAID Level 1) Performance is not the objective, but reliability. Mirroring frequently used when availability is more important than storage efficiency. Data transfer rate: Write performance may be slower than single disk, why? Worse of 2 disks Read performance can be faster than single disk, why? Consider reading block 1 from disk 0 and block 2 from disk 1 at the same time. Compare read performance to RAID Level 0? Better, but why? 3 5 7 9 4 6 8 10

Mirrored (RAID Level 1) Data reliability: Space redundancy overhead: Assume Mean-Time-To-Repair (MTTR) is 1 hour. Shorter with Hotswap HDs. MTTF of Mirrored 2-disks = 1 / (probability that 2 disks will fail within the same hour) = MTTR2/2 = (50K) 2/2 hours = many many years. Space redundancy overhead: 50% overhead

Striping and Mirrors (RAID 0+1) Disk 5 Disk 6 Disk 7 Disk 8 Disk 9

Bit-Interleaved Parity (RAID Level 3) Fine-grained striping at the bit level One parity disk: Parity bit value = XOR across all data bit values If one disk fails, recover the lost data: XOR across all good data bit values and parity bit value ? 1 1

Bit-Interleaved Parity (RAID Level 3) Performance: Transfer rate speedup? x32 of single disk Request service rate improvement? Same as single disk (do one request at a time) Reliability: Can tolerate 1 disk failure. Space overhead: One parity disk (1/33 overhead)

Block-Interleaved Parity (RAID Level 4) Coarse-grained striping at the block level Otherwise, it is similar to RAID 3 If one disk fails, recovery the lost block: Read same block of all disks (including parity disk) to reconstruct the lost block.

Block-Interleaved Parity (RAID Level 4) Performance: If error, read/write of same block on all disks (worse-of-N on one block) If no error, write also needs to update (read-n-write) the parity block. (no need to read other disks) Can compute new parity based on old data, new data, and old parity New parity = (old data XOR new data) XOR old parity Result in bottleneck on the parity disk! (can do only one write at a time) How to remove this bottleneck?

Block-Interleaved Parity (RAID Level 4) Reliability: Can tolerate 1 disk failure. Space redundancy overhead: 1 parity disk

Block-Interleaved Distributed-Parity (RAID Level 5) Remove the parity disk bottleneck in RAID L4 by distributing the parity uniformly over all of the disks. No single parity disk as bottleneck; otherwise, it is the same as RAID 4. Performance improvement in write. You can write to multiple disks (in 2-disk pairs) in parallel. Reliability & space redundancy are the same as RAID L4.

Files and Access Methods Structure of DBMS Disk Space Manager manage space (pages) on disk. Buffer Manager manage traffic between disk and main memory. (bring in pages from disk to main memory). File and Access Methods Organize records into pages and files. Applications Queries Query Optimization and Execution Relational Operators Files and Access Methods Buffer Manager Disk Space Manager

Disk Space Manager Lowest layer of DBMS software manages space on disk. Higher levels call upon this layer to: allocate/de-allocate a page read/write a page Request for a sequence of pages should be satisfied by allocating the pages sequentially on disk! Support the “Next” block concept (reduce I/O cost when multiple sequential pages are requested at the same time). Higher levels (buffer manager) don’t need to know how this is done, or how free space is managed. 24

More on Disk Space Manager Keep track of free (used) blocks: List of free blocks + the pointer to the first free block Bitmap with one bit for each disk block. Bit=1 (used), bit=0 (free) Bitmap approach can be used to identify contiguous areas on disk.

Buffer Manager Page Requests from Higher Levels DB BUFFER POOL disk page free frame MAIN MEMORY DISK DB choice of frame dictated by replacement policy Buffer pool contains frames When buffer pool is full, you need to bring more data into the buffer pool. You have to replace existing frames in the buffer pool -> replacement policy. Typically, DBMS has more data than main memory. Bring Data into main memory for DBMS to operate on it! Table of <frame#, pageid> pairs is maintained. 4

When a Page is Requested ... If the requested page is not in pool (and no free frame): Choose an occupied frame for replacement Page replacement policy (minimize page miss rate) If the replaced frame is dirty, write it to disk Read requested page into chosen frame Pin the page and return its address. For each frame, you maintain Pin_count: number of outstanding requests Dirty: modified and need to written back to disk If requests can be predicted (e.g., sequential scans) .. pages can be pre-fetched several pages at a time. 5

More on Buffer Manager Requestor of page must unpin it (no longer need it), and indicate whether the page has been modified: dirty bit is used for this. Page in pool may be requested many times, a pin count is used. A page is a candidate for replacement iff pin count = 0. 6

Buffer Replacement Policy Frame is chosen for replacement by a replacement policy: FIFO, MRU, Random, etc. Which policy is considered as the best? Least-recently-used (LRU): have LRU queue of frames with pin_count = 0 What is the overhead of implementing LRU? Clock (approximate LRU with less overhead) Use an additional reference_bit per page; set to 1 when the frame is accessed Clock hand moving from frame 0 to frame n. Reset reference_bit of recently accessed frames. Replace frame(s) with reference_bit = 0 & pin_count = 0. Policy can have big impact on # of I/O’s; depends on the access pattern. Least recently used = has not been accessed for the longest time 7

Clock Algorithm Example Pin_count=0 Clock Hand BUFFER POOL 1 disk page 1 1 free frame

LRU may not be good: Sequential Flooding #buffer frames = 2 #pages in a file = 3 (P1, P2, P3) Use LRU + repeated sequential scans What many page I/O replacements? Repeated scan of file # buffer frames < # pages in file Every scan of the file result in reading every page of the file. Block read Frame #1 Frame #2 P1 P2 P3

DBMS vs. OS File System OS also does disk space & buffer mgmt. Why not let OS manage these tasks? Better predict the page reference patterns & pre-fetch pages. Adjust replacement policy, and pre-fetch pages based on access patterns in typical DB operations. Pin a page in memory and force a page to disk. Differences in OS support: portability issues Maintain a virtual file that spans multiple disks. 8

Files of Records Higher levels of DBMS operate on records, and files of records. FILE: A collection of pages, each containing a collection of records. Must support: Insert/delete/modify record(s) Read a particular record (specified using record id) Scan all records (possibly with some conditions on the records to be retrieved) To support record level operations, we must keep track of: Fields in a record: Record format Records on a page: Page format Pages in a file: File format 13

Record Formats (how to organize fields in a record): Fixed Length Base address (B) Address = B+L1+L2 Information about field types and offset same for all records in a file; stored in system catalogs. Finding i-th field requires adding offsets to base address. Base address Offset 9

Record Formats: Variable Length Two alternative formats (# fields is fixed): F1 F2 F3 F4 4 $ Fields Delimited by Special Symbols Field Count F1 F2 F3 F4 Two formats Array of Field Offsets Second alternative offers direct access to the i-th field, efficient storage of nulls (special don’t know value); small directory overhead. 10

Page Formats (How to store records in a page): Fixed Length Records Slot 1 Slot 1 Slot 2 Slot 2 . . . Free Space . . . Slot N Slot N Slot M N 1 . . . 1 1 M M ... 3 2 1 number of records number of slots PACKED UNPACKED, BITMAP Since records are fixed length, a page can be divided into fixed-size slots. One slot can store one record. Record id Two formats Record id = <page id, slot #>. They differ on how deletion (which creates a hole) is handled. In first alternative, shift remaining records to fill hole => changes rid; may not be acceptable given external reference. 11

Page Formats: Variable Length Records Rid = (i,N) Page i Rid = (i,2) Rid = (i,1) 20 16 24 N Pointer to start of free space N . . . 2 1 # slots Slot directory contains one slot per record. Each slot contains (record offset, record length) Deletion is by setting the record offset to -1. Can move records on page without changing rid (change the record offset, but same slot number); so, attractive for fixed-length records. SLOT DIRECTORY Since you don’t know the size of the records, cannot divide the page into slots ahead of time. Slot directory Each slot contains … Deletion Insertion 12

Unordered (Heap) Files Simplest file structure contains records in no particular order. As file grows and shrinks, disk pages are allocated and de-allocated. How would you implement a heap file (data structure)? Double-Linked lists Page directory 14

Heap File (Doubly Linked Lists) Data Page Data Page Data Page Full Pages Header Page Data Page Data Page Data Page Pages with Free Space The header page id and Heap file name must be stored someplace. Each page contains 2 `pointers’ plus data. The problem is that inserting a variable size record requires walking through free space list to find a page with enough space. Describe two lists Problem 15

Heap File (Page Directory) Data Page 1 Page 2 Page N Header Page DIRECTORY The directory is a collection of pages. Each directory page contains multiple directory entries – one per data page. The directory entry contains <page id, free bytes on the page> Eliminate the problem in the double-linked list approach. Directory of data page Each entry has a link to the data page and as a count of size of free space 16

Summary How to reduce disk I/O cost? Single-block access: disk access time = seek time + rotational latency + transfer time Multi-block access: next block concept RAID: improve performance & reliability Disk space manager Buffer replacement algorithm, clock algorithm Formats: record, page, file Fixed vs. variable sizes 14