Storing Data: Disks and Buffers

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

Storing Data: Disks and Files
FILES (AND DISKS).
Buffer Management Notes Adapted from Prof Joe Hellersteins notes
Introduction to Database Systems1 Buffer Management Storage Technology: Topic 2.
CS4432: Database Systems II Buffer Manager 1. 2 Covered in week 1.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Storing Data: Disks and Files
Buffer management.
1 Database Buffer Management Yanlei Diao UMass Amherst Feb 20, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
The Relational Model (cont’d) Introduction to Disks and Storage CS 186, Spring 2007, Lecture 3 Cow book Section 1.5, Chapter 3 (cont’d) Cow book Chapter.
Storing Data: Disks and Files Lecture 3 (R&G Chapter 9) “Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 9.
Lecture 11: DMBS Internals
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7.
Physical Storage Susan B. Davidson University of Pennsylvania CIS330 – Database Management Systems November 20, 2007.
Introduction to Database Systems 1 Storing Data: Disks and Files Chapter 3 “Yea, from the table of my memory I’ll wipe away all trivial fond records.”
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Chapter 7 “ Yea, from the table of my memory I ’ ll wipe away.
1 Storing Data: Disks and Files Chapter 9. 2 Disks and Files  DBMS stores information on (“hard”) disks.  This has major implications for DBMS design!
“Yea, from the table of my memory I’ll wipe away all trivial fond records.” -- Shakespeare, Hamlet.
Exam I Grades uMax: 96, Min: 37 uMean/Median:66, Std: 18 uDistribution: w>= 90 : 6 w>= 80 : 12 w>= 70 : 9 w>= 60 : 9 w>= 50 : 7 w>= 40 : 11 w>= 30 : 5.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Storing Data: Disks and Files Content based on Chapter 9 Database Management Systems, (3.
1.1 CAS CS 460/660 Introduction to Database Systems Disks, Buffer Manager.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
DMBS Architecture May 15 th, Generic Architecture Query compiler/optimizer Execution engine Index/record mgr. Buffer manager Storage manager storage.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Storing Data: Disks and Files Chapter 7 Jianping Fan Dept of Computer Science UNC-Charlotte.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
1 Lecture 16: Data Storage Wednesday, November 6, 2006.
1 Storing Data: Disks and Files Chapter 9. 2 Objectives  Memory hierarchy in computer systems  Characteristics of disks and tapes  RAID storage systems.
Database Applications (15-415) DBMS Internals: Part II Lecture 12, February 21, 2016 Mohammad Hammoud.
Announcements Program 1 on web site: due next Friday Today: buffer replacement, record and block formats Next Time: file organizations, start Chapter 14.
The very Essentials of Disk and Buffer Management.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CS522 Advanced database Systems
CS 540 Database Management Systems
Database Applications (15-415) DBMS Internals- Part I Lecture 11, February 16, 2016 Mohammad Hammoud.
Module 11: File Structure
Storing Data: Disks and Files
Storing Data: Disks and Files
Database Applications (15-415) DBMS Internals: Part II Lecture 11, October 2, 2016 Mohammad Hammoud.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Database Management Systems (CS 564)
CS222/CS122C: Principles of Data Management Lecture #3 Heap Files, Page Formats, Buffer Manager Instructor: Chen Li.
Database Management Systems (CS 564)
Storing Data: Disks, Buffers and Files
Lecture 11: DMBS Internals
Storing Data: Disks and Files
Lecture 10: Buffer Manager and File Organization
Database Applications (15-415) DBMS Internals: Part II Lecture 13, February 25, 2018 Mohammad Hammoud.
Lecture 9: Data Storage and IO Models
Disk Storage, Basic File Structures, and Buffer Management
Filesystems 2 Adapted from slides of Hank Levy
Introduction to Database Systems
Database Systems November 2, 2011 Lecture #7.
Database Applications (15-415) DBMS Internals: Part III Lecture 14, February 27, 2018 Mohammad Hammoud.
5. Disk, Pages and Buffers Why Not Store Everything in Main Memory
Storing Data: Disks and Files
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Secondary Storage Management Brian Bershad
Basics Storing Data on Disks and Files
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
CS222P: Principles of Data Management Lecture #3 Buffer Manager, PAX
Secondary Storage Management Hank Levy
Storing Data: Disks and Files
Lecture 15: Data Storage Tuesday, February 20, 2001.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Presentation transcript:

Storing Data: Disks and Buffers R & G 9.1, 9.3, 9.4 Talk starts with wrap up of previous lecture Acknowledgement: Slides are adopted from the Berkeley course CS186 by Joey Gonzalez and Joe Hellerstein

Files and Access Methods Block diagram of a DBMS SQL Client Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Concurrency Control and Recovery

Database Management System Last few lectures: SQL SQL Client Database Management System Database First few lectures: out-of-core sort Next few weeks: How is a SQL query Executed?

Database Management System Architecture of a DBMS SQL Client Database Management System Parse, check, and verify the SQL expression SELECT s.sid, s.sname, r.bid FROM Sailors s, Reserves r WHERE s.sid = r.sid AND s.age > 30 Query Parsing & Optimization Heap Scan Reserves Indexed Scan Sailors Indexed Join GroupBy (Age) and translate into an efficient relational query plan Database

Database Management System Architecture of a DBMS SQL Client Database Management System Execute the dataflow by operating on records and files Query Parsing & Optimization Relational Operators Heap Scan Reserves Indexed Scan Sailors Indexed Join GroupBy (Age) Database

Database Management System Architecture of a DBMS SQL Client Database Management System Organizing tables and records as groups of pages in a logical file. Query Parsing & Optimization Relational Operators Bob M 32 94703 Harmon Name Sex Age Zip Addr Alice F 33 Mabel Jose 31 94110 Chavez Jane 30 Files and Index Management Page Header Database

Database Management System Architecture of a DBMS SQL Client Database Management System Query Parsing & Optimization Illusion of operating in memory RAM Relational Operators Frame Frame Frame Page 1 Page 4 Page 2 Files and Index Management Buffer Management Disk Space Management Database

Query Parsing & Optimization Architecture of a DBMS SQL Client Query Parsing & Optimization Translates page requests into physical bytes on one or more device(s) Relational Operators Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Files and Index Management Buffer Management Disk Space Management Database

Query Parsing & Optimization Architecture of a DBMS SQL Client Organized in layers Each layer abstracts the layer below Manage complexity Perf. Assumptions Example of good systems design Database Disk Space Management Buffer Management Files and Index Management Relational Operators Query Parsing & Optimization

Disk Space & Buffer Management Frame Page 4 Page 2 Page 1 Read Page Write Page Database operates on in memory pages. Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6

Overview Table Record File Byte Rep. Record Slotted Page Bob M 32 94703 Harmon Name Sex Age Zip Addr Alice F 33 Mabel Jose 31 94110 Chavez Jane 30 Table Record Bob M 32 94703 Harmon Varchar Char Int File Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Byte Rep. Record Header M 32 94703 Bob Harmon Slotted Page Page Header

Overview: Files of Pages of Records Tables stored as a logical files consisting of pages each containing a collection of records Pages are managed in memory by the buffer manager: higher levels of database only operate in memory on disk by the disk space manager: reads and writes pages to physical disk/files Big Ideas in this Lecture Block/Pages granularity reasoning Exploit access patterns in memory management Efficient binary representation of data

Query Parsing & Optimization Architecture of a DBMS SQL Client Query Parsing & Optimization Translates page requests into physical bytes on on or more device(s) Relational Operators Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Files and Index Management Buffer Management Disk Space Management Database

Disk Space Management Buffer Management Disk Space Mngmt. Frame Frame Frame Page 1 Page 4 Disk Space Mngmt. Frame Frame Frame Page 2 Page 1 Page 2 Database operates on in memory pages. Page 3 Page 4 Read Page Write Page Page 5 Page 6 Page 7 Page 8

Recall: Disks and Files DBMS stores information on Disks and SSDs. Disks are a mechanical anachronism (slow!) 10ms Random Read Latency Spinning Platters Arm Movement Arm Assembly

Recall: Arranging Pages on Disk “Next” page concept: pages on same track, followed by pages on same cylinder, followed by pages on adjacent cylinder Arrange file pages sequentially on disk minimize seek and rotational delay. For a sequential scan, pre-fetch several pages at a time! Read large consecutive blocks

Recall: SSDs DBMS stores information on Disks and SSDs. Disks are a mechanical anachronism (slow!) SSDs faster, slow relative to memory, costly writes Issues in current generation (NAND) 4-8K reads, 1-2MB writes So... read is fast and predictable Single read access time: 0.03 ms 4KB random reads: ~500MB/sec Sequential reads: ~525MB/sec But.. write is not! Slower for random Single write access time: 0.03ms 4KB random writes: ~120MB/sec Sequential writes: ~480MB/sec Write amplification: big units, need to reorg for garbage collection & wear

Disks and Files DBMS operate at Block Level DBMS stores information on Disks and SSDs. Disks are a mechanical anachronism (slow!) SSDs faster, slow relative to memory, costly writes DBMS operate at Block Level Read and Write large chunks seq. bytes Leverage cache hierarchy and HW pre-fetch Amortize seek delays on HDDs and Writes on SSD Sequentially: Next disk block is fastest Maximize usage of data per R/W Organize data for fast in memory processing (i.e., mapping)

A note on (confusing) terminology Block = Unit of transfer for disk read/write 64KB - 128KB is a good number today Book says 4KB Page = Fixed size contiguous chunk of memory Assume same size as block Refer to corresponding blocks on disk For simplicity we use Block and Page interchangeably.

Disk Space Management Lowest layer of DBMS, manages space on disk Mapping pages to locations on disk Loading pages from disk to memory Saving pages back to disk & ensuring writes Higher levels call upon this layer to: read/write a pages allocate/de-allocate logical pages Request for a sequence of pages best satisfied by pages stored sequentially on disk Physical details hidden from higher levels of system Higher levels may assume Next Page is fast! Lowest layer could operate directly on the /dev/”files” these are not files but commands directly to device. You wouldn’t bypass a filesystem all together “roll your own” In the old days this might have been a good idea. Then you would be writing your own device driver for each new device that you want to run your database on “eek”. Now people typically run databases on large files sitting on the filesystem. So you would allocate a large file to store the database. Side note: If this was an SSD why would that be a bad idea?

Disk Space Management Implementation Proposal 1: Talk to the device directly Could be very fast if you knew the device well What happens when devices change? Proposal 2: Run over filesystem (FS) Allocate single large “contiguous” file and assume sequential / nearby byte access are fast Most FS optimize for sequential access and temporal locality (buffer cache on hot items) Sometimes disable FS buffering May span multiple files on multiple disks / machines Lowest layer could operate directly on the /dev/”files” these are not files but commands directly to device. You wouldn’t bypass a filesystem all together “roll your own” In the old days this might have been a good idea. Then you would be writing your own device driver for each new device that you want to run your database on “eek”. Now people typically run databases on large files sitting on the filesystem. So you would allocate a large file to store the database. Side note: If this was an SSD why would that be a bad idea?

Typically sits on top of local file system Get Page 4 Get Page 5 Disk Space Management Big File 1 Big File 2 Big File 3 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 File System File System File System

Disk Space Management Provide API to read and write pages to device Pages: block level organization of bytes on disk Ensures next locality and abstracts FS/Device details Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6

Query Parsing & Optimization Architecture of a DBMS SQL Client Database Disk Space Management Buffer Management Files and Index Management Relational Operators Query Parsing & Optimization Illusion of operating in memory RAM Frame Frame Frame Page 1 Page 4 Page 2 Disk Space Management

Mapping Pages into Memory Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Main Memory (Buffer Pool) Frame Frame Frame Page 1 Page 4 Database operates on in memory pages. Load Page Write Page

Mapping Pages into Memory Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Main Memory (Buffer Pool) Data must be in RAM for DBMS to operate on it Buffer Mgr. hides the fact that not all data is in RAM Frame Frame Frame Page 1 Page 4 Database operates on in memory pages. Load Page Write Page

Mapping Pages into Memory Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Main Memory (Buffer Pool) Frame Frame Frame Page 2 Load Page Page 3 Write Page Page 5

Mapping Pages into Memory Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Main Memory (Buffer Pool) Frame Frame Frame Page 5 Page 2 Page 2 Page 3 Load Page Write Page

Mapping Pages into Memory Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Main Memory (Buffer Pool) Frame Frame Frame Page 5 ? Page 2 ? Page 3 ? Load Page Request Page 1 Write Page

Challenges for Buffer Manager What happens if a page is modified? Writing “dirty” pages back to disk No free frames  evict which page? Replacement Policy What if page is being used? Pinning Concurrent page operations? Solve this in lock manager …

When a Page is Requested ... Buffer pool information “table” contains: <frame#, pageid, pin_count, dirty> If requested page is not in pool: Choose a frame for replacement. Only “un-pinned” pages are candidates! If frame “dirty”, write current page to disk Read requested page into frame Pin the page and return its address. If requests can be predicted (e.g., sequential scans) pages can be pre-fetched several pages at a time!

Mapping Pages into Memory Files and Index Management Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Noticed pages changed from earlier example Buffer Pool Pointer * Pinned Pinned Frame Frame Frame Page 5 Page 1 Page 6 Load Page Request Page 6 Write Page

After Requestor Finishes Requestor of page must: indicate whether page was modified via dirty bit. unpin it (soon preferably!) why? Page in pool may be requested many times, a pin count is used. To pin a page: pin_count++ A page is a candidate for replacement iff pin count == 0 (“unpinned”) CC & recovery may do additional I/Os upon replacement. Write-Ahead Log protocol; more later!

Mapping Pages into Memory Disk Space Mngmt. Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Noticed pages changed from earlier example Buffer Pool 1 2 1 Frame Frame Frame Page 5 Page 2 Page 3 Load Page Finished Pointer * Page 2 Write Page

Page Replacement Policy Page is chosen for replacement by a replacement policy: Least-recently-used (LRU), Clock Most-recently-used (MRU) Policy can have big impact on #I/O’s; Depends on the access pattern. Once a hot area of research but LRU/Clock has become pretty popular

LRU Replacement Policy Least Recently Used (LRU) Pinned Frame: not available to replace track time each frame last unpinned (end of use) replace the frame which least recently unpinned Very common policy: intuitive and simple Works well for repeated accesses to popular pages (temporal locality) Can be costly. Why? Need to maintain heap data-structure Solution?

Clock Replacement Policy Empty Frame Approx. to LRU Arrange frames in logical cycle Introduce “Ref. Bit” Clock arm advances until request satisfied

Clock Replacement Policy Pinned Empty Frame A Want to insert page: Current frame has pin-count > 0  Skip Pinned G F B E C D

Clock Replacement Policy Pinned Empty Frame A Want to insert page: Not Pinned and Ref. bit is set  Clear Ref. Bit Skip Pinned G F B E C D

Clock Replacement Policy Pinned Empty Frame A Want to insert page: Not pinned and Ref. bit unset: Evict Copy Page Set pinned Set ref. bit Advanced clock Pinned G F B E C Pinned D

Clock Replacement Policy Pinned Empty Frame A Request for Page C: Cache Hit!: Pin (inc.) Set ref bit. Pinned F B Pinned E C Pinned G

Is LRU/Clock Always Best? Very common policy: intuitive and simple Works well for repeated accesses to popular pages temporal locality LRU can be costly  Clock policy is cheap When might it perform poorly What about repeated scans of big files?

Repeated Scan of Big File (LRU) Kind of big File Empty Frame Buffer A B C D Scan Cache Hits: Attempts:

Repeated Scan of Big File (LRU) Empty Frame Buffer A B C D Scan A Scan Cache Hits: Attempts: 1

Repeated Scan of Big File (LRU) Empty Frame Buffer A B C D A B Scan Scan Cache Hits: Attempts: 2

Repeated Scan of Big File (LRU) Empty Frame Buffer A B C D A B C Scan Scan Cache Hits: Attempts: 3

Repeated Scan of Big File (LRU) Empty Frame Buffer A B C D A B C LRU Scan Cache Hits: Attempts: 3 4 D Scan

Repeated Scan of Big File (LRU) Empty Frame Buffer A B C D A Scan D B C LRU Scan Cache Hits: Attempts: 5 4

Repeated Scan of Big File (LRU) Empty Frame Buffer A B C D D A C B Scan LRU Scan Cache Hits: Attempts: 5 6

Repeated Scan of Big File (LRU) Empty Frame Buffer A B C D Sequential Flooding D A B Scan LRU Scan No Cache Hits! Cache Hits: Attempts: 6

Repeated Scan of Big File (MRU) Most Recently Used File Empty Frame Buffer A B C D Scan Scan Cache Hits: Attempts:

Repeated Scan of Big File (MRU) Empty Frame Buffer A B C D Scan A Scan Cache Hits: Attempts: 1

Repeated Scan of Big File (MRU) Empty Frame Buffer A B C D A B Scan Scan Cache Hits: Attempts: 2

Repeated Scan of Big File (MRU) Empty Frame Buffer A B C D A B C Scan Scan Cache Hits: Attempts: 3

Repeated Scan of Big File (MRU) Empty Frame Buffer A B C D A B C MRU Scan Cache Hits: Attempts: 4 3 D Scan

Repeated Scan of Big File (MRU) Empty Frame Buffer A B C D Hit! Scan A B D MRU Scan 1 Cache Hits: Attempts: 5

Repeated Scan of Big File (MRU) Empty Frame Buffer A B C D Hit! A B D Scan MRU Scan 2 Cache Hits: Attempts: 6

Repeated Scan of Big File (MRU) Empty Frame Buffer A B C D A B D MRU Scan C 2 Scan Cache Hits: Attempts: 6 7

Repeated Scan of Big File (MRU) Empty Frame Buffer A B C D Hit! A C D MRU Scan 3 Improved Cache Hit Rate! Cache Hits: Attempts: 8 Scan

Repeated Scan of Big File (MRU) Empty Frame Buffer A B C D Hit! What else might we do for sequential scans? A B D MRU Scan 3 Improved Cache Hit Rate! Cache Hits: Attempts: 8 Scan

Background Prefetching File Empty Frame Buffer A B C D Prefetched Prefetched Scan A B C Scan Load (prefetch) “next” pages in a background thread Why does this help? Disk Scheduling & Parallel IO Interleave IO and compute

DBMS vs. OS Buffer Cache OS has page/buffer management. Why not let OS manage these tasks? Buffer management in DBMS requires ability to: pin page in buffer pool, force page to disk, order writes important for implementing CC & recovery adjust replacement policy, and pre-fetch pages based on access patterns in typical DB operations.

Query Parsing & Optimization Architecture of a DBMS SQL Client Database Disk Space Management Buffer Management Files and Index Management Relational Operators Query Parsing & Optimization Illusion of operating in memory RAM Frame Frame Frame Page 1 Page 4 Page 2 Disk Space Management