Download presentation
Published byAnton Gellatly Modified over 10 years ago
1
5. Disk, Pages and Buffers Why Not Store Everything in Main Memory
5. Disk, Pages and Buffers Why Not Store Everything in Main Memory? Why use disks at all? Main memory costs too much. $1000 will buy you either 1 GB of RAM or 1 TGB of disk today (maybe more!) (~1000 as much disk as RAM per dollar) . Main memory is volatile. We want data to be saved between program runs (data persistence or residuality) Main memory is smaller. System disks typically hold many orders of magnitude more data than RAM (TBs vs GBs) The typical storage hierarchy is: Main memory (RAM) for data in current use. Disk for the main database (secondary storage). Tapes for archiving older versions of the data (tertiary storage). 19
2
Disks Secondary storage device of choice.
Main advantage over tapes: random access vs. sequential Data stored and retrieved in units called disk blocks or pages Unlike RAM, time to retrieve a disk page varies depending upon location on disk. Caveat: In NUMA (non-uniform memory access) machines, e.g., NDSU CHPC’s SGI ALTIX, even in RAM, time to retrieve depends upon location (which brick or quad) Therefore, relative placement of pages on disk has a major impact on DBMS performance! 20
3
Arm assembly moves in/out to position a head on a desired track.
Components of a Disk Collection of tracks under the heads at any one time is a cylinder. Only one head reads/writes at any one time. Spindle Tracks (both sides of platter) Block size i(smallest unit of transfer) is a multiple of sector size (which is fixed). Disk head Arm movement Arm assembly Sector cylinder Platters 21
4
Accessing a Disk Page Delay time to access (read/write) data on a disk block: seek time (moving arms to position disk head on track) rotational delay (waiting for block to rotate under head) transfer time (actually moving of electronic data to/from disk surface) Seek time and rotational delay dominate. Seek times can vary from about 1 to 20msec Rotational delay can vary from 0 to 10msec Transfer rate can be 1msec per 4KB page Key to lower I/O cost: reduce seek/rotation delays! Hardware vs. software solutions? 22
5
Arranging Pages on Disk (clustering)
`Next’ block concept: blocks on same track, followed by blocks on same cylinder, followed by blocks on adjacent cylinder Blocks in a file should be arranged sequentially on disk (by the above notions of `next’), to minimize seek and rotational delay. For a sequential scan, pre-fetching several pages at a time is a big win! 23
6
RAID (redundant array of independent disks)
RAID Disk Array: Arrangement of several disks that gives the abstraction of a single, large disk. RAID Goals: Increase performance (more concurrent read/write heads) and reliability (redundant data copies are kept) RAID's two main techniques: Data striping: Data is partitioned and “striped” across disks; size of a partition is called the striping unit. Partitions are distributed over several disks allowing more read/write heads to operate in parallel. Redundancy: Redundant info allows reconstruction of data if 1 disk fails.
7
RAID Levels Level 0: Block striping but no redundancy
Disk1 Disk Disk Disk4 Level 0: Block striping but no redundancy (e.g., Blocks 1,2,3,4 on Disks 1,2,3,4 resp.) Faster reads (more r/w heads working in parallel) Disk1 Disk Disk Disk4 Level 1: Mirroring (2 identical copies) Each disk has a mirror image disk (check disk) Parallel reads, but a write involves 2 disks. Improved durability Disk1 Disk Disk Disk4 Level 0+1: (sometimes called level 10) Block Striping and Mirroring Faster reads plus improved durability Level 2: simple bit-striping, Not used these days.
8
RAID Levels 3,4,5 Level 3: Bit-Interleaved Parity
ParityDisk Disk1 Disk Disk3 Disk4 Level 3: Bit-Interleaved Parity Striping Unit = 1 bit bits 1,2,3, e.g., on disks 1,2,3,4, resp. 1 check disk Each read/write request involves all disks; disk array can process 1 request at a time (but very rapidly) ParityDisk Disk1 Disk Disk3 Disk4 Level 4: Block-Interleaved Parity Striping Unit=1 block. blocks 1,2,3,4 1 check disk Parallel reads possible for small requests large requests can utilize full bandwidth Writes involve modifying block and check disk Disk Disk1 Disk Disk3 Disk4 Level 5: Block-Interleaved Distributed Parity Similar to Level 4, but parity blocks distributed over all disks (striping unit = block) eliminates Parity Disk hot-spot
9
Buffer Management in a DBMS
Page Requests from Higher Levels Occupied frame free frame BUFFER POOL (page frames) DB MAIN MEMORY DISK Disk_mgr transfers pages between page-frame > disk choice of frame for a page is dictated by replacement policy Data must be in RAM (buffer) for DBMS to operate on it! LookupTable of <frame#, pageid> pairs is maintained. 4
10
When a Page is Requested
If requested page (from a higher level) is not in buffer pool: Choose a frame for replacement If frame is dirty (has been changed while in RAM), write it to disk first Read requested page into that frame (and update the LookupTable) Pin the page (designate it as temporarily non-replaceable) and return its address to requesting higher level layer process. If requests can be predicted (e.g., in sequential scans), pages can be pre-fetched several at a time! 5
11
More on Buffer Management
The requestor of a page, when it is done with that page, must unpin it (actually decrement its pin count) and set dirty bit if page has been modified. Because a page in the buffer pool may be requested concurrently by many higher layer processes, pin count is used (LookupTable has <frame#, pg-ID, pincnt>) A page is a candidate for replacement iff pin count = 0. A note: CC & recovery subsystem may force additional I/O when a frame is chosen for replacement. (e.g., to implement a Write-Ahead Log protocol; more later on that.) 6
12
Buffer Replacement write read
Frame is chosen for replacement by a replacement policy: Least-recently-used (LRU) or Most-Recently-Used (MRU) or… An example is given below, showing that knowledge of access pattern by the buffer manager, can be important – e.g., with LRU: Extent (multi-block) pre-fetching (and extent writing) would alleviate this situation considerably. BUFFER POOL These 6 reads fill the buffer. With LRU, every new read requires a write, flushing a frame (assuming all 6 pages have been change (dirtied)) 13 7 14 8 9 write read 10 11 12 DISK pages … 7
13
Buffer Replacement Policy
Policy can have big impact on # of I/O’s; depends on access pattern Sequential flooding is the bad situation caused by LRU + repeated sequential scans Can happen when # buffer frames < # pages in sequentially scan. Each page request causes a flush, whereas, MRU + repeated sequential scans would not. Given a file with 7 blocks to be read sequentially and repeatedly. Note that, after a while, every page to be read, was just flushed. BUFFER POOL 1 2 7 2nd scan now needs page-2, but it was just flushed! Etc. Second scan begins, requiring page-1, but it was just flushed! Pgs 1-6 read in order. To read page-7, LRU flushes page 1 7
14
DBMS vs. OS File System OS can do disk space and buffer management. Why not let OS manage these tasks? Differences in OS support: portability issues Some OS limitations, e.g., files can’t span disks. Buffer management in DBMS requires ability to: pin a page in buffer pool, force a page to disk (important for implementing CC & recovery), adjust replacement policy, and pre-fetch pages based on access patterns in typical DB operations. 8
15
Record Formats: Fixed Length
fields F1 F2 F3 F4 L1 L2 L3 L4 Field lengths Base address (B) Address = B+L1+L2 Info about field types same for all records in a file can access via offsets stored in system catalogs 9
16
Record Formats: Variable Length
Two alternative formats (assuming # fields is fixed): 4 $ Field Count Fields Delimited by Special Symbols F F F F4 F F F F4 Array of Field Offsets The 2nd alternative offers direct access to ith field, efficient storage of nulls (special don’t know value); small directory overhead. 10
17
Page Formats: Fixed Length Records
RecSlot 1 RecSlot 2 RecSlot N . . . N PACKED pg format Free Space number of records . . . M 1 M UNPACKED, BITMAP Slot 1 Slot 2 Slot N Slot M number of records Record ID (RID) = <page id, slot #>. In PACKED, moving records for free space mgmt changes RID. That may not be acceptable (RIDs are to be permanent IDs). 11
18
UNPACKED, RECORD POINTER Page Format (for Variable Length Records)
Rid = (i,N) * Page i Rid = (i,2) * Rid = (i,1) * Pointer to start of free space … N # of record slots SLOT DIRECTORY N Can move records on page without changing RID; so, attractive for fixed-length records too. 12
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.