Download presentation
Presentation is loading. Please wait.
Published byWidya Pranoto Modified over 5 years ago
1
Copyright © 2003-2017 - Curt Hill
Page Management In memory and on disk Copyright © Curt Hill
2
DBMS Organization Query Optimization and Execution
Relational Operators Files and Access Methods Buffer Management Disk Space Management Disk Copyright © Curt Hill
3
Copyright © 2003-2017 - Curt Hill
Layer Divisions The layering approach has proven to be advantageous from a software engineering perspective Operating Systems are also layered Each layer provides services to the layer above and uses the services of the layer below How the layers are divided depends on the project This presentation covers typical layers Copyright © Curt Hill
4
The Bottom Two Levels of DBMS
Buffer manager Manages pages in memory Receives page requests from above Disk Space Manager Manages pages on disk Transfers pages to and from disk Accepts commands only from Buffer Manager Copyright © Curt Hill
5
Copyright © 2003-2017 - Curt Hill
Disk Space Manager Treat a large part of disk as a relative file Read a page into the buffer slot Write a page from the buffer slot Started by buffer manager Notify buffer manager when complete Use the fastest I/O available from the OS Knows nothing about tables Copyright © Curt Hill
6
Copyright © 2003-2017 - Curt Hill
Buffer Manager Maintains the page buffer A large chunk of physical memory Has slots in which a page (or disk block) fit Has control information: What disk page How long since placed there Has this been changed? Knows little or nothing about tables Copyright © Curt Hill
7
Copyright © 2003-2017 - Curt Hill
The Buffer Pool 3591 5191 7639 3914 1121 0031 3001 6359 8335 1285 1891 2365 2021 3917 4026 2229 5091 1983 1440 1045 1008 1432 3005 5094 4004 Page numbers represent disk addresses The slots are where they are temporarily located All this is in memory Copyright © Curt Hill
8
Copyright © 2003-2017 - Curt Hill
Buffer Manager Like Cache and Virtual Memory it needs a replacement policy The buffer is always full The Replacement Policy determines which slot to free when a new slot is needed Better guided because file requests are more predictable Copyright © Curt Hill
9
Copyright © 2003-2017 - Curt Hill
Buffer Manager The Files and Access Method Layer requests the page If the page is present it is pinned If not it is brought in and pinned When request complete the page is unpinned Since the same page may be used by several requests, count the pins Pin count of zero does not necessarily force the page out of buffer Copyright © Curt Hill
10
Copyright © 2003-2017 - Curt Hill
Pinning A page must be pinned while in use In use is while the upper levels are actually examining or changing data on the page During this time it cannot be moved or discarded It is also convenient for locking to occur at the page level Locking forces pinning Copyright © Curt Hill
11
Copyright © 2003-2017 - Curt Hill
Replacement Policy The replacement policy answers the question: What page slot to use when a new one is needed After a very short time of execution a buffer pool is full When a new item is brought into the cache an old one must be discarded Here are some that we will consider Least recently used, Least frequently used and modifications Copyright © Curt Hill
12
Copyright © 2003-2017 - Curt Hill
Least Recently Used Most common and very effective is LRU Record last access for each page slot The oldest value gets dumped when one is needed Copyright © Curt Hill
13
Copyright © 2003-2017 - Curt Hill
Least Frequently Used Attach counter that records references in last x milliseconds Increment the counter on every reference and every so often decrement every one by 1 The frequently used pages tend to stay and others leave Copyright © Curt Hill
14
Copyright © 2003-2017 - Curt Hill
Modifications Buffer managers have a more predictable work load than Cache or Virtual Memory Sequential scans are very predictable, so that might alter which page is freed Pinned records are not available to be freed A modified page is harder to delete than one not modified Copyright © Curt Hill
15
Copyright © 2003-2017 - Curt Hill
Sequential flooding A problem for buffer managers Repeated sequential scans of a large file The number of pages in a file exceeds the buffer pool LRU causes each request to need an I/O Most Recently Used works in this case Limits the number of slots per table Copyright © Curt Hill
16
Buffer Manager and Virtual Memory
Both have a similar job Use a limited amount of memory to satisfy the requests for a much larger amount of memory They can easily interfere with each other Generally Buffer Manager locks its memory down so that virtual memory cannot page it out Copyright © Curt Hill
17
Files and Access Methods Layer
Just above Buffer Manager This is the layer that knows about: Tables Indices Disk addresses It creates files out of numbered pages Sequential files BTrees Hash Indices Copyright © Curt Hill
18
Copyright © 2003-2017 - Curt Hill
This Layer vs. OS Every Operating System offers services to structure files Not every OS offers BTrees In doing so they offer them with a generality not needed by the DBMS The price of generality is slower execution Thus this layer does all the file access processing without help from the OS Copyright © Curt Hill
19
Copyright © 2003-2017 - Curt Hill
Files or Sets of Pages? Relational algebra considers a table a set of tuples Not a good idea for implementation The Files and Access Methods layer views a table as a file The Buffer Manager maintains the buffers The Disk Manager knows nothing about tables, it only reads and writes numbered pages How is this done? Copyright © Curt Hill
20
A heap file becomes pages
The file has to be organized as pages There are several ways to do this: A linked list Directory of pages Among others Other types of files may be handled in these ways as well Copyright © Curt Hill
21
Copyright © 2003-2017 - Curt Hill
Linked List First page is a header It has both a forward and backward link Each link is a page ID Each page has both a forward and backward link The file may be scanned in either direction Copyright © Curt Hill
22
Copyright © 2003-2017 - Curt Hill
File Directory Linked Pages First Page Second Page Third Page Fourth Page Header Page Seventh Page Fifth Page Sixth Page Copyright © Curt Hill
23
Copyright © 2003-2017 - Curt Hill
Directory Pages The header page is subdivided into pointers to each of the needed pages The data pages have no links A directory page may link to another directory page if the number of pages exceeds the capacity of a page Copyright © Curt Hill
24
Copyright © 2003-2017 - Curt Hill
File Directory Directory Pages First Page Second Page Third Page Fourth Page Directory Page Fifth Page Seventh Page Sixth Page Next Directory Page Eighth Page Copyright © Curt Hill
25
Copyright © 2003-2017 - Curt Hill
Record Formats Relational algebra believes that each row of the table has exactly the same size This is known as a fixed record format This does not accommodate type varchar among others Thus we must consider record formats again Copyright © Curt Hill
26
Copyright © 2003-2017 - Curt Hill
Record Formats A record in a file is either fixed length or variable length Fixed length may not vary in a table Different tables may have different lengths Variable length records are implemented in two ways: Delimiter Descriptor Copyright © Curt Hill
27
Copyright © 2003-2017 - Curt Hill
Delimiter A delimiter is a character or string that denotes the end of the record This delimiter may not occur within the record DOS/Windows uses CR/LF UNIX uses a LF C style strings use Null character Sometimes an indicator of the number of records is put in the beginning of the page Copyright © Curt Hill
28
Copyright © 2003-2017 - Curt Hill
Descriptor Prefix the variable length record with a length descriptor Or put all the lengths at beginning No restriction on the characters in the record Size of descriptor limits maximum record size: One byte – 256 Two byte – 32K Four byte – 4G Copyright © Curt Hill
29
Variable Length Records
Delimiter style, using $ as delimiter First record $ Second record $ Third $ Fourth record $ 4 First record $ Second record $ Third $ Fourth record $ Descriptor style 12 First Record 18 Second Record … 5 Third 12 18 5 First Record Second Record … Third Copyright © Curt Hill
30
Finding Records - Fixed
We want to find record N in a fixed length file Multiply N by record length Divide by page length Read that page and use remainder to find record This is exactly how Direct files work Copyright © Curt Hill
31
Copyright © 2003-2017 - Curt Hill
Example Fixed Length Suppose: Record length is 20 Page size is 512 Want to find record 100 Calculation 20 100 = 2000 2000 512 = 3 remainder 464 Read page 3 and record sits between position 464 and 484 Copyright © Curt Hill
32
Variable Length Records
Much less easy with variable length records This is why some variable length record types are somewhat discouraged even though they save space Varchar MySQL sometimes changes these to char BLOB Copyright © Curt Hill
33
Copyright © 2003-2017 - Curt Hill
Finding the Nth Record The delimiter approach requires that each page be scanned Entirely or look at the count Descriptor approach allows finding the record quickly in the block or knowing it is in a subsequent block Copyright © Curt Hill
34
Copyright © 2003-2017 - Curt Hill
Free Space It is to our advantage to insert some free space in each of the pages This allows a variable length record to be replaced by a larger one or an insertion without affecting all the rest of the file We also end up with free space from deletions This makes the simple calculation of fixed size record calculation much more difficult but insertion and deletion much easier Copyright © Curt Hill
35
Copyright © 2003-2017 - Curt Hill
Page Format How are the records organized inside a page With fixed length records: Packed Unpacked With variable length records a slightly more complicated page directory is needed Copyright © Curt Hill
36
Packed fixed length records
There is a counter that indicates how many records are present Counter capacity The used records occupy the first so many bytes and are easy to identify because of their size Insertion/deletion of a record usually only slides record to bottom of page Copyright © Curt Hill
37
Unpacked fixed length records
Page is partitioned into slots There is a bit string which indicates whether the slot is empty or full Any empty slot is free space Insertion only slides to first empty slot Deletion requires no data movement Copyright © Curt Hill
38
Copyright © 2003-2017 - Curt Hill
Fixed Records in a Page Packed Unpacked 5 First First Free Second Second Third Third Fourth Free Fifth Fourth Free Free Free Fifth Free Copyright © Curt Hill
39
Variable records in a page
Each page has a directory which contains for each record: Starting location Length The last record position + length is the beginning of the free space The order of the records in the directory is important Their location in the page is not Copyright © Curt Hill
40
Copyright © 2003-2017 - Curt Hill
Compaction Replacement of a larger record by a smaller record or deletion leaves empty space This may or may not be left there Insertion or replacement of a smaller record by a larger may require the page be compacted Move all the records needed to leave all the free space at the end Compaction is the same as defragmentation Copyright © Curt Hill
41
Variable Records in a Page
5 (150,50), (50,10), (80,20), (100,15), (115,35) Second Free Third Fourth Fifth First Free Assume that this block is 50 bytes wide. The page contains 400 bytes. Copyright © Curt Hill
42
Copyright © 2003-2017 - Curt Hill
Conclusion Most of the complexity of these layers are hidden from the average user If these layers do a good job the entire DB will perform well otherwise too much I/O will occur These layers provide services so that the upper layers can focus on their task Copyright © Curt Hill
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.