Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.

Similar presentations


Presentation on theme: "CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li."— Presentation transcript:

1 CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations
Instructor: Chen Li

2 Topics today System catalogs Buffer management File organizations 18

3 Topic: System Catalogs
For each relation: name, file name, file structure (e.g., Heap) name, type, and length (if fixed) for each attribute index name, target, and kind for each index also integrity constraints, defaults, nullability, etc. For each index: structure (e.g., B+ tree) and search key fields For each view: view name and definition (including query) Plus statistics, authorization, buffer pool size, etc. * Catalogs themselves stored as record-based files too! 18

4 Attr_Cat(attr_name, rel_name, type, position)
19

5 Next topic: Buffer Management
Page Requests from Higher Levels BUFFER POOL Note: Project 1’s PagedFileManager class would do the buffering inside if we were doing it…! disk page free frame MAIN MEMORY DISK DB choice of frame dictated by replacement policy Data must be in RAM for DBMS to operate on it! Table of <frame#, pageid> pairs is maintained. 4

6 When a Page is Requested ...
If requested page is not in pool: Choose a frame for replacement If that frame is dirty, write it to disk Read requested page into chosen frame Pin the page and return its address * If requests can be predicted (e.g., sequential scans) pages can be prefetched several pages at a time! 5

7 More on Buffer Management
Requestor of page must unpin it, and indicate whether page has been modified, when done: dirty bit used for the latter purpose Page in pool may be requested many times a pin count is used, and a page is a candidate for replacement iff pin count = 0. CC & recovery may entail additional I/O when a frame is chosen for replacement. (Write-Ahead Log protocol; more in CS 223.) 6

8 Buffer Replacement Policy
Frame is chosen for replacement using a replacement policy: Least-recently-used (LRU), Clock, MRU, etc. Policy can have big impact on # of I/O’s; depends on the access pattern. Sequential flooding: Nasty situation caused by LRU + (repeated) sequential scans. # buffer frames < # pages in file means each page request causes an I/O. MRU much better in this situation (but not in all situations, of course). 7

9 DBMS vs. OS File System OS does disk space & buffer management – so why not let the OS manage these tasks…? Differences in OS support: portability issues Some limitations, e.g., files can’t span disks. Buffer management in DBMS requires ability to: pin a page in buffer pool, force a page to disk (important for implementing CC & recovery), and adjust replacement policy, and prefetch pages based on access patterns in typical DB operations. 8

10 Files of Records: Basic Summary
Disks provide cheap, non-volatile storage. Random access, but cost depends on location of page on disk; important to arrange data sequentially to minimize seek and rotation delays. Buffer manager brings pages into RAM. Page stays in RAM (at least!) until unpinned by last among concurrent requestors. Written to disk when frame chosen for replacement (some time after dirtying requestor unpins the page). Choice of frame to replace based on replacement policy. Could be worth prefetching several pages at a time. 20

11 Summary (Contd.) DBMS vs. OS File Support
DBMS needs features not found in many OS’s, such as forcing a page to disk, controlling the order of the page writes to disk, letting files span disks, controlling prefetching and page replacement policies based on (predictable) DB access patterns, etc. Variable length record format with field offset directory offers support for direct access to the i'th field and also supports null values. Slotted page format supports variable length records and allows records to move in a page. 21

12 Summary (Contd.) File layer keeps track of pages in a file and supports abstraction of a collection of records. Pages with free space identified using linked list or directory structure (similar to how pages in file itself are tracked; may be integrated with that). Indexes support efficient retrieval of records by mapping from values in fields to rids. Catalog relations store information about relations, indexes and views. (Information common to all records in a given collection.) 22

13 Next topic: File Organizations
Many alternatives exist. Each one is ideal for some situations, but not so good in others: Heap (random ordered) files: Suitable when typical access is a file scan retrieving all record or access comes through a variety of secondary indexes. Sorted Files: Best if records must be retrieved in some order, or only a `range’ of records is needed. Indexes: Data structures to organize records via trees or hashing. Like sorted files, they speed up searches for a subset of records, based on values in certain (“search key”) fields. Updates are much faster than in sorted files. 2

14 Cost Model We will ignore CPU costs, for simplicity, so:
B: The number of data pages R: Number of records per page D: (Average) time to read or write disk page Counting the number of page I/Os ignores gains of prefetching a sequence of pages; thus, even the real I/O cost is only roughly approximated for now. Average-case analysis; based on several simplistic assumptions. * Good enough to convey the overall trends! 3

15 Comparison of File Organizations
Heap files (random order; insert at eof) Sorted files, sorted on <age, sal> Clustered B+ tree file, Alternative (1), search key <age, sal> Heap file with unclustered B + tree index on search key <age, sal> Heap file with unclustered hash index on search key <age, sal>

16 Operations to Compare Scan: Fetch all records from disk
Equality search Range selection Insert a record Delete a record

17 Assumptions for Our Analysis
Heap Files: Equality selection on key; exactly one match. Sorted Files: File compacted after a deletion (vs. a deleted bit). 4

18 Cost of Operations * Several assumptions underlie these (rough) estimates! 5

19 Cost of Operations * Several assumptions underlie these (rough) estimates! 5


Download ppt "CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li."

Similar presentations


Ads by Google