Download presentation
Presentation is loading. Please wait.
Published byKelly Holmes Modified over 6 years ago
1
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations
Instructor: Chen Li
2
Buffer management: DBMS vs. OS File System
OS does disk space & buffer management – so why not let the OS manage these tasks…? Differences in OS support: portability issues Some limitations, e.g., files can’t span disks. Buffer management in DBMS requires ability to: pin a page in buffer pool, force a page to disk (important for implementing CC & recovery), and adjust replacement policy, and prefetch pages based on access patterns in typical DB operations. 8
3
Topic: System Catalogs
For each relation: name, file name, file structure (e.g., Heap) name, type, and length (if fixed) for each attribute index name, target, and kind for each index also integrity constraints, defaults, nullability, etc. For each index: structure (e.g., B+ tree) and search key fields For each view: view name and definition (including query) Plus statistics, authorization, buffer pool size, etc. * Catalogs themselves stored as record-based files too! 18
4
Attr_Cat(attr_name, rel_name, type, position)
19
5
Files of Records: Basic Summary
Disks provide cheap, non-volatile storage. Random access, but cost depends on location of page on disk; important to arrange data sequentially to minimize seek and rotation delays. Buffer manager brings pages into RAM. Page stays in RAM (at least!) until unpinned by last among concurrent requestors. Written to disk when frame chosen for replacement (some time after dirtying requestor unpins the page). Choice of frame to replace based on replacement policy. Could be worth prefetching several pages at a time. 20
6
Summary (Contd.) DBMS vs. OS File Support
DBMS needs features not found in many OS’s, such as forcing a page to disk, controlling the order of the page writes to disk, letting files span disks, controlling prefetching and page replacement policies based on (predictable) DB access patterns, etc. Variable length record format with field offset directory offers support for direct access to the i'th field and also supports null values. Slotted page format supports variable length records and allows records to move in a page. 21
7
Summary (Contd.) File layer keeps track of pages in a file and supports abstraction of a collection of records. Pages with free space identified using linked list or directory structure (similar to how pages in file itself are tracked; may be integrated with that). Indexes support efficient retrieval of records by mapping from values in fields to rids. Catalog relations store information about relations, indexes and views. (Information common to all records in a given collection.) 22
8
Next topic: File Organizations
Many alternatives exist. Each one is ideal for some situations, but not so good in others: Heap (random ordered) files: Suitable when typical access is a file scan retrieving all record or access comes through a variety of secondary indexes. Sorted Files: Best if records must be retrieved in some order, or only a `range’ of records is needed. Indexes: Data structures to organize records via trees or hashing. Like sorted files, they speed up searches for a subset of records, based on values in certain (“search key”) fields. Updates are much faster than in sorted files. 2
9
Cost Model We will ignore CPU costs, for simplicity, so:
B: The number of data pages R: Number of records per page D: (Average) time to read or write disk page Counting the number of page I/Os ignores gains of prefetching a sequence of pages; thus, even the real I/O cost is only roughly approximated for now. Average-case analysis; based on several simplistic assumptions. * Good enough to convey the overall trends! 3
10
Comparison of File Organizations
Heap files (random order; insert at eof) Sorted files, sorted on <age, sal>
11
Operations to Compare Scan: Fetch all records from disk
Equality search Range selection Insert a record Delete a record
12
Assumptions for Our Analysis
Heap Files: Equality selection on key; exactly one match. Sorted Files: File compacted after a deletion (vs. a deleted bit). 4
13
Cost of Operations * Several assumptions underlie these (rough) estimates! 5
14
Cost of Operations * Several assumptions underlie these (rough) estimates! 5
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.