CS 540 Database Management Systems Lecture 4: DBMS Architecture, storage
The advantage of RDBMS It separates logical level (schema) from physical level (implementation). Physical data independence Users do not worry about how their data is stored and processes on the physical devices. It is all SQL! Their queries work over (almost) all RDBMS deployments.
Challenges in physical level Processor: 10000 – 100000 MIPS Main memory: around 10 Gb/ sec. Secondary storage: higher capacity and durability Disk random access Seek time + rotational latency + transfer time Seek time: 4 ms - 15 ms! Rotational latency: 2 ms – 7 ms! Transfer time: at most 1000 Mb/ sec Read, write in blocks.
Random access versus sequential access Disk random access : Seek time + rotational latency + transfer time. Disk sequential access: reading blocks next to each other No seek time or rotational latency Much faster than random access
DBMS Architecture User/Web Forms/Applications/DBA query transaction Process manager Query Parser Transaction Manager Query Rewriter Logging & Recovery Query Optimizer Lock Manager Query Executor Files & Access Methods Lock Tables Buffers Buffer Manager Main Memory Storage Manager Storage
DBMS Architecture User/Web Forms/Applications/DBA query transaction Process manager Query Parser Transaction Manager Query Rewriter Logging & Recovery Query Optimizer Lock Manager Query Executor Files & Access Methods Lock Tables Buffers Buffer Manager This lecture Main Memory Storage Manager Storage
A Design Dilemma To what extent should we reuse OS services? Reuse as much as we can Performance problem (inefficient) Lack of control (incorrect crash recovery) Replicating some OS functions (“mini OS”) Have its own buffer pool Directly manage record structures with files …
OS vs. DBMS Similarities? What do they manage? What do they provide?
OS vs. DBMS: Similarities Purpose of an OS: managing hardware presenting interface abstraction to applications DBMS is in some sense an OS? DBMS manages data Both as API for application development!
OS vs. DBMS: Related Concepts Process Management What DB concepts? process synchronization deadlock handling Storage management What DB concepts? virtual memory file system
OS vs. DBMS: Differences?
OS vs. DBMS: Differences DBMS: Top-down to encapsulate high-level semantics! Data data with particular logical structures Queries query language with well defined operations Transactions transactions with ACID properties OS: Bottom-up to present low-level hardware
Problems with DBMS on top of OS Buffer pool management File system Process management Consistency control Paged virtual memory
Buffer Pool Management Performance of system calls LRU replacement Query-aware replacement needed for performance Circular access: 1, 2, …, n, 1, 2, .. Prefetching DBMS knows exactly which block is to be fetched next Crash recovery Need “selected force out”
Relations vs. File system Data object abstraction file: array of characters relation: set of tuples Physical contiguity: large DB files want clustering of blocks sol1: managing raw disks by DBMS sol2: simulate by managing free spaces in DBMS Multiple trees (access methods) file access: directory hierarchy (user access method) block access: inodes tuple access: DBMS indexes - Sol2: DBMS asks OS for large-than-needed-now chunks, and manage space within DBMS
Process management Reuse OS process management One process for each user Problem: DB processes are large long time to switch between processes Problem: critical sections Processes may have to wait for a descheduled process that has locks. n server processes that handle users’ requests duplication of OS multi-tasking inside servers! communication between processes: Message passing is not efficient Solutions: OS implements favored processes not forced out, relinquish the control voluntarily. faster message passing methods.
Consistency control OS provides some support for locking and recovery. OS provides lock on files DB requires lock on smaller units like tuples Commit point Buffer manager ensures all changes are flushed on disk. Buffer manager must know the inside of transactions.
State of the art DBMSs duplicate some OS functionalities. OS customized for DBMS