Database System Architecture CSCI 6442 ©Copyright 2019, David C. Roberts, all rights reserved
Agenda Relational and performance Database performance goals DBMS use of disk DBMS Architecture
Origins of the Relational Approach First appeared in Codd’s 1970 Communications of the ACM Article Emphasized data independence Emphasized more rigorous foundation for data management Performance of early relational systems so poor that it called into question the practicality of the relational approach
Moore’s Law to the Rescue! 1970 1984 1997 2007 2010 Cost $4,600,000 $4,000 $1,000 $550 $600 Speed (MHz) 12.5 8.3 166 1600 3000 Cost per MHz $368,000 $482 $6 $.34 $.10
What Does This Mean? Expensive Computer Cheap People Cheap Computer 1970 storage and processing resources were scarce and expensive computer price pays for 400 people for a year 2010 processing and storage are so cheap that they are nearly free computer price pays for one person for a day Expensive Computer Cheap People Cheap Computer Expensive People
1970 Data Models Performance the key issue Data model tailored to a business process Business process details drive the data model Code is written for the single business process Applications can be built to be rather efficient based on such a data model Correspondence between data model and a single business process is a given
The March of Technology Today’s technology advances make what was brute force and clumsy and expensive yesterday the elegant easy solution for today We may need to rethink basic approaches in the light of today’s technical economics
Relational—Mirroring Previous Systems Pre-relational: big deal to change the database structure Database structure was embedded in applications, so applications had to change Relational: huge improvement for the DBA to be able to change physical structure and not impact applications
Relational Performance Compared with a hand-coded application with custom data structures, a relational database has perhaps 10x poorer performance We are gaining more efficient use of people and paying for it with cpu cycles
Conventional Data Models Conceptual—model includes entity types, relationships Logical—model includes entity types, relationships and attributes Physical—logical model mapped onto physical structures provided by DBMS
DBMS Physical Model DBMS typically stores attributes of a single row near each other Rows in a table may be in a single file or otherwise co-located Changing entity types and their attributes in physical database is reserved for the DBA and can’t be done by applications Why? For performance (a la 1970?), and because earlier DBMSes did it that way
What a DBMS Does At its heart, a RDBMS offers three things: Tables Attributes of tables Constraints Application code is written to do CRUD operations on tables and enforce constraints Transaction processing and access control are built in to the DBMS Other important features are built on these basic ones
Components of a DBMS
DBMS Architecture Data is stored on disk Disk is necessary for database to be reliably available Disk is millions of times slower than anything that happens in RAM Number of disk accesses is a good measure of DBMS cost for an operation
Disk Disk is composed of fixed-length records, rotating around To access information, we need to move the head and wait for the disk to rotate We wait the same time whether we use one byte or all the record We call this fixed length record a page
Efficient Use of Disk For efficient use of disk, we want to use all the information contained in a single page We will look at how we organize disk in order to reduce the number of disk accesses for a search
Disk vs. RAM RAM is accessible in any order Any sort of structures can be used Data structure courses usually cover data structures for RAM We’ll talk about how to make efficient use of disk
Disk as Pages Disk is composed of fixed-length records, rotating around To access information, we need to move the head and wait for the disk to rotate We wait the same time whether we use one byte or all the record We call this fixed length record a page
Physical Implementation The DBMS gets a file from the OS that it then writes One or more of these files are managed as the database The DBMS allocates space within physical records to use a rows and index blocks A database row is implemented as a logical record in the file system Thus, the DBMS actually physically implements the row structure of the database Which has the same entity types as the conceptual data model Which has the same attributes as the logical data model
The Database Extent 1 The database may be spread across multiple physical disk drives Extent 2 Row Extent 3 <<tid>,<rid>,<cid><cli><cv>, … , <cid>,<cli>,<cv>, … >>
DBMS and Applications Database Management System Application Buffer Program Buffer Database Management System Application Program Buffer Application Program Buffer Application Program Buffer
DBMS Software Architecture Application Program Buffer System Global Area Database System Application Program Buffer Application Program Buffer Application Program Buffer
SQL Processing Lexical Analyzer Syntax Analyzer Executor Results SQL Tokens Syntax Analyzer Quads Executor Results
Executor Software Architecture SQL Executor Table Management Index Management Row Management Node Management Page Management Data Store
Question: is there a third kind of page? Pages Disk is divided into physical records called “pages” A page can be an index page (ie b-tree) or a data page Index page contains one node of a b-tree Data page contains rows of tables Question: is there a third kind of page?
Page Allocation Pages are initially considered all unallocated In response to requests, they are allocated and marked allocated When freed, they are chained onto a list of free pages
Database Extents Database needs to be able to extend over disk boundaries Size may require it Growth may require it Typically it’s managed as “extents”, each of which is a file to the OS file system Multiple files are mapped into a single sequence of page IDs
Extents SQL Executor Table Management Index Management Row Management Node Management Page Management Extent Management Data Store
The Database Extent 1 Extent 2 Row Extent 3 <<tid>,<rid>,<cid><cli><cv>, … , <cid>,<cli>,<cv>, … >>
Startup At startup, DBMS creates an empty system catalog Catalog has images of some tables; once images are established, then SQL can be used to create other tables
System Catalog The DBMS uses the system catalog to track objects of interest When the DBMS starts with a new database, it lays down part of the system catalog from an image The rest of the system catalog is created by SQL statements Many SQL statements reference or change the system catalog
System Catalog You will have the opportunity to learn more about the system catalog in your assignment.