Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.

Similar presentations


Presentation on theme: "1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS."— Presentation transcript:

1 1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS

2 2 Topics Phyical Storage Indexing Query Optimization Making ACID Work  Transactions  Concurrency  Journaling  Rollback/Rollforward  Recovery Distributed Database

3 3 Physical Storage Storage Hierarchy  Main memory  Secondary Storage (Disk)  Maybe: Tape, CD, … Generally:  Databases are too big for main memory  Databases require non-volatile storage I.e. not main memory Buffer Management: Moving data between levels in the storage hierarchy

4 4 Secondary Storage Organization Database is stored on  One or more disk files  One or more disks Database can be  One file  One or more files per table  A collection of files DBA specifies how data is spread among files

5 5 Database Files Files are organized into blocks or pages Records are stored in pages  May require that records fit in single pages  May allow records to span multiple pages  Records can be fixed or variable length Usually variable length Need to handle VARCHAR, BLOBS See DBMS documentation for:  Available strategies for DBMS  How to compute record sizes and file sizes Good rule of thumb: Database size is twice the raw data size

6 6 Record Organization Records may be organized  In insertion order  Sorted by primary key  Hashed by primary key Records may be clustered  Records of different types sharing some key are stored together  E.g. store Order and Order Line records together Best Record Organization?  For small problems: Use DBMS default  For large problems: Based on characteristics of problem

7 7 Indexing Most common indexing scheme: B Tree or B+ Tree  B Tree: Balanced Tree, tree has constant depth  Can be used for keys and non-unique indices  Allows retrieval based on key in O(log n) time (n is number of records in table) As many I/Os as there are levels in tree (+/- 1)  Allows retrieval of records based on partial value E.g. First field of two field key, but not second field  Allows retrieval of records in sorted order

8 8 Indexing Most common second choice: Hashing  Can be used for keys and non-unique indices  Retrieve records in O(1) time Typical retrievals in one or two I/Os  Can’t use partial values, doesn’t help sorting  Sometimes used for clustering of multiple records Other choices  Bit maps  Linear orderings  Two-level trees

9 9 What and How to Index DBMS will impose some constraints  Typical: Any unique key (includes primary key) must be indexed Additional indices:  Make retrieval faster  Make update slower Strategy:  Start with minimal set of indices and build up from there  Adding and dropping indices has relatively small cost in an RDBMS

10 10 Query Optimization Take a query (say an SQL Select)  Rewrite it into Relational Algebra Figure out best way to answer that query  Lowest cost?, Fastest?  Frequently, what is best way to do joins Have collection of available strategies  File scans, Use indices, External file sorts, Parallel processing, etc., etc. Most RDBMSs have mechanism to describe the strategy for a specific query  Use to analyze problematic queries

11 11 Heuristic Query Optimization Use heuristics to choose best approach Sample heuristics:  Use an index if possible  Do joins in order given in FROM clause Problem: Bad choices can be really bad Advantage: Usually allow knowledgeable user to get good behavior by specifying query the “right” way  Problem: The “right” way may change over time and/or as database grows

12 12 Cost Based Query Optimization Estimate cost of specific strategies Search space of possible strategies for “best” answer Issue: Need accurate cost estimation  Requires statistics about size, composition of database  DBA responsible for periodically running statistics gathering and updating programs Advantage: Strategy can change as database changes Advantage: Bad choices are usually not too bad Problem: Harder for knowledgeable user to control problematic queries

13 13 ACID Properties Atomicity: Transactions either complete successfully or have no effect on database Consistency: Database moves from consistent state to consistent state Isolation: Transactions that overlap in time are non- interfering  Ideal is Serializability: Overlapping transactions behave as if they were executed in some serial order Durability: Data from completed transactions is never lost

14 14 Transactions Queries, and most importantly updates, by a single application are collected together into a transaction DBMS provides transaction mechanism Application specifies transaction boundaries Example:  Adding an order is a transaction  Insertion of Order and Order Line records are collected together

15 15 Concurrency DBMS must provide mechanisms to prevent multiple transactions from concurrently modifying same parts of database DBMS can provide mechanisms for “repeatable reads”  Does application get same results if SELECT is repeated  Due to performance issues, application usually has to ask for repeatable reads

16 16 Some Concurrency Mechanisms Locks  Read and write locks on records and/or pages and/or tables  Lock escalation Read locks become write locks Record and/or page locks become file locks Optimistic  Assume everything is ok  Check at transaction completion that everything worked Versioned pages  Updates create new versions of pages  Old versions kept around as long as needed

17 17 Concurrency and Transactions Have mechanism for application to tell DBMS to begin and end transactions Must have mechanism for DBMS to tell application that “it didn’t work”  “You can’t update that record because another application has it locked”  “Your transaction can’t complete because another already completed one conflicts”  Application is responsible for re-trying as appropriate

18 18 Journaling DBMS keeps journal(s) of what has been changed in the database Journal has  “Before” images: What the database looked like before changes were made  “After” images: What the database looked like after the changes were made Before and After images may be stored separately or together Depending on algorithms: Data and/or Journal must be written to non-volatile storage before transaction is complete  Needed for durability DBA is responsible for journal management

19 19 Rollback/Rollforward Rollback undoes updates from incomplete transactions  Uses “before” images  Why? Transaction could not finish Transaction failed before End Transaction Database failed before transaction completed Rollforward redoes updates from completed transactions  Uses “after” images  Why? Database failed before updates from completed transactions written to secondary storage

20 20 Recovery On restart after failure: Perform rollback and/or rollforward as required to return database to consistent state Consistent state:  All updates from all completed transactions appear in database  No results from any incomplete transactions appear in database Provide ability to restore database from a saved copy and the journal

21 21 ACID Properties and Mechanisms Atomicity Consistency Isolation Durability Transactions  Concurrency  Journaling  Rollback / Rollforward  Recovery  DBMS provides mechanisms to support ACID properties Applications must direct DBMS properly Algorithms for ACID mechanisms are well known Implementing ACID mechanisms is tricky

22 22 Distributed Database Replication  Replicate changes between multiple copies of the database  Frequently uses deferred copying Clustering  Running database on a cluster  Needs distributed concurrency mechanisms Two phase commit: Reliable transaction commit in distributed environment  Would like to take advantage of parallel resources to speed query processing


Download ppt "1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS."

Similar presentations


Ads by Google