Database Management Systems 1 Raghu Ramakrishnan Database Design Review Aug 24, 2015 Instructor: Xintao Wu.

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Query Optimization Goal: Declarative SQL query
Database Management Systems 1 Raghu Ramakrishnan SQL: Queries, Programming, Triggers Chpt 5.
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
1 Relational Query Optimization Module 5, Lecture 2.
Relational Query Optimization 198:541. Overview of Query Optimization  Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
FALL 2004CENG 351 File Structures and Data Management1 SQL: Structured Query Language Chapter 5.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
Overview of Query Evaluation R&G Chapter 12 Lecture 13.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
Schema Refinement and Normal Forms. The Evils of Redundancy v Redundancy is at the root of several problems associated with relational schemas: – redundant.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Constraints, Triggers Chapter 5.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
1 Schema Refinement and Normal Forms Yanlei Diao UMass Amherst April 10, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Overview of Query Optimization v Plan : Tree of R.A. ops, with choice of alg for each op. –Each operator typically implemented using a `pull’ interface:
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Schema Refinement and Normal Forms Chapter 19 Instructor: Mirsad Hadzikadic.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
CSCD34 - Data Management Systems - A. Vaisman1 Schema Refinement and Normal Forms.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
Database Systems/comp4910/spring20031 Evaluation of Relational Operations Why does a DBMS implements several algorithms for each algebra operation? What.
SQL: Queries, Programming, Triggers. Example Instances We will use these instances of the Sailors and Reserves relations in our examples. If the key for.
1 Database Systems ( 資料庫系統 ) December 3, 2008 Lecture #10.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Database Systems/COMP4910/Spring02/Melikyan1 Schema Refinement and Normal Forms.
1 Schema Refinement and Normal Forms Week 6. 2 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 15.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
CMPT 258 Database Systems Final Exam Review.
1 Schema Refinement and Normal Forms Chapter The Evils of Redundancy  Redundancy is at the root of several problems associated with relational.
Implementation of Database Systems, Jarek Gryz1 Relational Query Optimization Chapters 12.
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples. v If the.
1 Database Systems ( 資料庫系統 ) Chapter 12 Overview of Query Evaluation November 22, 2004 By Hao-hua Chu ( 朱浩華 )
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction To Query Optimization and Examples Chpt
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
1 CS122A: Introduction to Data Management Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li.
Schema Refinement and Normal Forms
Introduction to Query Optimization
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
Schema Refinement and Normalization
Schema Refinement and Normal Forms
Relational Query Optimization
Overview of Query Evaluation
Relational Query Optimization
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Overview of Query Evaluation
Relational Query Optimization
Relational Query Optimization
Relational Algebra Chpt 4a Xintao Wu Raghu Ramakrishnan
Presentation transcript:

Database Management Systems 1 Raghu Ramakrishnan Database Design Review Aug 24, 2015 Instructor: Xintao Wu

Database Management Systems 2 Raghu Ramakrishnan Outline v E-R design v Relational model design v SQL conceptual evaluation v Normalization v Query optimization

Database Management Systems 3 Raghu Ramakrishnan Overview of db design v Requirement analysis –Data to be stored –Applications to be built –Operations (most frequent) subject to performance requirement v Conceptual db design –Description of the data (including constraints) –By high level model such as ER v Logical db design –Choose dbms to implement –Convert conceptual db design into database schema v Schema refinement ( normalization) v Physical db design –Analyze the workload –Refine db design to meet performance criteria (focus on Indexing ) v Security design

Database Management Systems 4 Raghu Ramakrishnan E-R Diagram Design lot dname budget did since name Works_In Departments Employees ssn

Database Management Systems 5 Raghu Ramakrishnan From E-R to Relational Tables v In translating a relationship set to a relation, attributes of the relation must include: – Keys for each participating entity set (as foreign keys). u This set of attributes forms a superkey for the relation. – All descriptive attributes. CREATE TABLE Works_In( ssn CHAR (1), did INTEGER, since DATE, PRIMARY KEY (ssn, did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)

Database Management Systems 6 Raghu Ramakrishnan Constraints v Key constraint v Does every department have a manager? – If so, this is a participation constraint : the participation of Departments in Manages is said to be total (vs. partial ). lot name dname budgetdid since name dname budgetdid since Manages since Departments Employees ssn Works_In

Database Management Systems 7 Raghu Ramakrishnan Enforcing Constraints in Scheme CREATE TABLE Dept_Mgr( did INTEGER, dname CHAR(20), budget REAL, ssn CHAR(11) NOT NULL, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, ON DELETE NO ACTION )

Database Management Systems 8 Raghu Ramakrishnan Outline v E-R design v Relational model design v SQL conceptual evaluation v Normalization v Query optimization

Database Management Systems 9 Raghu Ramakrishnan v relation-list A list of relation names (possibly with a range-variable after each name). v target-list A list of attributes of relations in relation-list v qualification Comparisons (Attr op const or Attr1 op Attr2, where op is one of ) combined using AND, OR and NOT. v DISTINCT is an optional keyword indicating that the answer should not contain duplicates. Default is that duplicates are not eliminated! Basic SQL Query SELECT [DISTINCT] target-list FROM relation-list WHERE qualification

Database Management Systems 10 Raghu Ramakrishnan Conceptual Evaluation Strategy v Semantics of an SQL query defined in terms of the following conceptual evaluation strategy: –Compute the cross-product of relation-list. –Discard resulting tuples if they fail qualifications. –Delete attributes that are not in target-list. –If DISTINCT is specified, eliminate duplicate rows. v This strategy is probably the least efficient way to compute a query! An optimizer will find more efficient strategies to compute the same answers.

Database Management Systems 11 Raghu Ramakrishnan Example Instances R1 S1 v We will use these instances of the Sailors and Reserves relations in our examples. v If the key for the Reserves relation contained only the attributes sid and bid, how would the semantics differ?

Database Management Systems 12 Raghu Ramakrishnan Example of Conceptual Evaluation SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103

Database Management Systems 13 Raghu Ramakrishnan Division in SQL SELECT S.sname FROM Sailors S WHERE NOT EXISTS (( SELECT B.bid FROM Boats B) EXCEPT ( SELECT R.bid FROM Reserves R WHERE R.sid=S.sid)) Find sailors who’ve reserved all boats.

Database Management Systems 14 Raghu Ramakrishnan Conceptual Evaluation Step v Cross product to get all rows v Select rows that satisfy the condition specified in the where clause. v Remove unnecessary fields. v From the remaining rows form groups according to the group- by clause. v Discard all groups that do not satisfy the condition in the having clause. v Apply aggregate function to each group. v Retrieve values for the columns and aggregations listed in the select clause.

Database Management Systems 15 Raghu Ramakrishnan Find the age of the youngest sailor with age 18, for each rating with at least 2 such sailors v Only S.rating and S.age are mentioned in the SELECT, GROUP BY or HAVING clauses; other attributes ` unnecessary ’. SELECT S.rating, MIN (S.age) FROM Sailors S WHERE S.age >= 18 GROUP BY S.rating HAVING COUNT (*) > 1 Answer relation

Database Management Systems 16 Raghu Ramakrishnan Outline v E-R design v Relational model design v SQL conceptual evaluation v Normalization v Query optimization

Database Management Systems 17 Raghu Ramakrishnan Design problems v Problems due to R W : – Update anomaly : Can we change W in just the 1st tuple of SNLRWH? – Insertion anomaly : What if we want to record rating- wage information if there is no employee with such rating? – Deletion anomaly : If we delete all employees with rating 5, we lose the information about the wage for rating 5!

Database Management Systems 18 Raghu Ramakrishnan Example (Contd.) v Do we have problems with the decomposed tables? – Update anomaly – Insertion anomaly – Deletion anomaly Hourly_Emps2 Wages

Database Management Systems 19 Raghu Ramakrishnan Functional Dependencies (FDs) v A functional dependency X Y holds over relation R if, for every allowable instance r of R: – t1 r, t2 r, ( t1 ) = ( t2 ) implies ( t1 ) = ( t2 ) –i.e., given two tuples in r, if the X values agree, then the Y values must also agree. (X and Y are sets of attributes.) v An FD is a statement about all allowable relations. –Must be identified based on semantics of enterprise. –Given some allowable instance r1 of R, we can check if it violates some FD f, but we cannot tell if f holds over R!

Database Management Systems 20 Raghu Ramakrishnan Example v A relation R has attributes (S, C, T, R, G) which denotes student, course, time, room, and grade respectively. From requirements, the following FDs hold. –SC G –ST R –C T –TR C

Database Management Systems 21 Raghu Ramakrishnan Attribute Closure v Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in # attrs!) v Typically, we just want to check if a given FD X Y is in the closure of a set of FDs F. An efficient check: –Compute attribute closure of X (denoted ) wrt F: u Set of all attributes A such that X A is in u There is a linear time algorithm to compute this. –Check if Y is in

Database Management Systems 22 Raghu Ramakrishnan Attribute Closure v Algorithm –Closure = X; –Repeat until there is no change{ u If there is an FD in F such that U closure u Then set closure = closure V}

Database Management Systems 23 Raghu Ramakrishnan Boyce-Codd Normal Form (BCNF) v Reln R with FDs F is in BCNF if, for all X A in –A X (called a trivial FD), or –X contains a key for R. v In other words, R is in BCNF if the only non-trivial FDs that hold over R are key constraints. –No redundancy in R that can be predicted using FDs alone.

Database Management Systems 24 Raghu Ramakrishnan Third Normal Form (3NF) v Reln R with FDs F is in 3NF if, for all X A in –A X (called a trivial FD), or –X contains a key for R, or –A is part of some key for R. v Minimality of a key is crucial in third condition above! v If R is in BCNF, obviously in 3NF. v If R is in 3NF, some redundancy is possible. It is a compromise, used when BCNF not achievable (e.g., no “good’’ decomp, or performance considerations). – Lossless-join, dependency-preserving decomposition of R into a collection of 3NF relations always possible.

Database Management Systems 25 Raghu Ramakrishnan Decomposition of a Relation Scheme v Suppose that relation R contains attributes A1... An. A decomposition of R consists of replacing R by two or more relations such that: –Each new relation scheme contains a subset of the attributes of R (and no attributes that do not appear in R), and –Every attribute of R appears as an attribute of one of the new relations. v Intuitively, decomposing R means we will store instances of the relation schemes produced by the decomposition, instead of instances of R. v E.g., Can decompose SNLRWH into SNLRH and RW.

Database Management Systems 26 Raghu Ramakrishnan Problems with Decompositions v There are three potential problems to consider: ¶ Some queries become more expensive. u e.g., How much did sailor Joe earn? (salary = W*H) · Given instances of the decomposed relations, we may not be able to reconstruct the corresponding instance of the original relation! u Fortunately, not in the SNLRWH example. ¸ Checking some dependencies may require joining the instances of the decomposed relations. u Fortunately, not in the SNLRWH example. v Tradeoff : Must consider these issues vs. redundancy.

Database Management Systems 27 Raghu Ramakrishnan Lossless Join Decompositions v Decomposition of R into X and Y is lossless-join w.r.t. a set of FDs F if, for every instance r that satisfies F: – ( r ) ( r ) = r v Definition extended to decomposition into 3 or more relations in a straightforward way. v It is essential that all decompositions used to deal with redundancy be lossless! (Avoids Problem (2).)

Database Management Systems 28 Raghu Ramakrishnan More on Lossless Join v The decomposition of R into X and Y is lossless-join wrt F if and only if the closure of F contains: –X Y X, or –X Y Y v In particular, the decomposition of R into UV and R - V is lossless-join if U V is empty and U V holds over R. Not a Lossless Join

Database Management Systems 29 Raghu Ramakrishnan Decomposition into BCNF v Consider relation R with FDs F. If X Y violates BCNF and Y is single attribute, decompose R into R - Y and XY. –Repeated application of this idea will give us a collection of relations that are in BCNF; lossless join decomposition, and guaranteed to terminate. –e.g., CSJDPQV, key C, JP C, SD P, J S –To deal with SD P, decompose into SDP, CSJDQV. –To deal with J S, decompose CSJDQV into JS and CJDQV v In general, several dependencies may cause violation of BCNF. The order in which we ``deal with’’ them could lead to very different sets of relations!

Database Management Systems 30 Raghu Ramakrishnan Outline v E-R design v Relational model design v SQL conceptual evaluation v Normalization v Query optimization

Database Management Systems 31 Raghu Ramakrishnan Overview of Query Optimization v Plan : Tree of R.A. ops, with choice of alg for each op. – Each operator typically implemented using a `pull’ interface: when an operator is `pulled’ for the next output tuples, it `pulls’ on its inputs and computes them. v Two main issues: – For a given query, what plans are considered? u Algorithm to search plan space for cheapest (estimated) plan. – How is the cost of a plan estimated? v Ideally: Want to find best plan. Practically: Avoid worst plans! v We will study the System R approach.

Database Management Systems 32 Raghu Ramakrishnan Schema for Examples v Similar to old schema; rname added for variations. v Reserves: – Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. v Sailors: – Each tuple is 50 bytes long, 80 tuples per page, 500 pages. Sailors ( sid : integer, sname : string, rating : integer, age : real) Reserves ( sid : integer, bid : integer, day : dates, rname : string)

Database Management Systems 33 Raghu Ramakrishnan Motivating Example v Cost: *1000 I/Os v By no means the worst plan! v Misses several opportunities: selections could have been `pushed’ earlier, no use is made of any available indexes, etc. v Goal of optimization: To find more efficient plans that compute the same answer. SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Reserves Sailors sid=sid bid=100 rating > 5 sname Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) RA Tree: Plan:

Database Management Systems 34 Raghu Ramakrishnan Alternative Plans 1 (No Indexes) v Main difference: push selects. v With 5 buffers, cost of plan: – Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution). – Scan Sailors (500) + write temp T2 (250 pages, if we have 10 ratings). – Sort T1 (2*2*10), sort T2 (2*4*250), merge (10+250) – Total: 4060 page I/Os. v If we used BNL join, join cost = 10+4*250, total cost = v If we `push’ projections, T1 has only sid, T2 only sid and sname : – T1 fits in 3 pages, cost of BNL drops to under 250 pages, total < Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join)

Database Management Systems 35 Raghu Ramakrishnan Alternative Plans 2 With Indexes v With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages. v INL with pipelining (outer is not materialized). v Decision not to push rating>5 before the join is based on availability of sid index on Sailors. v Cost: Selection of Reserves tuples (10 I/Os); for each, must get matching Sailors tuple (1000*1.2); total 1210 I/Os. v Join column sid is a key for Sailors. –At most one matching tuple, unclustered index on sid OK. –Projecting out unnecessary fields from outer doesn’t help. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Use hash index; do not write result to temp) (Index Nested Loops, with pipelining ) (On-the-fly)

Database Management Systems 36 Raghu Ramakrishnan Statistics and Catalogs v Need information about the relations and indexes involved. Catalogs typically contain at least: – # tuples (NTuples) and # pages (NPages) for each relation. – # distinct key values (NKeys) and NPages for each index. – Index height, low/high key values (Low/High) for each tree index. v Catalogs updated periodically. – Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok. v More detailed information (e.g., histograms of the values in some field) are sometimes stored.