1 Database Theory and Methodology. 2 The Good and the Bad So far we have not developed any measure of “goodness” to measure the quality of the design,

Slides:



Advertisements
Similar presentations
primary key constraint foreign key constraint
Advertisements

Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Functional Dependencies and Normalization for Relational Databases.
CS Schema Refinement and Normal Forms Chapter 19.
Normalization DB Tuning CS186 Final Review Session.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
CMSC424: Database Design Instructor: Amol Deshpande
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Schema Refinement and Normal Forms. The Evils of Redundancy v Redundancy is at the root of several problems associated with relational schemas: – redundant.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
1 Schema Refinement and Normal Forms Yanlei Diao UMass Amherst April 10, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Functional Dependencies CS 186, Spring 2006, Lecture 21 R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Schema Refinement and Normal Forms Chapter 19 Instructor: Mirsad Hadzikadic.
Schema Refinement and Normal Forms Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Database Group, Georgia Tech 1 Normalization. Database Group, Georgia Tech 2 Normalization What it’s all about Given a relation, R, and a set of functional.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
CSCD34 - Data Management Systems - A. Vaisman1 Schema Refinement and Normal Forms.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
Functional Dependencies and Normalization R&G Chapter 19 Lecture 26 Science is the knowledge of consequences, and dependence of one fact upon another.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
Functional Dependencies And Normalization R&G Chapter 19.
Database Systems/COMP4910/Spring02/Melikyan1 Schema Refinement and Normal Forms.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 Schema Refinement and Normal Forms Week 6. 2 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependencies. 2 Motivation v E/R  Relational translation problems : –Often discover more “detailed” constraints after translation (upcoming.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 15.
Functional Dependencies R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another. Thomas Hobbes ( )
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
1 Schema Refinement and Normal Forms Chapter The Evils of Redundancy  Redundancy is at the root of several problems associated with relational.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
1 Schema Refinement and Normal Forms Chapter The Evils of Redundancy  Redundancy causes several problems associated with relational schemas: 
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 CS122A: Introduction to Data Management Lecture #12: Relational DB Design Theory (1) Instructor: Chen Li.
LECTURE 5: Schema Refinement and Normal Forms 1. Redundency Problem Redundent Storage Update Anomaly If one copy updated, an inconsistancy may occur Insertion.
Functional Dependency and Normalization
Schema Refinement and Normal Forms
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Schema Refinement & Normalization Theory
Functional Dependencies
Schema Refinement and Normalization
Schema Refinement and Normal Forms
Functional Dependencies
Schema Refinement and Normalization
Chapter 19 (part 1) Functional Dependencies
Schema Refinement and Normal Forms
Schema Refinement and Normalization
Schema Refinement and Normal Forms
Presentation transcript:

1 Database Theory and Methodology

2 The Good and the Bad So far we have not developed any measure of “goodness” to measure the quality of the design, other than our intuition. Bad database design –Redundant Information –Update anomalies –Wasting storage with null values –Generation of invalid data during joins Good database design –Easy to explain its semantics –No redundancy (or at least reduced redundancy) –No information loss during joins We need formal concepts to define “goodness” and “badness” of relational schemas.

3 The Evils of Redundancy Redundancy is at the root of several problems associated with relational schemas: – Redundant storage – Insertion anomalies – Deletion anomalies – Update anomalies

4 Example: Bad Design FLIGHTS flt# date airline plane# DL242 10/23/00 Delta k-yo DL242 10/24/00 Delta t-up DL242 10/25/00 Delta o-ge AA121 10/24/00 American p-rw AA121 10/25/00 American q-yg AA411 10/22/00 American h-fe redundancy: airline name repeated for the same flight inconsistency: when airline name for a flight changes, it must be changed in many places

5 Bad Database Design insertion anomalies: how do we represent that SK912 is flown by Scandinavian without there being a date and a plane assigned? deletion anomalies: cancelling AA411 on 10/22/00 makes us lose that it is flown by American. update anomalies: if DL242 is flown by Sabena, we must change it everywhere. FLIGHTS flt# date airline plane# DL242 10/23/00 Delta k-yo DL242 10/24/00 Delta t-up DL242 10/25/00 Delta o-ge AA121 10/24/00 American p-rw AA121 10/25/00 American q-yg AA411 10/22/00 American h-fe-65748

6 Bad Database Design - decomposition FLIGHTS flt# date airline plane# DL242 10/23/00 Delta k-yo DL242 10/24/00 Delta t-up DL242 10/25/00 Delta o-ge AA121 10/24/00 American p-rw AA121 10/25/00 American q-yg AA411 10/22/00 American h-fe FLIGHTS-AIRLINE flt# airline DL242 Delta AA121 American AA411 American DATE-AIRLINE-PLANE date airline plane# 10/23/00 Delta k-yo /24/00 Delta t-up /25/00 Delta o-ge /24/00 American p-rw /25/00 American q-yg /22/00 American h-fe-65748

7 Bad Database Design - information loss FLIGHTS flt# date airline plane# DL242 10/23/00 Delta k-yo DL242 10/24/00 Delta t-up DL242 10/25/00 Delta o-ge AA121 10/24/00 American p-rw AA121 10/25/00 American q-yg AA211 10/22/00 American h-fe AA411 10/24/00 American p-rw AA411 10/25/00 American q-yg AA411 10/22/00 American h-fe DATE-AIRLINE-PLANE date airline plane# 10/23/00 Delta k-yo /24/00 Delta t-up /25/00 Delta o-ge /24/00 American p-rw /25/00 American q-yg /22/00 American h-fe FLIGHTS-AIRLINE flt# airline DL242 Delta AA121 American AA411 American information loss: we polluted the database with false facts; we can’t find the true facts.

8 Good Database Design no redundancy of FACT (!) no inconsistency no insertion, deletion or update anomalies no information loss FLIGHTS-DATE-PLANE flt# date plane# DL242 10/23/00 k-yo DL242 10/24/00 t-up DL242 10/25/00 o-ge AA121 10/24/00 p-rw AA121 10/25/00 q-yg AA411 10/22/00 h-fe FLIGHTS-AIRLINE flt# airline DL242 Delta AA121 American AA411 American

9 Goal — Devise a Theory for the Following Decide whether a particular relation R is in “good” form. In the case that a relation R is not in “good” form, decompose it into a set of relations {R 1, R 2,..., R n } such that –each relation is in good form –the decomposition is a lossless-join decomposition Our theory is based on: –functional dependencies –multivalued dependencies

10 Functional Dependencies (FDs) A functional dependency is a constraint between two sets of attributes from the database. Definition: A functional dependency, denoted by X  Y, holds over relation R if, for every allowable instance r of R: – t1  r, t2  r, t1.X = t2.X implies t1.Y =t2.Y – i.e., given two tuples in r, if the X values agree, then the Y values must also agree. (X and Y are sets of attributes.) An FD is a statement about all allowable relations. – Must be identified based on semantics of application. – Given some allowable instance r1 of R, we can check if it violates some FD f, but we cannot tell if f holds over R! If X  Y in R, this does not say whether or not Y  X.

11 Functional Dependencies and Keys A key constraint is a special case of an FD. Note the following: –If X is a candidate key, X  Y for any subset of attributes Y of R. –X  Y does not require that the set X be minimal; the additional minimality condition must be met for X to be a key. –If X  Y holds where Y is the set of all attributes in R, and there is some subset V of X s.t. V  Y, then X is a superkey.

12 Example: Constraints on Entity Set Consider relation obtained from Hourly_Emps: – Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked) Notation: We will denote this relation schema by listing the attributes: SNLRWH – This is really the set of attributes {S,N,L,R,W,H}. – Sometimes, we will refer to all attributes of a relation by using the relation name. (e.g., Hourly_Emps for SNLRWH) Some FDs on Hourly_Emps: – ssn is the key: S  SNLRWH – rating determines hrly_wages: R  W

13 Example (Contd.) Problems due to R  W : – Update anomaly: Can we change W in just the 1st tuple of SNLRWH? – Insertion anomaly: What if we want to insert an employee and don’t know the hourly wage for his rating? – Deletion anomaly: If we delete all employees with rating 5, we lose the information about the wage for rating 5! Will two smaller tables be better? Wages Hourly_Emps2

14 Closure of a set of FDs Given some FDs, we can usually infer additional FDs: – ssn  did, did  lot implies ssn  lot – mgr_ssn  mgr_phone, dept_no  mgr_ssn implies dept_no  mgr_phone An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold. Formally, the set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F; it is denoted by F +.

15 Inference Rules for FDs Armstrong’s Axioms (X, Y, Z are sets of attributes): – Reflexivity: If Y  X then X  Y. – Augmentation: If X  Y, then XZ  YZ for any Z. – Transitivity: If X  Y and Y  Z, then X  Z. These are sound and complete inference rules for FDs! – By sound we mean that they generate only FDs in F + when applied to a set F of FDs. – By complete we mean that repeated application of these rules will generate all FDs in the closure of F +.

16 Derived inference rules Union: if X  Y and X  Z, then X  YZ. Decomposition: if X  YZ, then X  Y and X  Z. Pseudotransitivity: if X  Y and WY  Z, then WX  Z. These additional rules are not essential; their soundness can be proved using Armstrong’s Axioms. Exercise: Prove rules Union, Decomposition and Pseudotransitivity using A.A.

17 Example Consider the following schema: Contracts(cid,sid,jid,did,pid,qty,value) – C is the key: C  CSJDPQV – Project purchases each part using single contract: JP  C – Dept purchases at most one part from a supplier: SD  P JP  C, C  CSJDPQV imply JP  CSJDPQV SD  P implies SDJ  JP SDJ  JP, JP  CSJDPQV imply SDJ  CSJDPQV –We can infer several additional FDs that are in the closure by using augmentation or decomposition. e.g. C  CSJDPQV implies C  C, C  S,C  J, C  D, C  P, C  Q, C  V. –We have also a number of trivial FDs from reflexivity rule.

18 Reasoning About FDs Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in # attrs!) Typically, we just want to check if a given FD X  Y is in the closure of a set of FDs F. An efficient check: 1. Compute attribute closure of X (denoted X + ) wrt F: Set of all attributes A such that X  A is in F +. There is a linear time algorithm to compute this. 2. Check if Y is in X +. Does F = {A  B, B  C, CD  E } imply A  E? – i.e, is A  E in the closure F + ? (Equivalently, is E in X + )?

19 Attribute Closure X + = X; repeat { oldX + = X + ; for each functional dependency Y  Z in F do if Y  X + then X + = X +  Z; until X + = oldX +

20 Example Given the following set of FDs: F= {Ssn  Ename, Pnumber  {Pname, Plocation}, {Ssn,Pnumber}  Hours } The attribute closures: {Ssn} + = {Ssn, Ename} {Pnumber} + = {Pnumber, Pname,Plocation} {Ssn,Pnumber} + = {Ssn, Pnumber, Ename, Pname, Plocation, Hours}

21 Example R = (A, B, C, G, H, I) F = {A  B A  C CG  H CG  I B  H} (AG) + 1. result = AG. 2. result = ABCG(A  C and A  B) 3. result = ABCGH(CG  H and CG  AGBC) 4.result = ABCGHI(CG  I and CG  AGBCH) Is AG a candidate key? 1. Is AG a super key? Does AG  R? == Is (AG) +  R 2. Is any subset of AG a superkey? Does A  R? == Is (A) +  R Does G  R? == Is (G) +  R

22 Finding a Key for a Relation Algorithm: Finding a Key K for R, given a set of FDs 1.Set K = R. 2.For each attribute A in K { Compute (K-A) + wrt F; If (K-A) + contains all the attributes in R, then set K = K – {A} };

23 Example Consider the following schema and the set of FDs: R = (Ssn, Pnumber, Ename, Pname, Plocation, Hours) F = {Ssn  Ename, Pnumber  {Pname, Plocation}, {Ssn,Pnumber}  Hours} The Key is {Ssn,Pnumber}, since {Ssn,Pnumber} + = {Ssn, Pnumber, Ename, Pname, Plocation, Hours}

24 Equivalence of two sets of FDs Definition: A set of FDs F covers another set of FDs E if every FD in E is also in F +. Definition: F and E are equivalent if F + = E + (i.e. F covers E and E covers F). We can determine whether F covers E by calculating X + with respect to F for each FD X  Y in E, and then checking whether this X + includes the attributes in Y. If this is the case for every FD in E, then F covers E. F E F+F+

25 Minimal Cover Sets of functional dependencies may have redundant dependencies that can be inferred from the others –Eg: A  C is redundant in: {A  B, B  C, A  C} –Parts of a functional dependency may be redundant E.g. on RHS: {A  B, B  C, A  CD} can be simplified to {A  B, B  C, A  D} E.g. on LHS: {A  B, B  C, AC  D} can be simplified to {A  B, B  C, A  D} Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies

26 Minimal Sets of FDs A set of FDs F is minimal if: Every dependency in F has a single attribute for its right-hand side. We can’t replace any dependency X  A in F with a dependency Y  A where Y  X and still have a set of dependencies equivalent with F. –This ensures that there are no redundancies by having redundant attributes on the left-hand side of a dependency. We can’t remove any dependency from F and still have a set of dependencies equivalent with F. –This ensures that there are no redundancies by having a dependency that can be inferred from the remaining FDs in F.

27 Minimal Cover of a set of FDs A minimal cover of a set of FDs E is a minimal set of dependencies F that is equivalent to E. There can be several minimal covers for a set of FDs. Algorithm: Finding Minimal Cover F for a set of Fds E. 1. Set F = E. 2. Replace each Fd X  {A1,..., An} in F by the n Fds X  A1,..., X  An. 3. For each Fd X  A in F For each attribute B  X If {{F - {X  A}}  {(X - {B})  A}}  F Then replace X  A with (X - {B})  A in F. 4. For each remaining Fd X  A in F If {F - {X  A}}  F, then remove X  A from F.

28 Examples E = { A  B, ABCD  E, EF  GH, ACDF  EG } has the following minimal cover: F = {A  B, ACD  E, EF  G, EF  H } E = {Ssn  Ename, Pnumber  {Pname, Plocation}, {Ssn,Pnumber}  Hours}} F= {Ssn  Ename, Pnumber  Pname, Pnumber  Plocation, {Ssn,Pnumber}  Hours}

29 Normal Forms Returning to the issue of schema refinement, the first question to ask is whether any refinement is needed! If a relation is in a certain normal form (BCNF, 3NF etc.), it is known that certain kinds of problems are avoided/minimized. This can be used to help us decide whether decomposing the relation will help. Role of FDs in detecting redundancy: – Consider a relation R with 3 attributes, ABC. No FDs hold: There is no redundancy here. Given A  B: Several tuples could have the same A value, and if so, they’ll all have the same B value!

30 Normalization Normalization of data is a process of analyzing the given relational schemas based on their FDs and primary keys to achieve the desirable properties of –minimizing redundancy –minimizing update, insert and delete anomalies. The process of normalization through decomposition must also confirm two additional properties: 1.Lossless join property (nonadditive join property) 2.Dependency preservation property

31 NF 2 1NF 2NF 3NF BCNF Overview of NFs

32 Normal Forms NF 2 : non-first normal form. 1NF: R is in 1NF iff all domain values are atomic. 2NF: R is in 2. NF iff R is in 1NF and every non-key attribute is fully dependent on the key. 3NF: R is in 3NF iff R is 2NF and every nonkey attribute is non- transitively dependent on the key BCNF: R is in BCNF iff every determinant is a candidate key –Determinant: an attribute on which some other attribute(s) is fully functionally dependent.

33 1NF: Atomic Values FLT-SCHEDULE flt# weekday airline dtime from atime to DL242 MO WE FR DELTA 10:40 ATL 12:30 BOS SK912 SA SU SAS 12:00 CPH 15:30 JFK AA242 MO FR AA 08:00 CHI 10:10 ATL Attributes must be defined over domains with atomic values FLT-SCHEDULE flt# weekday airline dtime from atime to DL242 MO DELTA 10:40 ATL 12:30 BOS SK912 SA SAS 12:00 CPH 15:30 JFK AA242 MO AA 08:00 CHI 10:10 ATL DL242 WE DELTA 10:40 ATL 12:30 BOS DL242 FR DELTA 10:40 ATL 12:30 BOS SK912 SU SAS 12:00 CPH 15:30 JFK AA242 FR AA 08:00 CHI 10:10 ATL

34 2NF: Full Functional Dependency An FD X  Y is a full functional dependency if any attribute A  X, X – {A} does not functionally determine Y. An FD X  Y is a partial functional dependency if some attribute A  X, (X – {A})  Y. E.g. {ssn, proj_num}  hours is a full FD. {ssn, proj_num}  ename is a partial FD. Definition: A relation schema R is in 2NF if R is in 1NF and every nonprime attribute A is fully functionally dependent on the primary key.

35 Third Normal Form (3NF) Relation R with FDs F is in 3NF if, for all X  A in F + – A  X (called a trivial FD), or – X contains a key for R, or – A is part of some key for R. Minimality of a key is crucial in third condition above! If R is in 3NF, obviously in 2NF. If R is in 3NF, some redundancy is still possible. It is a compromise, used when BCNF not achievable (e.g., no ``good’’ decomposition, or performance considerations). – Lossless-join, dependency-preserving decomposition of R into a collection of 3NF relations always possible.

36 What Does 3NF Achieve? If 3NF is violated by X  A, one of the following holds: – X is a subset of some key K We store (X, A) pairs redundantly. – X is not a proper subset of any key. There is a chain of FDs K  X  A, which means that we cannot associate an X value with a K value unless we also associate an A value with an X value. But: even if relation is in 3NF, these problems could arise. – e.g., Reserves SBDC, S  C, C  S is in 3NF, but for each reservation of sailor S, same (S, C) pair is stored.

37 Boyce-Codd Normal Form (BCNF) Reln R with FDs F is in BCNF if, for all X  A in F + – A X (called a trivial FD), or – X contains a key for R. In other words, R is in BCNF if the only non-trivial FDs that hold over R are key constraints. – No dependency in R that can be predicted using FDs alone. – If we are shown two tuples that agree upon the X value, we cannot infer the A value in one tuple from the A value in the other. – If example relation is in BCNF, then 2 tuples must be identical (since X is a key).