Functional Dependencies and Normalization Jose M. Peña

Slides:



Advertisements
Similar presentations
Functional Dependencies and Normalization for Relational Databases
Advertisements

primary key constraint foreign key constraint
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Ch 10, Functional Dependencies and Normal forms
Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization for Relational Databases
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
CMSC424: Database Design Instructor: Amol Deshpande
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
METU Department of Computer Eng Ceng 302 Introduction to DBMS Functional Dependencies and Normalization for Relational Databases by Pinar Senkul resources:
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
Databases 6: Normalization
Announcements Homework 1 due Friday. Slip it under my office door (1155) or put in my mailbox on 5 th floor. Program 2 has been graded ;-( Program 3 out.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Functional Dependencies and Normalization for Relational Databases.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 6 NORMALIZATION FOR RELATIONAL DATABASES Instructor Ms. Arwa Binsaleh.
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Normalization for Relational Databases.
DatabaseIM ISU1 Chapter 10 Functional Dependencies and Normalization for RDBs Fundamentals of Database Systems.
Lecture 1 of Advanced Databases Basic Concepts Instructor: Mr.Ahmed Al Astal.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Lecture 5: Functional dependencies and normalization Jose M. Peña
Topic 10 Functional Dependencies and Normalization for Relational Databases Faculty of Information Science and Technology Mahanakorn University of Technology.
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Functional Dependencies and Normalization for Relational Databases.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
Chapter Functional Dependencies and Normalization for Relational Databases.
CSE314 Database Systems Basics of Functional Dependencies and Normalization for Relational Databases Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
1 Functional Dependencies and Normalization Chapter 15.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Lecture 8: Database Concepts May 4, Outline From last lecture: creating views Normalization.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Design Process - Where are we?
Normalization.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
CPSC 603 Database Systems Lecturer: Laurie Webster II, M.S.S.E., M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 5 Introduction to a First Course in Database Systems.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Al-Imam University Girls Education Center Collage of Computer Science 1 st Semester, 1432/1433H Chapter 10_part 1 Functional Dependencies and Normalization.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
10/3/2017.
Functional Dependency and Normalization
Functional Dependencies and Normalization for Relational Databases
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization for RDBs
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Database Management systems Subject Code: 10CS54 Prepared By:
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Schema Refinement and Normal Forms
Lecture 5: Functional dependencies and normalization
Relational Database Design
Presentation transcript:

Functional Dependencies and Normalization Jose M. Peña

2 Overview Real world Model QueriesAnswers Databases Physical database DBMS Processing of queries and updates Access to stored data

3 Good Design Can we be sure that a translation from EER-diagram to relational tables results in good database design? Confronted with a deployed database, how can we be sure that it is well-designed? What is good database design?  Four informal measures  Formal measure: normalization

4 Informal design guideline Easy to explain semantics of the relation schema Reducing redundant information in tuples Redundancy causes waste of space and update anomalies:  Insertion anomalies  Deletion anomalies  Modification anomalies EMP(EMPID, EMPNAME, DEPTNAME, DEPTMGR) 123 Smith Research Wong Research Borg Administration null

5 Informal design guideline Sometimes, it may be desirable to have redundancy to gain in runtime, i.e. trade space for time. In that case and to avoid update anomalies  either, use triggers or stored procedures to update the base tables  or, keep the base tables free of redundancy and use views (assuming that the views are materialized to avoid too many joins).

6 Informal design guideline Reducing NULL values in tuples Why  Efficient use of space  Avoid costly outer joins  Ambiguous interpretation (unknown vs. doesn’t apply). Disallow the possibility of generating spurious tuples  Figures 10.5 and 10.6: cartesian product results in incorrect tuples  Only join on foreign key/primary key-attributes  Lossless join property: guarantees that the spurious tuple generation problem does not occur

Remarks Relational schema: The header of the table. Relation: The data in the table. Relation is a set, i.e. no duplicates exist. 7

8 Functional dependencies (FD) Let R be a relational schema with the attributes A 1,...,A n and let X and Y be subsets of {A 1,...,A n }. Let r(R) denote a relation in relational schema R. Despite the mathematical definition an FD cannot be determined automatically. It is a property of the semantics of attributes. We say that X functionally determines Y, X   Y if for each pair of tuples t 1, t 2  r(R) and for all relations r(R): If t 1 [X] = t 2 [X] then we must also have t 1 [Y] = t 2 [Y] We say that X functionally determines Y, X   Y if for each pair of tuples t 1, t 2  r(R) and for all relations r(R): If t 1 [X] = t 2 [X] then we must also have t 1 [Y] = t 2 [Y]

9 Inference rules 1. If X  Y then X  Y, or X  X (reflexive rule) 2. X  Y |= XZ  YZ (augmentation rule) 3. X  Y, Y  Z |= X  Z (transitive rule) 4. X  YZ |= X  Y (decomposition rule) 5. X  Y, X  Z |= X  YZ (union or additive rule) 6. X  Y, WY  Z |= WX  Z (pseudotransitive rule)

10 Inference rules Textbook, page 341: ”… X  A, and Y  B does not imply that XY  AB.” Prove that this statement is wrong. Prove inference rules 4, 5 and 6 by using only inference rules 1, 2 and 3.

11 Definitions Superkey: a set of attributes uniquely (but not necessarily minimally!) identifying a tuple of a relation. Key: A set of attributes that uniquely and minimally identifies a tuple of a relation. Candidate key: If there is more than one key in a relation, the keys are called candidate keys. Primary key: One candidate key is chosen to be the primary key. Prime attribute: An attribute A that is part of a candidate key X (vs. nonprime attribute) For any relation extension or state

12 Normal Forms 1NF, 2NF, 3NF, BCNF (4NF, 5NF) Minimize redundancy Minimize update anomalies Normal form ↑ = redundancy and update anomalies ↓ and relations become smaller. Join operation to recover original relations.

13 1NF 1NF: The relation should have no non-atomic values. IDNameLivesIn 100Pettersson{Stockholm, Linköping} 101Andersson{Linköping} 102Svensson{Ystad, Hjo, Berlin} R non1NF IDName 100Pettersson 101Andersson 102Svensson IDLivesIn 100Stockholm 100Linköping 101Linköping 102Ystad 102Hjo 102Berlin R1 1NF R2 1NF Normalization What about multi-valued attributes ?

14 2NF 2NF: no nonprime attribute should be functionally dependent on a part of a candidate key. EmpIDDeptWork%EmpName 100Dev50Baker 100Support50Baker 200Dev80Miller R non2NF EmpIDEmpName 100Baker 200Miller EmpIDDeptWork% 100Dev50 100Support50 200Dev80 R1 2NF R2 2NF Normalization

15 No 2NF: A part of a candidate key can have repeated values in the relation and, thus, so can have the nonprime attribute, i.e. redundancy + insertion and modification anomalies. An FD X  Y is a full functional dependency (FFD) if removal of any attribute A i from X means that the dependency does not hold any more. 2NF: Every nonprime attribute is fully functionally dependent on every candidate key. 2NF

16 3NF 3NF: 2NF + no nonprime attribute should be functionally dependent on a set of attributes that is not a candidate key IDNameZipCity 100Andersson58214Linköping 101Björk10223Stockholm 102Carlsson58214Linköping R non3NF Normalization IDNameZip 100Andersson Björk Carlsson58214 ZipCity 58214Linköping 10223Stockholm R1 3NF R2 3NF

17 No 3NF (but 2NF): A set of attributes that is not a candidate key can have repeated values in the relation and, thus, so can have the nonprime attribute, i.e. redundancy + insertion and modification anomalies. An FD X  Y is a transitive dependency if there is a set of attributes Z that is not a candidate key and such that both X  Z and Z  Y hold. 3NF: 2NF + no nonprime attribute is transitively dependent on any candidate key. 3NF

18 Little summary X  A 2NF and 3NF do nothing if A is prime. Assume A is nonprime. 2NF = decompose if X is part of a candidate key. 3NF = decompose if X is neither a candidate key nor part of a candidate key. 3NF = X is a candidate key or A is prime. If X is not a candidate key, then it can have repeated values in the relation and, thus, so can have A. Should this be ignored because A is prime ?

19 Boyce-Codd Normal Form BCNF: Every determinant is a candidate key BCNF = decompose if X  A is such that X is not a candidate key and A is a prime attribute. Example: Given R(A,B,C,D) and AB  CD, C  B. Then R is in 3NF but not in BCNF  C is a determinant but not a candidate key.  Decompose into R1(A,C,D) with AC  D and R2(C,B) with C  B.

20 TimeRoomInstructorActivity Mon 17.00GymTinaIronWoman Mon 17.00MirrorsAnnaAerobics Tue 17.00GymTinaIntro Tue 17.00MirrorsAnnaAerobics Wed 18.00GymAnnaIronWoman R nonBCNF At a gym, an instructor is leading an activity in a certain room at a certain time. BCNF: Example Time, room  instructor, activity Time, activity  instructor, room Time, instructor  activity, room Activity  room Decompose into R1(Time,Activity,Instructor) and R2(Activity,Room)

21 Properties of decomposition Keep all attributes from the universal relation R. Preserve the identified functional dependencies. Lossless join  It must be possible to join the smaller tables to arrive at composite information without spurious tuples.

22 Normalization: Example Given universal relation R (PID, PersonName, Country, Continent, ContinentArea, NumberVisitsCountry) Functional dependencies? Keys?

23 PID  PersonName PID, Country  NumberVisitsCountry Country  Continent Continent  ContinentArea Based on FDs, what are keys for R? Use inference rules Normalization: Example

24 Country  Continent, Continent  ContinentArea, then Country  Continent, ContinentArea (transitive + aditive rules) PID, Country  Continent, ContinentArea (augmentation + decomposition rules), PID, Country  PersonName (augmentation + decomposition rules), PID, Country  NumberVisitsCountry, then PID, Country  Continent, ContinentArea, PersonName, NumberVisitsCountry (additive rule) PID, Country is the key for R. Normalization: Example

25 Is R (PID,Country,Continent,ContinentArea,PersonName,NumberVisitsCountry) in 2NF? No, PersonName depends on a part of the candidate key ( PID), then R1(PID, PersonName) R2(PID, Country, Continent, ContinentArea, NumberVisitsCountry) Is R2 in 2NF? No, Continent and ContinentArea depend on a part of the candidate key ( Country), then R1(PID, PersonName) R21(Country, Continent, ContinentArea) R22(PID, Country, NumberVisitsCountry)  R1, R21, R22 are in 2NF 2NF: no nonprime attribute should be functionally dependent on a part of a candidate key. Normalization: Example

26 Are R1, R21, R22 in 3NF? R22(PID, Country, NumberVisitsCountry), R1(PID, PersonName) : Yes, a single nonprime attribute, no transitive dependencies. R21(Country, Continent, ContinentArea) : No, Continent defines ContinentArea, then R211(Country, Continent) R212(Continent, ContinentArea)  R1, R22, R211, R212 are in 3NF 3NF: 2NF + no nonprime attribute should be functionally dependent on a set of attributes that is not a candidate key

27 Are R1, R22, R211, R212 in BCNF? R22(PID, Country, NumberVisitsCountry), R1(PID, PersonName): R211(Country, Continent) R212(Continent, ContinentArea)  Yes Can the universal relation R be reproduced from R1, R22, R211 and R212 without spurious tuples? BCNF: Every determinant is a candidate key

28 Summary and open issues Good design: informal and formal properties of relations Functional dependencies, and thus normal forms, are about attribute semantics (= real- world knowledge), normalization can only be automated if FDs are given. Are high normal forms good design when it comes to performance?  No, denormalization may be required.