Lecture 5: Functional dependencies and normalization

Slides:



Advertisements
Similar presentations
Functional Dependencies and Normalization for Relational Databases
Advertisements

primary key constraint foreign key constraint
Functional Dependencies and Normalization for Relational Databases.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Databases 6: Normalization
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College.
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
DatabaseIM ISU1 Chapter 10 Functional Dependencies and Normalization for RDBs Fundamentals of Database Systems.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Lecture 5: Functional dependencies and normalization Jose M. Peña
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
Functional Dependencies and Normalization Jose M. Peña
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Lecture 8: Database Concepts May 4, Outline From last lecture: creating views Normalization.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
11/06/97J-1 Principles of Relational Design Chapter 12.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
1 COP 4710 Databases Fall, 2000 Today’s Topic Chapter 5: Improving the Quality of Relational Schemas David A. Gaitros September 18, 2000 Department of.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
10/3/2017.
COP 6726: New Directions in Database Systems
Relational Normalization Theory
Relational Schema Design II
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
Functional Dependency and Normalization
Advanced Normalization
Gergely Lukács Pázmány Péter Catholic University
Functional Dependencies and Normalization for Relational Databases
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Module 5: Overview of Database Design -- Normalization
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
CS422 Principles of Database Systems Normalization
Functional Dependencies and Normalization for RDBs
Database Design Dr. M.E. Fayad, Professor
Relational Database Design by Dr. S. Sridhar, Ph. D
Chapter 8: Relational Database Design
Handout 4 Functional Dependencies
Advanced Normalization
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization for Relational Databases
Schema Refinement and Normalization
Database Management systems Subject Code: 10CS54 Prepared By:
Database Normalization
Module 5: Overview of Normalization
Schema Refinement What and why
Normalization Murali Mani.
Functional Dependencies and Normalization
Outline: Normalization
Normalization.
Normalization DB Design Guidelines Presented by: Dr. Samir Tartir
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Normalization cs3431.
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
Sampath Jayarathna Cal Poly Pomona
Schema Refinement and Normal Forms
Relational Database Design
Chapter Outline 1 Informal Design Guidelines for Relational Databases
Database Design Dr. M.E. Fayad, Professor
Database.
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Lecture 09: Functional Dependencies
CS4222 Principles of Database System
Presentation transcript:

Lecture 5: Functional dependencies and normalization 09/04/2019 Lecture 5: Functional dependencies and normalization Jose M. Peña jose.m.pena@liu.se

09/04/2019 Motivation Can we be sure that the translation of an EER-diagram into a relational model results in a good database design ? Confronted with a deployed database, how can we be sure that it is well-designed? What is a good database design? Four informal measures. One formal measure: Normalization. 2

Good design: Informal measures 09/04/2019 Good design: Informal measures Easy-to-explain semantics of the relational schema. Minimal redundant information in tuples. Why ? Redundancy causes waste of space and update anomalies: Insertion anomalies. Deletion anomalies. Modification anomalies. EMP( EMPID, EMPNAME, DEPTNAME, DEPTMGR) 123 Smith Research 999 333 Wong Research 999 888 Borg Administration null 3

Good design: Informal measures 09/04/2019 Good design: Informal measures Minimal number of NULL values in tuples. Why ? Efficient use of space. Avoid costly outer joins. Ambiguous interpretation (unknown vs. doesn’t apply). Disallow the possibility of generating spurious tuples. How ? By joining only on foreign key/primary key attributes. 4

Definitions Relational schema: The header of the table. Relation: The data in the table. Relation is a set, i.e. no duplicate tuples exist. 5

Functional dependencies 09/04/2019 Functional dependencies Let R be a relational schema with the attributes A1,...,An and let X and Y be subsets of {A1,...,An}. Let r(R) denote a relation in the relational schema R. Despite the mathematical definition, a functional dependency cannot be discovered from the relation. It is a property of the semantics of attributes in the relational schema. We say that X functionally determines Y, i.e. X Y, if for each pair of tuples t1, t2 r(R) and for all relations r(R) we have that if t1[X] = t2[X] then t1[Y] = t2[Y]. 6

Inference rules If X  Y then XY, or X  X (reflexive rule) 09/04/2019 Inference rules If X  Y then XY, or X  X (reflexive rule) XY |= XZ  YZ (augmentation rule) X  Y, Y  Z |= X  Z (transitive rule) X  YZ |= X  Y (decomposition rule) X  Y, X  Z |= X  YZ (union or additive rule) X  Y, WY  Z |= WX  Z (pseudotransitive rule) 7

Inference rules Textbook, page 341: 09/04/2019 Inference rules Textbook, page 341: ”… X  A, and Y  B does not imply that XY  AB.” Prove that this statement is wrong. Prove inference rules 4, 5 and 6 by using only inference rules 1, 2 and 3. 8

09/04/2019 Definitions True for any relation. So, it depends on the semantics of the attributes Superkey: A set of attributes that uniquely (but not necessarily minimally) identifies a tuple of a relation. Key: A set of attributes that uniquely and minimally identifies a tuple of a relation. Candidate key: If there is more than one key in a relation, every key is called a candidate key. In other words, X is a candidate key if X  A1,...,An \ X. Primary key: A particular candidate key is chosen to be the primary key. Prime attribute: Every attribute that is part of a candidate key (vs. nonprime attribute). 9

Good design: Formal measure, normalization 09/04/2019 Good design: Formal measure, normalization 1NF, 2NF, 3NF, BCNF (4NF, 5NF). Minimize redundancy. Minimize update anomalies. The higher the normal form, the less the redundancy. Moreover, relations become more but smaller. Join operations are needed to recover the original relations. This may harm running time. So, normalization is not mandatory. In some cases, even denormalization may be preferred. In these cases, you may want to deal with redundancy via coding (e.g. procedures, triggers, views, etc.). 10

This is how we dealt with multi-valued attributes in the translation. 09/04/2019 First normal form (1NF) The relational model does not have non-atomic values. This is how we dealt with multi-valued attributes in the translation. Rnon1NF ID Name LivesIn 100 Pettersson {Stockholm, Linköping} 101 Andersson {Linköping} 102 Svensson {Ystad, Hjo, Berlin} R21NF ID LivesIn 100 Stockholm Linköping 101 102 Ystad Hjo Berlin R11NF ID Name 100 Pettersson 101 Andersson 102 Svensson Normalization 11

Second normal form (2NF) 09/04/2019 Second normal form (2NF) The relational model does not have any set of non-prime attributes that is functionally dependent on part of a candidate key. Rnon2NF EmpID Dept Work% EmpName 100 Dev 50 Baker Support 200 80 Miller R22NF R12NF EmpID Dept Work% 100 Dev 50 Support 200 80 Normalization EmpID EmpName 100 Baker 200 Miller 12

Second normal form (2NF) 09/04/2019 Second normal form (2NF) The relational model does not have any set of non-prime attribute that is functionally dependent on part of a candidate key. Why not ? Because, a part of a candidate key can have repeated values in the relation and, thus, so can have the set of non-prime attributes, which implies redundancy and thus waste of space and update anomalies. Solution: Assume that X  Y violates 2NF in a relational schema R. Create a new relational schema R2(X,Y). Remove Y from R. The relational schema consists now of R and R2. 13

09/04/2019 Third normal form (3NF) The relational model does not have any set of non-prime attributes that is functionally dependent on a set of attributes that is not a candidate key. Rnon3NF ID Name Zip City 100 Andersson 58214 Linköping 101 Björk 10223 Stockholm 102 Carlsson R13NF R23NF Normalization ID Name Zip 100 Andersson 58214 101 Björk 10223 102 Carlsson Zip City 58214 Linköping 10223 Stockholm 14

09/04/2019 Third normal form (3NF) The relational model does not have any set of non-prime attributes that is functionally dependent on a set of attributes that is not a candidate key. Why not ? Because, a set of attributes that is not a candidate key can have repeated values in the relation and, thus, so can have the set of non-prime attributes, which implies redundancy and thus waste of space and update anomalies. Note that 3NF implies 2NF. Solution: Assume that X  Y violates 3NF in a relational schema R. Create a new relational schema R2(X,Y). Remove Y from R. The relational schema consists now of R and R2. 15

Little summary X  Y 2NF and 3NF do nothing if Y is prime. 09/04/2019 Little summary X  Y 2NF and 3NF do nothing if Y is prime. Assume Y is non-prime. 2NF = decompose if X is part of a candidate key. 3NF = decompose if X is not a candidate key. 3NF = X is a candidate key or Y is prime. If Y is prime but X is not a candidate key, then X can have repeated values in the relation and, thus, so can have Y. Is not this a problem just because Y is prime ? 16

Boyce-Codd normal form (BCNF) 09/04/2019 Boyce-Codd normal form (BCNF) In every functional dependency in the relational model, the determinant is a candidate key. Example: Let R(A,B,C,D) denote a relational schema with functional dependencies ABCD, CB. Then R is in 3NF but not in BCNF. C is a determinant but not a candidate key. Decompose R into R1(A,C,D) with AC  D and R2(C,B) with C  B. In general: Assume that X  Y violates BCNF in a relational schema R. Create a new relational schema R2(X,Y). Remove Y from R. The relational schema consists now of R and R2. You may have to find a new primary key for R. 17

Little summary X  Y 2NF and 3NF do nothing if Y is prime. 09/04/2019 Little summary X  Y 2NF and 3NF do nothing if Y is prime. Assume Y is non-prime. 2NF = decompose if X is part of a candidate key. 3NF = decompose if X is not a candidate key. 3NF = X is a candidate key or Y is prime. Assume Y is prime. BCNF = decompose if X is not a candidate key. 18

Normalization: Example 09/04/2019 Normalization: Example Consider the following relational schema: R(PID, PersonName, Country, Continent, ContinentArea, NumberVisitsCountry) Functional dependencies ? Candidate keys ? 19

Normalization: Example 09/04/2019 Normalization: Example Functional dependencies: PID  PersonName PID, Country  NumberVisitsCountry Country  Continent Continent  ContinentArea What are the candidate keys for R? Use the inference rules to show that X  A1,...,An \ X. 20

Normalization: Exmple 09/04/2019 Normalization: Exmple Country  Continent, Continent  ContinentArea then Country  ContinentArea (transitive rule) Country  Continent, ContinentArea (additive rule) PID, Country  PID, Continent, ContinentArea (augmentation rule) PID, Country  Continent, ContinentArea (decomposition rule) PID  PersonName PID, Country  PersonName (augmentation + decomposition rules) PID, Country  NumberVisitsCountry PID, Country  Continent, ContinentArea, PersonName, NumberVisitsCountry (additive rule) PID, Country is the only candidate key for R and, thus, its primary key. 21

Normalization: Example 09/04/2019 Normalization: Example Is R (PID,Country,Continent,ContinentArea,PersonName,NumberVisitsCountry) in 2NF ? No, PersonName depends on a part of a candidate key (PID), then R1(PID, PersonName) R2(PID, Country, Continent, ContinentArea, NumberVisitsCountry) Is R1 in 2NF ? Yes. Is R2 in 2NF ? No, Continent and ContinentArea depend on a part of a candidate key (Country), then R21(Country, Continent, ContinentArea) R22(PID, Country, NumberVisitsCountry) Now, R1, R21, R22 are in 2NF. 2NF: No non-prime attribute should be functionally dependent on a part of a candidate key. 22

R22(PID, Country, NumberVisitsCountry) R1(PID, PersonName) 09/04/2019 Are R1, R21, R22 in 3NF? 3NF: No nonprime attribute should be functionally dependent on a set of attributes that is not a candidate key. R22(PID, Country, NumberVisitsCountry) R1(PID, PersonName) Yes, because they have only one non-prime attribute. R21(Country, Continent, ContinentArea) No, Continent determines ContinentArea, then R211(Country, Continent) R212(Continent, ContinentArea) Now, R1, R22, R211, R212 are in 3NF. 23

09/04/2019 Are R1, R22, R211, R212 in BCNF? BCNF: Every determinant is a candidate key. R22(PID, Country, NumberVisitsCountry) R1(PID, PersonName) R211(Country, Continent) R212(Continent, ContinentArea) Yes, they are in BCNF. Can the original relation R be recovered by joining R1, R22, R211 and R212 without generating spurious tuples? Yes. Mind the foreign keys created during normalization ! 24

Desirable properties of normalization 09/04/2019 Desirable properties of normalization Keep all the attributes from the original relational model (true in our method). Preserve all the functional dependencies from the original relational model (false in our method). Lossless join (true in our method). It must be possible to join the smaller relations produced by normalization and recover the original relation without generating spurious tuples. 25