Lecture 5: Functional dependencies and normalization Jose M. Peña

Slides:



Advertisements
Similar presentations
primary key constraint foreign key constraint
Advertisements

Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization for Relational Databases
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Boyce-Codd Normal Form Kelvin Nishikawa SE157a-03 Fall 2006 Kelvin Nishikawa SE157a-03 Fall 2006.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Functional Dependencies and Normalization for Relational Databases by Pinar Senkul resources:
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Normalization I.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Databases 6: Normalization
Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College.
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
DatabaseIM ISU1 Chapter 10 Functional Dependencies and Normalization for RDBs Fundamentals of Database Systems.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Topic 10 Functional Dependencies and Normalization for Relational Databases Faculty of Information Science and Technology Mahanakorn University of Technology.
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
Top-Down Database Design Mini-world Requirements Conceptual schema E1 E2 R Relation schemas ?
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Functional Dependencies and Normalization for Relational Databases.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
Chapter Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization Jose M. Peña
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Lecture 8: Database Concepts May 4, Outline From last lecture: creating views Normalization.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Design Process - Where are we?
Normalization.
Chapter 5.1 and 5.2 Brian Cobarrubia Database Management Systems II January 31, 2008.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Al-Imam University Girls Education Center Collage of Computer Science 1 st Semester, 1432/1433H Chapter 10_part 1 Functional Dependencies and Normalization.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
10/3/2017.
Functional Dependency and Normalization
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Database Management systems Subject Code: 10CS54 Prepared By:
Functional Dependencies and Normalization
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Schema Refinement and Normal Forms
Lecture 5: Functional dependencies and normalization
Relational Database Design
Presentation transcript:

Lecture 5: Functional dependencies and normalization Jose M. Peña

2 Motivation Can we be sure that the translation of an EER- diagram into a relational model results in a good database design ? Confronted with a deployed database, how can we be sure that it is well-designed? What is a good database design?  Four informal measures.  One formal measure: Normalization.

3 Good design: Informal measures Easy-to-explain semantics of the relational schema. Minimal redundant information in tuples.  Why ? Redundancy causes waste of space and update anomalies: Insertion anomalies. Deletion anomalies. Modification anomalies. EMP(EMPID, EMPNAME, DEPTNAME, DEPTMGR) 123 Smith Research Wong Research Borg Administration null

4 Good design: Informal measures Minimal number of NULL values in tuples.  Why ? Efficient use of space. Avoid costly outer joins. Ambiguous interpretation (unknown vs. doesn’t apply). Disallow the possibility of generating spurious tuples.  How ? By joining only on foreign key/primary key attributes.

Definitions Relational schema: The header of the table. Relation: The data in the table. Relation is a set, i.e. no duplicate tuples exist. 5

6 Functional dependencies Let R be a relational schema with the attributes A 1,...,A n and let X and Y be subsets of {A 1,...,A n }. Let r(R) denote a relation in the relational schema R. Despite the mathematical definition, a functional dependency cannot be discovered from the relation. It is a property of the semantics of attributes in the relational schema. We say that X functionally determines Y, i.e. X   Y, if for each pair of tuples t 1, t 2  r(R) and for all relations r(R) we have that if t 1 [X] = t 2 [X] then t 1 [Y] = t 2 [Y]. We say that X functionally determines Y, i.e. X   Y, if for each pair of tuples t 1, t 2  r(R) and for all relations r(R) we have that if t 1 [X] = t 2 [X] then t 1 [Y] = t 2 [Y].

7 Inference rules 1. If X  Y then X  Y, or X  X (reflexive rule) 2. X  Y |= XZ  YZ (augmentation rule) 3. X  Y, Y  Z |= X  Z (transitive rule) 4. X  YZ |= X  Y (decomposition rule) 5. X  Y, X  Z |= X  YZ (union or additive rule) 6. X  Y, WY  Z |= WX  Z (pseudotransitive rule)

8 Inference rules Textbook, page 341: ”… X  A, and Y  B does not imply that XY  AB.” Prove that this statement is wrong. Prove inference rules 4, 5 and 6 by using only inference rules 1, 2 and 3.

9 Definitions Superkey: A set of attributes that uniquely (but not necessarily minimally) identifies a tuple of a relation. Key: A set of attributes that uniquely and minimally identifies a tuple of a relation. Candidate key: If there is more than one key in a relation, every key is called a candidate key. In other words, X is a candidate key if X  A 1,...,A n \ X. Primary key: A particular candidate key is chosen to be the primary key. Prime attribute: Every attribute that is part of a candidate key (vs. nonprime attribute). True for any relation. So, it depends on the semantics of the attributes

10 Good design: Formal measure, normalization 1NF, 2NF, 3NF, BCNF (4NF, 5NF). Minimize redundancy. Minimize update anomalies. The higher the normal form, the less the redundancy. Moreover, relations become more but smaller. Join operations are needed to recover the original relations. This may harm running time. So, normalization is not mandatory. In some cases, even denormalization may be preferred. In these cases, you may want to deal with redundancy via coding (e.g. procedures, triggers, views, etc.).

11 First normal form (1NF) The relational model does not have non-atomic values. IDNameLivesIn 100Pettersson{Stockholm, Linköping} 101Andersson{Linköping} 102Svensson{Ystad, Hjo, Berlin} R non1NF IDName 100Pettersson 101Andersson 102Svensson IDLivesIn 100Stockholm 100Linköping 101Linköping 102Ystad 102Hjo 102Berlin R1 1NF R2 1NF Normalization This is how we dealt with multi- valued attributes in the translation.

12 Second normal form (2NF) The relational model does not have any non-prime attribute that is functionally dependent on part of a candidate key. EmpIDDeptWork%EmpName 100Dev50Baker 100Support50Baker 200Dev80Miller R non2NF EmpIDEmpName 100Baker 200Miller EmpIDDeptWork% 100Dev50 100Support50 200Dev80 R1 2NF R2 2NF Normalization

13 The relational model does not have any non-prime attribute that is functionally dependent on part of a candidate key.  Why not ? Because, a part of a candidate key can have repeated values in the relation and, thus, so can have the non-prime attribute, which implies redundancy and thus waste of space and update anomalies. Second normal form (2NF)

14 Third normal form (3NF) The relational model does not have any non- prime attribute that is functionally dependent on a set of attributes that is not a candidate key. IDNameZipCity 100Andersson58214Linköping 101Björk10223Stockholm 102Carlsson58214Linköping R non3NF Normalization IDNameZip 100Andersson Björk Carlsson58214 ZipCity 58214Linköping 10223Stockholm R1 3NF R2 3NF

15 The relational model does not have any non-prime attribute that is functionally dependent on a set of attributes that is not a candidate key.  Why not ? Because, a set of attributes that is not a candidate key can have repeated values in the relation and, thus, so can have the non-prime attribute, which implies redundancy and thus waste of space and update anomalies. Note that 3NF implies 2NF. Third normal form (3NF)

16 Little summary X  A 2NF and 3NF do nothing if A is prime. Assume A is non-prime. 2NF = decompose if X is part of a candidate key. 3NF = decompose if X is not a candidate key. 3NF = X is a candidate key or A is prime. If A is prime but X is not a candidate key, then X can have repeated values in the relation and, thus, so can have A. Is not this a problem just because A is prime ?

17 Boyce-Codd normal form (BCNF) In every functional dependency in the relational model, the determinant is a candidate key. Example: Let R(A,B,C,D) denote a relational schema with functional dependencies AB  CD, C  B. Then R is in 3NF but not in BCNF.  C is a determinant but not a candidate key.  Decompose R into R1(A,C,D) with AC  D and R2(C,B) with C  B.

18 Little summary X  A 2NF and 3NF do nothing if A is prime. Assume A is non-prime. 2NF = decompose if X is part of a candidate key. 3NF = decompose if X is not a candidate key. 3NF = X is a candidate key or A is prime. Assume A is prime. BCNF = decompose if X is not a candidate key.

19 Normalization: Example Consider the following relational schema: R(PID, PersonName, Country, Continent, ContinentArea, NumberVisitsCountry) Functional dependencies ? Candidate keys ?

20 Functional dependencies: PID  PersonName PID, Country  NumberVisitsCountry Country  Continent Continent  ContinentArea What are the candidate keys for R? Use the inference rules to show that X  A 1,...,A n \ X. Normalization: Example

21 Country  Continent, Continent  ContinentArea then Country  ContinentArea (transitive rule) Country  Continent, ContinentArea (additive rule) PID, Country  PID, Continent, ContinentArea (augmentation rule) PID, Country  Continent, ContinentArea (decomposition rule) PID  PersonName then PID, Country  PersonName (augmentation + decomposition rules) PID, Country  NumberVisitsCountry then PID, Country  Continent, ContinentArea, PersonName, NumberVisitsCountry (additive rule) PID, Country is the only candidate key for R and, thus, its primary key. Normalization: Exmple

22 Is R (PID,Country,Continent,ContinentArea,PersonName,NumberVisitsCountry) in 2NF ? No, PersonName depends on a part of a candidate key ( PID), then R1(PID, PersonName) R2(PID, Country, Continent, ContinentArea, NumberVisitsCountry) Is R1 in 2NF ? Yes. Is R2 in 2NF ? No, Continent and ContinentArea depend on a part of a candidate key ( Country), then R1(PID, PersonName) R21(Country, Continent, ContinentArea) R22(PID, Country, NumberVisitsCountry) Now, R1, R21, R22 are in 2NF. 2NF: No non-prime attribute should be functionally dependent on a part of a candidate key. Normalization: Example

23 Are R1, R21, R22 in 3NF? R22(PID, Country, NumberVisitsCountry) R1(PID, PersonName) Yes, because they have only one non-prime attribute. R21(Country, Continent, ContinentArea) No, Continent determines ContinentArea, then R211(Country, Continent) R212(Continent, ContinentArea) Now, R1, R22, R211, R212 are in 3NF. 3NF: No nonprime attribute should be functionally dependent on a set of attributes that is not a candidate key.

24 Are R1, R22, R211, R212 in BCNF? R22(PID, Country, NumberVisitsCountry) R1(PID, PersonName) R211(Country, Continent) R212(Continent, ContinentArea) Yes, they are in BCNF. Can the original relation R be recovered by joining R1, R22, R211 and R212 without generating spurious tuples? Yes. Mind the foreign keys created during normalization ! BCNF: Every determinant is a candidate key.

25 Desirable properties of normalization Keep all the attributes from the original relational model. Preserve all the functional dependencies from the original relational model. Lossless join.  It must be possible to join the smaller relations produced by normalization and recover the original relation without generating spurious tuples.