Design Theory for Relational Databases 2019, Fall Pusan National University Ki-Joune Li
Properties of Table When we design relational DB, It is a set of relations. Relations can be derived from UML diagram But NOT all relations are correct. We should carefully observe the properties of table Functional Dependency Key Decomposition of Table
Definition of Functional Dependency FD (Functional Dependency) on a Relation R iff A1 A2 A3 … An B where A1 , A2 , A3 , … , An , B are attributes of R A set of attributes A1 A2 A3 … An functionally determines B More than one B’s A1 A2 A3 … An B1 A1 A2 A3 … An B2 … A1 A2 A3 … An Bk A1 A2 A3 … An B1 B2 … Bk A1 A2 A3 … An B1 B2 B3 … Bk
Functional Dependency: Example A Relation Movies (title, year, length, filmType, studioName, starName) (title year) length (title year) filmType (title year) studioName (title year) length filmType studioName ? (title year) starName : more than one star in a film It is important to discover FD in a relation It helps to decide the correctness of relation design.
Key Given a relation R Example A set of one or more attributes {A1, A2, A3, …, An} is a KEY iff the set functionally determines all other attributes and no proper subset of {A1, A2, A3, …, An} functionally determines other attributes (Minimal) Primary Key: If a relation has more than one keys, a key is defined as primary key Super Key a set of attributes containing a key No minimality condition Example Movies (title, year, length, filmType, studioName, starName) What are keys ?
How to discover keys From E-R Diagram: Underlined Attributes It means that keys are defined based on the understanding of the real world Example: Movies (title, year, length, filmType, studioName, starName) (year, starName) is not key if a star can make more than one film per year (year, starName) is a key if a star is allowed to make only one film per year Relation (A1, A2, B) for relationship between R1 and R2 One-One One-Many Many-One Many-Many
Rules about Functional Dependencies Functional Dependency An important property of Relation (or Table) Some interesting properties or rules of FD Transitive Rule A B and B C then A C Splitting/Combining Rule A1 A2 A3 …An B1, A1 A2 A3 …An B2, …, A1 A2 A3 … An Bk iff A1 A2 A3 … An B1 B2 … Bk Trivial FD Rule: Given a FD A1 A2 A3 …An B FD is trivial if B is one of {A1 A2 A3 …An} : really trivial FD is Completely non-trivial: B is not in {A1 A2 A3 …An}
Rules about Functional Dependencies Trivial Dependency Rule A1 A2 … An B1 B2 … Bm is equivalent to A1 A2 … An C1 C2 … Ck if {C1 C2 … Ck } { B1 B2 … Bm } and for any C {C1 C2 … Ck }, C {A1 A2 … An } Example: (year, title) (studioName, year), (year, title) studioName Unnecessary B1 B2 B3 … Bm A1 A2 A3 … An C1 C2 C3 … Ck
Armstrong's Axioms Reflexivity: (Trivial FD) If {C1 C2 … Ck } { B1 B2 … Bm }, then B1 B2 … Bm C1 C2 … Ck Augmentation: If A1 A2 … An B1 B2 … Bm , then A1 A2 … An C1 C2 … Ck B1 B2 … Bm C1 C2 … Ck Transitivity: A1 A2 … An B1 B2 … Bm and B1 B2 … Bm C1 C2 … Ck , then A1 A2 … An C1 C2 … Ck
Closure of Attributes Closure : {A1, A2, … An }+ {A1 A2 … An } is a set of attributes and S is a set of FD Closure of {A1 A2 … An } under FD's in S: set of attributes B such that A1 A2 … An B That is, under all functional dependencies, every Bi that we derive A1 A2 … An B1 A1 A2 … An B2 . . . A1 A2 … An Bk then {A1 A2 … An }+ = {B1 ,B2 ,… , Bk }
Algorithm to Find Closure Input: Set of Attributes {A1, A2, … An }, and set S of FDs Output: {A1, A2, … An }+ Process 1. Split FDs that each FD has a single attribute on the right. e.g. A1 A2 B C then Split it to A1 A2 B and A1 A2 C 2. Initialize X = {A1, A2, … An } 3. Search for some FD e.g. B1 B2 ... Bm C such that B1, B2 , .. Bm are in X but C not in X 4. Repeat 3 until no more attribute to add in X Example Given attributes A, B, C, D, E, and F S: A B C, B C A D, D E, and C F B What is { A, B } + ?
Closure and Key If {A1, A2, … An }+ is the set of all attributes of relation R, then A1, A2, … An is a super key Example: R (A, B, C, D, E) and S: A B C, B C A D, D E then { A, B } + = {A, B, C, D, E} : all attributes of R. {A, B} is a super key of R. if no attribute can be removed to cover the all attributed, then it is a key. Example: if we remove B from {A, B} then {A} + is not {A, B, C, D, E} . therefore {A, B} is a key
Closing Set of Functional Dependencies Closing Set of FD set S: Basis T of S: If we can derive S from a T, then T is a basis of S. Remove all duplicated FDs Minimal Basis B satisfies three conditions All the FD in B have one attribute in right side If any FD is removed from S, then some FD becomes no longer valid. If for any FD in B, we remove one or more attributes from the left side, then the result is no more a basis Example for a S={AB, AC, BA, BC, CA, CB}, what is the minimal basis of S? {ABC, ACB, BCA}?
Bad Design: Anomalies Bad Design: Example Redundancy Update Anomaly Deletion Anomaly Title Year Length Film Type StudioName Starring Star Wars 1977 124 Color Fox Carrie Fisher Mark Hamill Harrison Ford 1980 Billy Dee Williams Mighty Ducks 1991 104 Disney Emilio Estevez Wayne’s World 1992 95 Paramount Dana Carvey Mike Meyers Update 124 to 123 Delete “Emilio Estevez”
Decomposing Relations: Example R={title, year, length, filmType, studioName, starring} {title, year, length, filmType, studioName} (=R1), {title, year, starring} (=R2) Redundancy Update Anomaly Deletion Anomaly Title Year Length Film Type StudioName Star Wars 1977 124 Color Fox 1980 Mighty Ducks 1991 104 Disney Wayne’s World 1992 95 Paramount Title Year Starring Star Wars 1977 Carrie Fisher Mark Hamill Harrison Ford 1980 Billy Dee Williams Mighty Ducks 1991 Emilio Estevez Wayne’s World 1992 Dana Carvey Mike Meyers
Decomposing Relations Decomposition of Bad Relation A good way to remove the problem of bad relations Decomposition: Lossless Decomposition { A1 A2 … An } { B1 B2 … Bm }, {C1 C2 … Ck } such that { B1 B2 … Bm } {C1 C2 … Ck } = { A1 A2 … An } and { B1 B2 … Bm } {C1 C2 … Ck } {}
Lossless Decomposition – Bad Example R1 R2’ Title Starring Star Wars Carrie Fisher Mark Hamill Harrison Ford Billy Dee Williams Mighty Ducks Emilio Estevez Wayne’s World Dana Carvey Mike Meyers Title Year Length Film Type StudioName Star Wars 1977 124 Color Fox 1980 Mighty Ducks 1991 104 Disney Wayne’s World 1992 95 Paramount R2 Title Year Starring Star Wars 1977 Carrie Fisher Mark Hamill Harrison Ford 1980 Billy Dee Williams Mighty Ducks 1991 Emilio Estevez Wayne’s World 1992 Dana Carvey Mike Meyers R R1 R2’ R = R1 R2
Normal Form: Conditions for Good Relation 1st Normal Form (1NF) 2nd Normal Form (2NF) 3rd Normal Form (3NF) Boyce-Codd Normal Form (BCNF)
1st Normal Form 1NF: Every component of relation should be ATOMIC No Table in component No Set No List etc..
Part of prime attribute Partial Dependency on non-prime attribute 2nd Normal Form 2NF 1NF and None of the non-prime attributes of the relation is functionally dependent on a part of a candidate key Prime Attribute: attribute belonging to key Partial Dependency on non-prime attribute Example Player (Team, Number, TeamAddress, Name, Position) 1NF but not 2NF non-prime attribute Part of prime attribute A C B Partial Dependency on non-prime attribute
Example - 1 Should be decomposed Player (Team, Number, TeamAddress, Name, Position) FD1: Team, Name Position FD2: Team TeamAddress Key: {Team, Name}+={Team, Number, TeamAddress, Name, Position} in FD2, TeamAddress (non-prime attribute) is dependent on {Team}, which is a subset of the key and 2NF violation Should be decomposed R1(Team, Number, Name, Position) and R2(Team, TeamAddress) R1 R2 = R
Example - 2 Candidate Key: {Employee, Skill} Not 2ND Current Work Location Jones Typing 114 Main Street Shorthand Whittling Roberts Light Cleaning 73 Industrial Way Ellis Alchemy Juggling Harrison Candidate Key: {Employee, Skill} Not 2ND Partial FD: Employee Current Work Location Should be decomposed (Employee, Skill), (Employee, Current Work Location)
3rd Normal Form 2NF: Every non-prime attributes of the relation must be non- transitively dependent on every candidate key Example Team (TeamName, Address, ManagerID, ManagerHireDate) FD: TeamNameAddress, TeamNameManagerID (TeamName )ManagerID ManagerHireDate Key: {TeamName} 2NF but Not 3NF To be decomposed (TeamName, Address, ManagerID), (Manager SS ID, ManagerHireDate) A C B
Example: 2NF but NOT 3NF Candidate Key: {Tournament, Year} Winner Winner Date of Birth Indiana Invitational 1998 Al Fredrickson 21 July 1975 Cleveland Open 1999 Bob Albertson 28 September 1968 Des Moines Masters Chip Masterson 14 March 1977 Candidate Key: {Tournament, Year} 2NF: No Partial Dependency Not 3ND Transitive Functional Dependency {Tournament, Year} Winner Winner Date of Birth Should be decomposed (Tournament, Year, Winner), (Player, Birth date}
Boyce-Codd Normal Form (BCNF) BCNF: For every one of its non-trivial functional dependencies X Y, X is a super key Remember: nontrivial means Y is not a member of set X. Remember, a superkey is any superset of a key (not necessarily a proper superset) BCNF is slightly stronger than 3NF
Relationship between 1NF, 2NF, 3NF and BCNF
Example: 3NF but NOT BCNF For a relation R(A,B,C,D,E), FD F={A->B, BC->E, ED->A} Keys D C are prime attributes ? {DC}+ = {A,B,C,D,E} NO, add one attribute from middle e.g. A ? {ADC}+ = {A,B,C,D,E}. YES. Likewise, we may test ? {ACD}+, {BCD}+, {CDE}+ Keys: {ACD, BCD, CDE} ? BCNF: To check whether every left hand side of F be one of the (super) keys ? 3NF: No transitive dependency and None of the non-prime attributes of the relation is functionally dependent on a part of a candidate key: No FD from a part of prime attribute to non-prime attribute (no attribute is non-prime attribute) Left (Prime) Middle (?) Right (non-Prime) C, D A,B, E (none)
Example: 3NF but NOT BCNF Prof. ID Prof. SS ID Student ID 1078 088-51-0074 31850 37921 1293 096-77-4146 46224 1480 072-21-2223 A table to show the assignment of students Candidate Keys {Prof. ID, Student ID} {Prof. SS ID, Student ID} 1NF 2NF: no partial FD of non-prime attributes on candidate key 3NF: No transitive FD NOT BCNF: Prof. ID Prof. SS ID : Functional Dependency but not candidate key Should be decomposed (Prof. ID, Student ID), (Prof. ID, Prof. SS ID) Prof.ID Prof. SS ID Student ID
Decomposition Three Conditions Elimination of Anomalies Update Redundancy Deletion Lossless Decomposition Original Relation by Natural Join Preservation of Dependencies Relation with two attributes: Always in BCNF (why?)
BCNF Decomposition Algorithm Input: Relation R0 and set S0 of FDs Output: R1, R2, … Rn such that R0 =R1 R2 … Rn Process 1. Check R0 is in BCNF, then return R0 2. If there is any BCNF violation with X Y, then compute X+. Then R1= X+ and R2 =has the rest attributes and X 3. Decompose FD set S0 into S1 and S2. 4. Repeat 1-3 until no more BCNF violation. Example Team (TeamName, Address, ManagerID, ManagerHireDate) FD: TeamNameAddress, TeamNameManagerID ManagerID ManagerHireDate