Download presentation
Presentation is loading. Please wait.
Published byCathleen Walton Modified over 9 years ago
1
COMP 430 Intro. to Database Systems Normal Forms Slides use ideas from Chris Ré.
2
What makes a good design? During design process: Choose attributes & types appropriately. Pick entity sets appropriately. Associate attributes with entity sets appropriately. Follow common patterns. Any objective criteria for evaluating our decisions?
3
Normal forms Domain/key normal form 6 th normal form 5 th normal form 4 th normal form Boyce-Codd normal form = 3.5 th normal form 3 rd normal form 2 nd normal form 1 st normal form Implies Almost implies Implies Based on functional dependencies, i.e., what attributes depend upon. Goal: Prevent DB anomalies.
4
Intuition (3NF) “[Every] non-key field must provide a fact about the key, the whole key, and nothing but the key.” – Bill Kent (BCNF) “Each attribute must represent a fact about the key, the whole key, and nothing but the key.” – Chris Date Course(crn, dept_code, course_number, title) Student(student_id, first_name, last_name) Enrollment(crn, student_id, grade)
5
Denormalization We’ll later see that there are also reasons to not use 3NF / BCNF.
6
1 st normal form instructordays_teaches John{M,T,W,Th,F} John{M,T,W,Th,F} Scott{T,Th} …… instructorday_teaches JohnM T W Th JohnF ScottT Th …… Table represents a mathematical relation: Each record unique. Single value per record & attribute.
7
Anomalies Update, delete, insert
8
Update anomalies caused by redundancy studentcourseroom JohnCOMP 130BRK 101 JaneCOMP 130BRK 101 JohnCOMP 430DCH 1070 MaryCOMP 430DCH 1070 SueCOMP 430DCH 1070 ……… Updating one room results in inconsistency = update anomaly. Assume no separate Student or Course tables.
9
Delete anomalies caused by poor attribute grouping studentcourseroom ……… No enrolled students results in no information about course = delete anomaly. Assume no separate Student or Course tables.
10
Insert anomalies caused by poor attribute grouping Need an enrolled student to reserve a room = insert anomaly. studentcourseroom JohnCOMP 130BRK 101 JaneCOMP 130BRK 101 JohnCOMP 430DCH 1070 MaryCOMP 430DCH 1070 SueCOMP 430DCH 1070 …COMP 600DCH 1070 ……… Assume no separate Student or Course tables.
11
Functional dependencies
12
FDs identify relationships between attributes Process: 1.Start with some relational schema. 2.Identify its functional dependences. 3.Use these to design a better schema. Not only identify problems, but fix them.
13
Roadmap Define FDs. Define closures to find all FDs. Define superkeys to determine what should be key. Apply definitions to 3NF, BCNF. Glimpse at what’s beyond 3NF, BCNF. Intuition: Attribute X (room) depends on Y (course). X (room) should be a non-key attribute in a table with Y (course) as key. With examples & activities, of course!
14
FD – definition Let S, T be sets of attributes. Let R be a relation. S functionally determines T (S T) holds on R iff for all valid tuples t 1, t 2 in R, if t 1 [S]=t 2 [S], then t 1 [T]=t 2 [T]. S T is a functional dependency. I.e., all the tuples that fit the intended meaning of R.
15
FD – an illustration S1S1 …SmSm T1T1 …TnTn JohnSmithJr.dogblue17 JohnSmithJr.dogblue17 If t 1,t 2 agree here……they also agree here! t1t1 t2t2
16
FD example {position} {phone} emp_idnamephoneposition E0045Smith1234Clerk E3542Mike 9876 Salesrep E1111Smith 9876 Salesrep E9999Mary1234Lawyer Not {phone} {position} emp_idnamephoneposition E0045Smith 1234 Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary 1234 Lawyer That a FD holds for one instance doesn’t imply that does for all instances.
17
Two ways FDs are used Let F be a set of FDs. Let R be a relation. F can specify a set of constraints on R’s tuples.Does F hold on R? Considered part of R’s schema. To test whether R’s tuples are valid under F.Does R satisfy F?
18
Superkey – definition Let S be a set of attributes. Let R be a relation. S is a superkey of R iff S R, i.e., for all valid tuples t 1, t 2 in R, if t 1 [S]=t 2 [S], then t 1 [R]=t 2 [R]. Here, R means the set of all of the relation’s attributes. Intuition: R’s attributes should be dependent on exactly the primary key.
19
Activity – Find FDs in this instance abcde 12436 32518 14457 12436 32518 Find at least three FDs which hold on this instance: { } { }
20
Problems finding FDs Can’t necessarily show a FD holds on all instances. Potentially large number – How to find them all? Start with subproblem – Given a set of FDs, what other FDs must hold?
21
Logical implication of FDs – example namecolorcategorydeptprice GizmoGreenGadgetToys49 WidgetBlackGadgetToys59 GizmoGreenWhatsitGarden99 Given: {name} {color} {category} {dept} {color, category} {price} Logically implies: {name, category} {price}
22
Logical implication rules Armstrong’s Axioms Reflexivity:If B A, then A B. Augmentation:If A B, then AC BC. Transitivity:If A B and B C, then A C. Derivable rules: Union:If A B and A C, then A BC. Decomposition:If A BC, then A B and A C. Pseudo-transitivity:If A B and BC D, then AC D. Sound: Only imply correct FDs. Complete: Imply all correct FDs. Standard notation for union of FD sets.
23
Activity – using FD implication rules namecolorcategorydeptprice GizmoGreenGadgetToys49 WidgetBlackGadgetToys59 GizmoGreenWhatsitGarden99 Given: {name} {color} {category} {dept} {color, category} {price} Logical implication:Rule(s)? {name, category} {price} {name, category} {name} {name, category} {color} {name, category} {category} {name, category} {color, category}
24
Closures – all logically implied FDs Let F be a set of FDs. Let A, B be sets of attributes. The closure of F (F + ) is the set of all FDs logically implied by F. The closure of A (A + ) is the set of all attributes B such that A B.
25
Closure of attribute sets Given: {name} {color} {category} {dept} {color, category} {price} Some closures: {name} + = {name, color} {name, category} + = {name, category, color, dept, price} {color} + = {color}
26
Algorithm for closure of attribute sets Closure(A) = Result = A Repeat until A doesn’t change: For each FD B C: if B Result, then add C to Result Return Result Given: {name} {color} {category} {dept} {color, category} {price} Compute: {name, category} + Simple algorithm quadratic in size of F. Complicated linear algorithm exists.
27
Activity – use closure algorithm Closure(A) = Result = A Repeat until A doesn’t change: For each FD B C: if B Result, then add C to Result Return Result Given: {a, b} {c} {a, d} {e} {b} {d} {a, f} {b} Compute: {a, b} + = {a, f} + =
28
Closure of FD set Let A, B be sets of attributes: Closure(F) = Result = For each subset A: For each A B provable: Add A B to Result. Return Result. For each B A + :
29
Closure of FD set – example Given: {a, b} {c} {a, d} {b} {b} {d} Compute attribute set closures: {a} + = {a} {b} + = {b, d} {c} + = {c} {d} + = {d} {a, b} + = {a, b, c, d} {a, c} + = {a, c} {a, d} + = {a, b, c, d} {b, c} + = {b, c, d} {b, d} + = {b, d} {c, d} + = {c, d} {a, b, c} + = {a, b, c, d} {a, b, d} + = {a, b, c, d} {a, c, d} + = {a, b, c, d} {b, c, d} + = {b, c, d} {a, b, c, d} + = {a, b, c, d} Compute FDs: {a} {a} {b} {b}, … {d}, … {b, d} {c} {c} {d} {d} {a, b} {a, b, c, d} {a, c} {a, c} {a, d} {a, b, c, d} {b, c} {b, c, d} {b, d} {b, d} {c, d} {c, d} {a, b, c} {a, b, c, d} {a, b, d} {a, b, c, d} {a, c, d} {a, b, c, d} {b, c, d} {b, c, d} {a, b, c, d} {a, b, c, d} If we take A B to be shorthand for A B’, where B’ B.
30
Closure of FD set – example Given: {a, b} {c} {a, d} {b} {b} {d} … Eliminating trivial FDs A B, where B A. Replacing FDs A AB with A B. Shorthand version: {b} {d} {a, b} {c, d} {a, d} {b, c} {b, c} {d} {a, b, c} {d} {a, b, d} {c} {a, c, d} {b} Compute FDs: {a} {a} {b} {b}, … {d}, … {b, d} {c} {c} {d} {d} {a, b} {a, b, c, d} {a, c} {a, c} {a, d} {a, b, c, d} {b, c} {b, c, d} {b, d} {b, d} {c, d} {c, d} {a, b, c} {a, b, c, d} {a, b, d} {a, b, c, d} {a, c, d} {a, b, c, d} {b, c, d} {b, c, d} {a, b, c, d} {a, b, c, d}
31
FD covers & equivalence F covers G iff G can be inferred from F. I.e., G + F +. F and G are equivalent iff F + = G +. Given: {a, b} {c} {a, d} {b} {b} {d} Compute FDs: {a} {a} {b} {b, d} {c} {c} {d} {d} {a, b} {a, b, c, d} {a, c} {a, c} {a, d} {a, b, c, d} {b, c} {b, c, d} {b, d} {b, d} {c, d} {c, d} {a, b, c} {a, b, c, d} {a, b, d} {a, b, c, d} {a, c, d} {a, b, c, d} {b, c, d} {b, c, d} {a, b, c, d} {a, b, c, d} Shorthand version: {b} {d} {a, b} {c, d} {a, d} {b, c} {b, c} {d} {a, b, c} {d} {a, b, d} {c} {a, c, d} {b}
32
Superkeys & keys
33
Superkeys & keys (reminder) Let S be a set of attributes. Let R be a relation. S is a superkey of R iff S R. Equivalently, iff S + = R. A key is a minimal superkey. We pick a key as primary key.
34
Superkeys & keys – example Product(name, price, category, color) {name, category} price {category} color What are superkey(s)? Key(s)? How can you search for them?
35
Activity 10a-keys.ipynb
36
Normal forms
37
FD-based table normalization While there are “bad” FDs Decompose a table with “bad” FDs into sub-tables.
38
Boyce-Codd normal form (BCNF) (BCNF) “Each attribute must represent a fact about the key, the whole key, and nothing but the key.” – Chris Date Ignoring trivial FDs: Good: X RX is a (super)key Bad: X Afor A RX is not a (super)key “Bad” because X isn’t the primary key, but it functionally determines some of the attributes. Thus, there is redundancy.
39
BCNF – definition Relation R is in BCNF iff whenever A B is a non-trivial FD in R, then A is a superkey for R. I.e., each FD is trivial or “good”, not “bad”.
40
BCNF – example namessnphonecity Mary123-45-6789713-555-1234Houston Mary123-45-6789713-555-6543Houston Joe987-65-4321512-555-2121Austin Joe987-65-4321512-555-1234Austin Find a “bad” FD. {ssn} {name, city}
41
BCNF – example fixed Decompose table into sub-tables. namessncity Mary123-45-6789Houston Joe987-65-4321Austin ssnphone 123-45-6789713-555-1234 123-45-6789713-555-6543 987-65-4321512-555-2121 987-65-4321512-555-1234 {ssn} {name, city} Now a “good” FD.
42
BCNF decomposition BCNF_decomp(R) = Find attribute set X s.t. X + X (not trivial) and X + R (not superkey). If no such X, then return R. Let D = X + - X. (attributes functionally determined by X) Let N = R - X +. (attributes not functionally determined by X) Decompose R into R 1 (X D) and R 2 (X N). Return BCNF_decomp(R 1 ), BCNF_decomp(R 2 ).
43
BCNF decomposition – example BCNF_decomp(R) = Find attribute set X s.t. X + X and X + R. If no such X, then return R. Let D = X + - X. Let N = R - X +. Decompose R into R 1 (X D) and R 2 (X N). Return BCNF_decomp(R 1 ), BCNF_decomp(R 2 ). R(a,b,c,d,e) {a} {b, c} {c} {d} R(a,b,c,d,e) {a} {b, c} {c} {d} X = {a} X + = {a,b,c,d} X = {a} X + = {a,b,c,d} D = {b,c,d} N = {e} D = {b,c,d} N = {e} R 1 (a,b,c,d) R 2 (a,e) R 1 (a,b,c,d) R 2 (a,e)
44
BCNF decomposition – example BCNF_decomp(R) = Find attribute set X s.t. X + X and X + R. If no such X, then return R. Let D = X + - X. Let N = R - X +. Decompose R into R 1 (X D) and R 2 (X N). Return BCNF_decomp(R 1 ), BCNF_decomp(R 2 ). R 1 (a,b,c,d) {a} {b, c} {c} {d} R 1 (a,b,c,d) {a} {b, c} {c} {d} X = {c} X + = {c,d} X = {c} X + = {c,d} D = {d} N = {a,b} D = {d} N = {a,b} R 11 (c,d) R 12 (a,b,c) R 11 (c,d) R 12 (a,b,c)
45
BCNF decomposition – example BCNF_decomp(R) = Find attribute set X s.t. X + X and X + R. If no such X, then return R. Let D = X + - X. Let N = R - X +. Decompose R into R 1 (X D) and R 2 (X N). Return BCNF_decomp(R 1 ), BCNF_decomp(R 2 ). R 11 (c,d) {a} {b, c} {c} {d} R 11 (c,d) {a} {b, c} {c} {d} No such X.
46
BCNF decomposition – example BCNF_decomp(R) = Find attribute set X s.t. X + X and X + R. If no such X, then return R. Let D = X + - X. Let N = R - X +. Decompose R into R 1 (X D) and R 2 (X N). Return BCNF_decomp(R 1 ), BCNF_decomp(R 2 ). R 12 (a,b,c) {a} {b, c} {c} {d} R 12 (a,b,c) {a} {b, c} {c} {d} No such X.
47
BCNF decomposition – example BCNF_decomp(R) = Find attribute set X s.t. X + X and X + R. If no such X, then return R. Let D = X + - X. Let N = R - X +. Decompose R into R 1 (X D) and R 2 (X N). Return BCNF_decomp(R 1 ), BCNF_decomp(R 2 ). R 2 (a,e) {a} {b, c} {c} {d} R 2 (a,e) {a} {b, c} {c} {d} No such X.
48
BCNF decomposition + keys – example R(a,b,c,d,e) {a} {b, c} {c} {d} R(a,b,c,d,e) {a} {b, c} {c} {d} BCNF R 11 (c,d) R 12 (a,b,c) R 2 (a,e) {a} {b, c} {c} {d} R 11 (c,d) R 12 (a,b,c) R 2 (a,e) {a} {b, c} {c} {d} keys R 11 (c,d) R 12 (a,b,c) R 2 (a,e) {a} {b, c} {c} {d} R 11 (c,d) R 12 (a,b,c) R 2 (a,e) {a} {b, c} {c} {d}
49
Activity – BCNF & keys 10b-bcnf.ipynb namessnphonecityzip Mary123-45-6789713-555-1234Houston77005 Mary123-45-6789713-555-6543Houston77005 Joe987-65-4321281-555-2121Houston77005 Joe987-65-4321281-555-1234Houston77005 {city} {zip} {ssn} {name, city}
50
Decompositions
51
BCNF decomposition – the good & the bad The good:Algorithm to detect & remove redundancies. Standard practice. The bad:Sometimes some subtle, undesirable side-effects.
52
Decompositions & joins R(A 1, …, A n, B 1, …, B m, C 1, …, C p ) R 1 (A 1, …, A n, B 1, …, B m ) R 2 (A 1, …, A n, C 1, …, C p ) R 1 (A 1, …, A n, B 1, …, B m ) R 2 (A 1, …, A n, C 1, …, C p ) Decomp. Join??? If decomposition is lossless, a join restores the original relation.
53
Decomposition & join – lossless example namepricecategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera namecategory GizmoGadget OneClickCamera GizmoCamera nameprice Gizmo19.99 OneClick24.99 Gizmo19.99 Decomp. Join
54
Decomposition & join – lossy example namepricecategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera namecategory GizmoGadget OneClickCamera GizmoCamera Decomp. Join pricecategory 19.99Gadget 24.99Camera 19.99Camera namepricecategory Gizmo19.99Gadget OneClick24.99Camera Gizmo24.99Camera OneClick19.99Camera Gizmo19.99Camera Loses the association between name and price.
55
BCNF decompositions is lossless R(A 1, …, A n, B 1, …, B m, C 1, …, C p ) R 1 (A 1, …, A n, B 1, …, B m ) R 2 (A 1, …, A n, C 1, …, C p ) R 1 (A 1, …, A n, B 1, …, B m ) R 2 (A 1, …, A n, C 1, …, C p ) Decomp. Join Holds by definition of BCNF decomposition algorithm. Lossless iff {A 1, …, A n } {B 1, …, B m } Don’t need {A 1, …, A n } {C 1, …, C p }
56
BCNF can lose FD information Decomp. R 1 (name, category) R 2 (name, company) {category, company} {name} {name} {company} R 1 (name, category) R 2 (name, company) {category, company} {name} {name} {company} Can’t enforce “nonlocal” FD. namecompanycategory GizmoGizmoWorksGadget GizmoPlusGizmoWorksGadget namecategory GizmoGadget GizmoPlusGadget namecompany GizmoGizmoWorks GizmoPlusGizmoWorks Join R(name, company, category) {category, company} {name} {name} {company} R(name, company, category) {category, company} {name} {name} {company} “Bad” FD Keys: {category, company}, {category, name}
57
Three solutions Accept the BCNF tradeoff between avoiding redundancy/anomalies and preserving FDs. BCNF is most common choice. Take extra steps to enforce these FDs. E.g., join tables and then check. Weaken decomposition so that no such lost FDs. E.g., 3NF.
58
3 rd normal form (3NF) BCNF: Relation R is in BCNF iff whenever A B is a non-trivial FD in R, then A is a superkey for R. 3NF: Relation R is in 3NF Iff whenever A B is a non-trivial FD in R, then either: A is a superkey for R, or Every element of B is part of a key.
59
3NF avoids losing FD information R(name, company, category) {category, company} {name} {name} {company} R(name, company, category) {category, company} {name} {name} {company} BCNF: “Bad” 3NF: “Good” because company part of a key. Keys: {category, company}, {category, name}
60
3NF allows some redundancies/anomalies namecompanycategory GizmoGizmoWorksGadget GizmoGizmoWorksCamera GizmoPlusGizmoWorksGadget GizmoPlusGizmoWorksCamera NewThingNULLGadget Insertion anomaly. Product not yet made by any company. Redundancy. Repeating product’s company.
61
More about 3NF 3NF is still lossless! Requires somewhat more complicated decomposition algorithm.
62
Glimpse beyond BCNF/3NF/FDs 4NF – multi-valued dependencies (generalizes FDs) 5NF & 6NF – join dependencies DKNF – only domain & key constraints
63
Multi-value dependencies FD:A BThe value in A determines a value for B. MVD:A ↠ BThe value in A determines a set of values for B. restaurantmenu_itemdelivery_area Papa John’sPizzaRice Village Papa John’sPizzaRice University Papa John’sPizzaSouthampton Domino’sPizzaRice Village Domino’sPizzaRice University Domino’sPastaRice Village Domino’sPastaRice University {restaurant} ↠ {menu_item} {restaurant} ↠ {delivery_area}
64
Normal form summary Constraints on data/tables to limit redundancy Decomposition strategy/algorithm to meet constraints Different normal forms for different trade-offs
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.