Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lectures 5 & 7: Design Theory Lectures 5 & 7. Announcements Homework #1 due today! Homework was not easy You learned a new, declarative way of programming!

Similar presentations


Presentation on theme: "Lectures 5 & 7: Design Theory Lectures 5 & 7. Announcements Homework #1 due today! Homework was not easy You learned a new, declarative way of programming!"— Presentation transcript:

1 Lectures 5 & 7: Design Theory Lectures 5 & 7

2 Announcements Homework #1 due today! Homework was not easy You learned a new, declarative way of programming! Hope: you got the concept, so you can pick up a book on SQL whenever you need. Some issues with iPython+SQLite versions. Ugh! You were great about this issues! Constructive questions and feedback! Thanks! You will never need iPython to submit homework Setting Expectations: cf. simply text assignments with no expected output. Searching on piazza is problem, we’ll try to do a better job aggregating posts… Lots of stuff was there, but hard to find! Thank you to those who aggregated posts! (Candy?) Lecture 5

3 Announcements #2 Office hours last night was filled with nice people! Queue management had a hiccup (our mistake?) Resulted in “fake” long queue times... Don’t be scared away! We need a better way to group and identify problems (Luke is on it! Thanks!) We will be better about triaging issues to get help and make groups quickly. Thanks to CAs who came in for extra time! Lecture 5

4 Announcements #3 How to use activity time: Review slides, ask questions, or do the activity! We designed these to cope with different backgrounds and pace requirements. If you don’t get the activity in class YOU ARE DOING FINE! Thank you! We appreciate the “thank you”s for new material! Be aggressive about giving feedback, we want you to have best class possible! There will be hiccups in new material, please start early. We’re trying to make class more fun and `teach you more material!!! Guest lecture on Thursday from GOOGLE. HAVE FUN! Lecture 5

5 Lecture 5: Design Theory I Lecture 5

6 Today’s Lecture 1.Normal forms & functional dependencies ACTIVITY: Finding FDs 2.Finding functional dependencies ACTIVITY: Compute the closures 3.Closures, superkeys & keys ACTIVITY: The key or a key? 6 Lecture 5

7 1. Normal forms & functional dependencies 7 Lecture 5 > Section 1

8 What you will learn about in this section 1.Overview of design theory & normal forms 2.Data anomalies & constraints 3.Functional dependencies 4.ACTIVITY: Finding FDs 8 Lecture 5 > Section 1

9 Design Theory Design theory is about how to represent your data to avoid anomalies. It is a mostly mechanical process Tools can carry out routine portions We have a notebook implementing all algorithms! We’ll play with it in the activities! Lecture 5 > Section 1 > Overview

10 Normal Forms 1 st Normal Form (1NF) = All tables are flat 2 nd Normal Form = disused Boyce-Codd Normal Form (BCNF) 3 rd Normal Form (3NF) 4 th and 5 th Normal Forms = see text books DB designs based on functional dependencies, intended to prevent data anomalies Our focus in this lecture + next one Lecture 5 > Section 1 > Overview

11 1 st Normal Form (1NF) StudentCourses Mary{CS145,CS229} Joe{CS145,CS106} …… Violates 1NF. 1NF Constraint: Types must be atomic! StudentCourses MaryCS145 MaryCS229 JoeCS145 JoeCS106 In 1 st NF Lecture 5 > Section 1 > Overview

12 Data Anomalies & Constraints Lecture 5 > Section 1 > Data anomalies & constraints

13 Constraints Prevent (some) Anomalies in the Data StudentCourseRoom MaryCS145B01 JoeCS145B01 SamCS145B01.. If every course is in only one room, contains redundant information! A poorly designed database causes anomalies: Lecture 5 > Section 1 > Data anomalies & constraints

14 Constraints Prevent (some) Anomalies in the Data StudentCourseRoom MaryCS145B01 JoeCS145C12 SamCS145B01.. If we update the room number for one tuple, we get inconsistent data = an update anomaly A poorly designed database causes anomalies: Lecture 5 > Section 1 > Data anomalies & constraints

15 Constraints Prevent (some) Anomalies in the Data StudentCourseRoom.. If everyone drops the class, we lose what room the class is in! = a delete anomaly A poorly designed database causes anomalies: Lecture 5 > Section 1 > Data anomalies & constraints

16 Constraints Prevent (some) Anomalies in the Data StudentCourseRoom MaryCS145B01 JoeCS145B01 SamCS145B01.. Similarly, we can’t reserve a room without students = an insert anomaly A poorly designed database causes anomalies: …CS229C12 Lecture 5 > Section 1 > Data anomalies & constraints

17 Constraints Prevent (some) Anomalies in the Data StudentCourse MaryCS145 JoeCS145 SamCS145.. CourseRoom CS145B01 CS229C12 Today: develop theory to understand why this design may be better and how to find this decomposition… Is this form better? Redundancy? Update anomaly? Delete anomaly? Insert anomaly? Is this form better? Redundancy? Update anomaly? Delete anomaly? Insert anomaly? Lecture 5 > Section 1 > Data anomalies & constraints

18 Functional Dependencies Lecture 5 > Section 1 > Functional dependencies

19 Functional Dependency A->B means that “whenever two tuples agree on A then they agree on B.” A->B means that “whenever two tuples agree on A then they agree on B.” Def: Let A,B be sets of attributes We write A  B or say A functionally determines B if, for any tuples t 1 and t 2 : t 1 [A] = t 2 [A] implies t 1 [B] = t 2 [B] and we call A  B a functional dependency Def: Let A,B be sets of attributes We write A  B or say A functionally determines B if, for any tuples t 1 and t 2 : t 1 [A] = t 2 [A] implies t 1 [B] = t 2 [B] and we call A  B a functional dependency Lecture 5 > Section 1 > Functional dependencies

20 A Picture Of FDs A1A1 …AmAm B1B1 …BnBn Defn (again): Given attribute sets A={A 1,…,A m } and B = {B 1,…B n } in R, Lecture 5 > Section 1 > Functional dependencies

21 A1A1 …AmAm B1B1 …BnBn A Picture Of FDs titi tjtj Defn (again): Given attribute sets A={A 1,…,A m } and B = {B 1,…B n } in R, The functional dependency A  B on R holds if for any t i,t j in R: Lecture 5 > Section 1 > Functional dependencies

22 A Picture Of FDs Defn (again): Given attribute sets A={A 1,…,A m } and B = {B 1,…B n } in R, The functional dependency A  B on R holds if for any t i,t j in R: t i [A 1 ] = t j [A 1 ] AND t i [A 2 ]=t j [A 2 ] AND … AND t i [A m ] = t j [A m ] A1A1 …AmAm B1B1 …BnBn titi tjtj If t1,t2 agree here.. Lecture 5 > Section 1 > Functional dependencies

23 A Picture Of FDs Defn (again): Given attribute sets A={A 1,…,A m } and B = {B 1,…B n } in R, The functional dependency A  B on R holds if for any t i,t j in R: if t i [A 1 ] = t j [A 1 ] AND t i [A 2 ]=t j [A 2 ] AND … AND t i [A m ] = t j [A m ] then t i [B 1 ] = t j [B 1 ] AND t i [B 2 ]=t j [B 2 ] AND … AND t i [B n ] = t j [B n ] A1A1 …AmAm B1B1 …BnBn titi tjtj If t1,t2 agree here.. …they also agree here! Lecture 5 > Section 1 > Functional dependencies

24 FDs for Relational Schema Design High-level idea: why do we care about FDs? 1.Start with some relational schema 2.Find out its functional dependencies (FDs) 3.Use these to design a better schema 1.One which minimizes the possibility of anomalies Lecture 5 > Section 1 > Functional dependencies

25 Functional Dependencies as Constraints StudentCourseRoom MaryCS145B01 JoeCS145B01 SamCS145B01.. Note: The FD {Course} -> {Room} holds on this instance A functional dependency is a form of constraint Holds on some instances not others. Part of the schema, helps define a valid instance. Lecture 5 > Section 1 Recall: an instance of a schema is a multiset of tuples conforming to that schema, i.e. a table

26 Functional Dependencies as Constraints StudentCourseRoom MaryCS145B01 JoeCS145B01 SamCS145B01.. However, cannot prove that the FD {Course} -> {Room} is part of the schema Note that: You can check if an FD is violated by examining a single instance; However, you cannot prove that an FD is part of the schema by examining a single instance. This would require checking every valid instance Lecture 5 > Section 1

27 27 More Examples An FD is a constraint which holds, or does not hold on an instance: EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer Lecture 5 > Section 1 > Functional dependencies

28 28 {Position}  {Phone} EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike 9876  Salesrep E1111Smith 9876  Salesrep E9999Mary1234Lawyer More Examples Lecture 5 > Section 1 > Functional dependencies

29 29 EmpIDNamePhonePosition E0045Smith 1234  Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary 1234  Lawyer but not {Phone}  {Position} More Examples Lecture 5 > Section 1 > Functional dependencies

30 ACTIVITY 30 Lecture 5 > Section 1 > ACTIVITY ABCDE 12436 32518 14457 12436 32518 Find at least three FDs which hold on this instance: { }  { }

31 2. Finding functional dependencies 31 Lecture 5 > Section 2

32 What you will learn about in this section 1.“Good” vs. “Bad” FDs: Intuition 2.Finding FDs 3.Closures 4.ACTIVITY: Compute the closures 32 Lecture 5 > Section 2

33 33 “Good” vs. “Bad” FDs We can start to develop a notion of good vs. bad FDs: EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer Intuitively: EmpID -> Name, Phone, Position is “good FD” Minimal redundancy, less possibility of anomalies Intuitively: EmpID -> Name, Phone, Position is “good FD” Minimal redundancy, less possibility of anomalies Lecture 5 > Section 2 > Good vs. Bad FDs

34 34 “Good” vs. “Bad” FDs We can start to develop a notion of good vs. bad FDs: EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer Intuitively: EmpID -> Name, Phone, Position is “good FD” But Position -> Phone is a “bad FD” Redundancy! Possibility of data anomalies Intuitively: EmpID -> Name, Phone, Position is “good FD” But Position -> Phone is a “bad FD” Redundancy! Possibility of data anomalies Lecture 5 > Section 2 > Good vs. Bad FDs

35 StudentCourseRoom MaryCS145B01 JoeCS145B01 SamCS145B01.. Given a set of FDs (from user) our goal is to: 1.Find all FDs, and 2.Eliminate the “Bad Ones". Given a set of FDs (from user) our goal is to: 1.Find all FDs, and 2.Eliminate the “Bad Ones". Returning to our original example… can you see how the “bad FD” {Course} -> {Room} could lead to an: Update Anomaly Insert Anomaly Delete Anomaly … “Good” vs. “Bad” FDs Lecture 5 > Section 2 > Good vs. Bad FDs

36 FDs for Relational Schema Design High-level idea: why do we care about FDs? 1.Start with some relational schema 2.Find out its functional dependencies (FDs) 3.Use these to design a better schema 1.One which minimizes possibility of anomalies Lecture 5 > Section 2 > Finding FDs This part can be tricky!

37 Finding Functional Dependencies There can be a very large number of FDs… How to find them all efficiently? We can’t necessarily show that any FD will hold on all instances… How to do this? We will start with this problem: Given a set of FDs, F, what other FDs must hold? We will start with this problem: Given a set of FDs, F, what other FDs must hold? Lecture 5 > Section 2 > Finding FDs

38 Equivalent to asking: Given a set of FDs, F = {f 1,…f n }, does an FD g hold? Inference problem: How do we decide? Finding Functional Dependencies Lecture 5 > Section 2 > Finding FDs

39 Finding Functional Dependencies 1. {Name}  {Color} 2. {Category}  {Department} 3. {Color, Category}  {Price} 1. {Name}  {Color} 2. {Category}  {Department} 3. {Color, Category}  {Price} NameColorCategoryDepPrice GizmoGreenGadgetToys49 WidgetBlackGadgetToys59 GizmoGreenWhatsitGarden99 Which / how many other FDs do?!? Provided FDs:Products Given the provided FDs, we can see that {Name, Category}  {Price} must also hold on any instance… Example: Lecture 5 > Section 2 > Finding FDs

40 Equivalent to asking: Given a set of FDs, F = {f 1,…f n }, does an FD g hold? Inference problem: How do we decide? Answer: Three simple rules called Armstrong’s Rules. 1.Split/Combine, 2.Reduction, and 3.Transitivity… ideas by picture Answer: Three simple rules called Armstrong’s Rules. 1.Split/Combine, 2.Reduction, and 3.Transitivity… ideas by picture Finding Functional Dependencies Lecture 5 > Section 2 > Finding FDs

41 1. Split/Combine A1A1 …AmAm B1B1 …BnBn A 1, …, A m  B 1,…,B n Lecture 5 > Section 2 > Finding FDs

42 1. Split/Combine A1A1 …AmAm B1B1 …BnBn A 1, …, A m  B 1,…,B n … is equivalent to the following n FDs… A 1,…,A m  B i for i=1,…,n Lecture 5 > Section 2 > Finding FDs

43 1. Split/Combine A1A1 …AmAm B1B1 …BnBn A 1, …, A m  B 1,…,B n … is equivalent to … And vice-versa, A 1,…,A m  B i for i=1,…,n Lecture 5 > Section 2 > Finding FDs

44 Reduction/Trivial A1A1 …AmAm A 1,…,A m  A j for any j=1,…,m Lecture 5 > Section 2 > Finding FDs

45 3. Transitive Closure A1A1 …AmAm B1B1 …BnBn C1C1 …CkCk A 1, …, A m  B 1,…,B n and B 1,…,B n  C 1,…,C k Lecture 5 > Section 2 > Finding FDs

46 3. Transitive Closure A1A1 …AmAm B1B1 …BnBn C1C1 …CkCk A 1, …, A m  B 1,…,B n and B 1,…,B n  C 1,…,C k implies A 1,…,A m  C 1,…,C k Lecture 5 > Section 2 > Finding FDs

47 Finding Functional Dependencies 1. {Name}  {Color} 2. {Category}  {Department} 3. {Color, Category}  {Price} 1. {Name}  {Color} 2. {Category}  {Department} 3. {Color, Category}  {Price} NameColorCategoryDepPrice GizmoGreenGadgetToys49 WidgetBlackGadgetToys59 GizmoGreenWhatsitGarden99 Which / how many other FDs hold? Provided FDs:Products Example: Lecture 5 > Section 2 > Finding FDs

48 Finding Functional Dependencies 1. {Name}  {Color} 2. {Category}  {Dept.} 3. {Color, Category}  {Price} 1. {Name}  {Color} 2. {Category}  {Dept.} 3. {Color, Category}  {Price} Which / how many other FDs hold? Provided FDs:Inferred FDs: Example: Inferred FDRule used 4. {Name, Category} -> {Name}? 5. {Name, Category} -> {Color}? 6. {Name, Category} -> {Category}? 7. {Name, Category -> {Color, Category}? 8. {Name, Category} -> {Price}? Lecture 5 > Section 2 > Finding FDs

49 Finding Functional Dependencies 1. {Name}  {Color} 2. {Category}  {Dept.} 3. {Color, Category}  {Price} 1. {Name}  {Color} 2. {Category}  {Dept.} 3. {Color, Category}  {Price} Can we find an algorithmic way to do this? Provided FDs:Inferred FDs: Example: Inferred FDRule used 4. {Name, Category} -> {Name}Trivial 5. {Name, Category} -> {Color}Transitive (4 -> 1) 6. {Name, Category} -> {Category}Trivial 7. {Name, Category -> {Color, Category}Split/combine (5 + 6) 8. {Name, Category} -> {Price}Transitive (7 -> 3) Lecture 5 > Section 2 > Finding FDs

50 Closures Lecture 5 > Section 2 > Closures

51 51 Closure of a set of Attributes Given a set of attributes A 1, …, A n and a set of FDs F: Then the closure, {A 1, …, A n } + is the set of attributes B s.t. {A 1, …, A n }  B Given a set of attributes A 1, …, A n and a set of FDs F: Then the closure, {A 1, …, A n } + is the set of attributes B s.t. {A 1, …, A n }  B {name}  {color} {category}  {department} {color, category}  {price} {name}  {color} {category}  {department} {color, category}  {price} Example: F = Example Closures : {name} + = {name, color} {name, category} + = {name, category, color, dept, price} {color} + = {color} {name} + = {name, color} {name, category} + = {name, category, color, dept, price} {color} + = {color} Lecture 5 > Section 2 > Closures

52 52 Closure Algorithm Lecture 5 > Section 2 > Closures

53 53 Closure Algorithm {name}  {color} {category}  {dept} {color, category}  {price} {name}  {color} {category}  {dept} {color, category}  {price} F = {name, category} + = {name, category} {name, category} + = {name, category} Lecture 5 > Section 2 > Closures

54 54 Closure Algorithm {name}  {color} {category}  {dept} {color, category}  {price} {name}  {color} {category}  {dept} {color, category}  {price} F = {name, category} + = {name, category} {name, category} + = {name, category} {name, category} + = {name, category, color} {name, category} + = {name, category, color} Lecture 5 > Section 2 > Closures

55 55 Closure Algorithm {name}  {color} {category}  {dept} {color, category}  {price} {name}  {color} {category}  {dept} {color, category}  {price} F = {name, category} + = {name, category} {name, category} + = {name, category} {name, category} + = {name, category, color} {name, category} + = {name, category, color} {name, category} + = {name, category, color, dept} {name, category} + = {name, category, color, dept} Lecture 5 > Section 2 > Closures

56 56 Closure Algorithm F = {name, category} + = {name, category} {name, category} + = {name, category} {name, category} + = {name, category, color, dept, price} {name, category} + = {name, category, color, dept, price} {name, category} + = {name, category, color} {name, category} + = {name, category, color} {name, category} + = {name, category, color, dept} {name, category} + = {name, category, color, dept} {name}  {color} {category}  {dept} {color, category}  {price} {name}  {color} {category}  {dept} {color, category}  {price} Lecture 5 > Section 2 > Closures

57 Example 57 Compute {A,B} + = {A, B, } Compute {A, F} + = {A, F, } R(A,B,C,D,E,F) {A,B}  {C} {A,D}  {E} {B}  {D} {A,F}  {B} {A,B}  {C} {A,D}  {E} {B}  {D} {A,F}  {B} Lecture 5 > Section 2 > Closures

58 Example 58 Compute {A,B} + = {A, B, C, D } Compute {A, F} + = {A, F, B } R(A,B,C,D,E,F) {A,B}  {C} {A,D}  {E} {B}  {D} {A,F}  {B} {A,B}  {C} {A,D}  {E} {B}  {D} {A,F}  {B} Lecture 5 > Section 2 > Closures

59 Example 59 Compute {A,B} + = {A, B, C, D, E} Compute {A, F} + = {A, B, C, D, E, F} R(A,B,C,D,E,F) {A,B}  {C} {A,D}  {E} {B}  {D} {A,F}  {B} {A,B}  {C} {A,D}  {E} {B}  {D} {A,F}  {B} Lecture 5 > Section 2 > Closures

60 Activity-5-2.ipynb 60 Lecture 5 > Section 2 > ACTIVITY

61 3. Closures, Superkeys & Keys 61 Lecture 5 > Section 3

62 What you will learn about in this section 1.Closures Pt. II 2.Superkeys & Keys 3.ACTIVITY: The key or a key? 62 Lecture 5 > Section 3

63 63 Why Do We Need the Closure? With closure we can find all FD’s easily To check if X  A 1.Compute X + 2.Check if A  X + Note here that X is a set of attributes, but A is a single attribute. Why does considering FDs of this form suffice? Recall the Split/combine rule: X  A 1, …, X  A n implies X  {A 1, …, A n } Recall the Split/combine rule: X  A 1, …, X  A n implies X  {A 1, …, A n } Lecture 5 > Section 3 > Closures Pt. II

64 64 Using Closure to Infer ALL FDs {A,B}  C {A,D}  B {B}  D {A,B}  C {A,D}  B {B}  D Example: Given F = Step 1: Compute X +, for every set of attributes X: {A} + = {A} {B} + = {B,D} {C} + = {C} {D} + = {D} {A,B} + = {A,B,C,D} {A,C} + = {A,C} {A,D} + = {A,B,C,D} {A,B,C} + = {A,B,D} + = {A,C,D} + = {A,B,C,D} {B,C,D} + = {B,C,D} {A,B,C,D} + = {A,B,C,D} {A} + = {A} {B} + = {B,D} {C} + = {C} {D} + = {D} {A,B} + = {A,B,C,D} {A,C} + = {A,C} {A,D} + = {A,B,C,D} {A,B,C} + = {A,B,D} + = {A,C,D} + = {A,B,C,D} {B,C,D} + = {B,C,D} {A,B,C,D} + = {A,B,C,D} No need to compute these- why? Lecture 5 > Section 3 > Closures Pt. II

65 65 Using Closure to Infer ALL FDs {A,B}  C {A,D}  B {B}  D {A,B}  C {A,D}  B {B}  D Example: Given F = Step 1: Compute X +, for every set of attributes X: {A} + = {A}, {B} + = {B,D}, {C} + = {C}, {D} + = {D}, {A,B} + = {A,B,C,D}, {A,C} + = {A,C}, {A,D} + = {A,B,C,D}, {A,B,C} + = {A,B,D} + = {A,C,D} + = {A,B,C,D}, {B,C,D} + = {B,C,D}, {A,B,C,D} + = {A,B,C,D} Step 2: Enumerate all FDs X  Y, s.t. Y  X + and X  Y =  : {A,B}  {C,D}, {A,D}  {B,C}, {A,B,C}  {D}, {A,B,D}  {C}, {A,C,D}  {B} {A,B}  {C,D}, {A,D}  {B,C}, {A,B,C}  {D}, {A,B,D}  {C}, {A,C,D}  {B} Lecture 5 > Section 3 > Closures Pt. II

66 66 Using Closure to Infer ALL FDs {A,B}  C {A,D}  B {B}  D {A,B}  C {A,D}  B {B}  D Example: Given F = {A} + = {A}, {B} + = {B,D}, {C} + = {C}, {D} + = {D}, {A,B} + = {A,B,C,D}, {A,C} + = {A,C}, {A,D} + = {A,B,C,D}, {A,B,C} + = {A,B,D} + = {A,C,D} + = {A,B,C,D}, {B,C,D} + = {B,C,D}, {A,B,C,D} + = {A,B,C,D} Step 2: Enumerate all FDs X  Y, s.t. Y  X + and X  Y =  : {A,B}  {C,D}, {A,D}  {B,C}, {A,B,C}  {D}, {A,B,D}  {C}, {A,C,D}  {B} {A,B}  {C,D}, {A,D}  {B,C}, {A,B,C}  {D}, {A,B,D}  {C}, {A,C,D}  {B} “Y is in the closure of X” Lecture 5 > Section 3 > Closures Pt. II Step 1: Compute X +, for every set of attributes X:

67 67 Using Closure to Infer ALL FDs {A,B}  C {A,D}  B {B}  D {A,B}  C {A,D}  B {B}  D Example: Given F = {A} + = {A}, {B} + = {B,D}, {C} + = {C}, {D} + = {D}, {A,B} + = {A,B,C,D}, {A,C} + = {A,C}, {A,D} + = {A,B,C,D}, {A,B,C} + = {A,B,D} + = {A,C,D} + = {A,B,C,D}, {B,C,D} + = {B,C,D}, {A,B,C,D} + = {A,B,C,D} Step 2: Enumerate all FDs X  Y, s.t. Y  X + and X  Y =  : {A,B}  {C,D}, {A,D}  {B,C}, {A,B,C}  {D}, {A,B,D}  {C}, {A,C,D}  {B} {A,B}  {C,D}, {A,D}  {B,C}, {A,B,C}  {D}, {A,B,D}  {C}, {A,C,D}  {B} The FD X  Y is non- trivial Lecture 5 > Section 3 > Closures Pt. II Step 1: Compute X +, for every set of attributes X:

68 Superkeys and Keys Lecture 5 > Section 3 > Superkeys & Keys

69 Keys and Superkeys A superkey is a set of attributes A 1, …, A n s.t. for any other attribute B in R, we have {A 1, …, A n }  B A superkey is a set of attributes A 1, …, A n s.t. for any other attribute B in R, we have {A 1, …, A n }  B A key is a minimal superkey I.e. all attributes are functionally determined by a superkey Meaning that no subset of a key is also a superkey Lecture 5 > Section 3 > Superkeys & Keys

70 Finding Keys and Superkeys For each set of attributes X 1.Compute X + 2.If X + = set of all attributes then X is a superkey 3.If X is minimal, then it is a key Do we need to check all sets of attributes? Which sets? Lecture 5 > Section 3 > Superkeys & Keys

71 Example of Finding Keys Product(name, price, category, color) {name, category}  price {category}  color {name, category}  price {category}  color What is a key? Lecture 5 > Section 3 > Superkeys & Keys

72 Example of Keys Product(name, price, category, color) {name, category}  price {category}  color {name, category}  price {category}  color Lecture 5 > Section 3 > Superkeys & Keys

73 Activity-5-3.ipynb 73 Lecture 5 > Section 3 > ACTIVITY

74 Lecture 7: Design Theory II Lecture 7

75 Today’s Lecture 1.PS#1 Review (via AJ RATNER!!!!) 2.Decompositions ACTIVITY 3.Online Course Feedback Give you some time in class to fill it out. Want feedback about new format (20+10/notebooks/heavy ps feedback). Whatever else we’re doing wrong 75 Lecture 7

76 PS1: What you learned This was a tough problem set- congratulations on doing so well! You used a declarative programming language (SQL) to do linear algebra answer questions about data do graph operations Cool stuff! However the point is not these specific applications… Less tricky versions of these same types of queries will be fair game for exams PS1 Recap

77 Linear Algebra, Declaratively Matrix multiplication & other operations = just joins! The shift from procedural to declarative programming PS1 Recap > Problem 1 C = [[0]*p for i in range(n)] for i in range(n): for j in range(p): for k in range(m): C[i][j] += A[i][k] * B[k][j] Proceed through a series of instructions SELECT A.i, B.j, SUM(A.x * B.x) FROM A, B WHERE A.j = B.i GROUP BY A.i, B.j; SELECT A.i, B.j, SUM(A.x * B.x) FROM A, B WHERE A.j = B.i GROUP BY A.i, B.j; Declare a desired output set

78 Common SQL Query Paradigms GROUP BY / HAVING + Aggregators + Nested queries PS1 Recap > Problem 2 SELECT station_id, COUNT(day) AS nbd FROM precipitation, (SELECT day, MAX(precip) FROM precipitation GROUP BY day) AS m WHERE day = m.day AND precip = m.precip GROUP BY station_id HAVING COUNT(day) > 1 ORDER BY nbd DESC; SELECT station_id, COUNT(day) AS nbd FROM precipitation, (SELECT day, MAX(precip) FROM precipitation GROUP BY day) AS m WHERE day = m.day AND precip = m.precip GROUP BY station_id HAVING COUNT(day) > 1 ORDER BY nbd DESC; Think about order*! *of the semantics, not the actual execution Think about order*! *of the semantics, not the actual execution

79 Common SQL Query Paradigms GROUP BY / HAVING + Aggregators + Nested queries PS1 Recap > Problem 2 SELECT station_id, COUNT(day) AS nbd FROM precipitation, (SELECT day, MAX(precip) FROM precipitation GROUP BY day) AS m WHERE day = m.day AND precip = m.precip GROUP BY station_id HAVING COUNT(day) > 1 ORDER BY nbd DESC; SELECT station_id, COUNT(day) AS nbd FROM precipitation, (SELECT day, MAX(precip) FROM precipitation GROUP BY day) AS m WHERE day = m.day AND precip = m.precip GROUP BY station_id HAVING COUNT(day) > 1 ORDER BY nbd DESC; GROUP BY SELECT Get the max precipitation by day

80 Common SQL Query Paradigms GROUP BY / HAVING + Aggregators + Nested queries PS1 Recap > Problem 2 SELECT station_id, COUNT(day) AS nbd FROM precipitation, (SELECT day, MAX(precip) FROM precipitation GROUP BY day) AS m WHERE day = m.day AND precip = m.precip GROUP BY station_id HAVING COUNT(day) > 1 ORDER BY nbd DESC; SELECT station_id, COUNT(day) AS nbd FROM precipitation, (SELECT day, MAX(precip) FROM precipitation GROUP BY day) AS m WHERE day = m.day AND precip = m.precip GROUP BY station_id HAVING COUNT(day) > 1 ORDER BY nbd DESC; GROUP BY Get the station, day pairs where / when this happened SELECT WHERE Get the max precipitation by day

81 Common SQL Query Paradigms GROUP BY / HAVING + Aggregators + Nested queries PS1 Recap > Problem 2 SELECT station_id, COUNT(day) AS nbd FROM precipitation, (SELECT day, MAX(precip) FROM precipitation GROUP BY day) AS m WHERE day = m.day AND precip = m.precip GROUP BY station_id HAVING COUNT(day) > 1 ORDER BY nbd DESC; SELECT station_id, COUNT(day) AS nbd FROM precipitation, (SELECT day, MAX(precip) FROM precipitation GROUP BY day) AS m WHERE day = m.day AND precip = m.precip GROUP BY station_id HAVING COUNT(day) > 1 ORDER BY nbd DESC; GROUP BY Get the station, day pairs where / when this happened SELECT WHERE Get the max precipitation by day GROUP BY Group by stations

82 Common SQL Query Paradigms GROUP BY / HAVING + Aggregators + Nested queries PS1 Recap > Problem 2 SELECT station_id, COUNT(day) AS nbd FROM precipitation, (SELECT day, MAX(precip) FROM precipitation GROUP BY day) AS m WHERE day = m.day AND precip = m.precip GROUP BY station_id HAVING COUNT(day) > 1 ORDER BY nbd DESC; SELECT station_id, COUNT(day) AS nbd FROM precipitation, (SELECT day, MAX(precip) FROM precipitation GROUP BY day) AS m WHERE day = m.day AND precip = m.precip GROUP BY station_id HAVING COUNT(day) > 1 ORDER BY nbd DESC; GROUP BY Get the station, day pairs where / when this happened SELECT WHERE Get the max precipitation by day GROUP BY Group by stations HAVING Having > 1 such day

83 Common SQL Query Paradigms Complex correlated queries PS1 Recap > Problem 2 SELECT x1.p AS median FROM x AS x1 WHERE (SELECT COUNT(*) FROM X AS x2 WHERE x2.p > x1.p) = (SELECT COUNT(*) FROM X AS x2 WHERE x2.p < x1.p); SELECT x1.p AS median FROM x AS x1 WHERE (SELECT COUNT(*) FROM X AS x2 WHERE x2.p > x1.p) = (SELECT COUNT(*) FROM X AS x2 WHERE x2.p < x1.p); This was a tricky problem- but good practice in thinking about things declaratively

84 Common SQL Query Paradigms Nesting + EXISTS / ANY / ALL PS1 Recap > Problem 2 SELECT sid, p3.precip FROM ( SELECT sid, precip FROM precipitation AS p1 WHERE precip > 0 AND NOT EXISTS ( SELECT p2.precip FROM precipitation AS p2 WHERE p2.sid = p1.sid AND p2.precip > 0 AND p2.precip < p1.precip)) AS p3 WHERE NOT EXISTS ( SELECT p4.precip FROM precipitation AS p4 WHERE p4.precip – 400 > p3.precip); SELECT sid, p3.precip FROM ( SELECT sid, precip FROM precipitation AS p1 WHERE precip > 0 AND NOT EXISTS ( SELECT p2.precip FROM precipitation AS p2 WHERE p2.sid = p1.sid AND p2.precip > 0 AND p2.precip < p1.precip)) AS p3 WHERE NOT EXISTS ( SELECT p4.precip FROM precipitation AS p4 WHERE p4.precip – 400 > p3.precip); More complex, but again just think about order!

85 Graph traversal & recursion PS1 Recap > Problem 3 A B For fixed-length paths SELECT A, B, d FROM edges SELECT A, B, d FROM edges

86 Graph traversal & recursion PS1 Recap > Problem 3 A B For fixed-length paths SELECT A, B, d FROM edges UNION SELECT e1.A, e2.B, e1.d + e2.d AS d FROM edges e1, edges e2 WHERE e1.B = e2.A AND e2.B <> e1.A SELECT A, B, d FROM edges UNION SELECT e1.A, e2.B, e1.d + e2.d AS d FROM edges e1, edges e2 WHERE e1.B = e2.A AND e2.B <> e1.A

87 Graph traversal & recursion PS1 Recap > Problem 3 A B For fixed-length paths SELECT A, B, d FROM edges UNION SELECT e1.A, e2.B, e1.d + e2.d AS d FROM edges e1, edges e2 WHERE e1.B = e2.A AND e2.B <> e1.A UNION SELECT e1.A, e3.B, e1.d + e2.d + e3.d AS d FROM edges e1, edges e2, edges e3 WHERE e1.B = e2.A AND e2.B = e3.A AND e2.B <> e1.A AND e3.B <> e2.A AMD e3.B <> e1.A SELECT A, B, d FROM edges UNION SELECT e1.A, e2.B, e1.d + e2.d AS d FROM edges e1, edges e2 WHERE e1.B = e2.A AND e2.B <> e1.A UNION SELECT e1.A, e3.B, e1.d + e2.d + e3.d AS d FROM edges e1, edges e2, edges e3 WHERE e1.B = e2.A AND e2.B = e3.A AND e2.B <> e1.A AND e3.B <> e2.A AMD e3.B <> e1.A

88 Graph traversal & recursion PS1 Recap > Problem 3 A B For variable-length paths on trees WITH RECURSIVE paths(a, b, b_prev, d) AS ( SELECT A, B, A FROM edges UNION SELECT p.a, e.B, e.A, p.d + e.d FROM paths p, edges e WHERE p.b = e.A AND s.B <> p.b_prev) SELECT a, b, MAX(d) FROM paths; WITH RECURSIVE paths(a, b, b_prev, d) AS ( SELECT A, B, A FROM edges UNION SELECT p.a, e.B, e.A, p.d + e.d FROM paths p, edges e WHERE p.b = e.A AND s.B <> p.b_prev) SELECT a, b, MAX(d) FROM paths;

89 Graph traversal & recursion PS1 Recap > Problem 3 B For variable-length paths on trees WITH RECURSIVE paths(a, b, b_prev, d) AS ( SELECT A, B, A FROM edges UNION SELECT p.a, e.B, e.A, p.d + e.d FROM paths p, edges e WHERE p.b = e.A AND e.B <> p.b_prev) SELECT a, b, MAX(d) FROM paths; WITH RECURSIVE paths(a, b, b_prev, d) AS ( SELECT A, B, A FROM edges UNION SELECT p.a, e.B, e.A, p.d + e.d FROM paths p, edges e WHERE p.b = e.A AND e.B <> p.b_prev) SELECT a, b, MAX(d) FROM paths; p.a p p.b p.b prev A

90 Today’s Lecture 1.Boyce-Codd Normal Form ACTIVITY 2.Decompositions & 3NF ACTIVITY 3.MVDs ACTIVITY 90 Lecture 7

91 1. Boyce-Codd Normal Form 91 Lecture 7 > Section 1

92 What you will learn about in this section 1.Conceptual Design 2.Boyce-Codd Normal Form 3.The BCNF Decomposition Algorithm 4.ACTIVITY 92 Lecture 7 > Section 1

93 Conceptual Design Lecture 7 > Section 1 > Conceptual Design

94 94 Back to Conceptual Design Now that we know how to find FDs, it’s a straight-forward process: 1.Search for “bad” FDs 2.If there are any, then keep decomposing the table into sub-tables until no more bad FDs 3.When done, the database schema is normalized Recall: there are several normal forms… Lecture 7 > Section 1 > Conceptual Design

95 Boyce-Codd Normal Form (BCNF) Main idea is that we define “good” and “bad” FDs as follows: X  A is a “good FD” if X is a (super)key In other words, if A is the set of all attributes X  A is a “bad FD” otherwise We will try to eliminate the “bad” FDs! Lecture 7 > Section 1 > BCNF

96 Boyce-Codd Normal Form (BCNF) Why does this definition of “good” and “bad” FDs make sense? If X is not a (super)key, it functionally determines some of the attributes Recall: this means there is redundancy And redundancy like this can lead to data anomalies! EmpIDNamePhonePosition E0045Smith1234Clerk E3542Mike9876Salesrep E1111Smith9876Salesrep E9999Mary1234Lawyer Lecture 7 > Section 1 > BCNF

97 97 Boyce-Codd Normal Form BCNF is a simple condition for removing anomalies from relations: In other words: there are no “bad” FDs A relation R is in BCNF if: if {A 1,..., A n }  B is a non-trivial FD in R then {A 1,..., A n } is a superkey for R A relation R is in BCNF if: if {A 1,..., A n }  B is a non-trivial FD in R then {A 1,..., A n } is a superkey for R Lecture 7 > Section 1 > BCNF

98 98 Example What is the key? {SSN, PhoneNumber} What is the key? {SSN, PhoneNumber} NameSSNPhoneNumberCity Fred123-45-6789206-555-1234Seattle Fred123-45-6789206-555-6543Seattle Joe987-65-4321908-555-2121Westfield Joe987-65-4321908-555-1234Westfield {SSN}  {Name,City} This FD is bad because it is not a superkey Lecture 7 > Section 1 > BCNF

99 99 Example NameSSNCity Fred123-45-6789Seattle Joe987-65-4321Madison SSNPhoneNumber 123-45-6789206-555-1234 123-45-6789206-555-6543 987-65-4321908-555-2121 987-65-4321908-555-1234 Let’s check anomalies: Redundancy ? Update ? Delete ? Let’s check anomalies: Redundancy ? Update ? Delete ? {SSN}  {Name,City} Now in BCNF! This FD is now good because it is the key Lecture 7 > Section 1 > BCNF

100 100 BCNF Decomposition Algorithm BCNFDecomp(R): Find X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R1(X  Y) and R2(X  Z) Return BCNFDecomp(R1), BCNFDecomp(R2) BCNFDecomp(R): Find X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R1(X  Y) and R2(X  Z) Return BCNFDecomp(R1), BCNFDecomp(R2) Lecture 7 > Section 1 > BCNF

101 101 BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R1(X  Y) and R2(X  Z) Return BCNFDecomp(R1), BCNFDecomp(R2) BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R1(X  Y) and R2(X  Z) Return BCNFDecomp(R1), BCNFDecomp(R2) Find a set of attributes X which has non-trivial “bad” FDs, i.e. is not a superkey, using closures Lecture 7 > Section 1 > BCNF

102 102 BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R1(X  Y) and R2(X  Z) Return BCNFDecomp(R1), BCNFDecomp(R2) BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R1(X  Y) and R2(X  Z) Return BCNFDecomp(R1), BCNFDecomp(R2) If no “bad” FDs found, in BCNF! Lecture 7 > Section 1 > BCNF

103 103 BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R1(X  Y) and R2(X  Z) Return BCNFDecomp(R1), BCNFDecomp(R2) BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R1(X  Y) and R2(X  Z) Return BCNFDecomp(R1), BCNFDecomp(R2) Let Y be the attributes that X functionally determines (+ that are not in X) And let Z be the other attributes that it doesn’t Let Y be the attributes that X functionally determines (+ that are not in X) And let Z be the other attributes that it doesn’t Lecture 7 > Section 1 > BCNF

104 104 BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X  Y) and R 2 (X  Z) Return BCNFDecomp(R1), BCNFDecomp(R2) BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X  Y) and R 2 (X  Z) Return BCNFDecomp(R1), BCNFDecomp(R2) XZY R1R1 R2R2 Split into one relation (table) with X plus the attributes that X determines (Y)… Lecture 7 > Section 1 > BCNF

105 105 BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X  Y) and R 2 (X  Z) Return BCNFDecomp(R1), BCNFDecomp(R2) BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X  Y) and R 2 (X  Z) Return BCNFDecomp(R1), BCNFDecomp(R2) XZY R1R1 R2R2 And one relation with X plus the attributes it does not determine (Z) Lecture 7 > Section 1 > BCNF

106 106 BCNF Decomposition Algorithm BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X  Y) and R 2 (X  Z) Return BCNFDecomp(R 1 ), BCNFDecomp(R 2 ) BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X  Y) and R 2 (X  Z) Return BCNFDecomp(R 1 ), BCNFDecomp(R 2 ) Proceed recursively until no more “bad” FDs! Lecture 7 > Section 1 > BCNF

107 R(A,B,C,D,E) BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X  Y) and R 2 (X  Z) Return BCNFDecomp(R 1 ), BCNFDecomp(R 2 ) BCNFDecomp(R): Find a set of attributes X s.t.: X + ≠ X and X + ≠ [all attributes] if (not found) then Return R let Y = X + - X, Z = (X + ) C decompose R into R 1 (X  Y) and R 2 (X  Z) Return BCNFDecomp(R 1 ), BCNFDecomp(R 2 ) Example {A}  {B,C} {C}  {D} {A}  {B,C} {C}  {D} Lecture 7 > Section 1 > BCNF

108 108 Example R(A,B,C,D,E) {A} + = {A,B,C,D} ≠ {A,B,C,D,E} R 1 (A,B,C,D) {C} + = {C,D} ≠ {A,B,C,D} R 2 (A,E)R 11 (C,D) R 12 (A,B,C) R(A,B,C,D,E) {A}  {B,C} {C}  {D} {A}  {B,C} {C}  {D} Lecture 7 > Section 1 > BCNF

109 Activity-7-1.ipynb 109 Lecture 7 > Section 1 > ACTIVITY

110 2. Decompositions 110 Lecture 7 > Section 2

111 What you will learn about in this section 1.Lossy & Lossless Decompositions 2.ACTIVITY 111 Lecture 7 > Section 2

112 Recap: Decompose to remove redundancies 1.We saw that redundancies in the data (“bad FDs”) can lead to data anomalies 2.We developed mechanisms to detect and remove redundancies by decomposing tables into BCNF 1.BCNF decomposition is standard practice- very powerful & widely used! 3.However, sometimes decompositions can lead to more subtle unwanted effects… 112 Lecture 7 > Section 2 > Decompositions When does this happen?

113 113 Decompositions in General R 1 = the projection of R on A 1,..., A n, B 1,..., B m R(A 1,...,A n,B 1,...,B m,C 1,...,C p ) R 1 (A 1,...,A n,B 1,...,B m ) R 2 (A 1,...,A n,C 1,...,C p ) Lecture 7 > Section 2 > Decompositions R 2 = the projection of R on A 1,..., A n, C 1,..., C p

114 114 Theory of Decomposition NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NamePrice Gizmo19.99 OneClick24.99 Gizmo19.99 NameCategory GizmoGadget OneClickCamera GizmoCamera I.e. it is a Lossless decomposition Sometimes a decomposition is “correct” Lecture 7 > Section 2 > Decompositions

115 115 Lossy Decomposition NamePriceCategory Gizmo19.99Gadget OneClick24.99Camera Gizmo19.99Camera NameCategory GizmoGadget OneClickCamera GizmoCamera PriceCategory 19.99Gadget 24.99Camera 19.99Camera What’s wrong here? However sometimes it isn’t Lecture 7 > Section 2 > Decompositions

116 Lossless Decompositions What (set) relationship holds between R1 Join R2 and R if lossless? Hint: Which tuples of R will be present? It’s lossless if we have equality! R(A 1,...,A n,B 1,...,B m,C 1,...,C p ) R 1 (A 1,...,A n,B 1,...,B m ) R 2 (A 1,...,A n,C 1,...,C p ) Lecture 7 > Section 2 > Decompositions

117 Lossless Decompositions A decomposition R to (R1, R2) is lossless if R = R1 Join R2 R(A 1,...,A n,B 1,...,B m,C 1,...,C p ) R 1 (A 1,...,A n,B 1,...,B m ) R 2 (A 1,...,A n,C 1,...,C p ) Lecture 7 > Section 2 > Decompositions

118 Lossless Decompositions 118 BCNF decomposition is always lossless. Why? Note: don’t need {A 1,..., A n }  {C 1,..., C p } Note: don’t need {A 1,..., A n }  {C 1,..., C p } If {A 1,..., A n }  {B 1,..., B m } Then the decomposition is lossless If {A 1,..., A n }  {B 1,..., B m } Then the decomposition is lossless R(A 1,...,A n,B 1,...,B m,C 1,...,C p ) R 1 (A 1,...,A n,B 1,...,B m ) R 2 (A 1,...,A n,C 1,...,C p ) Lecture 7 > Section 2 > Decompositions

119 A problem with BCNF Note: This is historically inaccurate, but it makes it easier to explain Problem: To enforce a FD, must reconstruct original relation—on each insert! Lecture 7 > Section 2 > Decompositions

120 120 A Problem with BCNF {Unit}  {Company} {Company,Product}  {Unit} We do a BCNF decomposition on a “bad” FD: {Unit} + = {Unit, Company} We lose the FD {Company,Product}  {Unit} !! UnitCompanyProduct ……… UnitCompany …… UnitProduct …… {Unit}  {Company} Lecture 7 > Section 2 > Decompositions

121 121 So Why is that a Problem? No problem so far. All local FD’s are satisfied. UnitCompany Galaga99UW BingoUW UnitProduct Galaga99Databases BingoDatabases UnitCompanyProduct Galaga99UWDatabases BingoUWDatabases Let’s put all the data back into a single table again: {Unit}  {Company} Violates the FD {Company,Product}  {Unit} !! Lecture 7 > Section 2 > Decompositions

122 122 The Problem We started with a table R and FDs F We decomposed R into BCNF tables R 1, R 2, … with their own FDs F 1, F 2, … We insert some tuples into each of the relations—which satisfy their local FDs but when reconstruct it violates some FD across tables! Practical Problem: To enforce FD, must reconstruct R—on each insert! Lecture 7 > Section 2 > Decompositions

123 123 Possible Solutions Various ways to handle so that decompositions are all lossless / no FDs lost For example 3NF- stop short of full BCNF decompositions. See Bonus Activity! Usually a tradeoff between redundancy / data anomalies and FD preservation… BCNF still most common- with additional steps to keep track of lost FDs… Lecture 7 > Section 2 > Decompositions

124 Activity-7-2.ipynb 124 Lecture 7 > Section 2 > ACTIVITY

125 3. MVDs 125 Lecture 7 > Section 3

126 What you will learn about in this section 1.MVDs 2.ACTIVITY 126 Lecture 7 > Section 3

127 Multiple Value Dependencies (MVDs) MVD Ex: For each fixed course (e.g. CS145), Every staff member in that course and every student in that course occur in a tuple in that table. CourseStaffStudent CS949AmyBob CS145ChrisDeb CS145ChrisEli CS145FirasDeb CS145FirasEli Write: Course ↠ Staff or Course ↠ Student Lecture 7 > Section 3 > MVDs

128 Formal Definition of MVD CourseStaffStudent CS949AmyBob CS145ChrisDeb CS145ChrisEli CS145FirasDeb CS145FirasEli t1t1 t2t2 t3t3 Course ↠ Staff Lecture 7 > Section 3 > MVDs

129 Formal Definition of MVD CourseStaffStudent CS949AmyBob CS145ChrisDeb CS145ChrisEli CS145FirasDeb CS145FirasEli t1t1 t2t2 Course ↠ Staff A Lecture 7 > Section 3 > MVDs

130 Formal Definition of MVD CourseStaffStudent CS949AmyBob CS145ChrisDeb CS145ChrisEli CS145FirasDeb CS145FirasEli t1t1 t2t2 t3t3 Course ↠ Staff A Lecture 7 > Section 3 > MVDs

131 Formal Definition of MVD CourseStaffStudent CS949AmyBob CS145ChrisDeb CS145ChrisEli CS145FirasDeb CS145FirasEli t1t1 t2t2 t3t3 Course ↠ Staff A B Lecture 7 > Section 3 > MVDs

132 Formal Definition of MVD CourseStaffStudent CS949AmyBob CS145ChrisDeb CS145ChrisEli CS145FirasDeb CS145FirasEli t1t1 t2t2 t3t3 Course ↠ Staff A B C Lecture 7 > Section 3 > MVDs

133 Does Course ↠ Staff hold now? CourseStaffStudent CS949AmyBob CS145ChrisDeb CS145ChrisEli CS145FirasEli Lecture 7 > Section 3 > MVDs

134 Connection to FDs If A  B does A ↠ B ? Hint: sort of like multiplying by one… Lecture 7 > Section 3 > MVDs

135 Comments on MVDs MVDs have “rules” too! Experts: Axiomatizable 4 th Normal Form is “non-trivial MVD” For AI nerds: MVD is conditional independence in graphical models! Lecture 7 > Section 3 > MVDs

136 Activity-7-3.ipynb 136 Lecture 7 > Section 3 > ACTIVITY

137 Summary Constraints allow one to reason about redundancy in the data Normal forms describe how to remove this redundancy by decomposing relations Elegant—by representing data appropriately certain errors are essentially impossible For FDs, BCNF is the normal form. A tradeoff for insert performance: 3NF Lectures 5,7 > SUMMARY

138 Feedback! https://ctleval.stanford.edu/auth/evaluation.php?id=9451 We’re trying a new format (the notebooks, the 20+activity lecture format). Constructive feedback (of any aspect) is really appreciated! We’ve tried to be responsive (when we can get out of our own way!)


Download ppt "Lectures 5 & 7: Design Theory Lectures 5 & 7. Announcements Homework #1 due today! Homework was not easy You learned a new, declarative way of programming!"

Similar presentations


Ads by Google