Presentation is loading. Please wait.

Presentation is loading. Please wait.

N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing.

Similar presentations


Presentation on theme: "N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing."— Presentation transcript:

1 N ORMALIZATION Joe Meehean 1

2 R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing data make building software on top of DB difficult Normalization process of removing redundancies 2

3 M ODIFICATION A NOMALIES Insert anomaly extra data must be known to insert a row into a table Update anomaly must change multiple rows to modify a single fact Deletion anomaly deleting a row causes other data to be deleted deletes more data than is necessary or desired 3

4 B AD C OLLEGE D ATABASE All data in 1 table 4 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring2014---C1DB

5 B AD C OLLEGE D ATABASE Insert anomaly adding Rush Daniels as a student requires knowing which offerings Rush is enrolled in cannot add Rush as a student until he enrolls 5 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring2014---C1DB

6 B AD C OLLEGE D ATABASE Update anomaly if Emily changes her name to Emma need to change multiple rows 6 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring2014---C1DB

7 B AD C OLLEGE D ATABASE Delete anomaly if Roger drops out of college and we delete him we also delete that there is an offering of DB in the spring 7 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring2014---C1DB

8 F UNCTIONAL D EPENDENCIES (FD S ) Constraint between 2 or more columns Represented by → X determines Y (X →Y) if there exists at most 1 value of Y for each value of X like a mathematical function f(x) = y left hand side (or LHS) is called the determinant e.g., StdNo determines Student first name StdNo → First Name 8

9 O RGANIZING FD S Make a list can condense list by listing all dependent columns for a given determinant e.g., StdNo →First Name, Last Name Determinants should be minimal least # of columns required to determine values of other columns e.g., StdNo,First Name → Last Name 9

10 B AD C OLLEGE D ATABASE StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course Descr. Std No, Offer No → Grade 10 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring2014---C1DB

11 I DENTIFYING FD S From business narrative Look for words like unique e.g., “Each student has a unique student number, a first name, and a last name.” Look for 1-M relationships child (M-side) is the determinant (LHS) e.g., “Faculty teach many offerings.” e.g., Offer No → Faculty Id 11

12 I DENTIFYING FD S From relational tables FDs where determinant (LHS) is not the PK or a candidate key recall, a candidate key is column(s) that unique identify a row e.g., Zip → State Combined PKs does 1 column determine values of some other columns? e.g., StdNo → First Name, Last Name 12

13 Q UESTIONS ? 13

14 N ORMAL F ORMS Normalization remove redundancies in tables removes modification anomalies makes data easier to modify Normal form rules about functional dependencies (FDs) allowed each successive normal form removes FDs 14

15 N ORMAL F ORMS 15 1NF 2NF 3NF/BCNF

16 1 ST N ORMAL F ORM All relational tables are already in 1NF by definition 16

17 2 ND N ORMAL F ORM Key columns columns that are part (or all of) a candidate key recall a candidate key is a key that uniquely identifies a row Non-key columns columns that are not part of a candidate key 17

18 2 ND N ORMAL F ORM A table is in 2NF if each non-key column depends on all candidate keys NOT on any subset of any candidate key check functional dependencies (FDs) A 2NF violation a FD where part of a key determines a non-key column 18

19 2 ND N ORMAL F ORM 19 2NF Violations StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course Descr. StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB

20 3 RD N ORMAL F ORM A table is in 3NF if it is in 2NF AND each non-key column depends only on candidate keys NOT other non-key columns e.g., CourseNr → Course Desc. 3NF violation a non-key column on the right-hand side (RHS) AND anything other than a candidate key on LHS 20

21 3 RD N ORMAL F ORM 3NF prohibits transitive dependencies Transitive dependencies if A → B & B → C, then A → C e.g., Offer No → Course No & Course No → Course Desc. then Offer No → Course Desc. 21

22 C OMBINED 2NF & 3NF A table is in 3NF if each non-key column depends on all candidate keys whole candidate keys and nothing but candidate keys 22

23 3 RD N ORMAL F ORM 23 2NF Violations StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course Descr. 3NF Violations CourseNo → Course Descr. OfferNo → Course Descr. StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB

24 B OYCE -C ODD N ORMAL F ORM (BCNF) Revised, simpler version of 3NF Covers additional special cases A table is in BCNF if every determinant is a candidate key Violations are easy to detect determinant (LHS) is not a candidate key e.g., StdNo → Last Name 24

25 B OYCE -C ODD N ORMAL F ORM (BCNF) Excludes 2 redundancies that 3NF does not 1. part of a key determines part of a key 2. a non-key determines part of a key 25

26 B OYCE -C ODD N ORMAL F ORM (BCNF) 26 StdNoOfferNoEmailEnrGrade S1O1blem@fake.edu3.5 S1O2blem@fake.edu3.6 S2O1rush@fake.edu3.8 S2O3rush@fake.edu3.5 BCNF Violations Email → StdNo

27 S IMPLE S YNTHESIS (BCNF) Convert tables into BCNF 1. Eliminate extraneous columns from LHS of FDs 2. Remove derived (transitive) FDs 3. Arrange FDs into groups by determinant 4. For each FD group make table with determinant as primary key 5. Merge tables where one table include all columns of other table choose PK of one of the tables to be PK of new table 27

28 B AD C OLLEGE D ATABASE (1) StdNo → First Name StdNo → Last Name OfferNo → Term OfferNo → Year Offer No → Course No Offer No → Course Descr. Std No, Offer No → Grade Course No → Course Descr. 28 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB

29 B AD C OLLEGE D ATABASE (2) StdNo → First Name StdNo → Last Name OfferNo → Term OfferNo → Year Offer No → Course No Offer No → Course Descr. Std No, Offer No → Grade Course No → Course Descr. 29 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB

30 B AD C OLLEGE D ATABASE (3) StdNo → First Name, Last Name OfferNo → Term, Year, Course No Std No, Offer No → Grade Course No → Course Descr. 30 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB

31 B AD C OLLEGE D ATABASE (4) 31 StdNoFirst NameLast Name S1EmilyBlem S2PhilPark Offer NoTermYearCourse No O1Spring2012C1 O2Fall2011C2 O3Spring2012C3 StdNoOfferNoGrade S1O1-- S1O2B+ S2O3-- S202B+ Course NoCourse Descr. C1PL C2DB C3OS

32 B AD C OLLEGE D ATABASE (5) 32 StdNoFirst NameLast Name S1EmilyBlem S2PhilPark Offer NoTermYearCourse No O1Spring2012C1 O2Fall2011C2 O3Spring2012C3 StdNoOfferNoGrade S1O1-- S1O2B+ S2O3-- S202B+ Course NoCourse Descr. C1PL C2DB C3OS

33 I MPORTANCE OF N ORMAL F ORM V IOLATIONS We have the BCNF synthesis process we can just make BCNF tables why do we care about detecting NF violations? DBA has 2 jobs make new databases maintain old ones Making new DBs requires using BCNF synthesis process Maintaining old DBs requires detecting NF violations perhaps made by other employees detecting violations narrows scope of DB redesign 33

34 Q UESTIONS ? 34

35 4 TH N ORMAL F ORM (4NF) M-way relationships associative entity types (weak entities) multiple associations primary key made of FKs from 3 or more tables often represent important documents glue multiple things together e.g., invoice can sometimes contain redundancies 35

36 4 TH N ORMAL F ORM (4NF) 36 Student StdNo Name Offering OfferNo Location Textbook TextNo TextTitle Enroll

37 4 TH N ORMAL F ORM (4NF) 37 StdNoOfferNoTextNo S1O1T1 S1O2T2 S1O1T2 S1O2T3 Enroll Table

38 M ULTIVALUED D EPENDENCIES (MVD S ) Given table R with columns X,Y, and Z X →→ Y each X maps to a set of Ys (between 1 and M) X →→ Z each X maps to a set of Zs (between 1 and M) Y & Z are independent knowing Y doesn’t tell you anything about Z and vice-versa Y →→ Z & Y → Z Z →→ Y & Z → Y also Y,V →→ Z, unless V →→ Z Every FD is an MVD not every MVD is an FD 38

39 T RIVIAL MVD S MVD X →→ Y is trivial if Y is a subset of X OR X and Y are the only columns in the table OR X → Y and X → Z e.g., has-job table E# →→ P# e.g. offering table C#, S# →→ #S 39 Employee#Position# Course Number Section #Faculty ID

40 M ULTIVALUED D EPENDENCES (MVD S ) non-trivial MVDs manifest as redundancies in tables there exist rows where X and Y are the same but Z is different e.g., enroll table O# →→ S# O# →→ T# S# independent of T# if Emily drops 242 it doesn’t change the text books 40 OfferNoStudentNoTextNo CS242APhil CS242AEmily CS242ADrozdek CS242AWeiss

41 M ULTIVALUED D EPENDENCES (MVD S ) non-trivial MVDs manifest as redundancies in tables there exist rows where X and Y are the same but Z is different e.g., enroll table O# →→ S# O# →→ T# S# independent of T# if Emily drops 242 it doesn’t change the text books 41 OfferNoStudentNoTextNo CS242APhilWeis CS242AEmilyDrozdek CS242APhilDrozdek CS242AEmilyWeiss

42 4 TH N ORMAL F ORM (4NF) 4 th normal form table in BCNF AND all MVDs are trivial Detecting a violation are there any MVDs? are those MVDs non-trivial? 42

43 4 TH N ORMAL F ORM (4NF) Resolving violations X →→ Y X →→ Z 43 XYZ X1Y1Z1 X1Y2Z2 X1Y2Z1 X1Y1Z2 XY X1Y1 X1Y2 XZ X1Z1 X1Z2

44 M ORE E XAMPLES StudentOfferingGrade PhilCS242AA PhilCS370AB EmilyCS242AB EmilyCS370AA 44 S →→ O & S →→ G ? O →→ G & O →→ S ? G →→ S & G →→ O ?

45 M ORE E XAMPLES StudentOfferingGrade PhilCS242AA PhilCS370AB EmilyCS242AB EmilyCS370AA 45 Offering and Grade not independent Grade and Student not independent Student and Offering not indepedent S →→ O & S →→ G ? O →→ G & O →→ S ? G →→ S & G →→ O ?

46 M ORE E XAMPLES B →→ E & B →→ C Is this a trivial MVD? 46 Bank BranchEmployeeCustomer B3AnnTed B3TerryAlfred B3AnnAlfred B3TerryTed

47 M ORE E XAMPLES B →→ E & B →→ C Is this a trivial MVD? E is not a subset of B & C is not a subset of B B and E are not the only columns in the table B → E & B → C NO!!! 47 Bank BranchEmployeeCustomer B3AnnTed B3TerryAlfred B3AnnAlfred B3TerryTed

48 M ORE E XAMPLES 48 Bank BranchEmployeeCustomer B3AnnTed B3TerryAlfred B3AnnAlfred B3TerryTed Bank BranchEmployee B3Ann B3Terry Bank BranchCustomer B3Ted B3Alfred

49 Q UESTIONS ? 49

50 50 Part#PQtyPDesc P125mm bolt P2410mm nut P325mm wrench P448mm washer PQty →→ PDesc & PQty →→ Part# ?

51 51 Loc #ItemManagers L1XBox 360 250GBCindy L1Garmin GPSAaron L1XBox 360 250GBAaron L1Garmin GPSCindy

52 E XTRA 4NF S LIDES 52

53 4 TH N ORMAL F ORM (4NF) Relationship independence 2 relationships are independent if one cannot be derived from the other knowing one relationship tells you nothing about the other 53

54 4 TH N ORMAL F ORM (4NF) 54 StdNoOfferNoTextNo S1O1T1 S1O2T2 S1O1T2 S1O2T3 Enroll Table 3 relationships StdNo -- OfferNo StdNo -- TextNo OfferNo -- TextNo

55 4 TH N ORMAL F ORM (4NF) StdNo -- OfferNo cannot be derived from other 2 StdNo -- TextNo & TextNo -- OfferNo same textbook can be use for 2 offerings OfferNo -- TextNo cannot be derived from other 2 OfferNo -- StdNo & StdNo -- TextNo students use many text books, not all related to this offering StdNo -- TextNo can be derived StdNo -- OfferNo & OfferNo -- TextNo offering number gives the set of texts a student needs 55

56 4 TH N ORMAL F ORM (4NF) Multivalued Dependencies (MVDs) each X can map to a set of Ys and a set of Zs generalization of functional dependencies each X maps to one Y each X maps to one Z represented by X→→Y|Z every FD is an MVD known as a trivial MVD not every MVD is an FD 56

57 4 TH N ORMAL F ORM (4NF) M-way tables sometimes introduce MVDs X →→Y X→→Z X→→Y|Z Y and Z are independent relationship X--Y is independent of relationship X--Z Not all M-way tables produce MVDs 57

58 4 TH N ORMAL F ORM (4NF) MVD Table Redundancies assume X1 maps to Y1 & Y2 and X1 maps to Z1 & Z2 58 XYZ X1Y1 X1Y2 X1Z1 X1Z2

59 4 TH N ORMAL F ORM (4NF) Need to fill in the rest of the table 59 XYZ X1Y1Z1 X1Y2Z2 X1Y2Z1 X1Y1Z2

60 4 TH N ORMAL F ORM (4NF) Rows below the line exist because relationship B--C can be derived from relationships A--B & A--C Rows below line are redundant 60 XYZ X1Y1Z1 X1Y2Z2 X1Y2Z1 X1Y1Z2

61 4 TH N ORMAL F ORM (4NF) 61 OfferNoStdNoTextNo O1S1T1 O1S2T2 O1S2T1 O1S1T2 Enroll Table OfferNo→→StdNo|TextNo offerings map to many students offerings can have many text books Rows below the line are redundant

62 4 TH N ORMAL F ORM (4NF) 4NF definition tables cannot contain any non-trivial MVDs Resolving 4NF violations for each table with a non-trivial MVD split 3 column table into two 2 column tables A,B,C goes to A,B & A,C 62 StdNoOfferNo S1O1 S1O2 OfferNoTextNo O1T1 O1T2 O2T1 O2T3


Download ppt "N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing."

Similar presentations


Ads by Google