Download presentation
Presentation is loading. Please wait.
Published byJovani Carle Modified over 9 years ago
1
N ORMALIZATION Joe Meehean 1
2
R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing data make building software on top of DB difficult Normalization process of removing redundancies 2
3
M ODIFICATION A NOMALIES Insert anomaly extra data must be known to insert a row into a table Update anomaly must change multiple rows to modify a single fact Deletion anomaly deleting a row causes other data to be deleted deletes more data than is necessary or desired 3
4
B AD C OLLEGE D ATABASE All data in 1 table 4 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring2014---C1DB
5
B AD C OLLEGE D ATABASE Insert anomaly adding Rush Daniels as a student requires knowing which offerings Rush is enrolled in cannot add Rush as a student until he enrolls 5 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring2014---C1DB
6
B AD C OLLEGE D ATABASE Update anomaly if Emily changes her name to Emma need to change multiple rows 6 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring2014---C1DB
7
B AD C OLLEGE D ATABASE Delete anomaly if Roger drops out of college and we delete him we also delete that there is an offering of DB in the spring 7 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring2014---C1DB
8
F UNCTIONAL D EPENDENCIES (FD S ) Constraint between 2 or more columns Represented by → X determines Y (X →Y) if there exists at most 1 value of Y for each value of X like a mathematical function f(x) = y left hand side (or LHS) is called the determinant e.g., StdNo determines Student first name StdNo → First Name 8
9
O RGANIZING FD S Make a list can condense list by listing all dependent columns for a given determinant e.g., StdNo →First Name, Last Name Determinants should be minimal least # of columns required to determine values of other columns e.g., StdNo,First Name → Last Name 9
10
B AD C OLLEGE D ATABASE StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course Descr. Std No, Offer No → Grade 10 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring2014---C1DB
11
I DENTIFYING FD S From business narrative Look for words like unique e.g., “Each student has a unique student number, a first name, and a last name.” Look for 1-M relationships child (M-side) is the determinant (LHS) e.g., “Faculty teach many offerings.” e.g., Offer No → Faculty Id 11
12
I DENTIFYING FD S From relational tables FDs where determinant (LHS) is not the PK or a candidate key recall, a candidate key is column(s) that unique identify a row e.g., Zip → State Combined PKs does 1 column determine values of some other columns? e.g., StdNo → First Name, Last Name 12
13
Q UESTIONS ? 13
14
N ORMAL F ORMS Normalization remove redundancies in tables removes modification anomalies makes data easier to modify Normal form rules about functional dependencies (FDs) allowed each successive normal form removes FDs 14
15
N ORMAL F ORMS 15 1NF 2NF 3NF/BCNF
16
1 ST N ORMAL F ORM All relational tables are already in 1NF by definition 16
17
2 ND N ORMAL F ORM Key columns columns that are part (or all of) a candidate key recall a candidate key is a key that uniquely identifies a row Non-key columns columns that are not part of a candidate key 17
18
2 ND N ORMAL F ORM A table is in 2NF if each non-key column depends on all candidate keys NOT on any subset of any candidate key check functional dependencies (FDs) A 2NF violation a FD where part of a key determines a non-key column 18
19
2 ND N ORMAL F ORM 19 2NF Violations StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course Descr. StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB
20
3 RD N ORMAL F ORM A table is in 3NF if it is in 2NF AND each non-key column depends only on candidate keys NOT other non-key columns e.g., CourseNr → Course Desc. 3NF violation a non-key column on the right-hand side (RHS) AND anything other than a candidate key on LHS 20
21
3 RD N ORMAL F ORM 3NF prohibits transitive dependencies Transitive dependencies if A → B & B → C, then A → C e.g., Offer No → Course No & Course No → Course Desc. then Offer No → Course Desc. 21
22
C OMBINED 2NF & 3NF A table is in 3NF if each non-key column depends on all candidate keys whole candidate keys and nothing but candidate keys 22
23
3 RD N ORMAL F ORM 23 2NF Violations StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course Descr. 3NF Violations CourseNo → Course Descr. OfferNo → Course Descr. StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB
24
B OYCE -C ODD N ORMAL F ORM (BCNF) Revised, simpler version of 3NF Covers additional special cases A table is in BCNF if every determinant is a candidate key Violations are easy to detect determinant (LHS) is not a candidate key e.g., StdNo → Last Name 24
25
B OYCE -C ODD N ORMAL F ORM (BCNF) Excludes 2 redundancies that 3NF does not 1. part of a key determines part of a key 2. a non-key determines part of a key 25
26
B OYCE -C ODD N ORMAL F ORM (BCNF) 26 StdNoOfferNoEmailEnrGrade S1O1blem@fake.edu3.5 S1O2blem@fake.edu3.6 S2O1rush@fake.edu3.8 S2O3rush@fake.edu3.5 BCNF Violations Email → StdNo
27
S IMPLE S YNTHESIS (BCNF) Convert tables into BCNF 1. Eliminate extraneous columns from LHS of FDs 2. Remove derived (transitive) FDs 3. Arrange FDs into groups by determinant 4. For each FD group make table with determinant as primary key 5. Merge tables where one table include all columns of other table choose PK of one of the tables to be PK of new table 27
28
B AD C OLLEGE D ATABASE (1) StdNo → First Name StdNo → Last Name OfferNo → Term OfferNo → Year Offer No → Course No Offer No → Course Descr. Std No, Offer No → Grade Course No → Course Descr. 28 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB
29
B AD C OLLEGE D ATABASE (2) StdNo → First Name StdNo → Last Name OfferNo → Term OfferNo → Year Offer No → Course No Offer No → Course Descr. Std No, Offer No → Grade Course No → Course Descr. 29 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB
30
B AD C OLLEGE D ATABASE (3) StdNo → First Name, Last Name OfferNo → Term, Year, Course No Std No, Offer No → Grade Course No → Course Descr. 30 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB
31
B AD C OLLEGE D ATABASE (4) 31 StdNoFirst NameLast Name S1EmilyBlem S2PhilPark Offer NoTermYearCourse No O1Spring2012C1 O2Fall2011C2 O3Spring2012C3 StdNoOfferNoGrade S1O1-- S1O2B+ S2O3-- S202B+ Course NoCourse Descr. C1PL C2DB C3OS
32
B AD C OLLEGE D ATABASE (5) 32 StdNoFirst NameLast Name S1EmilyBlem S2PhilPark Offer NoTermYearCourse No O1Spring2012C1 O2Fall2011C2 O3Spring2012C3 StdNoOfferNoGrade S1O1-- S1O2B+ S2O3-- S202B+ Course NoCourse Descr. C1PL C2DB C3OS
33
I MPORTANCE OF N ORMAL F ORM V IOLATIONS We have the BCNF synthesis process we can just make BCNF tables why do we care about detecting NF violations? DBA has 2 jobs make new databases maintain old ones Making new DBs requires using BCNF synthesis process Maintaining old DBs requires detecting NF violations perhaps made by other employees detecting violations narrows scope of DB redesign 33
34
Q UESTIONS ? 34
35
4 TH N ORMAL F ORM (4NF) M-way relationships associative entity types (weak entities) multiple associations primary key made of FKs from 3 or more tables often represent important documents glue multiple things together e.g., invoice can sometimes contain redundancies 35
36
4 TH N ORMAL F ORM (4NF) 36 Student StdNo Name Offering OfferNo Location Textbook TextNo TextTitle Enroll
37
4 TH N ORMAL F ORM (4NF) 37 StdNoOfferNoTextNo S1O1T1 S1O2T2 S1O1T2 S1O2T3 Enroll Table
38
M ULTIVALUED D EPENDENCIES (MVD S ) Given table R with columns X,Y, and Z X →→ Y each X maps to a set of Ys (between 1 and M) X →→ Z each X maps to a set of Zs (between 1 and M) Y & Z are independent knowing Y doesn’t tell you anything about Z and vice-versa Y →→ Z & Y → Z Z →→ Y & Z → Y also Y,V →→ Z, unless V →→ Z Every FD is an MVD not every MVD is an FD 38
39
T RIVIAL MVD S MVD X →→ Y is trivial if Y is a subset of X OR X and Y are the only columns in the table OR X → Y and X → Z e.g., has-job table E# →→ P# e.g. offering table C#, S# →→ #S 39 Employee#Position# Course Number Section #Faculty ID
40
M ULTIVALUED D EPENDENCES (MVD S ) non-trivial MVDs manifest as redundancies in tables there exist rows where X and Y are the same but Z is different e.g., enroll table O# →→ S# O# →→ T# S# independent of T# if Emily drops 242 it doesn’t change the text books 40 OfferNoStudentNoTextNo CS242APhil CS242AEmily CS242ADrozdek CS242AWeiss
41
M ULTIVALUED D EPENDENCES (MVD S ) non-trivial MVDs manifest as redundancies in tables there exist rows where X and Y are the same but Z is different e.g., enroll table O# →→ S# O# →→ T# S# independent of T# if Emily drops 242 it doesn’t change the text books 41 OfferNoStudentNoTextNo CS242APhilWeis CS242AEmilyDrozdek CS242APhilDrozdek CS242AEmilyWeiss
42
4 TH N ORMAL F ORM (4NF) 4 th normal form table in BCNF AND all MVDs are trivial Detecting a violation are there any MVDs? are those MVDs non-trivial? 42
43
4 TH N ORMAL F ORM (4NF) Resolving violations X →→ Y X →→ Z 43 XYZ X1Y1Z1 X1Y2Z2 X1Y2Z1 X1Y1Z2 XY X1Y1 X1Y2 XZ X1Z1 X1Z2
44
M ORE E XAMPLES StudentOfferingGrade PhilCS242AA PhilCS370AB EmilyCS242AB EmilyCS370AA 44 S →→ O & S →→ G ? O →→ G & O →→ S ? G →→ S & G →→ O ?
45
M ORE E XAMPLES StudentOfferingGrade PhilCS242AA PhilCS370AB EmilyCS242AB EmilyCS370AA 45 Offering and Grade not independent Grade and Student not independent Student and Offering not indepedent S →→ O & S →→ G ? O →→ G & O →→ S ? G →→ S & G →→ O ?
46
M ORE E XAMPLES B →→ E & B →→ C Is this a trivial MVD? 46 Bank BranchEmployeeCustomer B3AnnTed B3TerryAlfred B3AnnAlfred B3TerryTed
47
M ORE E XAMPLES B →→ E & B →→ C Is this a trivial MVD? E is not a subset of B & C is not a subset of B B and E are not the only columns in the table B → E & B → C NO!!! 47 Bank BranchEmployeeCustomer B3AnnTed B3TerryAlfred B3AnnAlfred B3TerryTed
48
M ORE E XAMPLES 48 Bank BranchEmployeeCustomer B3AnnTed B3TerryAlfred B3AnnAlfred B3TerryTed Bank BranchEmployee B3Ann B3Terry Bank BranchCustomer B3Ted B3Alfred
49
Q UESTIONS ? 49
50
50 Part#PQtyPDesc P125mm bolt P2410mm nut P325mm wrench P448mm washer PQty →→ PDesc & PQty →→ Part# ?
51
51 Loc #ItemManagers L1XBox 360 250GBCindy L1Garmin GPSAaron L1XBox 360 250GBAaron L1Garmin GPSCindy
52
E XTRA 4NF S LIDES 52
53
4 TH N ORMAL F ORM (4NF) Relationship independence 2 relationships are independent if one cannot be derived from the other knowing one relationship tells you nothing about the other 53
54
4 TH N ORMAL F ORM (4NF) 54 StdNoOfferNoTextNo S1O1T1 S1O2T2 S1O1T2 S1O2T3 Enroll Table 3 relationships StdNo -- OfferNo StdNo -- TextNo OfferNo -- TextNo
55
4 TH N ORMAL F ORM (4NF) StdNo -- OfferNo cannot be derived from other 2 StdNo -- TextNo & TextNo -- OfferNo same textbook can be use for 2 offerings OfferNo -- TextNo cannot be derived from other 2 OfferNo -- StdNo & StdNo -- TextNo students use many text books, not all related to this offering StdNo -- TextNo can be derived StdNo -- OfferNo & OfferNo -- TextNo offering number gives the set of texts a student needs 55
56
4 TH N ORMAL F ORM (4NF) Multivalued Dependencies (MVDs) each X can map to a set of Ys and a set of Zs generalization of functional dependencies each X maps to one Y each X maps to one Z represented by X→→Y|Z every FD is an MVD known as a trivial MVD not every MVD is an FD 56
57
4 TH N ORMAL F ORM (4NF) M-way tables sometimes introduce MVDs X →→Y X→→Z X→→Y|Z Y and Z are independent relationship X--Y is independent of relationship X--Z Not all M-way tables produce MVDs 57
58
4 TH N ORMAL F ORM (4NF) MVD Table Redundancies assume X1 maps to Y1 & Y2 and X1 maps to Z1 & Z2 58 XYZ X1Y1 X1Y2 X1Z1 X1Z2
59
4 TH N ORMAL F ORM (4NF) Need to fill in the rest of the table 59 XYZ X1Y1Z1 X1Y2Z2 X1Y2Z1 X1Y1Z2
60
4 TH N ORMAL F ORM (4NF) Rows below the line exist because relationship B--C can be derived from relationships A--B & A--C Rows below line are redundant 60 XYZ X1Y1Z1 X1Y2Z2 X1Y2Z1 X1Y1Z2
61
4 TH N ORMAL F ORM (4NF) 61 OfferNoStdNoTextNo O1S1T1 O1S2T2 O1S2T1 O1S1T2 Enroll Table OfferNo→→StdNo|TextNo offerings map to many students offerings can have many text books Rows below the line are redundant
62
4 TH N ORMAL F ORM (4NF) 4NF definition tables cannot contain any non-trivial MVDs Resolving 4NF violations for each table with a non-trivial MVD split 3 column table into two 2 column tables A,B,C goes to A,B & A,C 62 StdNoOfferNo S1O1 S1O2 OfferNoTextNo O1T1 O1T2 O2T1 O2T3
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.