Download presentation
Presentation is loading. Please wait.
1
Normalization Beyond Third Normal Form
Hugo Kornelis
2
Database design Normalization is boring fun Normalization is hard
Normalization is not important fun easy very
3
Hugo Kornelis Independent database consultant Community addict Speaker, blogger, author, technical editor, Pluralsight author, etc. MVP (SQL Server/Data Platform) Blog:
4
Thank You To Our Sponsors!
5
Overview The basics … … and beyond Key concepts
Normalization up to Third Normal Form … and beyond All “higher” normal forms EKNF, BCNF, 4NF, 5NF, DK/NF, ONF, 6NF
6
Overview The basics … … and beyond Key concepts
Normalization up to Third Normal Form … and beyond All “higher” normal forms EKNF, BCNF, 4NF, 5NF, DK/NF, ONF, 6NF
7
Key Concepts Universe of Discourse (UoD)
Subset of reality … (or of a virtual reality) … as it is seen by the business If it’s not in the UoD, we don’t care.
8
in violation of business rules
Key Concepts Purpose of normalization Prevent data that is incorrect Normal forms defined at a per-table level impossible inconsistent in violation of business rules
9
Key Concepts Functional dependency
Column A determines column B (A B), if … for each possible value for A, there is … either one value for B or no value for B but NEVER more than one value Examples: Chair number Name Chair number Birthdate Birthdate Name
10
Key Concepts Functional dependency - terminology
These all mean the same: Name depends on Badge number Badge number determines Name (short form) Badge number Name “Functional dependency” – sometimes “FD”
11
Key Concepts Composite functional dependency
{A, B} C if for each combination of A and B, there is … either one value for C or no value for C but NEVER more than one value Example: {Room number, Chair number} Name
12
Key Concepts “Cheating” composite functional dependency
These are irrelevant Badge number Name {Badge number, Chair number} Name Cheater!
13
Key Concepts Composite the other way around??
A {B, C} Completely equivalent to A B and A C
14
Key Concepts Candidate Key: Within a table Column
(or combination of columns) Determines every other column in the table BadgeNo Room Chair Name 123 A 25 Marge 124 B 3 William 126 24 Julie 127 André 128 C 5 Kathryn
15
Key Concepts Candidate Key May be more than one One “Primary” key
Rest “Alternate” key
16
First Normal Form Table is in First Normal Form (1NF) if
Table has at least one candidate key All columns are atomic No repeating groups No composite values Depends highly on UoD!
17
First Normal Form No repeating groups Speaker Country Sessions
Allan Mitchell United Kingdom AD-205, AD-207, BI-205 Oliver Engels Germany DBA-305, PD-203 Speaker Country Session Allan Mitchell United Kingdom AD-205 AD-207 BI-205 Oliver Engels Germany DBA-305 PD-203
18
First Normal Form No composite values Speaker Country Session
Allan Mitchell United Kingdom AD-205 AD-207 BI-205 Oliver Engels Germany DBA-305 PD-203 Speaker Country Track Session No Allan Mitchell United Kingdom AD 205 207 BI Oliver Engels Germany DBA 305 PD 203
19
First Normal Form No composite values Speaker Country Session
Allan Mitchell United Kingdom AD-205 AD-207 BI-205 Oliver Engels Germany DBA-305 PD-203 First name Last name Country Session Allan Mitchell United Kingdom AD-205 AD-207 BI-205 Oliver Engels Germany DBA-305 PD-203
20
Second Normal Form Table is in Second Normal Form (2NF) if
Table is in First Normal Form Non-key columns depend on the whole keys Only applies for columns that are not part of any key Can only be violated if at least one key is composite
21
Second Normal Form Non-key columns depend on the whole keys
Session Room Start time Room capacity AD-205 Blue 10:00 125 AD-207 Grand 12:45 228 BI-205 605 16:00 60 DBA-305 PD-203 225 14:30 Session Room capacity {Room, Start time} Room capacity Room Room capacity Cheater!
22
Second Normal Form Non-key columns depend on the whole keys
Session Room Start time Room capacity AD-205 Blue 10:00 125 AD-207 Grand 12:45 228 BI-205 605 16:00 60 DBA-305 PD-203 225 14:30 Room Room capacity Room Room capacity Blue 125 Grand 228 605 60 225
23
Second Normal Form Non-key columns depend on the whole keys Session
Room Start time AD-205 Blue 10:00 AD-207 Grand 12:45 BI-205 605 16:00 DBA-305 PD-203 225 14:30 Room Room capacity Blue 125 Grand 228 605 60 225
24
Third Normal Form Table is in Third Normal Form (3NF) if
Table is in Second Normal Form Non-key columns depend on nothing but the keys Only applies for columns that are not part of any key Violated if non-key column depends on … … one or more other non-key columns … two or more key columns that are part of different keys … one or more non-key columns combined with one or more key columns Not violated if non-key column depends on … … one or more columns that are all part of the same key (because that would already violate 2NF)
25
Third Normal Form Non-key columns depend on nothing but the keys
Badge number Speaker Session 123 Oliver Engels DBA-305 124 Allan Mitchell BI-205 126 127 128 Lara Rubbelke AD-101 Badge number Speaker Badge number Session Speaker Session Session Speaker
26
Third Normal Form Non-key columns depend on nothing but the keys
Badge number Speaker Session 123 Oliver Engels DBA-305 124 Allan Mitchell BI-205 126 127 128 Lara Rubbelke AD-101 Session Speaker Session Speaker DBA-305 Oliver Engels BI-205 Allan Mitchell AD-101 Lara Rubbelke
27
Third Normal Form Non-key columns depend on nothing but the keys
Badge number Session 123 DBA-305 124 BI-205 126 127 128 AD-101 Session Speaker DBA-305 Oliver Engels BI-205 Allan Mitchell AD-101 Lara Rubbelke
28
Illustration: Michael J. Swart
Summary Table is in Third Normal Form if every non-key column depends on The keys, The whole keys, And nothing but the keys (so help me Codd) Bernstein’s algorithm for synthesis of a Third Normal Form schema Dr. E.J. Codd Illustration: Michael J. Swart
29
Boyce-Codd Normal Form
Remember Third Normal Form? Every non-key column depends on the keys, the whole keys, and nothing but the keys Here’s Boyce-Codd Normal Form (BCNF):
30
Boyce-Codd Normal Form
Key columns depend on the whole keys Cheater! Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 225 14:30 {Track, Session number} Room {Track, Session number} Start time {Room, Start time} Session number {Room, Start time} Track Track Room Room Track Cheater! 605
31
Boyce-Codd Normal Form
Key columns depend on the whole keys Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 225 14:30 Track Room Room Track Track Room AD Blue BI 605 DBA Grand PD 225
32
Boyce-Codd Normal Form
Key columns depend on the whole keys Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 225 14:30 Track Room AD Blue BI 605 DBA Grand PD 225
33
Boyce-Codd Normal Form
Key columns depend on the whole keys ? Track Session number Start time AD 205 10:00 207 12:45 BI 16:00 DBA 305 PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD 225
34
Boyce-Codd Normal Form
Key columns depend on the whole keys Track Session number Start time AD 205 10:00 207 12:45 BI 16:00 DBA 305 PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD 225
35
Boyce-Codd Normal Form
Key columns depend on the whole keys Track Session number Start time AD 205 10:00 207 12:45 BI 16:00 DBA 305 PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD 225
36
Boyce-Codd Normal Form
Key columns depend on the whole keys Not always possible to achieve Cheater! Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 14:30 {Track, Session number} Room {Track, Session number} Start time {Room, Start time} Session number {Room, Start time} Track Track Room Room Track
37
Boyce-Codd Normal Form
Key columns depend on the whole keys Not always possible to achieve Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 14:30 Track Room Room Track Track Room AD Blue BI 605 DBA Grand PD
38
Boyce-Codd Normal Form
Key columns depend on the whole keys Not always possible to achieve ? Track Session number Start time AD 205 10:00 207 12:45 BI 16:00 DBA 305 PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD
39
Boyce-Codd Normal Form
Key columns depend on the whole keys Not always possible to achieve Track Session number Start time AD 205 10:00 207 12:45 BI 16:00 DBA 305 PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD 14:30
40
Boyce-Codd Normal Form
Key columns depend on the whole keys Not always possible to achieve Alternative form is not safe either Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD 605
41
Boyce-Codd Normal Form
Key columns depend on the whole keys Not always possible to achieve Alternative form is not safe either, unless you add a “weird” foreign key Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD
42
Elementary Key Normal Form
Remember Third Normal Form? Every non-key column depends on the keys, the whole keys, and nothing but the keys Remember Boyce-Codd Normal Form? Every column depends on the keys, the whole keys, and nothing but the keys Here’s Elementary Key Normal Form (EKNF): Every non-elementary key column depends on the keys, the whole keys, and nothing but the keys
43
Elementary Key Normal Form
What is an elementary key? Based on elementary dependencies {A, B} C is not elementary if C A or C B Elementary key is any key that implements at least one elementary dependency EKNF is same as 3NF, except for columns in non-elementary keys Highest normal form that is guaranteed achievable Bernstein’s algorithm for synthesis of a Third Normal Form schema Does not solve BCNF violations
44
Fourth Normal Form ? Fourth Normal Form not violated
On Monday, you can ask Erland about Design ? Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design Fourth Normal Form not violated
45
Fourth Normal Form ? ! Fourth Normal Form IS violated!
On Monday, you can ask Erland about Design ? ! Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design Fourth Normal Form IS violated! Facts are represented multiple times
46
Fourth Normal Form ! On Monday, you can ask Erland about Design Day
Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design Day Expert Monday Erland Oliver Tuesday Wednesday Hugo Expert Subject Erland Design Tuning Oliver BI Hugo
47
Fourth Normal Form Table is in Fourth Normal Form (4NF) if
Table is in Boyce-Codd Normal Form No multivalued dependencies between subset of columns There will always (by definition) be multivalued dependencies between all columns in the table
48
Fourth Normal Form Multivalued dependency
Column A ”multidetermines” column B (A ↠ B), if … for each possible value for A, there are … zero, one or more values for B, regardless of values in other columns Examples: Session ↠ Attendee Presenter ↠ Attendee
49
Fourth Normal Form Composite multivalued dependency
{A, B} ↠ C if or each combination of A and B, there are … zero, one or more values for C, regardless of values in other columns Example: {Conference, Session} ↠ Attendee
50
Fourth Normal Form Composite the other way around??
A ↠ {B, C} Is NOT equivalent to A ↠ B and A ↠ C
51
Fourth Normal Form Fourth Normal Form not violated
On Monday, you can ask Erland about Design Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design Expert ↠ {Day, Subject} Expert ↠ Day Expert ↠ Subject Fourth Normal Form not violated
52
Fourth Normal Form Expert ↠ {Day, Subject} Expert ↠ Day
On Monday, you can ask Erland about Design Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design Expert ↠ {Day, Subject} Expert ↠ Day Expert ↠ Subject Day Expert Monday Erland Oliver Tuesday Wednesday Hugo Expert Subject Erland Design Tuning Oliver BI Hugo
53
Fifth Normal Form Fourth Normal Form not violated
On Monday, you can ask Erland about Design Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design On Monday, you can ask about design Expert ↠ {Day, Subject} Expert ↠ Day Expert ↠ Subject Fourth Normal Form not violated … but Fifth Normal Form IS violated!
54
Fifth Normal Form Expert ↠ {Day, Subject} Expert ↠ Day
On Monday, you can ask Erland about Design Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design On Monday, you can ask about design Expert ↠ {Day, Subject} Expert ↠ Day Expert ↠ Subject Day Subject Monday Design Tuning BI Tuesday Wednesday Day Expert Monday Erland Oliver Tuesday Wednesday Hugo Expert Subject Erland Design Tuning Oliver BI Hugo
55
Fifth Normal Form Table is in Fifth Normal Form (5NF) if
Table is in Fourth Normal Form No join dependencies, unless implied by a key
56
Fifth Normal Form JOIN Join dependency Day Expert Subject Monday
Erland Design Tuning Oliver BI Tuesday Wednesday Hugo Day Subject Monday Design Tuning BI Tuesday Wednesday Day Expert Monday Erland Oliver Tuesday Wednesday Hugo Expert Subject Erland Design Tuning Oliver BI Hugo
57
Sixth Normal Form Remember Fifth Normal Form?
No join dependencies, unless implied by a key Here’s Sixth Normal Form (6NF):
58
Sixth Normal Form JOIN Session Topic First name Last name AD-205
Indexes Tim Chapman AD-207 Performance Margarita Naumova BI-205 Dashboards Oliver Engels DBA-305 Backups Grant Fritchey PD-203 Negotiating Steve Jones Session Topic AD-205 Indexes AD-207 Performance BI-205 Dashboards DBA-305 Backups PD-203 Negotiating Session First name AD-205 Tim AD-207 Margarita BI-205 Oliver DBA-305 Grant PD-203 Steve Session Last name AD-205 Chapman AD-207 Naumova BI-205 Engels DBA-305 Fritchey PD-203 Jones
59
Sixth Normal Form JOIN Session Topic First name Last name AD-205
Indexes Tim Chapman AD-207 NULL Margarita Naumova BI-205 Dashboards Oliver Engels DBA-305 Backups PD-203 Steve Jones Session Topic AD-205 Indexes AD-207 Performance BI-205 Dashboards DBA-305 Backups PD-203 Negotiating Session First name AD-205 Tim AD-207 Margarita BI-205 Oliver DBA-305 Grant PD-203 Steve Session Last name AD-205 Chapman AD-207 Naumova BI-205 Engels DBA-305 Fritchey PD-203 Jones
60
Sixth Normal Form JOIN Session Topic First name Last name AD-205
Indexes Tim Chapman AD-207 NULL Margarita Naumova BI-205 Dashboards Oliver Engels DBA-305 Backups PD-203 Steve Jones Session Topic AD-205 Indexes BI-205 Dashboards DBA-305 Backups Session First name AD-205 Tim AD-207 Margarita BI-205 Oliver PD-203 Steve Session Last name AD-205 Chapman AD-207 Naumova BI-205 Engels PD-203 Jones Chapman Chapman
61
Optimal Normal Form JOIN Session Topic First name Last name AD-205
Indexes Tim Chapman AD-207 NULL Margarita Naumova BI-205 Dashboards Oliver Engels DBA-305 Backups PD-203 Steve Jones Session Topic AD-205 Indexes BI-205 Dashboards DBA-305 Backups Session First name Last name AD-205 Tim Chapman AD-207 Margarita Naumova BI-205 Oliver Engels PD-203 Steve Jones
62
Optimal Normal Form Optimal Normal Form (ONF)
Based on fact-based modeling methods (e.g. ORM, NIAM) Every elementary fact type becomes a table End result is mostly 6NF, … … except in some situations Composite foreign keys Composite alternate keys No academic foundation (as far as I know)
63
Domain-Key Normal Form
Requirements for Domain-Key Normal Form (DK/NF): Is NOT based on dependencies Based on: Domains Keys Constraints Every constraint must be implied by the keys and domains Implies Fifth Normal Form (and lower) Probably implies Optimal Normal Form Does not imply Sixth Normal Form, nor is it implied by Sixth Normal Form Which values are allowed in a column? All candidate keys Rules for valid data
64
Domain-Key Normal Form
Relevance of Domain-Key Normal Form Domains declared, enforced (no code needed) Keys declared, enforced (no code needed) Other constraints code needed to enforce Code = cost factor: Time to write Time to test and debug Future maintenance
65
Domain-Key Normal Form
Achievability of Domain-Key Normal Form Sometimes impossible “Every presenter delivers at least three sessions” Otherwise often requires extra tables (subtypes) Introduces need for more (and more complex) code for queries Code = cost factor: Time to write Time to test and debug Future maintenance
66
Overview 1NF 2NF 3NF BCNF 4NF 5NF 6NF
67
Overview 1NF 2NF 3NF BCNF 4NF 5NF 6NF EKNF
68
Overview 1NF 2NF 3NF BCNF 4NF 5NF 6NF EKNF ONF DK/NF
69
Click “Sessions”–“Schedule” … “Download”
T H E E N D Download deck: Click “Sessions”–“Schedule” … “Download”
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.