Normalization Beyond Third Normal Form

Slides:



Advertisements
Similar presentations
Shantanu Narang.  Background  Why and What of Normalization  Quick Overview of Lower Normal Forms  Higher Order Normal Forms.
Advertisements

Boyce-Codd NF Takahiko Saito Spring 2005 CS 157A.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Boyce-Codd Normal Form Kelvin Nishikawa SE157a-03 Fall 2006 Kelvin Nishikawa SE157a-03 Fall 2006.
Database Normalization Il-Han Yoo CS 157A Professor: Sin-Min Lee.
© 2002 by Prentice Hall 1 David M. Kroenke Database Processing Eighth Edition Chapter 5 The Relational Model and Normalization.
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
NORMALIZATION N. HARIKA (CSC).
Fundamentals, Design, and Implementation, 9/e. Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 4/2 Copyright.
Component 4: Introduction to Information and Computer Science Unit 6: Databases and SQL Lecture 4 This material was developed by Oregon Health & Science.
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
Module III: The Normal Forms. Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form. The database.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to.
Normalization Copyright © 1999 Patrick McDermott College of Alameda
Brian Alderman | MCT, CEO / Founder of MicroTechPoint Pete Harris | Microsoft Senior Content Publisher.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall, Modified by Dr. Mathis 3-1 David M. Kroenke’s Chapter Three: The Relational.
The Relational Model and Normalization The Relational Model Normalization First Through Fifth Normal Forms Domain/Key Normal Form The Synthesis of Relations.
Normal Forms through BCNF CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Component 4/Unit 6d Topic IV: Design a simple relational database using data modeling and normalization Description and Information Gathering Data Model.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
Normalisation RELATIONAL DATABASES.  Last week we looked at elements of designing a database and the generation of an ERD  As part of the design and.
MS Access. Most A2 projects use MS Access Has sufficient depth to support a significant project. Relational Databases. Fairly easy to develop a good user.
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
Normal Forms 1NF – A table that qualifies as a relation is in 1NF. (Back)(Back) 2NF – A relation is in 2NF if all of its nonkey attributes are dependent.
4TH NORMAL FORM By: Karen McVay.
Understanding Data Storage
Database Design Fundamentals
Functional Dependency and Normalization
Revised: 2 April 2004 Fred Swartz
CSIS 115 Database Design and Applications for Business
Advanced Normalization
Chapter 15 Relational Design Algorithms and Further Dependencies
Normalization Karolina muszyńska
Announcements Read 5.1 – 5.5 for today Read 5.6 – 5.7 for Wednesday
A brief summary of database normalization
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Normal Forms.
Quiz Questions Q.1 An entity set that does not have sufficient attributes to form a primary key is a (A) strong entity set. (B) weak entity set. (C) simple.
Database Design Dr. M.E. Fayad, Professor
Relational Database Design by Dr. S. Sridhar, Ph. D
Advanced Normalization
Advanced Normalization
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
CMPE 226 Database Systems February 21 Class Meeting
11/12/2018.
The Relational Model and Normalization
Now where does THAT estimate come from?
Normalization Referential Integrity
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 4 The Relational Model Pearson Education © 2009.
Chapter 6 Normalization of Database Tables
Module 5: Overview of Normalization
Database Design Done Right!
Normalization By Jason Park Fall 2005 CS157A.
Database Processing: David M. Kroenke’s Chapter Three:
Normalization.
The Relational Model Transparencies
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting.
Decomposition and Higher Forms of Normalization
Copyright © 2018, 2015, 20 Pearson Education, Inc. All Rights Reserved Database Concepts Eighth Edition Chapter # 2 The Relational Model.
Relational Database Design
Sampath Jayarathna Cal Poly Pomona
Sampath Jayarathna Cal Poly Pomona
Schema Refinement and Normal Forms
Normalization By Jason Park Fall 2005 CS157A.
Database Design Dr. M.E. Fayad, Professor
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Presentation transcript:

Normalization Beyond Third Normal Form Hugo Kornelis

Database design Normalization is boring fun Normalization is hard Normalization is not important fun easy very

Hugo Kornelis 2006-2016 Independent database consultant Community addict Speaker, blogger, author, technical editor, Pluralsight author, etc. MVP (SQL Server/Data Platform) 2006-2016 Blog: http://sqlblog.com/blogs/hugo_kornelis Email: hugo@perFact.info Twitter: @Hugo_Kornelis

Thank You To Our Sponsors!

Overview The basics … … and beyond Key concepts Normalization up to Third Normal Form … and beyond All “higher” normal forms EKNF, BCNF, 4NF, 5NF, DK/NF, ONF, 6NF

Overview The basics … … and beyond Key concepts Normalization up to Third Normal Form … and beyond All “higher” normal forms EKNF, BCNF, 4NF, 5NF, DK/NF, ONF, 6NF

Key Concepts Universe of Discourse (UoD) Subset of reality … (or of a virtual reality) … as it is seen by the business If it’s not in the UoD, we don’t care.

in violation of business rules Key Concepts Purpose of normalization Prevent data that is incorrect Normal forms defined at a per-table level impossible inconsistent in violation of business rules

Key Concepts Functional dependency Column A determines column B (A  B), if … for each possible value for A, there is … either one value for B or no value for B but NEVER more than one value Examples: Chair number  Name Chair number  Birthdate Birthdate  Name

Key Concepts Functional dependency - terminology These all mean the same: Name depends on Badge number Badge number determines Name (short form) Badge number  Name “Functional dependency” – sometimes “FD”

Key Concepts Composite functional dependency {A, B}  C if for each combination of A and B, there is … either one value for C or no value for C but NEVER more than one value Example: {Room number, Chair number}  Name

Key Concepts “Cheating” composite functional dependency These are irrelevant Badge number  Name {Badge number, Chair number}  Name Cheater!

Key Concepts Composite the other way around?? A  {B, C} Completely equivalent to A  B and A  C

Key Concepts Candidate Key: Within a table Column (or combination of columns) Determines every other column in the table BadgeNo Room Chair Name 123 A 25 Marge 124 B 3 William 126 24 Julie 127 André 128 C 5 Kathryn

Key Concepts Candidate Key May be more than one One “Primary” key Rest “Alternate” key

First Normal Form Table is in First Normal Form (1NF) if Table has at least one candidate key All columns are atomic No repeating groups No composite values Depends highly on UoD!

First Normal Form No repeating groups Speaker Country Sessions Allan Mitchell United Kingdom AD-205, AD-207, BI-205 Oliver Engels Germany DBA-305, PD-203 Speaker Country Session Allan Mitchell United Kingdom AD-205 AD-207 BI-205 Oliver Engels Germany DBA-305 PD-203

First Normal Form No composite values Speaker Country Session Allan Mitchell United Kingdom AD-205 AD-207 BI-205 Oliver Engels Germany DBA-305 PD-203 Speaker Country Track Session No Allan Mitchell United Kingdom AD 205 207 BI Oliver Engels Germany DBA 305 PD 203

First Normal Form No composite values Speaker Country Session Allan Mitchell United Kingdom AD-205 AD-207 BI-205 Oliver Engels Germany DBA-305 PD-203 First name Last name Country Session Allan Mitchell United Kingdom AD-205 AD-207 BI-205 Oliver Engels Germany DBA-305 PD-203

Second Normal Form Table is in Second Normal Form (2NF) if Table is in First Normal Form Non-key columns depend on the whole keys Only applies for columns that are not part of any key Can only be violated if at least one key is composite

Second Normal Form Non-key columns depend on the whole keys    Session Room Start time Room capacity AD-205 Blue 10:00 125 AD-207 Grand 12:45 228 BI-205 605 16:00 60 DBA-305 PD-203 225 14:30  Session  Room capacity {Room, Start time}  Room capacity Room  Room capacity  Cheater! 

Second Normal Form Non-key columns depend on the whole keys Session Room Start time Room capacity AD-205 Blue 10:00 125 AD-207 Grand 12:45 228 BI-205 605 16:00 60 DBA-305 PD-203 225 14:30 Room  Room capacity Room Room capacity Blue 125 Grand 228 605 60 225

Second Normal Form Non-key columns depend on the whole keys Session Room Start time AD-205 Blue 10:00 AD-207 Grand 12:45 BI-205 605 16:00 DBA-305 PD-203 225 14:30 Room Room capacity Blue 125 Grand 228 605 60 225

Third Normal Form Table is in Third Normal Form (3NF) if Table is in Second Normal Form Non-key columns depend on nothing but the keys Only applies for columns that are not part of any key Violated if non-key column depends on … … one or more other non-key columns … two or more key columns that are part of different keys … one or more non-key columns combined with one or more key columns Not violated if non-key column depends on … … one or more columns that are all part of the same key (because that would already violate 2NF)

Third Normal Form Non-key columns depend on nothing but the keys    Badge number Speaker Session 123 Oliver Engels DBA-305 124 Allan Mitchell BI-205 126 127 128 Lara Rubbelke AD-101  Badge number  Speaker Badge number  Session Speaker  Session Session  Speaker   

Third Normal Form Non-key columns depend on nothing but the keys Badge number Speaker Session 123 Oliver Engels DBA-305 124 Allan Mitchell BI-205 126 127 128 Lara Rubbelke AD-101 Session  Speaker Session Speaker DBA-305 Oliver Engels BI-205 Allan Mitchell AD-101 Lara Rubbelke

Third Normal Form Non-key columns depend on nothing but the keys Badge number Session 123 DBA-305 124 BI-205 126 127 128 AD-101 Session Speaker DBA-305 Oliver Engels BI-205 Allan Mitchell AD-101 Lara Rubbelke

Illustration: Michael J. Swart Summary Table is in Third Normal Form if every non-key column depends on The keys, The whole keys, And nothing but the keys (so help me Codd) Bernstein’s algorithm for synthesis of a Third Normal Form schema Dr. E.J. Codd Illustration: Michael J. Swart

Boyce-Codd Normal Form Remember Third Normal Form? Every non-key column depends on the keys, the whole keys, and nothing but the keys Here’s Boyce-Codd Normal Form (BCNF):

Boyce-Codd Normal Form Key columns depend on the whole keys Cheater! Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 225 14:30  {Track, Session number}  Room {Track, Session number}  Start time {Room, Start time}  Session number {Room, Start time}  Track Track  Room Room  Track    Cheater!  605 

Boyce-Codd Normal Form Key columns depend on the whole keys Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 225 14:30 Track  Room Room  Track Track Room AD Blue BI 605 DBA Grand PD 225

Boyce-Codd Normal Form Key columns depend on the whole keys Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 225 14:30 Track Room AD Blue BI 605 DBA Grand PD 225

Boyce-Codd Normal Form Key columns depend on the whole keys ? Track Session number Start time AD 205 10:00 207 12:45 BI 16:00 DBA 305 PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD 225

Boyce-Codd Normal Form Key columns depend on the whole keys Track Session number Start time AD 205 10:00 207 12:45 BI 16:00 DBA 305 PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD 225

Boyce-Codd Normal Form Key columns depend on the whole keys Track Session number Start time AD 205 10:00 207 12:45 BI 16:00 DBA 305 PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD 225

Boyce-Codd Normal Form Key columns depend on the whole keys Not always possible to achieve Cheater! Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 14:30  {Track, Session number}  Room {Track, Session number}  Start time {Room, Start time}  Session number {Room, Start time}  Track Track  Room Room  Track     

Boyce-Codd Normal Form Key columns depend on the whole keys Not always possible to achieve Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 14:30 Track  Room Room  Track Track Room AD Blue BI 605 DBA Grand PD

Boyce-Codd Normal Form Key columns depend on the whole keys Not always possible to achieve ? Track Session number Start time AD 205 10:00 207 12:45 BI 16:00 DBA 305 PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD

Boyce-Codd Normal Form Key columns depend on the whole keys Not always possible to achieve Track Session number Start time AD 205 10:00 207 12:45 BI 16:00 DBA 305 PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD 14:30

Boyce-Codd Normal Form Key columns depend on the whole keys Not always possible to achieve Alternative form is not safe either Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD 605

Boyce-Codd Normal Form Key columns depend on the whole keys Not always possible to achieve Alternative form is not safe either, unless you add a “weird” foreign key Track Session number Room Start time AD 205 Blue 10:00 207 12:45 BI 605 16:00 DBA 305 Grand PD 203 14:30 Track Room AD Blue BI 605 DBA Grand PD

Elementary Key Normal Form Remember Third Normal Form? Every non-key column depends on the keys, the whole keys, and nothing but the keys Remember Boyce-Codd Normal Form? Every column depends on the keys, the whole keys, and nothing but the keys Here’s Elementary Key Normal Form (EKNF): Every non-elementary key column depends on the keys, the whole keys, and nothing but the keys

Elementary Key Normal Form What is an elementary key? Based on elementary dependencies {A, B}  C is not elementary if C  A or C  B Elementary key is any key that implements at least one elementary dependency EKNF is same as 3NF, except for columns in non-elementary keys Highest normal form that is guaranteed achievable Bernstein’s algorithm for synthesis of a Third Normal Form schema Does not solve BCNF violations

Fourth Normal Form ? Fourth Normal Form not violated On Monday, you can ask Erland about Design ? Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design Fourth Normal Form not violated

Fourth Normal Form ? ! Fourth Normal Form IS violated! On Monday, you can ask Erland about Design ? ! Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design Fourth Normal Form IS violated! Facts are represented multiple times

Fourth Normal Form ! On Monday, you can ask Erland about Design Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design Day Expert Monday Erland Oliver Tuesday Wednesday Hugo Expert Subject Erland Design Tuning Oliver BI Hugo

Fourth Normal Form Table is in Fourth Normal Form (4NF) if Table is in Boyce-Codd Normal Form No multivalued dependencies between subset of columns There will always (by definition) be multivalued dependencies between all columns in the table

Fourth Normal Form Multivalued dependency Column A ”multidetermines” column B (A ↠ B), if … for each possible value for A, there are … zero, one or more values for B, regardless of values in other columns Examples: Session ↠ Attendee Presenter ↠ Attendee

Fourth Normal Form Composite multivalued dependency {A, B} ↠ C if or each combination of A and B, there are … zero, one or more values for C, regardless of values in other columns Example: {Conference, Session} ↠ Attendee

Fourth Normal Form Composite the other way around?? A ↠ {B, C} Is NOT equivalent to A ↠ B and A ↠ C

Fourth Normal Form Fourth Normal Form not violated On Monday, you can ask Erland about Design Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design Expert ↠ {Day, Subject} Expert ↠ Day Expert ↠ Subject Fourth Normal Form not violated

Fourth Normal Form Expert ↠ {Day, Subject} Expert ↠ Day On Monday, you can ask Erland about Design Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design Expert ↠ {Day, Subject} Expert ↠ Day Expert ↠ Subject Day Expert Monday Erland Oliver Tuesday Wednesday Hugo Expert Subject Erland Design Tuning Oliver BI Hugo

Fifth Normal Form Fourth Normal Form not violated On Monday, you can ask Erland about Design Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design On Monday, you can ask about design Expert ↠ {Day, Subject} Expert ↠ Day Expert ↠ Subject Fourth Normal Form not violated … but Fifth Normal Form IS violated!

Fifth Normal Form Expert ↠ {Day, Subject} Expert ↠ Day On Monday, you can ask Erland about Design Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo On Monday, you can ask Erland questions Erland knows about Design On Monday, you can ask about design Expert ↠ {Day, Subject} Expert ↠ Day Expert ↠ Subject Day Subject Monday Design Tuning BI Tuesday Wednesday Day Expert Monday Erland Oliver Tuesday Wednesday Hugo Expert Subject Erland Design Tuning Oliver BI Hugo

Fifth Normal Form Table is in Fifth Normal Form (5NF) if Table is in Fourth Normal Form No join dependencies, unless implied by a key

Fifth Normal Form JOIN Join dependency Day Expert Subject Monday Erland Design Tuning Oliver BI Tuesday Wednesday Hugo Day Subject Monday Design Tuning BI Tuesday Wednesday Day Expert Monday Erland Oliver Tuesday Wednesday Hugo Expert Subject Erland Design Tuning Oliver BI Hugo

Sixth Normal Form Remember Fifth Normal Form? No join dependencies, unless implied by a key Here’s Sixth Normal Form (6NF):

Sixth Normal Form JOIN Session Topic First name Last name AD-205 Indexes Tim Chapman AD-207 Performance Margarita Naumova BI-205 Dashboards Oliver Engels DBA-305 Backups Grant Fritchey PD-203 Negotiating Steve Jones Session Topic AD-205 Indexes AD-207 Performance BI-205 Dashboards DBA-305 Backups PD-203 Negotiating Session First name AD-205 Tim AD-207 Margarita BI-205 Oliver DBA-305 Grant PD-203 Steve Session Last name AD-205 Chapman AD-207 Naumova BI-205 Engels DBA-305 Fritchey PD-203 Jones

Sixth Normal Form JOIN Session Topic First name Last name AD-205 Indexes Tim Chapman AD-207 NULL Margarita Naumova BI-205 Dashboards Oliver Engels DBA-305 Backups PD-203 Steve Jones Session Topic AD-205 Indexes AD-207 Performance BI-205 Dashboards DBA-305 Backups PD-203 Negotiating Session First name AD-205 Tim AD-207 Margarita BI-205 Oliver DBA-305 Grant PD-203 Steve Session Last name AD-205 Chapman AD-207 Naumova BI-205 Engels DBA-305 Fritchey PD-203 Jones

Sixth Normal Form JOIN Session Topic First name Last name AD-205 Indexes Tim Chapman AD-207 NULL Margarita Naumova BI-205 Dashboards Oliver Engels DBA-305 Backups PD-203 Steve Jones Session Topic AD-205 Indexes BI-205 Dashboards DBA-305 Backups Session First name AD-205 Tim AD-207 Margarita BI-205 Oliver PD-203 Steve Session Last name AD-205 Chapman AD-207 Naumova BI-205 Engels PD-203 Jones Chapman Chapman

Optimal Normal Form JOIN Session Topic First name Last name AD-205 Indexes Tim Chapman AD-207 NULL Margarita Naumova BI-205 Dashboards Oliver Engels DBA-305 Backups PD-203 Steve Jones Session Topic AD-205 Indexes BI-205 Dashboards DBA-305 Backups Session First name Last name AD-205 Tim Chapman AD-207 Margarita Naumova BI-205 Oliver Engels PD-203 Steve Jones

Optimal Normal Form Optimal Normal Form (ONF) Based on fact-based modeling methods (e.g. ORM, NIAM) Every elementary fact type becomes a table End result is mostly 6NF, … … except in some situations Composite foreign keys Composite alternate keys No academic foundation (as far as I know)

Domain-Key Normal Form Requirements for Domain-Key Normal Form (DK/NF): Is NOT based on dependencies Based on: Domains Keys Constraints Every constraint must be implied by the keys and domains Implies Fifth Normal Form (and lower) Probably implies Optimal Normal Form Does not imply Sixth Normal Form, nor is it implied by Sixth Normal Form Which values are allowed in a column? All candidate keys Rules for valid data

Domain-Key Normal Form Relevance of Domain-Key Normal Form Domains  declared, enforced (no code needed) Keys  declared, enforced (no code needed) Other constraints  code needed to enforce Code = cost factor: Time to write Time to test and debug Future maintenance

Domain-Key Normal Form Achievability of Domain-Key Normal Form Sometimes impossible “Every presenter delivers at least three sessions” Otherwise often requires extra tables (subtypes) Introduces need for more (and more complex) code for queries Code = cost factor: Time to write Time to test and debug Future maintenance

Overview 1NF 2NF 3NF BCNF 4NF 5NF 6NF

Overview 1NF 2NF 3NF BCNF 4NF 5NF 6NF EKNF

Overview 1NF 2NF 3NF BCNF 4NF 5NF 6NF EKNF ONF DK/NF

Click “Sessions”–“Schedule” … “Download” T H E E N D Email: hugo@perFact.info Download deck: http://www.sqlsaturday.com/637 Click “Sessions”–“Schedule” … “Download”