N ORMALIZATION Joe Meehean 1. R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing.

Slides:



Advertisements
Similar presentations
Functional dependencies 1. 2 Outline motivation: update anomalies cause: not expressed constraints on data (FDs) functional dependencies (FDs) definitions.
Advertisements

Shantanu Narang.  Background  Why and What of Normalization  Quick Overview of Lower Normal Forms  Higher Order Normal Forms.
Normalisation to 3NF Database Systems Lecture 11 Natasha Alechina.
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala N ATIONAL I NSTITUTE OF T ECHNOLOGY A GARTALA Aug-Dec,2010 Normalization 2 CSE-503 :: D ATABASE.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =
Ch 10, Functional Dependencies and Normal forms
Chapter 8 Normalization. © 2001 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Outline Modification anomalies Functional dependencies.
Fundamentals, Design, and Implementation, 9/e Chapter 4 The Relational Model and Normalization.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Boyce-Codd Normal Form Kelvin Nishikawa SE157a-03 Fall 2006 Kelvin Nishikawa SE157a-03 Fall 2006.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 7 Normalization of Relational Tables.
Normalization of Database Tables
7-1 Normalization - Outline  Modification anomalies  Functional dependencies  Major normal forms  Practical concerns.
Chapter 5 Normalization of Database Tables
Databases 6: Normalization
NORMALIZATION N. HARIKA (CSC).
Introduction to Schema Refinement
Chapter 3 The Relational Model and Normalization
Normalization Rules for Database Tables Northern Arizona University College of Business Administration.
Normalization B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Lecture 12 Inst: Haya Sammaneh
Copyright, Harris Corporation & Ophir Frieder, Normal Forms “Why be normal?” - Author unknown Normal.
Fundamentals, Design, and Implementation, 9/e. Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 4/2 Copyright.
IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization.
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Chapter 7 Normalization. Outline Modification anomalies Functional dependencies Major normal forms Relationship independence Practical concerns.
1 Database Design and Development: A Visual Approach © 2006 Prentice Hall Chapter 4 DATABASE DESIGN AND DEVELOPMENT: A VISUAL APPROACH Chapter 4 Normalization.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 5 Normalization of Database.
Chapter 7 Normalization. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Modification anomalies Functional dependencies.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Normal Forms through BCNF CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
CSE314 Database Systems Basics of Functional Dependencies and Normalization for Relational Databases Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003.
COMP1212 COMP1212 Anomalies and Dependencies Dr. Mabruk Ali.
Lecture No 14 Functional Dependencies & Normalization ( III ) Mar 04 th 2011 Database Systems.
Database Design – Lecture 8
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#16: Schema Refinement & Normalization.
1 Functional Dependencies and Normalization Chapter 15.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
Relational Model & Normalization Relational terminology Anomalies and the need for normalization Normal forms Relation synthesis De-normalization.
9/23/2012ISC329 Isabelle Bichindaritz1 Normalization.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Ch 7: Normalization-Part 1
11/10/2009GAK1 Normalization. 11/10/2009GAK2 Learning Objectives Definition of normalization and its purpose in database design Types of normal forms.
Normalisation 1NF to 3NF Ashima Wadhwa. In This Lecture Normalisation to 3NF Data redundancy Functional dependencies Normal forms First, Second, and Third.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
Logical Database Design and Relational Data Model Muhammad Nasir
INLS 623 – D ATABASE N ORMALIZATION Instructor: Jason Carter.
7 Copyright © 2006, Oracle. All rights reserved. Normalization of Relational Tables (Part II)
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
4TH NORMAL FORM By: Karen McVay.
CS422 Principles of Database Systems Normalization
Normalization Karolina muszyńska
Payroll Management System
© 2011 Pearson Education, Inc. Publishing as Prentice Hall
Normalization.
Normalisation to 3NF.
CS 405G: Introduction to Database Systems
Chapter 4 The Relational Model and Normalization
Presentation transcript:

N ORMALIZATION Joe Meehean 1

R EDUNDANCIES Repeated data in database Wastes space Can cause modification anomalies unexpected side effect when changing data make building software on top of DB difficult Normalization process of removing redundancies 2

M ODIFICATION A NOMALIES Insert anomaly extra data must be known to insert a row into a table Update anomaly must change multiple rows to modify a single fact Deletion anomaly deleting a row causes other data to be deleted deletes more data than is necessary or desired 3

B AD C OLLEGE D ATABASE All data in 1 table 4 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring C1DB

B AD C OLLEGE D ATABASE Insert anomaly adding Rush Daniels as a student requires knowing which offerings Rush is enrolled in cannot add Rush as a student until he enrolls 5 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring C1DB

B AD C OLLEGE D ATABASE Update anomaly if Emily changes her name to Emma need to change multiple rows 6 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring C1DB

B AD C OLLEGE D ATABASE Delete anomaly if Roger drops out of college and we delete him we also delete that there is an offering of DB in the spring 7 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring C1DB

F UNCTIONAL D EPENDENCIES (FD S ) Constraint between 2 or more columns Represented by → X determines Y (X →Y) if there exists at most 1 value of Y for each value of X like a mathematical function f(x) = y left hand side (or LHS) is called the determinant e.g., StdNo determines Student first name StdNo → First Name 8

O RGANIZING FD S Make a list can condense list by listing all dependent columns for a given determinant e.g., StdNo →First Name, Last Name Determinants should be minimal least # of columns required to determine values of other columns e.g., StdNo,First Name → Last Name 9

B AD C OLLEGE D ATABASE StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course Descr. Std No, Offer No → Grade 10 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Fall2011C-C1DB S1PhilParkO2Fall2011B+C2OS S2BlemEmilyO3Spring2012A+C3PL S2BlemEmilyO2Fall2011B+C2OS S3RogerCookO4Spring C1DB

I DENTIFYING FD S From business narrative Look for words like unique e.g., “Each student has a unique student number, a first name, and a last name.” Look for 1-M relationships child (M-side) is the determinant (LHS) e.g., “Faculty teach many offerings.” e.g., Offer No → Faculty Id 11

I DENTIFYING FD S From relational tables FDs where determinant (LHS) is not the PK or a candidate key recall, a candidate key is column(s) that unique identify a row e.g., Zip → State Combined PKs does 1 column determine values of some other columns? e.g., StdNo → First Name, Last Name 12

Q UESTIONS ? 13

N ORMAL F ORMS Normalization remove redundancies in tables removes modification anomalies makes data easier to modify Normal form rules about functional dependencies (FDs) allowed each successive normal form removes FDs 14

N ORMAL F ORMS 15 1NF 2NF 3NF/BCNF

1 ST N ORMAL F ORM All relational tables are already in 1NF by definition 16

2 ND N ORMAL F ORM Key columns columns that are part (or all of) a candidate key recall a candidate key is a key that uniquely identifies a row Non-key columns columns that are not part of a candidate key 17

2 ND N ORMAL F ORM A table is in 2NF if each non-key column depends on all candidate keys NOT on any subset of any candidate key check functional dependencies (FDs) A 2NF violation a FD where part of a key determines a non-key column 18

2 ND N ORMAL F ORM 19 2NF Violations StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course Descr. StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB

3 RD N ORMAL F ORM A table is in 3NF if it is in 2NF AND each non-key column depends only on candidate keys NOT other non-key columns e.g., CourseNr → Course Desc. 3NF violation a non-key column on the right-hand side (RHS) AND anything other than a candidate key on LHS 20

3 RD N ORMAL F ORM 3NF prohibits transitive dependencies Transitive dependencies if A → B & B → C, then A → C e.g., Offer No → Course No & Course No → Course Desc. then Offer No → Course Desc. 21

C OMBINED 2NF & 3NF A table is in 3NF if each non-key column depends on all candidate keys whole candidate keys and nothing but candidate keys 22

3 RD N ORMAL F ORM 23 2NF Violations StdNo → First Name, Last Name OfferNo → Term, Year, Course No, Course Descr. 3NF Violations CourseNo → Course Descr. OfferNo → Course Descr. StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB

B OYCE -C ODD N ORMAL F ORM (BCNF) Revised, simpler version of 3NF Covers additional special cases A table is in BCNF if every determinant is a candidate key Violations are easy to detect determinant (LHS) is not a candidate key e.g., StdNo → Last Name 24

B OYCE -C ODD N ORMAL F ORM (BCNF) Excludes 2 redundancies that 3NF does not 1. part of a key determines part of a key 2. a non-key determines part of a key 25

B OYCE -C ODD N ORMAL F ORM (BCNF) 26 StdNoOfferNo EnrGrade BCNF Violations → StdNo

S IMPLE S YNTHESIS (BCNF) Convert tables into BCNF 1. Eliminate extraneous columns from LHS of FDs 2. Remove derived (transitive) FDs 3. Arrange FDs into groups by determinant 4. For each FD group make table with determinant as primary key 5. Merge tables where one table include all columns of other table choose PK of one of the tables to be PK of new table 27

B AD C OLLEGE D ATABASE (1) StdNo → First Name StdNo → Last Name OfferNo → Term OfferNo → Year Offer No → Course No Offer No → Course Descr. Std No, Offer No → Grade Course No → Course Descr. 28 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB

B AD C OLLEGE D ATABASE (2) StdNo → First Name StdNo → Last Name OfferNo → Term OfferNo → Year Offer No → Course No Offer No → Course Descr. Std No, Offer No → Grade Course No → Course Descr. 29 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB

B AD C OLLEGE D ATABASE (3) StdNo → First Name, Last Name OfferNo → Term, Year, Course No Std No, Offer No → Grade Course No → Course Descr. 30 StdNoFirst Name Last Name Offer No TermYearGradeCourse No Course Descr. S1PhilParkO1Spring2012--C1PL S1PhilParkO2Fall2011B+C2DB S2BlemEmilyO3Spring2012--C3OS S2BlemEmilyO2Fall2011B+C2DB

B AD C OLLEGE D ATABASE (4) 31 StdNoFirst NameLast Name S1EmilyBlem S2PhilPark Offer NoTermYearCourse No O1Spring2012C1 O2Fall2011C2 O3Spring2012C3 StdNoOfferNoGrade S1O1-- S1O2B+ S2O3-- S202B+ Course NoCourse Descr. C1PL C2DB C3OS

B AD C OLLEGE D ATABASE (5) 32 StdNoFirst NameLast Name S1EmilyBlem S2PhilPark Offer NoTermYearCourse No O1Spring2012C1 O2Fall2011C2 O3Spring2012C3 StdNoOfferNoGrade S1O1-- S1O2B+ S2O3-- S202B+ Course NoCourse Descr. C1PL C2DB C3OS

I MPORTANCE OF N ORMAL F ORM V IOLATIONS We have the BCNF synthesis process we can just make BCNF tables why do we care about detecting NF violations? DBA has 2 jobs make new databases maintain old ones Making new DBs requires using BCNF synthesis process Maintaining old DBs requires detecting NF violations perhaps made by other employees detecting violations narrows scope of DB redesign 33

Q UESTIONS ? 34

4 TH N ORMAL F ORM (4NF) M-way relationships associative entity types (weak entities) multiple associations primary key made of FKs from 3 or more tables often represent important documents glue multiple things together e.g., invoice can sometimes contain redundancies 35

4 TH N ORMAL F ORM (4NF) 36 Student StdNo Name Offering OfferNo Location Textbook TextNo TextTitle Enroll

4 TH N ORMAL F ORM (4NF) 37 StdNoOfferNoTextNo S1O1T1 S1O2T2 S1O1T2 S1O2T3 Enroll Table

M ULTIVALUED D EPENDENCIES (MVD S ) Given table R with columns X,Y, and Z X →→ Y each X maps to a set of Ys (between 1 and M) X →→ Z each X maps to a set of Zs (between 1 and M) Y & Z are independent knowing Y doesn’t tell you anything about Z and vice-versa Y →→ Z & Y → Z Z →→ Y & Z → Y also Y,V →→ Z, unless V →→ Z Every FD is an MVD not every MVD is an FD 38

T RIVIAL MVD S MVD X →→ Y is trivial if Y is a subset of X OR X and Y are the only columns in the table OR X → Y and X → Z e.g., has-job table E# →→ P# e.g. offering table C#, S# →→ #S 39 Employee#Position# Course Number Section #Faculty ID

M ULTIVALUED D EPENDENCES (MVD S ) non-trivial MVDs manifest as redundancies in tables there exist rows where X and Y are the same but Z is different e.g., enroll table O# →→ S# O# →→ T# S# independent of T# if Emily drops 242 it doesn’t change the text books 40 OfferNoStudentNoTextNo CS242APhil CS242AEmily CS242ADrozdek CS242AWeiss

M ULTIVALUED D EPENDENCES (MVD S ) non-trivial MVDs manifest as redundancies in tables there exist rows where X and Y are the same but Z is different e.g., enroll table O# →→ S# O# →→ T# S# independent of T# if Emily drops 242 it doesn’t change the text books 41 OfferNoStudentNoTextNo CS242APhilWeis CS242AEmilyDrozdek CS242APhilDrozdek CS242AEmilyWeiss

4 TH N ORMAL F ORM (4NF) 4 th normal form table in BCNF AND all MVDs are trivial Detecting a violation are there any MVDs? are those MVDs non-trivial? 42

4 TH N ORMAL F ORM (4NF) Resolving violations X →→ Y X →→ Z 43 XYZ X1Y1Z1 X1Y2Z2 X1Y2Z1 X1Y1Z2 XY X1Y1 X1Y2 XZ X1Z1 X1Z2

M ORE E XAMPLES StudentOfferingGrade PhilCS242AA PhilCS370AB EmilyCS242AB EmilyCS370AA 44 S →→ O & S →→ G ? O →→ G & O →→ S ? G →→ S & G →→ O ?

M ORE E XAMPLES StudentOfferingGrade PhilCS242AA PhilCS370AB EmilyCS242AB EmilyCS370AA 45 Offering and Grade not independent Grade and Student not independent Student and Offering not indepedent S →→ O & S →→ G ? O →→ G & O →→ S ? G →→ S & G →→ O ?

M ORE E XAMPLES B →→ E & B →→ C Is this a trivial MVD? 46 Bank BranchEmployeeCustomer B3AnnTed B3TerryAlfred B3AnnAlfred B3TerryTed

M ORE E XAMPLES B →→ E & B →→ C Is this a trivial MVD? E is not a subset of B & C is not a subset of B B and E are not the only columns in the table B → E & B → C NO!!! 47 Bank BranchEmployeeCustomer B3AnnTed B3TerryAlfred B3AnnAlfred B3TerryTed

M ORE E XAMPLES 48 Bank BranchEmployeeCustomer B3AnnTed B3TerryAlfred B3AnnAlfred B3TerryTed Bank BranchEmployee B3Ann B3Terry Bank BranchCustomer B3Ted B3Alfred

Q UESTIONS ? 49

50 Part#PQtyPDesc P125mm bolt P2410mm nut P325mm wrench P448mm washer PQty →→ PDesc & PQty →→ Part# ?

51 Loc #ItemManagers L1XBox GBCindy L1Garmin GPSAaron L1XBox GBAaron L1Garmin GPSCindy

E XTRA 4NF S LIDES 52

4 TH N ORMAL F ORM (4NF) Relationship independence 2 relationships are independent if one cannot be derived from the other knowing one relationship tells you nothing about the other 53

4 TH N ORMAL F ORM (4NF) 54 StdNoOfferNoTextNo S1O1T1 S1O2T2 S1O1T2 S1O2T3 Enroll Table 3 relationships StdNo -- OfferNo StdNo -- TextNo OfferNo -- TextNo

4 TH N ORMAL F ORM (4NF) StdNo -- OfferNo cannot be derived from other 2 StdNo -- TextNo & TextNo -- OfferNo same textbook can be use for 2 offerings OfferNo -- TextNo cannot be derived from other 2 OfferNo -- StdNo & StdNo -- TextNo students use many text books, not all related to this offering StdNo -- TextNo can be derived StdNo -- OfferNo & OfferNo -- TextNo offering number gives the set of texts a student needs 55

4 TH N ORMAL F ORM (4NF) Multivalued Dependencies (MVDs) each X can map to a set of Ys and a set of Zs generalization of functional dependencies each X maps to one Y each X maps to one Z represented by X→→Y|Z every FD is an MVD known as a trivial MVD not every MVD is an FD 56

4 TH N ORMAL F ORM (4NF) M-way tables sometimes introduce MVDs X →→Y X→→Z X→→Y|Z Y and Z are independent relationship X--Y is independent of relationship X--Z Not all M-way tables produce MVDs 57

4 TH N ORMAL F ORM (4NF) MVD Table Redundancies assume X1 maps to Y1 & Y2 and X1 maps to Z1 & Z2 58 XYZ X1Y1 X1Y2 X1Z1 X1Z2

4 TH N ORMAL F ORM (4NF) Need to fill in the rest of the table 59 XYZ X1Y1Z1 X1Y2Z2 X1Y2Z1 X1Y1Z2

4 TH N ORMAL F ORM (4NF) Rows below the line exist because relationship B--C can be derived from relationships A--B & A--C Rows below line are redundant 60 XYZ X1Y1Z1 X1Y2Z2 X1Y2Z1 X1Y1Z2

4 TH N ORMAL F ORM (4NF) 61 OfferNoStdNoTextNo O1S1T1 O1S2T2 O1S2T1 O1S1T2 Enroll Table OfferNo→→StdNo|TextNo offerings map to many students offerings can have many text books Rows below the line are redundant

4 TH N ORMAL F ORM (4NF) 4NF definition tables cannot contain any non-trivial MVDs Resolving 4NF violations for each table with a non-trivial MVD split 3 column table into two 2 column tables A,B,C goes to A,B & A,C 62 StdNoOfferNo S1O1 S1O2 OfferNoTextNo O1T1 O1T2 O2T1 O2T3