1 Week 4: Normalisation: Redundant data becomes inconsistent data; therefore … “The key, the whole key, and nothing but the key,so help me, Codd”
2 BoyGirl Database, Version 0 NOTE: Not a good design! Because one girl can have many boys ….. we are storing redundant mobile data about Bonnie. “Redundant data becomes inconsistent data”
3 A Better BoyGirl Database Eliminated redundant data about Bonnie Two tables with a One-to-Many relationship … … linked by a Foreign Key Foreign key Primary key
4 The Menace of Redundant Data Redundant data becomes inconsistent data Insert, modify, and delete more data than desired Strive for one fact in one place More technically, strive for 3NF or BCNF (more later)
5 Big University Table – What’s redundant here? StdSSNStdClassOfferNoOffYearGradeCourseNoCrsDesc S1JUNO C1DB S1JUNO C2VB S2JUNO C3OO S2JUNO C2VB StdSSN OfferNo CourseNo StdSSN, OfferNo StdCity, StdClass OffTerm, OffYear, CourseNo CrsDesc EnrGrade Forgotten what these American University terms mean???? See next slide … Solution is to split the single table into two or more Tables. Like we did with BoyGirl.
6 Big University Table – Forgotten these American terms ???? StdSSNStdClassOfferNoOffYearGradeCourseNoCrsDesc StdSSN – student number (student social security number?) StdClass – Freshman, Sophomore, Junior, Senior OfferNo – e.g , Autumn 2007 Grade – A student’s grade point average a the start of semester CourseNo – e.g or … Americans say “course” for “subject” CrsDec – course description Primary Key Attributes OffYear CourseNo Offering CourseNo CrsDesc Course OfferNo
7 X Y “X (functionally) determines Y” For each X value, there is at most one Y value “Normalisation” is the process of splitting tables to remove redundancies Functional Dependencies (FDs) x y f (x) = 2x
8 FD’s in Data Prove non-existence (but not existence) by looking at data Two rows that have the same X value but a different Y value Understand business rules (or common sense) StdSSNStdClassOfferNoOffYearGradeCourseNoCrsDesc S1JUNO C1DB S1JUNO C2VB S2JUNO C3OO S2JUNO C2VB
9 Toward First Normal Form (1NF) Normalisation step by step StdSSNStdClassOfferNoOffYearGradeCourseNoCrsDesc S1JUNO C1DB S1JUNO C2VB S2JUNO C3OO S2JUNO C2VB Step 1: Identification of superkeys. According to the previous FD diagram, we know that StdSSN, OfferNo and CourseNo form a “superkey”
10 StdSS N StdClas s OfferN o OffYea r Grad e CourseN o CrsDes c S1JUNO C1DB S1JUNO C2VB S2JUNO C3OO S2JUNO C2VB Step 2: Determining primary key (minimal superkey). If we analyze our superkey we can conclude that CourseNo is not contributing to the uniqueness of our superkey, therefore we can take it out of the key. First Normal Form (1NF)
11 Second Normal Form (2NF) Every non-key column depends on a whole key, not part of a key Violations Part of key non-key Violations only for combined keys “combined” “composite”
12 2NF Example (problem) Violations of 2NF form in the 1NF big university database table StdSSN StdCity, StdClass OfferNo OffTerm, OffYear, CourseNo, CrsDesc StdSSNStdClassOfferNoOffYearGradeCourseNoCrsDesc S1JUNO C1DB S1JUNO C2VB S2JUNO C3OO S2JUNO C2VB
13 2NF Example (solution) Splitting the table UnivTable0 (StdSSN*, OfferNo*, EnrGrade) UnivTable1 (StdSSN, StdCity, StdClass) UnivTable2 (OfferNo, OffTerm, OffYear, CourseNo, CrsDesc) Where … Underlining means “primary key” Asterisk means foreign key
14 To get to Third Normal Form (3NF) We need to ensure that … Every non-key column depends only on a key not on non-key columns Violations: Non-key Non-key OfferNo CourseNo, CourseNo CrsDesc then OfferNo CrsDesc
15 3NF Example One violation in UnivTable2 CourseNo CrsDesc Splitting the table UnivTable2-1 (OfferNo, OffTerm, OffYear, CourseNo*) UnivTable2-2 (CourseNo, CrsDesc)
16 BCNF A special case not covered by 3NF Where two things can be used as substitute primary keys for each other E.g. staff number, tax file number, address
17 BCNF Example StdSSNOfferNoEnrGrade UnivTable4 (Mannino’s example page ) Convert from 3NF to BCNF by placing the redundant keys in a table by themselves and only using one of them in other tables: UnivTable4-1 (StdSSN*, OfferNo, EnrGrade) UnivTable4-2 (StdSSN, )
18 Role of Normalisation Normalisation and drawing ERDS are complimentary ways of designing databases. Strive to reach at least 3NF, hopefully BCNF. There are even higher normal forms, 4NF, 5NF, etc, but we don’t talk about them in this course. They are almost never an issue in real work databases. May reverse engineer an ERD
19 Today’s Lab Exercise (this slide not part of today’s lecture handout) Familiarisation with next week’s lab exam database. Download the lab exam database into your PostgreSQL account. See URL on page 2 of handout. Attempt the sample questions on page 5 of handout. Answers on page 6.