Relations, Keys, & Normalization DBS201
Equivalent Terms: Relational Model Table-Oriented DBMS Conventional File Systems Conceptionally Repreesents Table File Row Record Column Field Column Type Data Type Column Value Field Value
What is a Relation? Rows contain data about an entity Columns contain data about attributes of the entity Cells of the table hold a single value All entries in a column are of the same kind Each column has a unique name The order of the rows and columns is unimportant No two rows may be identical
Types of Keys? A key is one or more columns of a relation that is used to identify a row A key can be unique or nonunique In Employee relation Emp_No vs Department Composite Key Primary Key Candidate Key Surrogate Key Foreign Key
Keys A key that contains two or more attributes Is a composite key Keys that uniquely identify each row in a relation Are candidate keys The candidate key that is chosen as the key that will actually be used by the DBMS to uniquely identify each row in a relation Is a primary key
Keys An attribute that is a key of one or more relations other than the one in which it appears Is a foreign key Foreign keys are one or more fields in a dependent file that reference the primary key in a parent file
Surrogate Keys A column with a unique, DBMS assigned identifier that has been added to a table to be the primary key The unique values of the surrogate key are assigned by the DBMS each time a row is added and the values never change PROPERTY(Street,City,Prov,Pcode,OwnerID) PROPERTY(PropertyID,Street,City,Prov,Pcode,OwnerID) Surrogate keys are short, numeric and never change Ideal as a primary key
Primary Key? Example of Hockey Awards Award PlayerName Pnumber Position Team Year Best Defense Joe Wall 17 Left Defense Toronto 1999 Sy Stopp 7 Right Defense Detroit 2000 Pete Puck 22 Left Defense Montreal 2001 Joe Wall 17 Left Wing Toronto 2002 MostValuable Sam Scores 18 Center Chicago 1999 Wayne Gret 99 Center New York 2000 Joe Wall 17 Left Wing Toronto 2002
Primary Key? Fill in all attributes Example of Hockey Awards Award PlayerName Pnumber Position Team Year Best Defense Joe Wall 17 Left Defense Toronto 1999 Best Defense Sy Stopp 7 Right Defense Detroit 2000 Best Defense Pete Puck 22 Left Defense Montreal 2001 Best Defense Joe Wall 17 Left Wing Toronto 2002 MostValuable Sam Scores 18 Center Chicago 1999 MostValuable Wayne Gret 99 Center New York 2000 MostValuable Joe Wall 17 Left Wing Toronto 2002 2. Step 2: Look for any column that has no value that is used in more than one row. We are looking for a column for which it’s value is UNIQUE. We first check to see if we can have a single attribute primary key. No one column meets this criterion.
Step 3: Look for any pairs of columns which when concatenated produce a unique value. Team + Year………… Position + Team…….. Position + Year……… PNumber + Position…. PNumber + Team……. PNumber + Year…….. PlayerName + PNumber … PlayerName + Position…… PlayerName + Team……… PlayerName + Year………. Award + PlayerName…….. Award + PNumber………... Award + Position…………. Award + Team……………. Award + Year……………..
Concatenated Primary Key So Award + Year will be selected as the PK. and the relation will be: (Award, Year, PlayerName, PNumber, Position, Team)
Normalization Review
Normal Forms 1NF 2NF 3NF Why is this done?
Two ways to get to 1NF – how did we do it last week? CLASSLIST SubjectCode Section InstNo InstName SubjectName StudentNo StudentName DBS201 A 122 Russ Pangborn Intro to DB 111111111 222222222 Terry Adams Jack Chan B 323 Bill Gates 121212121 323233232 Frank Brown Mary Wong RPG544 RPGIV 444444444 143211222 Wendy Clark Peter Lind Write CLASSLIST in UNF CLASSLIST [ SubjectCode, SectionCode, InstructorNo, InstructorName, SubjectName, {StudentNumber, StudentName} ] A relation is in 1st normal form when the primary key determines a single value of each attribute for all attributes in the relation (i.e. the relation contains no repeating groups) Two ways to get to 1NF – how did we do it last week?
CLASSLIST SubjectCode Section InstNo InstName SubjectName StudentNo StudentName DBS201 A 122 Russ Pangborn Intro to DB 111111111 Terry Adams 222222222 Jack Chan B 323 Bill Gates 121212121 Frank Brown 323233232 Mary Wong RPG544 RPGIV 444444444 Wendy Clark 143211222 Peter Lind Add to key of unnormalized relation to insure primary key identifies 1 and only 1 value of each attribute in the relation CLASSLIST [ SubjectCode, SectionCode, InstructorNo, InstructorName, SubjectName, StudentNumber, StudentName ] CLASSLIST [ SubjectCode, SectionCode, StudentNumber, InstructorNo, InstructorName, SubjectName, StudentName ]
Method 2 Restate the original un-normalized relation without the repeating group CLASSLIST [ SubjectCode, SectionCode, InstructorNo, InstructorName, SubjectName ] Create a new relation consisting of key of original relation and attributes within repeating group and add to key to ensure uniqueness CLASSLISTSTUDENT [ SubjectCodeFK1, SectionCodeFK1, StudentNumber, StudentName ]
2nd Normal Form A 1NF relation is in 2NF when the entire primary key is needed to determine the value of each non-key attribute (i.e. relation has no partial dependencies – attributes whose values can be determined by knowing only part of the key)
1st Normal Form -> 2nd Normal Form 1NF Relations: CLASSLIST [ SubjectCode, SectionCode, InstructorNo, InstructorName, SubjectName ] contains the partial dependency SubjectCode -> SubjectName CLASSLISTSTUDENT [ SubjectCode, SectionCode, StudentNumber, StudentName ] contains the partial dependency StudentNumber-> StudentName
2NF Create new relation(s) consisting of part of the primary key and all attributes whose values are determined by this part of the primary key: SUBJECT [SubjectCode, SubjectName ] STUDENT [StudentNumber, StudentName ] Restate original relation(s) without partially dependent attributes: CLASSLISTSTUDENT [ SubjectCodeFK1, SectionCodeFK1, StudentNumberFK2 ] CLASSLIST [ SubjectCode, SectionCode, InstructorNo, InstructorName ]
3Rd Normal Form A 2NF relation is in 3NF when the primary key and nothing but the primary key can be used to determine the value of each non-key attribute (i.e. relation has no transitive dependencies – attributes whose values can be determined by knowing something other than the key)
2NF -> 3NF 2NF Relations: CLASSLISTSTUDENT [ SubjectCodeFK1, SectionCodeFK1, StudentNumberFK2 ] , CLASSLIST [ SubjectCodeFK, SectionCode, InstructorNo, InstructorName ] , SUBJECT [SubjectCode, SubjectName ] STUDENT [StudentNumber, StudentName ] . In CLASSLIST the Instructor Name is determined by InstructorNo so create the new relation: INSTRUCTOR [InstructorNo, InstructorName ] Remove the transitive dependency CLASSLIST [ SubjectCodeFK, SectionCode, InstructorNoFK ]
Resulting 3NF Relations for ClassList Userview Set of 3NF Relations for the Class List Userview: CLASSLIST [ SubjectCode, SectionCode, InstructorNo ] CLASSLISTSTUDENT [ SubjectCode, SectionCode, StudentNumber ] SUBJECT [SubjectCode, SubjectName ] STUDENT [StudentNumber, StudentName ] INSTRUCTOR [InstructorNo, InstructorName ] 1 unnormalized userview will always result in 1 or more relations in 1NF Each 1NF relation will result in 1 or more 2NF relations Each 2NF relation will result in 1 or more 3NF relations You can never lose (ie not include) an attribute – it must always be found in one of the relations at each step You can never lose a relation
Normalize Remaining User views Normalization process is then applied to each remaining user view (eg grade sheet, timetable request, …) A set of 3NF relations is produced for each user view Then 3NF relations from each user view are then integrated to form one complete set of relations for the application
similar attributes – Subject Code, Section, … GradeSheet SubjectCode Section Subject Name InstName Grade Sem Year StudentNo StudentName DBS201 A Database Design Russ Pangborn S 2016 111111111 Terry Adams RPG544 Advanced RPG Justin Trudeau B+ SYS333 B Systems Analysis Donald Trump C+ F DBT544 DB2 Steve Harper MAP525 Mobile Devices W 2017 MCL544 Assembler Cito Gaston similar attributes – Subject Code, Section, … Different attributes – Grade, Semester, Year Missing attribute – Instructor Number Taking this to 3NF will produce similar entities like STUDENT, INSTRUCTOR and different entities like GRADES After the user views are in 3NF a merge would be done
Writing a relation from a verbal or written description Write the DBDL for the following description: Each dentist’s office has a unique identifier for insurance companies. There is a mailing address for the office as well as the name of the head dentist. There are many patients and each patient has a unique identifier number. 1. List Attributes OfficeNo, MailAddress, HeadDentist,PatientNo, PatientName
2. Select Primary Key (unique identifier for each row) Each dentist’s office has a unique identifier for insurance companies. There is a mailing address for the office as well as the name of the head dentist. There are many patients and each patient has a unique identifier number. 2. Select Primary Key (unique identifier for each row) OfficeNo, MailAddress, HeadDentist,PatientNo, PatientName 3. Show mulit-valued dependencies OfficeNo,MailAddress,HeadDentist,(PatientNo, PatientName) Give the table a name DentistsOffice [OfficeNo, MailAddress, HeadDentist, (PatientNo, PatientName) ] We call this 0NF or UNF (Unnormalized Form) because there is a multi-valued dependency
Change from UNF to 1NF DentistsOffice [OfficeNo, MailAddress, HeadDentist, (PatientNo, PatientName) ] Select the Primary Key for the multi-valued dependency. Create a two-part primary key by concatenating the original PK with the PK of the multi-valued dependency DentistsOffice [OfficeNo, PatientNo MailAddress, HeadDentist, PatientName ]
Change from 1NF to 2NF DentistsOffice [OfficeNo, PatientNo MailAddress, HeadDentist, PatientName ] Look for partial dependencies MailAddress is dependent on OfficeNo PatientName is dependent on PatientNo OfficePatient[OfficeNo,PatientNo) DentistsOffice[ OfficeNo, MailAddress,HeadDentist] Patient[PatientNo,PatientName] Already in 3NF
[Purchase#, date, (item#, quantity, unit_ price, tax code)] 1NF Purchases at Shoppers Drug Mart-1111 Young Street Toronto are identified by a unique purchase # and a date on the bill. There can be several items and the purchase must record the item #, the quantity, the unit price, a tax code for each item, and the total price. UNF [Purchase#, date, (item#, quantity, unit_ price, tax code)] 1NF Item# is the Primary Key of the multi-valued dependency. [ Purchase#, item#, date, quantity, unit_ price, tax code]
PurchaseItem[ Purchase#, item#, quantity] Purchase[ Purchase#, date ] Purchases at Shoppers Drug Mart-1111 Young Street Toronto are identified by a unique purchase # and a date on the bill. There can be several items and the purchase must record the item #, the quantity, the unit price, a tax code for each item, and the total price. 2NF PurchaseItem[ Purchase#, item#, quantity] Purchase[ Purchase#, date ] Item[ item#, unit_ price, tax code]
Normalization Abstract Example For the following un-normalized relation: RELN1[A, B, {C, D}, E, F, G] where E can be determined by knowing only A, G can be determined by knowing only F and C can be used to determine D 1NF: RELN1[A, B, C, D, E, F, G] 2NF: RELN1[A, B, F, G] RELN3[A, E] RELN2[A, B, C, D] 3NF: RELN1[A, B, F] RELN4[F, G]
Nested Repeating Groups Example Representation of a user view in UNF may result in nested repeating groups. The process of resolving a repeating group must be performed starting with the outermost repeating group first to produce a temporary new relation containing a repeating group. The process of resolving a repeating group must then also be performed on this temporary relation
STUDIOUS SENIORS ATTENDANCE REPORT STUDIOUS SENIORS ATTENDANCE REPORT Class# Class Name Room Member# Name Date Attendance 1234 Ballroom Dancing 2133 123 Jane Smith Sept 12 Y Sept 19 Y 1234 Ballroom Dancing 2133 124 Bill Smith Sept 12 Y Sept 19 N 1234 Ballroom Dancing 2133 321 Paul Woo Sept 12 N 1234 Ballroom Dancing 2133 439 Mary Lu Sept 12 Y 2245 Basket Weaving 2133 439 Mary Lu Sept 14 N Sept 21 Y 3122 Italian Cooking 2134 439 Mary Lu Sept 15 Y Sept 22 Y 3122 Italian Cooking 2134 123 Jane Smith Sept 15 Y Sept 22 N UNF? ATTLIST [Class#,ClassName,Room, {Member#, Member_Name, {Date, Attendence} }]
UNF ATTLIST [Class#,ClassName,Room, {Member#, Member_Name, {Date, Attendence} }] 1NF ATTLIST(Class#,Member#,Date, ClassName,Room, Member_Name , Attendence) 2NF MEMBER(Member#, Member_Name) CLASS(Class#, ClassName,Room,) ATTLIST(Class#,Member#,Date, Attendence)