Normalization continued CMSC 461 Michael Wilson
Normalization clarification Normalization is simply a way of reducing anomalous database behavior It’s not a required or programmatically necessary concept A database will function perfectly fine without normalized tables The table design will just suck
First normal form (1NF) Each attribute only has atomic values None of the elements on a relation in 1NF have elements which are sets The elements cannot be further broken down A bad 1NF attribute and example value: phoneNumberAndFirstName: ,Jason
First normal form (1NF) Furthermore There are no duplicate rows This means that there must be a key This is important for higher normalization forms
Second normal form (2NF) Must be in 1NF Non-prime attributes are dependent on the whole of a candidate key Not a partial candidate key – not 2NF Non-prime = attributes not part of a candidate key One thing to keep in mind Multiple candidate keys may occur within one table As long as the non-prime attributes depend on a candidate key, it is sufficiently 2NF
Reminder Candidate key = minimal uniquely identifying set of attributes
2NF example EmployeeSkillWork Location BrownLight Cleaning73 Industrial Way BrownTyping73 Industrial Way HarrisonLight Cleaning73 Industrial Way JonesShorthand114 Main Street JonesTyping114 Main Street JonesWhittling114 Main Street
2NF example Jacked shamelessly from wikipedia Good example, though Neither Employee or Skill can be a key here Key must be {Employee, Skill} Here, the work location depends on the employee alone How to solve this?
Third normal form (3NF) Must be in 2NF Every non-prime attribute must be directly dependent on every superkey in a relation X→A where X is a superkey and A is a non- prime attribute Must hold for every superkey and every non- prime attribute
Reminder Superkey – uniquely identifying set of attributes
Third normal form (3NF) Another definition: For every functional dependency X→A, one of the following must hold: X→A is trivial X is a superkey Every element of the set difference between A and X is a prime attribute – part of a candidate key
3NF example TournamentYearWinnerWinner DOB Indiana Invitational 1998Al Frederickson 21 July 1975 Cleveland Open 1999Bob Albertson 28 September 1968 Des Moines Masters 1999Al Frederickson 21 July 1975 Indiana Invitational 1999Chip Masterson 14 March 1977
3NF example Also jacked from wikipedia This table is in 2NF What are the candidate keys? What are the superkeys?
3NF example The winner functionally determines the winner date of birth Transitive dependency of a non-prime attribute Therefore, 3NF violation How do we fix this?
Boyce-Codd Normal Form Often called 3.5NF Only states two things For every functional dependency of the form X→A, one of the following must hold: X→A is trivial X is a superkey for the relation
Difference between 3NF and BCNF? It’s actually pretty straightforward 3NF says that non-prime attributes must be dependent on a key However, it does not say anything about prime attributes Parts of the key can be dependent on candidate keys BCNF tables satisfy 3NF, but not necessarily the reverse
3NF and BCNF BCNF is only slightly more strict than 3NF Only time you run into issues is when candidate keys overlap in 3NF Possible to have a 3NF relation that is not BNF when candidate keys overlap
What to use? 3NF is very popular, most common BCNF is also very popular Recommendation Shoot for 3NF to begin with Very sensible way of organizing your data Tables only have information that describes the key
Denormalization Though normalization helps us rely on our data, denormalization is sometimes required for performance reasons Often, one will need to re-add redundant data Minimizes joins, selects, views, etc. In high performance applications, one extra select could cause crippling response issues
When to denormalize? Not at first! If you don’t know that you’re going to run into performance issues, then don’t denormalize Always try to keep things in a normalized form if possible Later Once you’ve identified issues through testing and statistics, denormalize if necessary