Normalization CMSC 461 Michael Wilson
Anomalies Poor relational database design can lead to the occurrence of anomalies Anomalies that we tend to encounter Redundancy Data is repeated unnecessarily in several tuples Update anomalies Data is updated in one place but not another Deletion anomalies Data is deleted from a relation and additional data is removed along with it unintentionally
Redundancy titleyearlengthgenrestudioStar Star Wars SciFiFoxCarrie Fisher Star Wars SciFiFoxMark Hamill Star Wars SciFiFoxHarrison Ford Gone With the Wind DramaMGMVivien Leigh Wayne’s World ComedyParamountDana Carvey Wayne’s World ComedyParamountMike Meyers
Delete anomaly
Redundancy We have the movie title, movie year, film length, genre, and studio name repeated each and every time we want to add a new star to the relation
Update anomaly titleyearlengthgenrestudioStar Star Wars SciFiFoxCarrie Fisher Star Wars SciFiFoxMark Hamill Star Wars SciFiFoxHarrison Ford Gone With the Wind DramaMGMVivien Leigh Wayne’s World ComedyParamountDana Carvey Wayne’s World ComedyParamountMike Meyers
Update anomaly We changed the film length for one of the entries Forgot to update it for the rest of the entries Now the data is inconsistent
Delete anomaly titleyearlengthgenrestudioStar Star Wars SciFiFoxCarrie Fisher Star Wars SciFiFoxMark Hamill Star Wars SciFiFoxHarrison Ford Gone With the Wind DramaMGMVivien Leigh Wayne’s World ComedyParamountDana Carvey Wayne’s World ComedyParamountMike Meyers
Delete anomaly titleyearlengthgenrestudioStar Star Wars SciFiFoxCarrie Fisher Star Wars SciFiFoxMark Hamill Star Wars SciFiFoxHarrison Ford Wayne’s World ComedyParamountDana Carvey Wayne’s World ComedyParamountMike Meyers
Delete anomaly Say we wanted to remove Vivien Leigh as a star We unintentionally removed all information about Gone With the Wind from our relation
How do we address these anomalies? Same answer for all three Decomposing relations Split the attributes of a relation to make two new relations Relatively simple operation
Decomposing a relation With a relation R(A 1, A 2,…,A n ), you can decompose into relations S(B 1,B 2,…,B m ) and T(C 1,C 2,…,C k ) {A 1,A 2,…A n }={B 1,B 2,…,B m }U{C 1,C 2,…,C k } S = π B1,B2,…,Bm (R) T = π C1,C2,…,Ck (R) In other words, take a subset of R’s attributes and stuff them into S, and take the rest of R’s attributes and stuff them into T
Decomposing our movies relation – movies2 titleyearlengthgenreStudio Star Wars sciFiFox Gone With the Wind DramaMGM Wayne’s World199295comedyParamount
Decomposing our movies relation – movies3 titleyearstar Star Wars1977Carrie Fisher Star Wars1977Mark Hamill Star Wars1977Harrison Ford Gone With the Wind 1939Vivien Leigh Wayne’s World1992Dana Carvey Wayne’s World1992Mike Meyers
Further decomposition? There are still some anomaly possibilities in these resulting relations Also, we would like to be able to handle movies named the same thing in the same year How could we decompose them further to avoid this?
Normalization There are normal forms that can be applied to relation schemas and databases Organizing the attributes to reduce redundancy and dependency on one another Done primarily through decomposing of relations Many different normal forms
Normalization forms You’ll see NF to mean “normal form” 1NF Attributes are atomic. That is, they cannot be decomposed any further 2NF Satisfied 1NF, and also all non-prime attributes are dependent wholly on a candidate key (no subsets) Non-prime attributes are attributes not part of a candidate key
Third normal form (3NF) The relation satisifies 2NF Every non-prime attribute is directly dependent on every superkey of the relation Directly dependent means that it is not transitive If you have two FDs: A→B B→C A→C is a transitive dependency In other words, relations that satisfied A→B must only have the key A or attributes directly related to A
Bill Kent’s pledge 3NF: “Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key.” 1NF – The key exists 2NF – Non-key attributes dependent on the whole key 3NF – Non-key attributes must be dependent on nothing but the key