Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Sections Database Anomalies What is Normalization? The Normal Forms 2
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Database Design Database design demands the decision of a suitable logical structure Most importantly What relations are needed to store the values What attributes they should use And the optimization of relation design for clarity and efficiency 3
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Data Anomalies Edgar Codd, inventor of the relational database, described data anomalies in the 70s They are unintended consequences of a database modification There are 3 kinds of anomalies: Insert Anomaly Delete Anomaly Update Anomaly 4
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Insert Anomaly Insert anomalies happen when data is inserted into the relation that has attributes missing (null attributes) If we view the relation as a set where every tuple is its own key, then this is an illegal operation We don’t want the database to have holes in its information 5
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Delete Anomaly If database is not normalized, then deleting a from a relation could result in a deletion of other wanted information Example of insert and delete anomalies on next page: 6
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Insert anomaly: Because P# is empty Delete anomaly: If S5 is deleted, then all information about S5, salary, status 30, Athens is lost 7 S #Salary STATUS CITYP # QTY S LONDONP1300 S LONDONP2200 S LONDONP3400 S LONDONP4200 S LONDONP5100 S LONDONP6100 S PARISP1300 S PARISP2400 S PARISP2200 S LONDONP2200 S LONDONP4300 S LONDONP5400 S ATHENS -
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Update Anomaly If a database is not normalized, updating a single fact in the database becomes very inefficient and sometimes incorrect That fact can be in many relations, so only updating one relation would not be sufficient Therefore, many relations must be updated to accurately reflect the update… if this is not done then the update is not accurate. 8
Dr. T. Y. Lin | SJSU | CS 157A | Fall If S1 is changed then many updates have to be issued for a single attribute change Why not issue single change to single relation? S #Salary STATUS CITYP # QTY S LONDONP1300 S LONDONP2200 S LONDONP3400 S LONDONP4200 S LONDONP5100 S LONDONP6100 S PARISP1300 S PARISP2400 S PARISP2200 S LONDONP2200 S LONDONP4300 S LONDONP5400
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 What is Normalization? Normalization is a formalized process of decomposing relations Normalized relations aim to remove redundancy and dependencies from relations By doing this, data anomalies are prevented But also, normalization is also the basis for designing simpler, clearer, faster, and more efficient RDBMS’s 10
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Normal Forms Data anomalies were described by Codd in 1970s He and others (Boyce, Fagin, more) also began defining Normal Forms that could describe how rigorous the normalization is-- Normal Forms A Normal Form is the specific form a relation is in when it satisfies specific properties These properties provide a systematic way of formulating non-normalized relations into normalized relations 11
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Many Normal Forms Relations in higher Normal Forms will be more normalized than relations in lower Normal Forms Every higher Normal Form satisfies every Normal Form lower than it Ex. 2NF is also 1NF, and 3NF is also 2NF and 1NF 1NF, 2NF, 3NF, BCNF, and 4NF will be discussed, however there are even more Normal Forms than these 5. 12
Dr. T. Y. Lin | SJSU | CS 157A | Fall
Dr. T. Y. Lin | SJSU | CS 157A | Fall NF (First Normal Form) For a relation to be in 1NF it must have: Any related values must be decomposed into separate tables All rows must be unique (relational set) All columns must be unique (no repeating groups) Any value in any tuple must be atomic (cannot be divided) A private key must be defined (usually formally defined as the entire tuple, assuming it is unique and the relation is a set) 14
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Functional Dependencies (FD) Given a relation R, attribute Y of R is functionally dependent on attribute X of R if each X - value in R has associated with it precisely one Y - value in R (at any one time). (no X-values are mapped to two Y-values) A functional dependency is a special form of integrity constraint. In other words, every legal extension ( tabulation ) of that relation satisfies that constraint. An attribute Y is said to be fully functionally dependent on X if Y functionally depends on X but not any proper subset of X. From now on, by FD, we mean full FD. 15
Dr. T. Y. Lin | SJSU | CS 157A | Fall NF (Second Normal Form) SQL is automatically in 1NF, but it is not good enough, in Codd’s own words 16
Dr. T. Y. Lin | SJSU | CS 157A | Fall NF (Third Normal Form) Functional dependencies 17
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 BCNF (Boyce-Codd Normal Form) Also known as 3.5 Normal form 18
Dr. T. Y. Lin | SJSU | CS 157A | Fall NF (Fourth Normal Form) 19