7 Copyright © 2006, Oracle. All rights reserved. Normalization of Relational Tables (Part II)
BCNF A table is in BCNF if every determinant is a candidate key. –A stronger definition than 3NF –It is simpler because it does not refer to 2NF –It is better and covers a special case omitted by the original 3NF definition. Special cases not covered by 3NF –A part of a compound key A part of a compound key –A nonkey column A part of a compound key (The determinant of both FDs is not a candidate key) The first special case is only possible if there are multiple composite, candidate keys (The cases are not common).
Second and Third Normal Form (2NF and 3NF) A table is in 2NF if no partial dependency exists –Partial Dependency ( 部分的依賴 ) A part of a compound key → A nonkey column A table is in 3NF if it is in 2NF and no transitive dependency exists –Transitive Dependency ( 傳遞 / 遞移的依賴 ) A nonkey column A nonkey column
BCNF Example Primary key: ( StdSSN, OfferNo ) Violations of 2NF, 3NF, BCNF (2NF > 3NF > BCNF) –2NF (no partial dependency exists) : FD1, FD2 –3NF (in 2NF and no transitive dependency) : FD1, FD2, FD3 –BCNF (every determinant is a candidate key) : FD1, FD2, FD3 Split into four tables StdSSN, OfferNo EnrGrade StdSSN StdCity, StdClass (violation of 2NF, hence 3NF) OfferNo OffTerm, OffYear, CourseNo (violation of 2NF, hence 3NF) CourseNo CrsDesc (violation of 3NF, BCNF) FD 1 FD 2 FD 3 FD 4
Violation Example of BCNF UnivTable4 StdSSNOfferNo EnrGrade UnivTable4 is in 3NF, but not in BCNF. UnivTable4 is in 3NF because the only nonkey column ( EnrGrade ) depends on candidate keys (not on other nonkey column) in the FD diagram above. In UnivTable4, the dependencies between StdSSN and violate BCNF (every determinant is a candidate key). –Both StdSSN and are determants, but neither is an entire candidate key although each is a part of a candidate key. key columnnonkey column
Example of Handling BCNF Violation UnivTable4 StdSSNOfferNo EnrGrade UnivTable4-1 (StdSSN, ) UNIQUE ( ) UnivTable4-2 (StdSSN, OfferNo, EnrGrade) FOREIGN KEY (StdSSN) REFERENCES UnivTable4-1 UnivTable4 is in 3NF, but not in BCNF. (A table is in BCNF if every determinant is a whole candidate key.) key columnnonkey column PK of another table ( FK )
Violation Example of BCNF UnivTable5 is in 3NF (nonkey column depends on candidate key) – Status is the only nonkey column. – Status depends on the entire candidate keys ( or ) UnivTable5 is not in BCNF (every determinant is a candidate key). –The dependency diagram shows that AdvisorNo is a determinant (for Major) but not a candidate key. UnivTable5 is in 3NF, but not in BCNF. UnivTable5 AdvisorNoStdSSNMajorStatus A1S1ISCOMPLETED A2S1FINPENDING A1S2ISPENDING A3S2FINCOMPLETED key columnnonkey column
Example of Handling BCNF Violation UnivTable5-1 (AdvisorNo, Major) UnivTable5-2 (AdvisorNo, StdSSN, Status) FOREIGN KEY (AdvisorNo) REFERENCES UnivTable5-1 UnivTable5 is in 3NF, but not in BCNF. UnivTable5 AdvisorNoStdSSNMajorStatus A1S1ISCOMPLETED A2S1FINPENDING A1S2ISPENDING A3S2FINCOMPLETED PK PK of another table ( FK ) key columnnonkey column
Simple Synthesis ( 合成 ) Procedure (For Generating Tables Satisfying BCNF) Starting point Starting with a list of simple functional dependencies Synthesis –Individual simple FDs are combined into composite FDs –Composite FDs are used to construct tables 5 steps in total –First 2 steps eliminate redundancy by removing extraneous ( 無關的 ) columns and derived FDs. –Last 3 steps produce tables by using composite FDs.
Simple Synthesis ( 合成 ) Procedure 1.Eliminate extraneous ( 無關的 ) columns from the LHS of FDs 2.Remove derived FDs (Transitive Dependency) 3.Arrange the simple FDs of same LHS into groups (composite FDs) with each group having the same LHS. 4.Make a table for each composite FD group with its determinant as the primary key. 5.Merge two similar tables into one table if one of the two tables contains all columns of the other one. –Choose a PK from one of the two tables as the PK of the merged table –Define the unique constraints for the PK column of the other table, that is not designated as the PK of the merged table 6.Add foreign keys
Simple Synthesis Example I Step 1: Eliminate extraneous columns from the LHS of FDs Check if the LHS of any FD contains two or more columns. Check if any column of the candidate LHSs is the RHS of any FD. StdSSN, StdClass StdCity StdSSN StdClass StdSSN StdSSN OfferNo OffTerm OfferNo OffYear OfferNo CourseNo OfferNo CrsDesc CourseNo CrsDesc StdSSN, OfferNo EnrGrade StdSSN StdCity StdSSN StdClass StdSSN StdSSN OfferNo OffTerm OfferNo OffYear OfferNo CourseNo OfferNo CrsDesc CourseNo CrsDesc StdSSN, OfferNo EnrGrade Is StdSSN or OfferNo at the RHS of any FD ?
Simple Synthesis Example I Step 2: Eliminate transitive dependency (derived FDs ) StdSSN StdCity StdSSN StdClass StdSSN StdSSN OfferNo OffTerm OfferNo OffYear OfferNo CourseNo OfferNo CrsDesc CourseNo CrsDesc StdSSN, OfferNo EnrGrade StdSSN StdCity StdSSN StdClass StdSSN StdSSN OfferNo OffTerm OfferNo OffYear OfferNo CourseNo CourseNo CrsDesc StdSSN, OfferNo EnrGrade OfferNo CrsDesc is an transitive dependency.
Simple Synthesis Example I Step 3: Arrange simple FDs of same LHS into a composite FD StdSSN StdCity StdSSN StdClass StdSSN StdSSN OfferNo OffTerm OfferNo OffYear OfferNo CourseNo CourseNo CrsDesc StdSSN, OfferNo EnrGrade StdSSN StdCity, StdClass, StdSSN OfferNo OffTerm, OffYear, CourseNo CourseNo CrsDesc StdSSN, OfferNo EnrGrade
Simple Synthesis Example I Step 4: Make a table for each FD group with the determinant as the primary key. Result: Five tables Student, Student2, Offering, Course, Enrollment Student (StdSSN, StdCity, StdClass, ) Student2 ( , StdSSN) Offering (OfferNo, OffTerm, OffYear, CourseNo) Course (CourseNo, CrsDesc) Enrollment (StdSSN, OfferNo, EnrGrade) StdSSN StdCity, StdClass, StdSSN OfferNo OffTerm, OffYear, CourseNo CourseNo CrsDesc StdSSN, OfferNo EnrGrade
Simple Synthesis Example I Step 5: Merge tables Merge two similar tables into one table if one of the two tables contains all columns of the other one. Student (StdSSN, StdCity, StdClass, ) Student2 ( , StdSSN) Offering (OfferNo, OffTerm, OffYear, CourseNo) Course (CourseNo, CrsDesc) Enrollment (StdSSN, OfferNo, EnrGrade) Student (StdSSN, StdCity, StdClass, ) UNIQUE ( ) Offering (OfferNo, OffTerm, OffYear, CourseNo) Course (CourseNo, CrsDesc) Enrollment (StdSSN, OfferNo, EnrGrade)
Add Foreign Keys Student (StdSSN, StdCity, StdClass, ) UNIQUE ( ) Offering (OfferNo, OffTerm, OffYear, CourseNo) FOREIGN KEY (CourseNo) REFERENCES Course Course (CourseNo, CrsDesc) Enrollment (StdSSN, OfferNo, EnrGrade) FOREIGN KEY (StdSSN) REFERENCES Student FOREIGN KEY (OfferNo) REFERENCES Offering
Multiple Candidate Keys A common misconception by novice database developers is that a table with multiple candidate keys violate BCNF. Multiple candidate keys do not violate either 3NF or BCNF necessarily. Step 5 of the Simple Synthesis Procedure creates tables with multiple candidate keys. You should not split a table just because it contains multiple candidate keys. Splitting a table unnecessarily can slow query performance.
Simple Synthesis Example II Reviews of Paper ( 論文的審查 ) Author ( 作者 ) information includes the unique author number, the name, the mailing address, and the unique but optional address. Paper ( 論文 ) information includes the primary author, the unique paper number, the title, the abstract, and the review status (pending, accepted, rejected). Reviewer ( 審查者 ) information includes the unique reviewer number, the name, the mailing address, and a unique but optional address. A completed review includes the reviewer number, the date, the paper number, comments to the authors, comments to the program chairman, and rating (overall, originality, correctness, style, and relevance). The combination of reviewer number and paper number identifies a review.
Simple Synthesis Example II AuthNo AuthName, Auth , AuthAddress Auth AuthNo PaperNo Primary_AuthNo, Title, Abstract, Status RevNo RevName, Rev , RevAddress Rev RevNo RevNo, PaperNo Auth_Comm, Prog_Comm, Date, Rating1, Rating2, Rating3, Rating4, Rating5 The steps of simple synthesis The first step is finished because the LHS is minimal in each FD. The second step is not necessary because there are no transitive dependency. The third step is finished because the FDs are in groups. The fourth step : define a table for each of the six FD groups The fifth step : merge similar tables and add foreign keys
Simple Synthesis Example II Solution Author(AuthNo, AuthName, Auth , AuthAddress) UNIQUE (Auth ) Paper(PaperNo, Primary_Auth, Title, Abstract, Status) FOREIGN KEY (Primary_Auth) REFERENCES Author Reviewer(RevNo, RevName, Rev , RevAddress) UNIQUE (Rev ) Review(PaperNo, RevNo, Auth_Comm, Prog_Comm, Date, Rating1, Rating2, Rating3, Rating4, Rating5) FOREIGN KEY (PaperNo) REFERENCES Paper FOREIGN KEY (RevNo) REFERENCES Reviewer
Practical Concerns: Role of Normalization Two ways to use normalization in DB development process As a refinement ( 精鍊 ) tool –Use after the ERD is converted into tables –Apply to each table As an initial design tool –Use normalization instead of ERD drawing in data modeling –Identify attributes and their functional dependencies –Apply normalization to generate tables –Add referential integrity constraints
Advantages of Refinement Approach Easier to translate requirements into an ERD than into a list of FDs During ERD development, related fields are grouped intuitively. Much normalization is done in an informal manner. There are fewer FDs to specify and less normalization to perform. Relationships can be overlooked when using normalization as the initial design approach
Normalization Objective Normalization results in a DB design with many tables to avoid modification anomalies Many tables Easier to change, but Difficult to query Normalization is update-biased If a database is used predominantly for queries, avoiding modification anomalies may not be an appropriate design goal.
Denormalization (反正規劃) A process of combing tables so that they are easier to query. Purposeful violation of normal forms –Most FDs will lead to anomalies if ignored. May improve performance
Summary Beware of unwanted redundancies FDs are important constraints Use a CASE tool for large problems Important tool of database development Focus on the normalization objective 自我練習 第七章 242 頁 13, 14, 15, 16