Download presentation
Presentation is loading. Please wait.
Published byAlice Davidson Modified over 9 years ago
1
Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~
2
What is Data Normalization?
Data Normalization is a process applied to relational tables which improves table design by eliminating redundancy and inconsistency.
3
How does the process work?
For each table: Select a minimal candidate key from the existing data. Do not use a surrogate key. Check for various dependency conditions and if they exist, resolve them by splitting data into new tables. The process always creates a cleaner design that has more tables.
4
What U talkin’ about Willis?
Can you build the underlying data model from this report? You can if you know how to normalize. Normalization makes this task quite easy. What is the key? Do you see problems with writing SQL to SELECT, INSERT, UPDATE or DELETE? Could there be redundancies or inconsistencies?
5
Understanding Functional Dependence
For attributes A and B, B is functionally dependent on A means each value in column A determines one and only one value in column B. Written: A B A determines B B is the determinant Ex: SSN Name (Name is functionally dependent on SSN)
6
Normalization Lingo Prime attribute = Any attribute which has been selected as minimal candidate key, or in the case of a composite key is part of the key. Non-Prime Attribute = Any attribute which is not part of the key. Key Attribute = Prime Attribute Non-Key Attribute = Non-Prime Attribute Prime attribute? Non-Prime attribute? Key attribute? Non-key Attribute? Depends on which book you read!
7
Normalization, FD, and You
Normalization is just the analysis of Functional Dependencies of all columns with respect to the chosen key. There are three dependency conditions which must be tested: Functional Dependence – any non-prime attributes which as FD on the PK. Partial Functional Dependence – any non-key attributes which are FD on part of the PK. Transitive Functional Dependence – any non-key attributes which are FD on some other non-key attribute(s).
8
Activity: IYCDTYCN! Identify the: Chosen Key? Prime Attributes?
If you can do this you can normalize! Primary Key: Driver ID # + Vehicle Lic Plate Prime Attributes: Driver ID # + Vehicle Lic Plate Non-Prime Attributes: (Every other column, but Driver ID # + Vehicle Lic Plate) Functional Dependencies on PK: Every column but Driver Territories Partial Functional Dependencies on part of PK: Driver Name, Driver Chg/Hr FD on Driver ID #, etc Transitive Functional Dependencies (on some non-key attribute) Vehicle Chg/HR FD on Vehicle Size Identify the: Chosen Key? Prime Attributes? Non-Prime Attributes Identify the: Functional Dependencies (WRT the Key) Partial Functional Dependencies (WRT part of the Key) Transitive Functional Dependencies (WRT some non-prime attribute)
9
The Dependency Diagram
The Dependency Diagram is a Very Useful Tool. It depicts the dependencies which exist among the attributes. Being in a normal form is like being the one, you either are or you aren’t. There are three main normal forms, and three levels of analysis…. Coincidence? I think not!
10
Normal Forms A Normal Form represents the current “state” of the data model. There are 4 basic normal forms: Zero Normal Form (0NF) Non-key attributes exist which are not FD on PK. First Normal Form (1NF) All non-key attributes FD on entire PK. Second Normal Form (2NF) In 1NF and No partial functional dependencies exist. Third Normal Form (3NF) In 2NF and No transitive functional dependencies exist. Being in a normal form is like being the one, you either are or you aren’t. There are three main normal forms, and three levels of analysis…. Coincidence? I think not!
11
First Normal Form (1NF) Definition: Rule: How to Apply the Rule:
All non-key attributes must be FD on the entire PK. (There must be PKFD for all attributes.) Rule: Move each non-key FD column into its own new table. How to Apply the Rule: For each non-key FD column: Place non-FD column into a new table Copy the PK (or part of it) from the original table into the new table. This will be a FK in the new table. Assign a PK to the new table (typically a composite key of the original Non-FD column and the FK.)
12
1NF: Example 1/2 What’s wrong with this data model?
What should be PK be? Why? Is there an attribute not FD on the PK? Is it in 1NF already? What if Erin takes up bass fishing? I’m planning a ski trip, whom should I contact? (How do I know Hobby3, skiing and not Hobby1)? Before we place this in 1NF, ask yourself what’s wrong with this data model? What do we need to do to add another hobby? Why is FID a better PK than or Name?
13
1NF: Example 2/2 What was done: Questions:
Hobbies table created. Contains the originally non FD column, “hobby” The PK (FID) was copied into the hobbies table. The PK of the Hobbies table is the combination of FID and Hobby. Questions: Is this in 1NF? Can you reproduce the previous data model from this one? Who likes skiing? Basketball? Is this in 1NF? Got to check both tables, but yes, it is. Can you reproduce the data, yup. You can! To produce the multiple columns like in the original model, we’d need to use a cross-tab query.
14
Second Normal Form (2NF)
Definition: The data model must be in 1NF AND No partial functional dependencies can exist. Rule: Move each partially FD non-key column into its own new table. How to Apply the Rule: For each partial dependency: Move all partially FD columns into a new table Copy the determinant into the new table. Make the determinant of the partial dependency: The PK for the new table, FK to the existing table.
15
2NF: Example 1/2 What’s wrong with this data model?
What should be PK be? Why? Do any partial dependencies exist? Where? What is the determinant for each, if any? Is it in 1NF already? 2NF? I made a mistake, 81HLV3 is a Power edge 5500, not a 4400? Before we place this in 2NF, ask yourself what’s wrong with this data model? What do we need to do to add another hobby? Why is FID a better PK than or Name?
16
2NF: Example 2/2 What was done: Questions:
Serial Num + SWID is the primary key. Servers, Software tables created from partial dependencies, where Serial Num,SWID are the determinants. Serial Num, is the PK for Servers, SWID is the PK for Software, each are also FK’s for the SWInstallation table Questions: Is this in 2NF? Can you reproduce the previous data model from this one? Is this in 2NF? Got to check all three tables, but yes, it is. Can you reproduce the data, yup. You can! Using a standard join in your query, you can reproduce it.
17
Third Normal Form (3NF) Definition: Rule: How to Apply the Rule:
The data model must be in 2NF AND No transitive functional dependencies can exist. Rule: Move each transitive FD non-key column into its own new table. How to Apply the Rule: For each transitive dependency: Move all transitive FD columns into a new table. Copy the determinant column into the new table. Make the determinant of the transitive dependency: The the PK for the new table. The FK for the original table.
18
3NF: Example 1/2 What’s wrong with this data model?
What should be PK be? Why? Do any transitive dependencies exist? Where? What is the determinant for each, if any? Is it in 1NF already? 2NF? 3NF? I made a mistake, Koors phone number is 4905? What’s wrong? Before we place this in 3NF, ask yourself what’s wrong with this data model? What do we need to do to add another hobby? Why is FID a better PK than or Name?
19
3NF: Example 2/2 What was done: Questions: Beer ID is the PK.
All transitive dependencies moved into a new table, Distributors. Distrib ID is the determinant. PK of Distributors table, FK in original Beer table. Questions: Is this in 3NF? Can you reproduce the previous data model from this one? Is this in 2NF? Got to check all three tables, but yes, it is. Can you reproduce the data, yup. You can! Using a standard join in your query, you can reproduce it.
20
Yes, there IS more… … and it will blow your mind.
Higher Normal Forms Yes, there IS more… … and it will blow your mind.
21
Boyce-Codd Normal Form (BCNF)
Rule: Eliminate key-transitive dependencies A table in BCNF Means: The table is in 3NF It includes no Non-Key attribute which determines a key attribute, or part of a key attribute.
22
BCNF: An Example
23
Fourth Normal Form (4NF)
RULE: Eliminate multiple sets of multi-valued dependencies. A table in 4NF Means: The table is in 3NF It includes no sets of attributes which contain multi-valued dependencies.
24
4NF: An Example Figure 4.15 Set of Tables in 4NF Figure 4.14
Multivalued Dependencies
25
How “far” should one Normalize?
For relational databases: 1NF is required, at minimum for practical RDBMS implementations. The majority of the time data models are normalized to 3NF. Sometimes certain tables are left in 1NF or 2NF, for performance or practical reasons. Higher normal forms BCNF, 4NF are rare. In General, the Higher the NF of your DM: The more complicated the internal DM The more “programming” required to reproduce the external DM. But, the lesser the chance for data anomalies!! It’s a total trade-off: Database complexity vs. data anomalies.
26
Mike’s “Road To 3NF” To normalize correctly, follow this process for each table in the data model: Designate a candidate key Any partial dependencies? Party Hard ! n 2NF PKFD for all attributes? Any transitive dependencies? y n 1NF 3NF y Apply 2NF Rule n y Apply 1NF Rule Apply 3NF Rule
27
Normalization Summary Cheat Sheet
0NF NF (Resolve non FD) 1NF NF (Resolve Partial FD) 2NF NF (Resolve Transitive FD) O O N O N1 O N2 O O N
28
Data Normalization Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.