Download presentation
Presentation is loading. Please wait.
1
Text & Original Presentations
Orange Coast College Business Division CS/CIS Department Fall CIS 182 Introduction to Database Concepts Instructor Dr. Martha Malaty Text & Original Presentations Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel, 2004
2
Normalization of Database Tables
Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel
3
In this chapter, you will learn:
What normalization is and what role it plays in the database design process About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF How normal forms can be transformed from lower normal forms to higher normal forms That normalization and ER modeling are used concurrently to produce a good database design That some situations require denormalization to generate information efficiently Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
4
Database Tables & Normalization
Good table structure is essential for good DB design Normalization is the way to reach good table structure Normalization is the process for assigning attributes to entities through evaluating and correcting table structures Reduces data redundancies Helps eliminate data anomalies Produces controlled redundancies to link tables Works through sequence of stages (normal forms) Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
5
Database Tables & Normalization
Normalization works through a series of stages called normal forms: First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Fourth normal form (4NF) . . . 2NF is better than 1NF; 3NF is better than 2NF; … For most business database design purposes, 3NF is highest we need to go in the normalization process Highest level of normalization is not always most desirable Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
6
Tutorials Websites On Normalization
What do you need to know about Webopedia Rules of data normalization 3NF Tutorial ServerWatch VBMySQL.com Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
7
Unnormalized vs. Denormalized
Unnormalized data: No ER model is developed when creating the database Denormalized data Starting with normalized data, adding redundancy for better performance We cannot denormalize unnormalized data Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
8
The Need for Normalization
Example: A company manages building projects Charges its clients by billing hours spent on each contract Hourly billing rate is dependent on employee’s position Periodically, a report is generated that contains information displayed in Table 5.1 See RPT_FORMAT table in ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
9
A Sample Report Layout Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
10
A Table in the Report Format
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
11
Example: Construction company
Three major problems: PROJ_NUM contains repeating groups Table entries invite data inconsistencies Table displays data anomalies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
12
Construction Company Problems
PROJ_NUM contains “repeating groups” Several data entries with NULL entries indicating that all entries belong to the same PROJ_NUM PRO_NUM is intended to be primary key Primary key can’t be NULL Relational table can’t contain repeating groups Repeating groups Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
13
Construction Company Problems
Table entries invite data inconsistencies Elect. Engineer could mistakenly be entered in another abbreviated form (e.g. Elect.Eng, EE, …) Possible inconsistency Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
14
Construction Company Problems
Table displays data anomalies Update anomalies Modifying JOB_CLASS needs several alterations Insertion anomalies New employee must be assigned project Deletion anomalies If employee is deleted, other vital data might be lost Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
15
Construction Company Problems
Structure of data set in Fig. 5.1 (ConstructCo.mdb) does not handle data very well The table structure appears to work; report is generated with ease Unfortunately, the report may yield different results, depending on what data anomaly has occurred Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
16
Levels of Normalizations
1NF 2NF 3NF BCNF 4NF 5NF Un-normalized) Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
17
Conversion to First Normal Form
Repeating group Derives its name from the fact that a group of multiple (related) entries can exist for any single key attribute occurrence Relational table must not contain repeating groups Normalizing the table structure will reduce these data redundancies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
18
Conversion to First Normal Form
Normalization is three-step procedure Eliminate Repeating Groups Identify the Primary Key Identify all Dependencies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
19
Step 1: Eliminate Repeating Groups
Present data in a tabular format, where each cell has a single value and there are no repeating groups Eliminate repeating groups by eliminating nulls, making sure that each repeating group attribute contains an appropriate data value Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
20
Data Organization: 1NF See DATA_ORG_INF
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
21
Step 2: Identify the Primary Key
Primary key must uniquely identify attribute values New key must be composed What would be the primary key for the DATA_ORG_INF table? Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
22
Step 3: Identify all Dependencies
Dependencies can be depicted with the help of a “Dependency Diagram” Depicts all dependencies found within a given table structure Helpful in getting bird’s-eye view of all relationships among a table’s attributes Makes it much less likely that an important dependency will be overlooked Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
23
Dependency Diagram: 1NF
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
24
First Normal Form Tabular format in which:
All key attributes are defined There are no repeating groups in the table All attributes are dependent on primary key All relational tables satisfy 1NF requirements Some tables contain “Partial dependencies” Dependencies based on only part of the primary key Sometimes used for performance reasons, but should be used with caution Still subject to data redundancies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
25
Conversion to Second Normal Form
Relational database design can be improved by converting the database into second normal form (2NF) Two steps Identify All key components Identify the dependent attributes Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
26
Step 1: Identify All Key Components
Write each key component on separate line, and then write the original (composite) key on the last line Each component will become the key in a new table Example: Proj_Num Emp_Num Proj_Num, Emp_Num Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
27
Step 2: Identify Dependent Attributes
Determine which attributes are dependent on which other attributes At this point, most anomalies have been eliminated Example: Proj_Num Proj_name Emp_Num Emp_Name, Job_Class, Chg_Hour Proj_Num, Emp_Num Hours Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
28
Second Normal Form (2NF) Conversion Results
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
29
Second Normal Form Table is in second normal form (2NF) if:
It is in 1NF and It includes no partial dependencies: No attribute is dependent on only a portion of the primary key Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
30
Conversion to Third Normal Form
Three steps Identify each new determinant Identify the dependent attributes Remove the dependent attributes from transitive dependencies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
31
Step 1: Identify New Determinants
Any attribute whose value determines other values within a row For every transitive dependency, write its determinant as a PK for a new table Example Job_Class Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
32
Step 2: Identify Dependent Attributes
Identify the attributes dependent on each determinant identified in Step 1 and identify the dependency Name the table to reflect its contents and function Example Job_Class Chg_Hour Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
33
Step 3: Remove Dependent Attributes from Transitive Dependencies
Eliminate all dependent attributes in transitive relationship(s) from each table that has such a transitive relationship Draw a new dependency diagram to show all tables defined in Steps 1–3 Check new tables and modified tables from Step 3 to make sure that each has a determinant and does not contain inappropriate dependencies Example: Proj_Num Proj_name Emp_Num Emp_Name, Job_Class Proj_Num, Emp_Num Hours Job_Class Chg_Hour Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
34
Third Normal Form (3NF) Conversion Results
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
35
Third Normal Form A table is in third normal form (3NF) if:
It is in 2NF and It contains no transitive dependencies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
36
Improving the Design Table structures are cleaned up to eliminate the troublesome initial partial and transitive dependencies Normalization cannot, by itself, be relied on to make good designs It is valuable because its use helps eliminate data redundancies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
37
Improving the Design (continued)
The following changes were (or should be) made: PK assignment Naming conventions Attribute atomicity Adding attributes Adding relationships Refining PKs Maintaining historical accuracy Using derived attributes Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
38
PK assignment Addition of JOB_CODE attribute greatly decreases the likelihood of referential integrity violation Example: One would write a job class as “DB Designer” and another as “Database Designer” although both refer to the same class Problem: The new key produces transitive dependency JOB_CODE -> JOB_CLASS, CHG_HOUR JOB_CLASS -> CHG_HOUR The effect of transitive dependency is less problematic than integrity violation Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
39
Naming conventions Adhere to naming convention by having the table name as the first prefix of the attribute name Example: CHG_HOUR is changed to JOB_CHG_HOUR JOB_CLASS should be changed to JOB_DESCRIPTION to better describe the entity HOURS is changed to ASSIGNED_HOURS to associate it with the ASSIGN table Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
40
Attribute atomicity Atomic attribute: Example: EMP_NAME is not atomic
Attribute that cannot be further divided Gains querying flexibility Example: EMP_NAME is not atomic Use EMP_LNAME, EMP_FNAME, EMP_INITIAL instead Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
41
Adding Attributes Adding attributes that are needed in the real-world environment Example: EMP_HIREDATE can be used to track employee’s “job longevity” to award bonuses to long-term employees & other morale enhancing measures Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
42
Adding Relationships To avoid unnecessary data duplication Example:
To be able to supply detailed information about each project’s manager, use EMP_NUM as a FK in PROJECT table Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
43
Refining PKs If an employee makes two or more entries for the same project, replace EMP_NUM & PROJ_NUM as a combined key, by a system-assigned, surrogate key, ASIGN_NUM, in the ASSIGN table Surrogate key: At the implementation level, a system-defined attribute that is created and managed by the DBMS. It is automatically incremented for each row. In Oracle for example we use the "sequence" object for surrogate keys. EMP_NUM & PROJ_NUM can still be used as FK’s Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
44
Maintaining historical accuracy
Writing the job charge per hour in ASSIGN table is crucial to maintain historical accuracy, in case that the job charge per hour changes over time Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
45
Using derived attributes
Derived attributes can be calculated when needed Storing a derived attribute makes it easier to write application SW to produce the desired results and saves reporting time ASSIGN_CHARGE is the result of multiplying ASSIGN_HOURS by ASSIGN_CHG_HOUR Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
46
The Completed Database
See Ch05_ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
47
The Completed Database (continued)
See Ch05_ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
48
The Completed Database (continued)
See Ch05_ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
49
Limitations on System-Assigned Keys
System-assigned (surrogate) primary key may not prevent confusing entries Example: JOB_CODE attribute was designed to be a PK of JOB table This does not prevent duplicating existing records with different PK values Data entries in Table 5.2 are inappropriate because they duplicate existing records Yet there has been no violation of either entity integrity or referential integrity Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
50
The Boyce-Codd Normal Form (BCNF)
Google search About.com A relation is in Boyce-Codd Normal Form (BCNF) if every determinant is a candidate key. Candidate key Has same characteristics as primary key, but for some reason, not chosen to be primary key If a table contains only one candidate key, the 3NF and the BCNF are equivalent BCNF can be violated only if the table contains more than one candidate key Most designers consider the BCNF as a special case of 3NF Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
51
The Boyce-Codd Normal Form (BCNF) (continued)
A table is in 3NF if it is in 2NF and there are no transitive dependencies A table can be in 3NF and not be in BCNF A transitive dependency exists when one nonprime attribute is dependent on another nonprime attribute A nonkey attribute is the determinant of a key attribute How to convert a 3NF into BCNF? Decompose the tables according to the functional dependencies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
52
A Table That is in 3NF but not in BCNF
Here: No partial dependencies No transitive dependencies But a nonkey attribute determines a key attribute A + B C, D C B Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
53
Decomposition to BCNF To convert the table into BCNF
First change the PK into A + C; partial dependency is generated Next Decompose the tables to remove partial dependency Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
54
Sample Data for a BCNF Conversion
Each CLASS_CODE identifies a class uniquely A student can take many classes A staff member can teach many classes STU_ID + STAFF_ID CLASS_CODE, ENROLL_GRADE CLASS_CODE STAFF_ID Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
55
Sample Data for a BCNF Conversion
STU_ID + STAFF_ID CLASS_CODE, ENROLL_GRADE CLASS_CODE STAFF_ID Already in 3NF but has Some anomalies Update anomalies If a different staff member is assigned to teach class 32456, 2 rows will require updating Deletion anomalies If student 135 drops class 28458, we loose information about who taught that class Solution Decompose table structure Resulting tables are in 3NF & BCNF Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
56
Another BCNF Decomposition
Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
57
Normalization and Database Design
Normalization should be part of design process Make sure that proposed entities meet required normal form before table structures are created Many real-world databases have been improperly designed or burdened with anomalies if improperly modified during course of time You may be asked to redesign and modify existing databases Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
58
Normalization and Database Design (continued)
ER diagram Provides the big picture, or macro view, of an organization’s data requirements and operations Created through an iterative process Identifying relevant entities, their attributes and their relationship Use results to identify additional entities and attributes Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
59
Normalization and Database Design (continued)
Normalization procedures Focus on the characteristics of specific entities A micro view of the entities within the ER diagram May yield additional entities/attributes to be added to the ERD Difficult to separate normalization process from ER modeling process Two techniques should be used concurrently Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
60
Example: Contracting Company
Operations/Business Rules: Company manages many projects Each project requires services of many employees An employee may be assigned many projects Some employees might not have any assigned projects Every employee has a job classification that determines the hourly billing rate Many employees can have the same job classification Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
61
Example: Contracting Company
The Initial ERD PROJECT is in 3NF EMPLOYEE: Contains transitive dependency JOB_DESCRIPTION defines the job’s charge per hour Classifications determine billing rate Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
62
Example: Contracting Company
The Modified ERD; removing transitive dependency Question: Which type of relationship exist between EMPLOYEE & PROJECT? Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
63
The Incorrect Representation of a M:N Relationship
M:N between EMPLOYEE & PROJECT cannot be implemented Need further table decomposition Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
64
Final (Implementable) ERD for a Contracting Company
After table decomposition Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
65
The Implemented Database for the Contracting Company
See Ch05_ConstructCo.mdb Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
66
Higher-Level Normal Forms
In some databases, multiple multivalued attributes exist Fourth Normal Form (4NF) Table is in 3NF No multiple sets of multivalued dependencies Fifth Normal Form (5NF, Domain Key NF) Not likely in business environment Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
67
Tables with Multivalued Dependencies
The tables contain two sets of independent multi-valued dependencies ORG_CODE & ASSIGN_NUM may have different values for the same employee One employee can have many service entries and many assignment to different organizations There is no viable candidate key EMP_NUM values are not unique, nor any other combination, since some of them contain nulls Version 3 is in 3NF, but has many redundancies Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
68
Fourth Normal Form Table is in fourth normal form (4NF) if
It is in 3NF or BCNF Has no multiple sets of multivalued dependencies 4NF is largely academic if tables conform to the following two rules: All attributes are dependent on primary key but independent of each other No row contains two or more multivalued facts about an entity Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
69
A Set of Tables in 4NF Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
70
Denormalization Creation of normalized relations is important database design goal Processing requirements should also be a goal If tables decomposed to conform to normalization requirements Number of database tables expands Joining larger number of tables takes additional disk input/output (I/O) operations and processing logic Reduces system speed Conflicts among design efficiency, information requirements, and processing speed are often resolved through compromises that may include denormalization Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
71
Denormalization (continued)
Unnormalized tables in a production database tend to have these defects: Data updates are less efficient because programs that read and update tables must deal with larger tables Indexing is much more cumbersome Unnormalized tables yield no simple strategies for creating virtual tables known as views Use denormalization cautiously Understand why—under some circumstances—unnormalized tables are a better choice Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
72
Summary Normalization is a table design technique aimed at minimizing data redundancies First three normal forms (1NF, 2NF, and 3NF) are most commonly encountered Normalization is an important part—but only a part—of the design process Continue the iterative ER process until all entities and their attributes are defined and all equivalent tables are in 3NF Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
73
Summary (continued) A table in 3NF may contain multivalued dependencies that produce either numerous null values or redundant data It may be necessary to convert a 3NF table to the fourth normal form (4NF) by splitting such a table to remove multivalued dependencies Tables are sometimes denormalized to yield less I/O which increases processing speed Database Systems: Design, Implementation, & Management, 6th Edition, Rob & Coronel
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.