Normalization: Kroenke Chapters 3 and 4. A relation is categorized by one of several normal forms. An aid to design helps characterize relations that.

Slides:



Advertisements
Similar presentations
Dependency preservation, 3NF revisited and BCNF
Advertisements

Normal forms - 1NF, 2NF and 3NF
Chapter 5 Normalization of Database Tables
Chapter 5 Normalization of Database Tables
Designing tables from a data model (Chapter 6) One base table for each entity.
Functional Dependencies and Normalization for Relational Databases
Shantanu Narang.  Background  Why and What of Normalization  Quick Overview of Lower Normal Forms  Higher Order Normal Forms.
 Definition  Components  Advantages  Limitations Contents  Definition Definition  Normal Forms Normal Forms  First Normal Form First Normal Form.
Normalisation to 3NF Database Systems Lecture 11 Natasha Alechina.
© 2002 by Prentice Hall 1 SI 654 Database Application Design Winter 2003 Dragomir R. Radev.
Wei-Pang Yang, Information Management, NDHU More on Normalization Unit 18 More on Normalization ( 表格正規化探討 ) 18-1.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
Chapter 8 Normal Forms Based on Functional Dependencies Deborah Costa Oct 18, 2007.
Fundamentals, Design, and Implementation, 9/e Chapter 4 The Relational Model and Normalization.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 3-1 COS 346 Day 5.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Database Design Theory Which tables to have in a database Normalization.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Chapter 3 The Relational Model and Normalization
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
Copyright © Curt Hill Schema Refinement III 4 th NF and 5 th NF.
(C) 2000, The University of Michigan 1 Database Application Design Handout #4 January 28, 2000.
Chapter 5 The Relational Model and Normalization David M. Kroenke Database Processing © 2000 Prentice Hall.
Fundamentals, Design, and Implementation, 9/e. Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 4/2 Copyright.
IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu
Logical Database Design ( 補 ) Unit 7 Logical Database Design ( 補 )
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
Chapter 4 The Relational Model and Normalization.
Database Management COP4540, SCS, FIU Relation Normalization (Chapter 14)
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to.
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall, Modified by Dr. Mathis 3-1 David M. Kroenke’s Chapter Three: The Relational.
1 Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442.
FEN Quality checking table design: Design Guidelines Normalisation Table Design Is this OK?
Functional Dependencies and Normalization for Relational Databases.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
Normalization Ioan Despi 2 The basic objective of logical modeling: to develop a “good” description of the data, its relationships and its constraints.
Further Normalization I
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
CSE314 Database Systems Basics of Functional Dependencies and Normalization for Relational Databases Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E.
Lecture No 14 Functional Dependencies & Normalization ( III ) Mar 04 th 2011 Database Systems.
M1G Introduction to Database Development 4. Improving the database design.
1 Functional Dependencies and Normalization Chapter 15.
ABSTRACT OF FIRST LECTURE then … the second lesson.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Ch 7: Normalization-Part 1
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
Objectives of Normalization  To create a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among.
Advanced Database System
5. The Relational Model and Normalization 5.1 Relational Model5.2 Normalization 5.3 1NF to 5NF 5.4 Domain/Key Normalization 5.5 Synthesis of Relation 5.6.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
Normalization (Database Design)
Functional Dependency and Normalization
A brief summary of database normalization
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
The Relational Model and Normalization
Database Management systems Subject Code: 10CS54 Prepared By:
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Unit 7 Normalization (表格正規化).
Chapter 7a: Overview of Database Design -- Normalization
Presentation transcript:

Normalization: Kroenke Chapters 3 and 4

A relation is categorized by one of several normal forms. An aid to design helps characterize relations that experience anomalies in update operations Higher normal forms TEND to be better design, but not guaranteed.

Remember the one fact-one place theme! Deletion anomaly: Deleting 1 fact inadvertently deletes another. Insertion anomaly: inserting 1 fact not possible without inserting another seemingly unrelated fact.

First Normal Form – 1NF A relation is 1NF if each attribute is atomic That is, attributes are simple types (int, float, string, char, etc)

Second Normal Form – 2NF How about this as a base table? Primary key

Definition: R is a relation; X and Y are attributes of R. Y is functionally dependent on X iff each X-value in R has precisely one Y-value in R associated with it. A common notation is X  Y.

Example: In the supplier’s S table, Status, City, and Name are functionally dependent on S#. In the SP table, Qty is functionally dependent on the combined attributes of S# and P# S# Status City Name S# P# Qty

City and status are not functionally dependent on each other. There may be several entries containing ‘London’ but different status values. There may be several entries containing a status of 50 but have different cities.

QTY is not functionally dependent on either P# or S#. S1 might have multiple QTY values for different parts Similar for P1

Def: Y is Fully Functionally Dependent on X if X  Y but Y is not functionally dependent on any proper subset of X. In S, (S#, Status)  City -- but not fully because S#  City in SP: (S#, P#)  Qty -- fully because neither S# nor P# by itself determines Qty. The functional dependence requires BOTH S# and P#.

Semantic notation. Must understand meaning of data, NOT a consequence of table data. For example, suppose that each city in S has the same status. Is it coincidence or by design?

Why is this important? What if we combined the relations S and SP into a single relation, First as in a few slides previous? First(S#, P#, Status, City, QTY) Underlined attributes represent the primary key.

Cannot enter fact that a supplier is located in a city unless that supplier already supplies some part. Why? Insertion Anomalies:

Deletion Anomalies Suppose S3 no longer supplies P2. Delete (S3, P2, 10, Paris, 200) if that is the ONLY part S# supplied, you lose fact that S3 is in Paris.

Update Anomalies S1 moves from London to Amsterdam. May have to update many entries. Violates the “one fact, one place” guideline These problems are caused by dependences on a proper subset of the primary key.

See also Kroenke’s example on page 95 and the text on page 96.

A relation is 2NF iff it is 1NF and every non-key attribute is fully functionally dependent on the primary key. There are no attributes dependent on a proper subset of the primary key. Second Normal Form (2NF)

Table First is NOT 2NF. Some nonkey attributes are not fully dependent on the primary key (S#, P#). Some are dependent on S# only The S and SP tables ARE 2NF. They are a better design in this case. Similar example in Fig 3-10 on page 106 of text.

How about this table? Does it contain redundancy? Are there update anomalies?

Suppose a supplier status is determined by the supplier’s city. That is, City  Status. Since also S#  City then S#  status is a result of these dependencies. A Transitive dependency exists as shown below. Transitive dependencies S# City Status

Similarly, a Housing table that links a student with a dorm and a residence fee would also likely have a transitive dependence. SID dorm Fee

Cannot state fact that a supplier in Rome must have a status of 50 unless there is already a supplier there. Cannot state fact a dorm has a specific cost unless there is already a student there. Insertion anomalies:

Deletion anomalies Delete (S5, 30, Athens) If it’s the ONLY Athens, lose fact that status for Athens must be 30. Delete (100, Randolph, $3200) from the Housing table. If that’s the only “Randolph” then you lose the connection between dorm and cost.

Update anomalies “Change status of London supplier” may mean multiple updates. Violates the “one fact” – “one place” rule. i.e. that each fact should be stored in one place.

A relation is 3NF iff it is 2NF and every non-key attribute is nontransitively dependent on the primary key. i.e. non- key attributes are mutually independent. Again, it’s a consequence of the meaning of the data, not the data itself. Third Normal Form (3NF)

Suppose all London suppliers had a status of 50. Is that coincidence? Is it by design?

Question: Is 3NF better than 2NF? Maybe. In the cases presented here, probably so. An employee table where EmpID  Address  ZipCode is not 3NF. We may not care about Address  Zip_Code unless it’s a UPS or Post Office application.

Dividing a table into 2 or more tables to achieve a higher normal form. Previously we divided First into tables S and SP to achieve 2NF. Table Decomposition

Now we find that S is not 3NF, so we should decompose S into two tables. We have options: 1. SS(S#, Status) and CS(City, Status) 2. SC(S#, City) and SS(S#, Status) or 3. SC(S#, City) and CS(City, Status) Which is best?

Need to ask: Does the decomposition result in a loss of information? For example, can we still relate the attributes that have been separated into two tables? Are the two relations independent of each other?

Option 1: SS(S#, Status) and CS(City, Status) Cannot get the city of a supplier. Can you see why?

Option 2: SC(S#, City) and SS(S#, Status) Relations not independent. If two suppliers are in the same city, must make sure they have the same status. Requires monitoring of changes, possibly the use of triggers. Extra work.

CAN get the status of a city but ONLY if there’s a supplier there. Otherwise there’s a loss of information. Can’t store the status of a city unless there’s a supplier there.

Option 3: SC(S#, City) and CS(City, Status) Two relations are independent. No loss of information Best option

Decompose the Housing table into one of 1. SD(SID, Dorm) and DF(Dorm, Fee) 2. SD(SID, Dorm) and SF(SID, Fee) 3. SF(SID, Fee) and DF(Dorm, Fee) Which is better? Construct a similar argument SID dorm Fee

Best decomposition frequently follows the FD arrows. This is a guideline, not an absolute rule.

Consider SMA(SID, MID, AID) where a student has one advisor for a major and an advisor advises for one major. This table is 3NF since there is only one non-key attribute. S2 drops Physics and you may lose the fact that A3 advises for Physics. Determinants SIDMIDAID S1 S2 Math Phys Math Phys A1 A2 A1 A3 AID MID SID

Def: If Y is fully functionally dependent on X then X is a determinant. Def: A tuple is an entry from a relation. The name is rooted in the historical development by E.J. Codd who used mathematical models to describe relations. Def: An attribute is a candidate key if that attribute uniquely identifies a tuple. A primary key is chosen from a list of candidates keys. Every candidate key is a determinant.

A relation is BCNF if every determinant is a candidate key. SMA is NOT BCNF since AID is a determinant but not a candidate key. Boyce-Codd Normal Form (BCNF)

Possible decompositions: SA(SID, AID) and AM(AID, MID) No Loss but relations are not independent. How do you “Find the major of S1”. It requires a search of two tables which seems somewhat counterintuitive.

SM(SID, MID) and AM(AID, MID). Cannot get advisor of a student.

SA(SID, AID) and SM(SID, MID). Cannot get who advises what. None of the three possible decompositions seems satisfactory.

Solution: Look at bigger picture (E-R diagram) Student (S) Advisor (A) Major (M) redundant Relations: S, M, A (With a foreign key matching the primary key in M), SM, and SA to implement the many-many relationships

NOTE: With BOTH SM and SA, it is possible for inconsistency to occur. Could have (S1, M5) in SM; (S1, A3) in SA; and have M8 as a foreign key for advisor A3 in the Advisor table. Would need software or triggers to assure consistency which adds to overhead.

On the other hand, relationship between S and M is derived from relationships between S and A and between A and M. This provides an argument that the relationship between S and M should not be shown as a separate relationship

Of course, then the fact that “ a student is majoring is something” is NOT explicitly stored. The design is based on business rules which we assume to be correct. May not always be the case.

Maybe the business rule that states “a student is majoring in something” is flawed. Allows a student to choose a major without having an advisor first.

Perhaps a better rule is “a student has an advisor, which determines the major”. It would be a model that forces student to choose an advisor, which may be a better rule since many students do NOT seek out advisors in timely fashion.

Multivalued Dependencies (MVDs) Consider SMA(Student, Major, Activity) A student can have multiple majors and participate in multiple activities.

This relation is BCNF vacuously (There are no determinants) Can’t store the major of a student unless that student has an activity. StudentMajorActivity S1 S2 Math Phys Math Swimming Football Baseball

Another example CIX(Courses, Instructor, teXt) To implement training programs or corporate sponsored courses. Courses taught by many instructors and an instructor can lead many courses. Similar for text and courses Instructors do NOT choose texts There are NO determinants in this table Can’t store the text for a course unless there is an instructor.

Yet another example on page

Def: Suppose A, B, and C are attributes of a relation. A Multivalued dependency (MVD) A  B holds in R if for each A there are multiple B values which are independent of any C values.

A relation is 4NF if it is BCNF and has no multi-valued dependencies. SMA is NOT 4NF. Would decompose into SM and SA. No loss since there’s no connection between M and A. Fourth Normal Form (4NF)

CIX is NOT 4NF. Would decompose into CI and CX. No loss since there’s no connection between I and X.

There is a 5NF but we will not cover. They rarely occur in practice.

Landmark paper: Ronald Fagin, “A Normal Fsorm for Relational Databases That Is Based on Domains and Keys”, ACM Transactions on Database Systems, September Domain/Key Normal Form (DK/NF )

In this paper he Defined DK/NF Proved that a relation in DK/NF has NO modification anomalies A relation having no modification anomalies must be in DK/NF

What is it? First, some definitions: Constraint: a rule governing static values of attributes. e.g. rules such as 0 =0; functional dependencies; multivalued dependencies.

key: unique identifier of a tuple. domain: description of an attribute’s allowable values

Def: A relation is DK/NF (Domain Key Normal Form) if every constraint is a logical consequence of the definition of its keys and domains. Without an example this probably makes little sense.

Ex. (from a previous edition of Kroenke): Track students, faculty, and who advises whom. Possible relations: Student(SID, Sname, FID) and Faculty(FID, Fname, FacStatus)

FacStatus=0 or 1 (undergrad/graduate); FID begins with 1; SID must not begin with 1; SID of grad students begins with 9. Only graduate faculty can advise graduate students. Constraints

Alternative constraint statement: “Grad student must be advised by Grad Faculty”  “If Sid starts with 9 then FacultyStatus of the advisor must be 1” Difficult to enforce through the database design since the relevant data lies in two distinct tables. Each relation is still 1NF through 4NF

Decomposing tables: Kroenke discussed Themes. Each relation has a theme. 3 themes here: Faculty grad advising undergrad advising

Possible Tables: Faculty(FID, Fname, FacStatus) G-ADV(GSID, Sname, GFID) UG-ADV(UGSID, Sname, FID)

FID in CDDD where C=1; D=decimal digit This is a generic notation for our purposes here. In Access you’d write: FID like “1###”; In SQL Server you’d write: FID like “1[0-9][0-9][0-9]” (See F-Adv table in the university database) Domain Definitions

Fname in Char(30) FacStatus in [0, 1] GSID in CDDD where C=9; D=Decimal digit UGSID in CDDD where C!=1; C!=9, D= decimal digit See G-Adv and UG-ADV tables for exact syntax

Sname in CHAR(30) GFID in {Select FID of Faculty, where FacStatus=1} (assuming the DBMS supports this type of constraint) There is a trigger in G-Adv to implement the equivalent of this.

All constraints are met by enforcing key and domain restrictions. i.e. it is DK/NF This relation is guaranteed to have NO modification anomalies. Relations & Keys:

A company hires student interns to work on various projects under the guidance of company employees. Semantics are as follows: A student intern can work on several projects and a project can use several interns. A project can have several team leaders (or co-leaders) which are company employees but an employee works on only one project. For each project on which an intern participates there is one team leader to which that intern must report. Consider a table, IPE, consisting of 3 attributes: Intern ID (I#), Project ID (P#), and employee ID (E#). So for example, if (I4, P5, E3) is an entry in this table then it means that Intern I4 is working on project P5 and must report to team leader E3 for that project. Examples:

(I#, P#)  E# and E#  P# Primary key should be (I#, P#) violates BCNF List FDs; what should the primary key be? Find the lowest normal form that is violated?

(I#, P#)  E# and E#  P# Consider three possible decompositions of the above relation as follows. Primary keys are underlined. Table IE(I#, E#) and IP(I#, P#). IE might contain (I2, E4) and (I2, E6); IP might contain (I2, P3) and (I2, P5); What project does E4 work on? Lose the project to which an employee is assigned.

(I#, P#)  E# and E#  P# Table IE(I#, E#) and EP(E#, P#) Since E#  P# we can get the project that an intern is working on through a join of these two tables.

(I#, P#)  E# and E#  P# Table EP(E#, P#) and IP(I#, P#) EP might contain (E2, P4) and (E4, P4); IP might contain (I2, P4) To which employee does I2 report? Lose the employee to which an intern reports.

Assume the following scenario in a university in which a student is paid by a department to do work for a faculty member. Semantics are as follows: A department can hire many students and a faculty member can have many students working for him/her. Each student can work for only one faculty member and is paid through the faculty member’s department budget. A department has many faculty members but each faculty member is a member of one department. There is a table consisting of 3 attributes: Student ID (SID), Department ID (DID), and Faculty ID (FID). So for example, if (S4, D5, F3) is an entry in this table then it means that Student S4 is working for faculty member F3 who, in turn, is a member of department D5.

SID  FID  DID Primary key should be SID Violates 3NF List FDs; what should the primary key be? Find the lowest normal form that is violated?

SID  FID  DID Consider three possible decompositions of the above relation as follows: SF(SID,FID) and SD(SID, DID): By doing a join between these tables, you can get the department of a faculty member But only IF the faculty member has a student employee. Also, tables are not independent

SID  FID  DID SF(SID, FID) and FD(FID, DID): By doing a join between these tables, you can get the department that is paying the student.

SID  FID  DID FD(FID, DID) and SD(SID, DID) Can you construct an example that shows you may not get the faculty member for whom a student is working?

From a previous exam An organization needs to track many ongoing projects, the department responsible for each project, and which employees are project leaders. Rules are as follows Each project is the responsibility of a single department. Each project has one project leader who is a member of the department responsible for the project. A department can have many employees and be responsible for many projects. An employee can be a project leader for several projects. Each employee is assigned to one department. Proceed as in the previous slides