3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University.

Slides:



Advertisements
Similar presentations
Functional dependencies 1. 2 Outline motivation: update anomalies cause: not expressed constraints on data (FDs) functional dependencies (FDs) definitions.
Advertisements

1 Term 2, 2004, Lecture 2, Normalisation - IntroductionMarian Ursu, Department of Computing, Goldsmiths College Normalisation Introduction.
Normalisation 2 Chapter 4.2 V3.0 Napier University Dr Gordon Russell.
Shantanu Narang.  Background  Why and What of Normalization  Quick Overview of Lower Normal Forms  Higher Order Normal Forms.
 Definition  Components  Advantages  Limitations Contents  Definition Definition  Normal Forms Normal Forms  First Normal Form First Normal Form.
Normalisation to 3NF Database Systems Lecture 11 Natasha Alechina.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Functional Dependencies and Normalization for Relational Databases.
Chapter 8 Normal Forms Based on Functional Dependencies Deborah Costa Oct 18, 2007.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
1 Database Design Theory Which tables to have in a database Normalization.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Databases 6: Normalization
1 Database Theory and Methodology. 2 The Good and the Bad So far we have not developed any measure of “goodness” to measure the quality of the design,
Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Normalization B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Computing & Information Sciences Kansas State University Monday, 13 Oct 2008CIS 560: Database System Concepts Lecture 18 of 42 Monday, 13 October 2008.
Database Group, Georgia Tech 1 Normalization. Database Group, Georgia Tech 2 Normalization What it’s all about Given a relation, R, and a set of functional.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Normalization. 2 Objectives u Purpose of normalization. u Problems associated with redundant data. u Identification of various types of update anomalies.
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
Database Design (Normalizations) DCO11310 Database Systems and Design By Rose Chang.
Lecture 5: Functional dependencies and normalization Jose M. Peña
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Design Theory for Relational Databases 2015, Fall Pusan National University Ki-Joune Li.
CSC271 Database Systems Lecture # 28.
Functional Dependencies An example: loan-info= Observe: tuples with the same value for lno will always have the same value for amt We write: lno  amt.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
Normalization Ioan Despi 2 The basic objective of logical modeling: to develop a “good” description of the data, its relationships and its constraints.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
Functional Dependencies and Normalization Jose M. Peña
3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
1 Lecture 6: Schema refinement: Functional dependencies
11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003.
Lecture No 14 Functional Dependencies & Normalization ( III ) Mar 04 th 2011 Database Systems.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
What is normalization ? Proposed by Codd in 1972 Takes a relation through a series of steps to certify whether it satisfies a certain normal form Initially.
Lecture 3 Functional Dependency and Normal Forms Prof. Sin-Min Lee Department of Computer Science.
Databases Illuminated
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Dr Gordon Russell, Napier University Normalisation 2 - V2.0 1 Normalisation 2 Unit 3.2.
IST Database Normalization Todd Bacastow IST 210.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
Relational Database Design
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
Objectives of Normalization  To create a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among.
ITD1312 Database Principles Chapter 4C: Normalization.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Functional Dependency and Normalization
Advanced Normalization
Advanced Normalization
Normalization.
CS 405G: Introduction to Database Systems
Presentation transcript:

3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable form for enforcing F. Decompose relation R into relations, R 1,..., R k, with associated functional dependencies, F 1,..., F k, such that R 1,..., R k are in a more desirable form, 3NF or BCNF. While decomposing R, make sure to preserve the dependencies, and make sure not to lose information.

Primitive Domains FLT-SCHEDULE flt# weekday airline dtime from atime to DL242 MO WE FR DELTA 10:40 ATL 12:30 BOS SK912 SA SU SAS 12:00 CPH 15:30 JFK AA242 MO FR AA 08:00 CHI 10:10 ATL Attributes must be defined over domains with atomic values FLT-SCHEDULE flt# weekday airline dtime from atime to DL242 MO DELTA 10:40 ATL 12:30 BOS SK912 SA SAS 12:00 CPH 15:30 JFK AA242 MO AA 08:00 CHI 10:10 ATL DL242 WE DELTA 10:40 ATL 12:30 BOS DL242 FR DELTA 10:40 ATL 12:30 BOS SK912 SU SAS 12:00 CPH 15:30 JFK AA242 FR AA 08:00 CHI 10:10 ATL

Bad Database Design - redundancy of fact FLIGHTS flt# date airline plane# DL242 10/23/00 Delta k-yo DL242 10/24/00 Delta t-up DL242 10/25/00 Delta o-ge AA121 10/24/00 American p-rw AA121 10/25/00 American q-yg AA411 10/22/00 American h-fe redundancy: airline name repeated for same flight inconsistency: when airline name for a flight changes, it must be changed many places

Bad Database Design - fact clutter insertion anomalies: how do we represent that SK912 is flown by Scandinavian without there being a date and a plane assigned? deletion anomalies: cancelling AA411 on 10/22/00 makes us lose that it is flown by American. update anomalies: if DL242 is flown by Sabena, we must change it everywhere. FLIGHTS flt# date airline plane# DL242 10/23/00 Delta k-yo DL242 10/24/00 Delta t-up DL242 10/25/00 Delta o-ge AA121 10/24/00 American p-rw AA121 10/25/00 American q-yg AA411 10/22/00 American h-fe-65748

Bad Database Design - information loss FLIGHTS flt# date airline plane# DL242 10/23/00 Delta k-yo DL242 10/24/00 Delta t-up DL242 10/25/00 Delta o-ge AA121 10/24/00 American p-rw AA121 10/25/00 American q-yg AA411 10/22/00 American h-fe FLIGHTS-AIRLINE flt# airline DL242 Delta AA121 American AA411 American DATE-AIRLINE-PLANE date airline plane# 10/23/00 Delta k-yo /24/00 Delta t-up /25/00 Delta o-ge /24/00 American p-rw /25/00 American q-yg /22/00 American h-fe-65748

Bad Database Design - information loss FLIGHTS flt# date airline plane# DL242 10/23/00 Delta k-yo DL242 10/24/00 Delta t-up DL242 10/25/00 Delta o-ge AA121 10/24/00 American p-rw AA121 10/25/00 American q-yg AA211 10/22/00 American h-fe AA411 10/24/00 American p-rw AA411 10/25/00 American q-yg AA411 10/22/00 American h-fe DATE-AIRLINE-PLANE date airline plane# 10/23/00 Delta k-yo /24/00 Delta t-up /25/00 Delta o-ge /24/00 American p-rw /25/00 American q-yg /22/00 American h-fe FLIGHTS-AIRLINE flt# airline DL242 Delta AA121 American AA411 American information loss: we polluted the database with false facts; we can’t find the true facts.

Bad Database Design - dependency loss DATE-AIRLINE-PLANE date airline plane# 10/23/00 Delta k-yo /24/00 Delta t-up /25/00 Delta o-ge /24/00 American p-rw /25/00 American q-yg /22/00 American h-fe FLIGHTS-AIRLINE flt# airline DL242 Delta AA121 American AA411 American dependency loss: we lost the fact that (flt#, date)  plane#

Good Database Design no redundancy of FACT (!) no inconsistency no insertion, deletion or update anomalies no information loss no dependency loss FLIGHTS-DATE-PLANE flt# date plane# DL242 10/23/00 k-yo DL242 10/24/00 t-up DL242 10/25/00 o-ge AA121 10/24/00 p-rw AA121 10/25/00 q-yg AA411 10/22/00 h-fe FLIGHTS-AIRLINE flt# airline DL242 Delta AA121 American AA411 American

Let X and Y be sets of attributes in R Y is functionally dependent on X in R iff for each x  R.X there is precisely one y  R.Y Y is fully functional dependent on X in R if Y is functional dependent on X and Y is not functional dependent on any proper subset of X We use keys to enforce functional dependencies in relations: X  Y X Y Functional Dependencies and Keys

FLIGHTS flt# date airline plane# FLIGHTS flt# date airline plane# FLIGHTS flt# date airline plane# Functional Dependencies and Keys plane# is not determined by flt# alone airline is not determined by flt# and date the FLIGHT relation will not allow the FDs to be enforced by keys

Functional Dependencies and Keys real worlddatabase name address cust# name address Consider the meaning cust# name address combined separate

How to Compute Meaning - Armstrong’s inference rules Rules of the computation: –reflexivity: if Y  X, then X  Y –Augmentation: if X  Y, then WX  WY –Transitivity: if X  Y and Y  Z, then X  Z Derived rules: –Union: if X  Y and X  Z, the X  YZ –Decomposition: if X  YZ, then X  Y and X  Z –Pseudotransitivity: if X  Y and WY  Z, then XW  Z Armstrong’s Axioms: –sound –complete

Overview of NFs NF 2 1NF 2NF 3NF BCNF

Normal Forms - definitions NF: non-first normal form 1NF: R is in 1NF. iff all domain values are atomic2 2NF: R is in 2. NF. iff R is in 1NF and every nonkey attribute is fully dependent on the key 3NF: R is in 3NF iff R is 2NF and every nonkey attribute is non-transitively dependent on the key BCNF: R is in BCNF iff every determinant is a candidate key Determinant: an attribute on which some other attribute is fully functionally dependent.

Example of Normalization flt# date plane# airline from to miles FLT-INSTANCE flt# date plane# airline from to miles

flt# date plane# airline from to miles flt# date plane# flt# airline from to miles from to miles flt# airline from to flt# date plane# Example of Normalization 1NF: 3NF & BCNF: 2NF:

3NF that is not BCNF A B C Candidate keys:{A,B} and {A,C} Determinants:{A,B} and {C} A decomposition: Lossless, but not dependency preserving! A B C R C B R1R1 A C R2R2

When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF. 3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys –i.e. composite candidate keys with at least one attribute in common. BCNF is based on the concept of a determinant. –A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent. A relation is in BCNF is, and only if, every determinant is a candidate key.

The theory Consider the following relation and determinants. Example 1. Given R(a,b,c,d) a,c -> b,d a,d -> b To be in BCNF, all valid determinants must be a candidate key. In the relation R, a,c->b,d is the determinate used, so the first determinate is fine. Example 2. If {a, b} is not a key, a,d->b suggests that a,d can be the primary key, which would determine b. However this would not determine c. This is not a candidate key, and thus R is not in BCNF.

Example 1 Patient No Patient NameAppointment IdTimeDoctor 1John009:00Zorro 2Kerr009:00Killer 3Adam110:00Zorro 4Robert013:00Killer 5Zane114:00Zorro

Two possible keys DB(Patno,PatName,appNo,time,doctor) Determinants: –Patno -> PatName –Patno,appNo -> Time,doctor –Time -> appNo Two options for 1NF primary key selection: – DB(Patno,PatName,appNo,time,doctor) (example 1a) – DB(Patno,PatName,appNo,time,doctor) (example 1b)

Example 1a DB(Patno,PatName,appNo,time,doctor) No repeating groups, so in 1NF 2NF – eliminate partial key dependencies: –DB(Patno,appNo,time,doctor) –R1(Patno,PatName) 3NF – no transient dependences so in 3NF Now try BCNF.

BCNF Every determinant is a candidate key DB(Patno,appNo,time,doctor) R1(Patno,PatName) Is determinant a candidate key? –Patno -> PatName Patno is present in DB, but not PatName, so irrelevant.

Continued… DB(Patno,appNo,time,doctor) R1(Patno,PatName) –Patno,appNo -> Time,doctor All LHS and RHS present so relevant. Is this a candidate key? Patno,appNo IS the key, so this is a candidate key. –Time -> appNo Time is present, and so is appNo, so relevant. Is this a candidate key? If it was then we could rewrite DB as: DB(Patno,appNo,time,doctor) This will not work, so not BCNF.

Rewrite to BCNF DB(Patno,appNo,time,doctor) R1(Patno,PatName) BCNF: rewrite to DB(Patno,time,doctor) R1(Patno,PatName) R2(time,appNo) time is enough to work out the appointment number of a patient. Now BCNF is satisfied, and the final relations shown are in BCNF

Example 1b DB(Patno,PatName,appNo,time,doctor) No repeating groups, so in 1NF 2NF – eliminate partial key dependencies: –DB(Patno,time,doctor) –R1(Patno,PatName) –R2(time,appNo) 3NF – no transient dependences so in 3NF Now try BCNF.

BCNF Every determinant is a candidate key DB(Patno,time,doctor) R1(Patno,PatName) R2(time,appNo) Is determinant a candidate key? –Patno -> PatName Patno is present in DB, but not PatName, irrelevant. –Patno,appNo -> Time,doctor Not all LHS present so not relevant –Time -> appNo Time is present, but not appNo, so not relevant. –Relations are in BCNF.

Summary - Example 1 This example has demonstrated three things: BCNF is stronger than 3NF, relations that are in 3NF are not necessarily inBCNF BCNF is needed in certain situations to obtain full understanding of the data model there are several routes to take to arrive at the same set of relations in BCNF. –Unfortunately there are no rules as to which route will be the easiest one to take.

Example 2 Grade_report(StudNo,StudName,(Major,Adviser, (CourseNo,Ctitle,InstrucName,InstructLocn,Grade ))) Functional dependencies –StudNo -> StudName –CourseNo -> Ctitle,InstrucName –InstrucName -> InstrucLocn –StudNo,CourseNo,Major -> Grade –StudNo,Major -> Advisor –Advisor -> Major

Example 2 cont... Unnormalised Grade_report(StudNo,StudName,(Major,Advisor, (CourseNo,Ctitle,InstrucName,InstructLocn,Grade ))) 1NF Remove repeating groups –Student(StudNo,StudName) –StudMajor(StudNo,Major,Advisor) –StudCourse(StudNo,Major,CourseNo, Ctitle,InstrucName,InstructLocn,Grade)

Example 2 cont... 1NF Student(StudNo,StudName) StudMajor(StudNo,Major,Advisor) StudCourse(StudNo,Major,CourseNo, Ctitle,InstrucName,InstructLocn,Grade) 2NF Remove partial key dependencies Student(StudNo,StudName) StudMajor(StudNo,Major,Advisor) StudCourse(StudNo,Major,CourseNo,Grade) Course(CourseNo,Ctitle,InstrucName,InstructLoc n)

Example 2 cont... 2NF Student(StudNo,StudName) StudMajor(StudNo,Major,Advisor) StudCourse(StudNo,Major,CourseNo,Grade) Course(CourseNo,Ctitle,InstrucName,InstructLoc n) 3NF Remove transitive dependencies Student(StudNo,StudName) StudMajor(StudNo,Major,Advisor) StudCourse(StudNo,Major,CourseNo,Grade) Course(CourseNo,Ctitle,InstrucName) Instructor(InstructName,InstructLocn)

Example 2 cont... BCNF Every determinant is a candidate key –Student : only determinant is StudNo –StudCourse: only determinant is StudNo,Major –Course: only determinant is CourseNo –Instructor: only determinant is InstrucName –StudMajor: the determinants are StudNo,Major, or Advisor Only StudNo,Major is a candidate key.

Example 2: BCNF BCNF Student(StudNo,StudName) StudCourse(StudNo,Major,CourseNo,Grade ) Course(CourseNo,Ctitle,InstrucName) Instructor(InstructName,InstructLocn) StudMajor(StudNo,Advisor) Adviser(Adviser,Major)

Problems BCNF overcomes If the record for student 456 is deleted we lose not only information on student 456 but also the fact that DARWIN advises in BIOLOGY we cannot record the fact that WATSON can advise on COMPUTING until we have a student majoring in COMPUTING to whom we can assign WATSON as an advisor. STUDENTMAJORADVISOR 123PHYSICSEINSTEIN 123MUSICMOZART 456BIOLOGYDARWIN 789PHYSICSBOHR 999PHYSICSEINSTEIN

Split into two tables In BCNF we have two tables STUDENTADVISOR 123EINSTEIN 123MOZART 456DARWIN 789BOHR 999EINSTEIN ADVISORMAJOR EINSTEINPHYSICS MOZARTMUSIC DARWINBIOLOGY BOHRPHYSICS

Returning to the ER Model Now that we have reached the end of the normalisation process, you must go back and compare the resulting relations with the original ER model –You may need to alter it to take account of the changes that have occurred during the normalisation process Your ER diagram should always be a prefect reflection of the model you are going to implement in the database, so keep it up to date! –The changes required depends on how good the ER model was at first!

Video Library Example A video library allows customers to borrow videos. Assume that there is only 1 of each video. We are told that: video(title,director,serial) customer(name,addr,memberno) hire(memberno,serial,date) title->director,serial serial->title serial->director name,addr -> memberno memberno -> name,addr serial,date -> memberno

What NF is this? No repeating groups therefore at least 1NF 2NF – A Composite key exists: hire(memberno,serial,date) –Can memberno be found with just serial or date? –NO, therefore the relations are already in 2NF. 3NF?

Test for 3NF video(title,director,serial) –title->director,serial –serial->director Director can be derived using serial, and serial and director are both non keys, so therefore this is a transitive or non-key dependency. Rewrite video…

Rewrite for 3NF video(title,director,serial) –title->director,serial –serial->director Becomes: video(title,serial) serial(serial,director)

Check BCNF Is every determinant a candidate key? video(title,serial) - Determinants are: – title->director,serial Candidate key – serial->title Candidate key –video in BCNF serial(serial,director) Determinants are: –serial->director Candidate key –serial in BCNF

customer(name,addr,memberno) Determinants are: –name,addr -> memberno Candidate key –memberno -> name,addr Candidate key –customer in BCNF hire(memberno,serial,date) Determinants are: –serial,date -> memberno Candidate key –hire in BCNF Therefore the relations are also now in BCNF.

R( A B C D) A B C D * * * Q1. For which keys R is 2NF? key {AD} R is 2NF key {BD} R is 2NF key {CD} R is not 2NF Q2. For which keys R is 3NF? Since prime-attributes are {A, B, C, D} R with key {AD} is 3NF R with key {BD} is 3NF