Design Theory for Relational Databases

Slides:



Advertisements
Similar presentations
Schema Refinement: Canonical/minimal Covers
Advertisements

Normalisation to 3NF Database Systems Lecture 11 Natasha Alechina.
Spring 2011 Instructor: Hassan Khosravi
Boyce-Codd NF Takahiko Saito Spring 2005 CS 157A.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Closure The closure of {B 1 …B k } under the set of FDs S, denoted by {B 1 …B k } +, is defined as follows: {B 1 …B k } + = {B | any relation satisfies.
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Design Theory for Relational Databases 2015, Fall Pusan National University Ki-Joune Li.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
3 Spring Chapter Normalization of Database Tables.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CPSC 603 Database Systems Lecturer: Laurie Webster II, M.S.S.E., M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 5 Introduction to a First Course in Database Systems.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Databases : Functional Dependencies 2007, Fall Pusan National University Ki-Joune Li.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Databases : Design of Relational Database Schemas 2007, Fall Pusan National University Ki-Joune Li.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh Part 2.
Lecture 11: Functional Dependencies
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
Advanced Normalization
CS422 Principles of Database Systems Normalization
Schedule Today: Next After that Normal Forms. Section 3.6.
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
CS422 Principles of Database Systems Normalization
CPSC-310 Database Systems
Relational Database Design by Dr. S. Sridhar, Ph. D
Relational Database Design
CS 480: Database Systems Lecture 22 March 6, 2013.
3.1 Functional Dependencies
Handout 4 Functional Dependencies
Advanced Normalization
Relational Design Theory
Functional Dependencies and Normalization
Database Normalization
Lecture 6: Design Theory
Design Theory for Relational Databases
Schema Refinement What and why
Normalization Murali Mani.
Chapter 14 & Chapter 15 Normalization Pearson Education © 2009.
Functional Dependencies and Normalization
Functional Dependencies and Normalization
Functional Dependencies and Relational Schema Design
Introduction to Database Systems CSE 444 Lectures 8 & 9 Database Design October 12 & 15, 2007.
Lecture 09: Functional Dependencies, Database Design
Relational Design Theory
Normalization Part II cs3431.
Functional Dependencies and Normalization
Lecture 8: Database Design
Functional Dependencies
Lecture 07: E/R Diagrams and Functional Dependencies
Normalization cs3431.
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
CSC 453 Database Systems Lecture
Asst.Prof.Dr.İlker Kocabaş UBİ502 at
Lecture 6: Functional Dependencies
Chapter 3: Design theory for relational Databases
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Functional Dependencies and Normalization
Lecture 09: Functional Dependencies
CS4222 Principles of Database System
Presentation transcript:

Design Theory for Relational Databases 2019, Fall Pusan National University Ki-Joune Li

Properties of Table When we design relational DB, It is a set of relations. Relations can be derived from UML diagram But NOT all relations are correct. We should carefully observe the properties of table Functional Dependency Key Decomposition of Table

Definition of Functional Dependency FD (Functional Dependency) on a Relation R iff A1 A2 A3 … An  B where A1 , A2 , A3 , … , An , B are attributes of R A set of attributes A1 A2 A3 … An functionally determines B More than one B’s A1 A2 A3 … An  B1 A1 A2 A3 … An  B2 … A1 A2 A3 … An  Bk A1 A2 A3 … An  B1 B2 … Bk A1 A2 A3 … An B1 B2 B3 … Bk

Functional Dependency: Example A Relation Movies (title, year, length, filmType, studioName, starName) (title year)  length (title year)  filmType (title year)  studioName (title year)  length filmType studioName ? (title year)  starName : more than one star in a film It is important to discover FD in a relation It helps to decide the correctness of relation design.

Key Given a relation R Example A set of one or more attributes {A1, A2, A3, …, An} is a KEY iff the set functionally determines all other attributes and no proper subset of {A1, A2, A3, …, An} functionally determines other attributes (Minimal) Primary Key: If a relation has more than one keys, a key is defined as primary key Super Key a set of attributes containing a key No minimality condition Example Movies (title, year, length, filmType, studioName, starName) What are keys ?

How to discover keys From E-R Diagram: Underlined Attributes It means that keys are defined based on the understanding of the real world Example: Movies (title, year, length, filmType, studioName, starName) (year, starName) is not key if a star can make more than one film per year (year, starName) is a key if a star is allowed to make only one film per year Relation (A1, A2, B) for relationship between R1 and R2 One-One One-Many Many-One Many-Many

Rules about Functional Dependencies Functional Dependency An important property of Relation (or Table) Some interesting properties or rules of FD Transitive Rule A  B and B  C then A  C Splitting/Combining Rule A1 A2 A3 …An  B1, A1 A2 A3 …An  B2, …, A1 A2 A3 … An  Bk iff A1 A2 A3 … An  B1 B2 … Bk Trivial FD Rule: Given a FD A1 A2 A3 …An  B FD is trivial if B is one of {A1 A2 A3 …An} : really trivial FD is Completely non-trivial: B is not in {A1 A2 A3 …An}

Rules about Functional Dependencies Trivial Dependency Rule A1 A2 … An  B1 B2 … Bm is equivalent to A1 A2 … An  C1 C2 … Ck if {C1 C2 … Ck }  { B1 B2 … Bm } and for any C  {C1 C2 … Ck }, C  {A1 A2 … An } Example: (year, title)  (studioName, year), (year, title)  studioName Unnecessary B1 B2 B3 … Bm A1 A2 A3 … An C1 C2 C3 … Ck

Armstrong's Axioms Reflexivity: (Trivial FD) If {C1 C2 … Ck }  { B1 B2 … Bm }, then B1 B2 … Bm  C1 C2 … Ck Augmentation: If A1 A2 … An  B1 B2 … Bm , then A1 A2 … An C1 C2 … Ck  B1 B2 … Bm C1 C2 … Ck Transitivity: A1 A2 … An  B1 B2 … Bm and B1 B2 … Bm  C1 C2 … Ck , then A1 A2 … An  C1 C2 … Ck

Closure of Attributes Closure : {A1, A2, … An }+ {A1 A2 … An } is a set of attributes and S is a set of FD Closure of {A1 A2 … An } under FD's in S: set of attributes B such that A1 A2 … An  B That is, under all functional dependencies, every Bi that we derive A1 A2 … An  B1 A1 A2 … An  B2 . . . A1 A2 … An  Bk then {A1 A2 … An }+ = {B1 ,B2 ,… , Bk }

Algorithm to Find Closure Input: Set of Attributes {A1, A2, … An }, and set S of FDs Output: {A1, A2, … An }+ Process 1. Split FDs that each FD has a single attribute on the right. e.g. A1 A2  B C then Split it to A1 A2  B and A1 A2  C 2. Initialize X = {A1, A2, … An } 3. Search for some FD e.g. B1 B2 ... Bm  C such that B1, B2 , .. Bm are in X but C not in X 4. Repeat 3 until no more attribute to add in X Example Given attributes A, B, C, D, E, and F S: A B  C, B C  A D, D  E, and C F  B What is { A, B } + ?

Closure and Key If {A1, A2, … An }+ is the set of all attributes of relation R, then A1, A2, … An is a super key Example: R (A, B, C, D, E) and S: A B  C, B C  A D, D  E then { A, B } + = {A, B, C, D, E} : all attributes of R.  {A, B} is a super key of R. if no attribute can be removed to cover the all attributed, then it is a key. Example: if we remove B from {A, B} then {A} + is not {A, B, C, D, E} . therefore {A, B} is a key

Closing Set of Functional Dependencies Closing Set of FD set S: Basis T of S: If we can derive S from a T, then T is a basis of S. Remove all duplicated FDs Minimal Basis B satisfies three conditions All the FD in B have one attribute in right side If any FD is removed from S, then some FD becomes no longer valid. If for any FD in B, we remove one or more attributes from the left side, then the result is no more a basis Example for a S={AB, AC, BA, BC, CA, CB}, what is the minimal basis of S? {ABC, ACB, BCA}?

Bad Design: Anomalies Bad Design: Example Redundancy Update Anomaly Deletion Anomaly Title Year Length Film Type StudioName Starring Star Wars 1977 124 Color Fox Carrie Fisher Mark Hamill Harrison Ford 1980 Billy Dee Williams Mighty Ducks 1991 104 Disney Emilio Estevez Wayne’s World 1992 95 Paramount Dana Carvey Mike Meyers Update 124 to 123 Delete “Emilio Estevez”

Decomposing Relations: Example R={title, year, length, filmType, studioName, starring}  {title, year, length, filmType, studioName} (=R1), {title, year, starring} (=R2) Redundancy Update Anomaly Deletion Anomaly Title Year Length Film Type StudioName Star Wars 1977 124 Color Fox 1980 Mighty Ducks 1991 104 Disney Wayne’s World 1992 95 Paramount Title Year Starring Star Wars 1977 Carrie Fisher Mark Hamill Harrison Ford 1980 Billy Dee Williams Mighty Ducks 1991 Emilio Estevez Wayne’s World 1992 Dana Carvey Mike Meyers

Decomposing Relations Decomposition of Bad Relation A good way to remove the problem of bad relations Decomposition: Lossless Decomposition { A1 A2 … An }  { B1 B2 … Bm }, {C1 C2 … Ck } such that { B1 B2 … Bm }  {C1 C2 … Ck } = { A1 A2 … An } and { B1 B2 … Bm }  {C1 C2 … Ck }  {}

Lossless Decomposition – Bad Example R1 R2’ Title Starring Star Wars Carrie Fisher Mark Hamill Harrison Ford Billy Dee Williams Mighty Ducks Emilio Estevez Wayne’s World Dana Carvey Mike Meyers Title Year Length Film Type StudioName Star Wars 1977 124 Color Fox 1980 Mighty Ducks 1991 104 Disney Wayne’s World 1992 95 Paramount R2 Title Year Starring Star Wars 1977 Carrie Fisher Mark Hamill Harrison Ford 1980 Billy Dee Williams Mighty Ducks 1991 Emilio Estevez Wayne’s World 1992 Dana Carvey Mike Meyers R  R1 R2’ R = R1 R2

Normal Form: Conditions for Good Relation 1st Normal Form (1NF) 2nd Normal Form (2NF) 3rd Normal Form (3NF) Boyce-Codd Normal Form (BCNF)

1st Normal Form 1NF: Every component of relation should be ATOMIC No Table in component No Set No List etc..

Part of prime attribute Partial Dependency on non-prime attribute 2nd Normal Form 2NF 1NF and None of the non-prime attributes of the relation is functionally dependent on a part of a candidate key Prime Attribute: attribute belonging to key Partial Dependency on non-prime attribute Example Player (Team, Number, TeamAddress, Name, Position) 1NF but not 2NF non-prime attribute Part of prime attribute A C B Partial Dependency on non-prime attribute

Example - 1 Should be decomposed Player (Team, Number, TeamAddress, Name, Position) FD1: Team, Name  Position FD2: Team  TeamAddress Key: {Team, Name}+={Team, Number, TeamAddress, Name, Position} in FD2, TeamAddress (non-prime attribute) is dependent on {Team}, which is a subset of the key and 2NF violation Should be decomposed R1(Team, Number, Name, Position) and R2(Team, TeamAddress) R1 R2 = R

Example - 2 Candidate Key: {Employee, Skill} Not 2ND Current Work Location Jones Typing 114 Main Street Shorthand Whittling Roberts Light Cleaning 73 Industrial Way Ellis Alchemy Juggling Harrison Candidate Key: {Employee, Skill} Not 2ND Partial FD: Employee  Current Work Location Should be decomposed (Employee, Skill), (Employee, Current Work Location)

3rd Normal Form 2NF: Every non-prime attributes of the relation must be non- transitively dependent on every candidate key Example Team (TeamName, Address, ManagerID, ManagerHireDate) FD: TeamNameAddress, TeamNameManagerID (TeamName  )ManagerID  ManagerHireDate Key: {TeamName} 2NF but Not 3NF To be decomposed (TeamName, Address, ManagerID), (Manager SS ID, ManagerHireDate) A C B

Example: 2NF but NOT 3NF Candidate Key: {Tournament, Year} Winner Winner Date of Birth Indiana Invitational 1998 Al Fredrickson 21 July 1975 Cleveland Open 1999 Bob Albertson 28 September 1968 Des Moines Masters Chip Masterson 14 March 1977 Candidate Key: {Tournament, Year} 2NF: No Partial Dependency Not 3ND Transitive Functional Dependency {Tournament, Year}  Winner  Winner Date of Birth Should be decomposed (Tournament, Year, Winner), (Player, Birth date}

Boyce-Codd Normal Form (BCNF) BCNF: For every one of its non-trivial functional dependencies X  Y, X is a super key Remember: nontrivial means Y is not a member of set X. Remember, a superkey is any superset of a key (not necessarily a proper superset) BCNF is slightly stronger than 3NF

Relationship between 1NF, 2NF, 3NF and BCNF

Example: 3NF but NOT BCNF For a relation R(A,B,C,D,E), FD F={A->B, BC->E, ED->A} Keys D C are prime attributes ? {DC}+ = {A,B,C,D,E}  NO, add one attribute from middle e.g. A ? {ADC}+ = {A,B,C,D,E}.  YES. Likewise, we may test ? {ACD}+, {BCD}+, {CDE}+ Keys: {ACD, BCD, CDE} ? BCNF: To check whether every left hand side of F be one of the (super) keys ? 3NF: No transitive dependency and None of the non-prime attributes of the relation is functionally dependent on a part of a candidate key: No FD from a part of prime attribute to non-prime attribute (no attribute is non-prime attribute) Left (Prime) Middle (?) Right (non-Prime) C, D A,B, E (none)

Example: 3NF but NOT BCNF Prof. ID Prof. SS ID Student ID 1078 088-51-0074 31850 37921 1293 096-77-4146 46224 1480 072-21-2223 A table to show the assignment of students Candidate Keys {Prof. ID, Student ID} {Prof. SS ID, Student ID} 1NF 2NF: no partial FD of non-prime attributes on candidate key 3NF: No transitive FD NOT BCNF: Prof. ID  Prof. SS ID : Functional Dependency but not candidate key Should be decomposed (Prof. ID, Student ID), (Prof. ID, Prof. SS ID) Prof.ID Prof. SS ID Student ID

Decomposition Three Conditions Elimination of Anomalies Update Redundancy Deletion Lossless Decomposition Original Relation by Natural Join Preservation of Dependencies Relation with two attributes: Always in BCNF (why?)

BCNF Decomposition Algorithm Input: Relation R0 and set S0 of FDs Output: R1, R2, … Rn such that R0 =R1 R2 … Rn Process 1. Check R0 is in BCNF, then return R0 2. If there is any BCNF violation with X  Y, then compute X+. Then R1= X+ and R2 =has the rest attributes and X 3. Decompose FD set S0 into S1 and S2. 4. Repeat 1-3 until no more BCNF violation. Example Team (TeamName, Address, ManagerID, ManagerHireDate) FD: TeamNameAddress, TeamNameManagerID ManagerID  ManagerHireDate