CS 222 Database Management System Spring 2010-11 Lecture 4 Database Design Theory Korra Sathya Babu Department of Computer Science NIT Rourkela.

Slides:



Advertisements
Similar presentations
Schema Refinement: Canonical/minimal Covers
Advertisements

primary key constraint foreign key constraint
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Announcements Read 6.1 – 6.3 for Wednesday Project Step 3, due now Homework 5, due Friday 10/22 Project Step 4, due Monday Research paper –List of sources.
Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by.
Manipulating Functional Dependencies Zaki Malik September 30, 2008.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Database Management COP4540, SCS, FIU Functional Dependencies (Chapter 14)
Properties of Armstrong’s Axioms Soundness All dependencies generated by the Axioms are correct Completeness Repeatedly applying these rules can generate.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Functional Dependencies - Example
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Multivalued Dependency Prepared by Tomasz Kaciak CS157A.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
CMSC424: Database Design Instructor: Amol Deshpande
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Functional Dependencies CS 186, Spring 2006, Lecture 21 R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another.
Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Functional Dependencies Prof. Yin-Fu Huang CSIE, NYUST Chapter 11.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
FUNCTIONAL DEPENDENCIES. Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant Information.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Design Theory for Relational Databases 2015, Fall Pusan National University Ki-Joune Li.
Functional Dependencies An example: loan-info= Observe: tuples with the same value for lno will always have the same value for amt We write: lno  amt.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
Ihr Logo Fundamentals of Database Systems Fourth Edition El Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Ihr Logo Fundamentals of Database Systems Fourth Edition El Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
1 Lecture 6: Schema refinement: Functional dependencies
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#16: Schema Refinement & Normalization.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Minimum Cover of F. Minimal Cover for a Set of FDs Minimal cover G for a set of FDs F: –Closure of F = closure of G. –Right hand side of each FD in G.
Deanship of Distance Learning Avicenna Center for E-Learning 1 Session - 7 Sequence - 2 Normalization Functional Dependencies Presented by: Dr. Samir Tartir.
1 Functional Dependencies. 2 Motivation v E/R  Relational translation problems : –Often discover more “detailed” constraints after translation (upcoming.
Functional Dependencies R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another. Thomas Hobbes ( )
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Databases 1 Sixth lecture. 2 Functional Dependencies X -> A is an assertion about a relation R that whenever two tuples of R agree on all the attributes.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
3.1 Functional Dependencies
Handout 4 Functional Dependencies
Schema Refinement & Normalization Theory
Functional Dependencies
Normalization cs3431.
Chapter 19 (part 1) Functional Dependencies
Relational Database Theory
Presentation transcript:

CS 222 Database Management System Spring Lecture 4 Database Design Theory Korra Sathya Babu Department of Computer Science NIT Rourkela

Database Design Problem Redundancy and Anomaly Functional Dependency Axioms, Logical Implications of FDs, Redundant FDs, Closure, Equivalence of FDS, extraneous attributes, covers Decomposition Rules of Decomposition, Test for Lossless Join and Dependency Preservation Normalization Normal Forms, Multivalued Dependency, Join Dependency, Denormalization 2/9/2016 Database Design2 Unit Overview

2/9/2016 Database Design3 Modeling Movies Stars-In Stars length filmType title year Owns Studios nameaddress name address

Automatic mappings from E/R to relations may not produce the best relational design possible Suggested Design Strategy Real-world to E/R model to Relational schema to Better relational schema to Relational DBMS Database designers sometimes go directly from Real-world to Relational schema, in which case the relational design could be really bad. Many problems may arise if the design is not careful 2/9/2016 Database Design4 Relational Database Design

Redundancy Insertion Anomaly Updation Anomaly Deletion Anomaly 2/9/2016 Database Design5 Problems using bad design

Decomposition But problems of Integrity arises if not checked well Functional Dependency Any more Solutions 2/9/2016 Database Design6 Solutions

Definition A functional dependency (FD) has the form X  Y, where X and Y are sets of attributes in a relation R. Formally, X  Y means that whenever two tuples in R agree on all the attributes of X, they must also agree on all the attributes of Y. Movies (title, year, length, filmtype,studioName, starName) FDs we can reasonable assert are: Title, year  length ; Title, year  filmType; Title, year  studioName Trivial Dependency Trivial: A fd A 1 A 2 …A n  B is said to be trivial if B is one of the A’s. ex. title year  title Nontrivial: atleast one of the B’s not among A’s. ex. title year  year length Completely nontrivial: none of the B’s are part of A’s. ex. title year  length 2/9/2016 Database Design7 Functional Dependency

An FD A1 A2...An -> B1 B2...Bm is trivial if the B’s are a subset of the A’s {B1,B2,... Bn} subset {A1,A2,... An} It’s non-trivial if at least one B is not among the A’s, i.e., {B1,B2,... Bn} − {A1,A2,...An} ≠ Ø It’s completely non-trivial if none of the B’s are among the A’s, {B1,B2,... Bn} Intersect {A1,A2,...An} = Ø Trivial dependency rule: The FD A1 A2...An  B1 B2...Bm is equivalent to the FD A1 A2...An  C1 C2... Ck, where the C’s are those B’s that are not A’s, i.e., {C1, C2,.., Ck} = {B1,B2,.,Bm} − {A1,A2,..,An} 2/9/2016 Database Design8 Trivial Dependency

Reflexivity (If X Y, then X  Y) Augumentation (If X  Y, then XZ  YZ for any Z) Transitivity (If X  Y and Y  Z, then X  Z) 2/9/2016 Database Design9 Armstrong Axioms Armstrong axioms are sound and complete Sound They generate only FDs in F + when applied to a set of FDs Complete They when repeatedly applied, these rules will generate all FDs in F +

Finding the closure of an FD Set may be tedious. So more rules may be derived from Armstrong Axioms The Union Rule, Pseudotransitive Rule and Decompostion Rule are Sound but not complete Union Rule (If X  Y and X  Z, then X  YZ) 2/9/2016 Database Design10 More Inference Axioms Given x  y, x  z Augument x to x  y and y to x  z  xx  xy ; xy  yz  x  xy ; xy  yz  x  yz [using transitive Axiom; x  xy, xy  yz]

Pseudotransitive Rule (If X  Y and YW  Z, then XW  Z) Decomposition Rule ( If X  YZ then X  Y and then X  Z ) 2/9/2016 Database Design11 More Inference Axioms Given x  y, yw  z Augument w to x  y  xw  yw  xw  z [using transitive Axiom; xw  yw, yw  z] Lets prove from the back onwards Assume x  y and x  z is given Take x  y and augument with x  xx  yx We already have x  z, So replace x with z in determinee  xx  yx  x  yz

Let F be the following set of functional dependencies: {AB → CD, B → DE, C → F, E → G, A → B}. Use Armstrong’s axioms to show that {A → FG} is logically implied by F 2/9/2016 Database Design12 Logical Implications of FDs

Used to determine if a relation R satisfies or doesn’t satisfy a given FD: A  B Input: Relation R and an FD: A  B Output: TRUE if R satisfies A  B, otherwise FALSE 2/9/2016 Database Design13 The Satisfies Algorithm The Satisfies Algorithm: Step 1: Sort the tuples of the relation R on the attribute(s) A (determinant) so that tuples with equal values under A are next to each other Step 2: Check that tuples with equal values under A also have equal values under attribute(s) B Step 3: If any two tuples of R meet condition 1 but fail to meet condition 2 the output of the algorithm is FALSE. Otherwise, the relation satisfies the Functional Dependency and the output of the algorithm is TRUE In short the satisfies algorithm can be stated as: The relation R satisfies the FD: A  B if the following holds for every pair of tuples t 1 and t 2 in R, if t 1.A = t 2.A then t 1.B = t 2.B

Given a set F of FDs, a FD A  B of F is said to be redundant w.r.t the FDs of F if and only if A  B can be derived from the set of FDs F-{A  B} Eliminating Redundant FDs allows us to minimize the set of FDs Membership Algorithm helps to determine the Redundant FDs. Input : A set F of FDs and a particular FD of F that is being tested Output: FD is Redundant or not 2/9/2016 Database Design14 Redundant FDs

Assume F is a set of FDs with A  B Є F 2/9/2016 Database Design15 Redundant FDs The Membership Algorithm: Step 1: Remove temporarily A  B from F and initialize the set of FDs G to F. ie. Set G=F-{A  B}. If G ≠ Ø proceed to step 2; otherwise stop executing the algorithm since A  B is non redundant Step 2: Initialize the set of attributes T i (with i=1) with the set of attribute(s) A(the determinant of the FD under consideration). ie. Set T i = T 1 = {A}. The set T 1 is the current T i Step 3: In the set G search for FDs X  Y such that all the attributes of the determinant X are elements of the current set T i. There are two possible outcomes Step 3a: If such FD is found, add the attribute of Y (right hand side of FD) to set T i and form a new Set T i + 1 = T i U Y. The Set T i + 1 is the current T i. Check if all the attributes of B (the right hand side of FD under consideration) are members of T i + 1. If this is the case, stop executing algorithm becos the FD:A  B is redundant. If not all attributes of B are members of T i + 1, remove X  Y from G and repeat step 3 Step 3a: If G= Ø or there are no FDs in G that have all the attributes of its determinant in the current T i then A  B is not redundant

Given the set F={x  YW, XW  Z, Z  Y, XY  Z}. Determine if the FD XY  Z is redundant in F Eliminate redundant FDs from F={ X  Y, Y  X, Y  Z, Z  Y, X  Z, Z  X } using the Membership algorithm Find the redundant FDs in the set F={ X  YZ, ZW  P, P  Z, W  XPQ, XYQ  YW, WQ  YZ } 2/9/2016 Database Design16 Redundant FDs

Definition The set of all FDs implied by a given set F of FDs is called the closure of F, and denoted as F + Armstrong Axioms can be applied repeatedly to infer all FDs implied by a set F of FDs 2/9/2016 Database Design17 Closure of FD Set Given R = ABCD and F = {A → B, A → C, CD → A}. Compute F +. A + F ={ABC} B + F ={B} … AB + F ={ABC} AC + F ={ABC} … ABC + F = … Given R = XYZ and F = {XY → Z}. Compute F +. F + = {X → X, Y → Y, Z → Z, XY → X, XY → Y, XY → XY, XY → Z XZ → X, XZ → Z, XZ → XZ, YZ → Y, YZ → Z, YZ → YY, XYZ → X, XYZ → Y, XYZ → Z, XYZ → XY, XYZ → XZ, XYZ → YZ, XYZ → XYZ,} Consider a relation with schema R(A,B,C,D) and FD's F={AB → C, C → D, D → A} Compute F +

Finding all the attributes in the relation that the current attribute can determine by using inference axioms and given FD set. Its denoted by {A} + Given FDs set F={ X  YZ, ZW  P, P  Z, W  XPQ, XYQ  YW, WQ  YZ }. Find the Closure of all the single attributes 2/9/2016 Database Design18 Attribute Closure

A unique minimal set of attribute(s) that determine the set of other attributes in a relation Two properties of key are unique and minimalism A superkey is a set of attributes that has the uniqueness property but is not necessarily minimal If a relation has multiple keys, specify one to be the primary key Convention: in a relational schema, underline the attributes of the primary key. If a key has only one attribute A, we say that A rather than {A} is a key. 2/9/2016 Database Design19 Candidate Key

Given a relation R(ABC) and FDs set F={ AB  C, B  D, D  B }. Find the candidate keys of the relation Given a relation R(XYZWP) and FDs set F={ Y  Z, Z  Y, Z  W, Y  P }. Find the number of candidate keys Consider a schema R={S,T,V,C,P,D} and F= {S → T, V → SC, SD → P}. Find keys for R Given a relation R(XYZWP) and FDs set F={x  Z, YZ  W, Z  Y }. Find the number of candidate keys 2/9/2016 Database Design20 Candidate Key

Given two sets F and G of FDs defined over same relational schema A set of FD’s S ‘follows’ from a set of FD’s T if every relation instance that satisfies all the FD’s in T also satisfies all the FD’s in S A  C follows from T = {A  B, B  C}. Two sets of FD’s S and T are ‘equivalent’ if and only if S follows from T, and T follows from S S = {A  B,B  C,A  C} and T={A  B, B  C} are equivalent These notions are useful in deriving new FDs from a given set of FDs 2/9/2016 Database Design21 Equivalence of set of FDs

Two sets of FDs F and G defined over same relation schema are equivalent if every FD in F can be inferred from G and every FD in G can be inferred from F F Covers G if every FD in G can be inferred from F (ie if G + is subset of F + ) F and G are equivalent if F covers G and G covers F If G covers F and no proper subset H of G exist such that H + = G + we say G is a non-redundant cover of F 2/9/2016 Database Design22 Equivalence of set of FDs

2/9/2016 Database Design23 The non-redundant cover algorithm The Non-Redundant Cover Algorithm: Step 1: Initialize G to F. i.e. set G=F Step 2: Test every FD of G for redundancy using the Membership Algorithm until there are no more FDs of G to be tested Step 3: The set G is a non-redundant cover of F Note: Given a set F, there may be more than one non-redundant cover since the order in which the FDs are considered is irrelevant Problem Find the non-redundant cover G for the set F={ X  YZ, ZW  P, P  Z, W  XPQ, XYQ  YW, WQ  YZ }

Definition: A is extraneous in X  Y if A can be removed from the left side or right side of X  Y without changing the closure of F. ex. Let G = {A  B C, B  C, A B  D } Attribute C is extraneous in the right side of A  B C and attribute B is extraneous in the left side of A B  D The set G = {A  B C, B  C, A B  D } is neither left- reduced nor right-reduced. G1 = {A  B C, B  C, A  D } is left-reduced but not right-reduced, while G2 = (A  B. B  C, A B  D} is right-reduced but not left-reduced. The set G3 = {A  B, B  C, A  D} is left and right-reduced, hence reduced One need to eliminate the extraneous attribute 2/9/2016 Database Design24 Extraneous Attribute

2/9/2016 Database Design25 Left Reduced Algorithm The Left-Reduced Algorithm: Step 1: Initialize a set of FDs to F. i.e. set G=F Step 2: For every A 1,A 2,…,A i,…,A n  Y in G do step 3 until there are no more FDS in G to which this step can be applied. The algorithm stops when all FDs of G have executed step 3 Step 3: For each attribute A i in the determinant of FD selected in the previous step do step 4 until all attributes have been tested. After finishing testing of all attributes of a particular FD repeat step 2 Step 4: Test if all attributes of Y (the RHS of the FD under consideration) are elements of the closure of A 1,A 2,…,A n (notice that we have removed attribute A i from the determinant of the FD) with respect to the FDs of G. If this is the case remove attribute Ai from the determinant of the FD undergoing testing becos Ai is an extraneous left attribute. If not all attributes of Y are elements of the closure of A 1,A 2,…,A n then attribute Ai is not an extraneous left attribute and should remain in the determinant of the FD under consideration When the algorithm finishes the set G contains a left-reduced cover set of T

Remove any extraneous left attributes from F={A  BC, E  C, D  AEF, ABF  BD} Reduce the set F={X  Z, XY  WP, XY  ZWQ, XZ  R} by removing extraneous left attribute Reduce the set F={X  WY, XW  Z, Z  Y, XY  Z} by removing extraneous left attribute 2/9/2016 Database Design26 Extraneous Attribute Tip :  There is no need to consider FDs with determinant that consist of single attribute

A set of FDs F is canonical if every FD in F is of the form X  A and F is left-reduced and non- redundant Since a canonical set of FDs is non-redundant and every FD has a single attribute on the right side, it is right-reduced. Since it is also left-reduced, it is reduced Example: The set F = {A  B, A  C, A  D, A  E, B I  J} is a canonical cover for G = {A  B C E, A B  D E, B I  J} 2/9/2016 Database Design27 Canonical Cover

A set of FDs is minimal if it satisfies the following conditions every dependency in F has a single attribute for its RHS we cannot remove any dependency from F and have a set of dependencies that is equivalent to F we cannot replace any dependency X  A in F with a dependency Y  A, where Y proper-subset-of X (Y subset-of X) and still have a set of dependencies that is equivalent to F Every set of FDs has an equivalent minimal set There can be several equivalent minimal sets There is no simple algorithm for computing a minimal set of FDs that is equivalent to a set F of FDs To synthesize a set of relations, we assume that we start with a set of dependencies that is minimal set 2/9/2016 Database Design28 Minimal Cover

We have been measuring our covers in terms of the number of FDs they contain. We can also measure them by the number of attribute symbols required to express them. example. (A B  C, CD  E, A C  IJ> has size 10 under this measure Defiition: A set of FDs F is optimal if there is no equivalent set of FDs with fewer attribute symbols than F The set F = {EC  D, AB  E, E  AB ) is an optimal cover for G=(ABC  D,AB  E, E  AB }. Notice that G is reduced and minimum, but not optimal. 2/9/2016 Database Design29 Optimal Cover

Find canonical cover of 1.F={X  Z, XY  WP, XY  ZWQ, XZ  R} 2.F={X  YW, XW  Z, Z  Y, XY  Z} 3.F={A  BC, E  C, D  AEF, ABF  BD} 4.G = {A  C, A B  C, C  DI, CD  I, EC  AB, EI  CC} Find minimal cover of the following FD sets 1.F={AB  C, A  BC, B  C, A  B} 2.F={A  B, B  A, B  C, A  C, C  A} 3.F={AB  C, A  B, B  A} 4.G = {ABCD  E, E  D, A  B, AC  D} 5.F={A  BC, B  AC, C  AB} 6.G={ABD  C, C  BE, AD  BF, B  F} 2/9/2016 Database Design30 Problems

Compound Functional Dependency (CFD) and Annular Covers 2/9/2016 Database Design31 Seminar change?

Good Design needs strategy Armstrong Axioms are sound and complete FDs are constraints There may be a number of equivalent FD sets FD sets may be minimized by checking the coverage 2/9/2016 Database Design32 Summary