Database Principles Relational Database Design II.

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Copyright © Cengage Learning. All rights reserved.
Algebra Problems… Solutions Algebra Problems… Solutions © 2007 Herbert I. Gross Set 22 By Herbert I. Gross and Richard A. Medeiros next.
2.3 Matrix Inverses. Numerical equivalent How would we solve for x in: ax = b ? –a -1 a x = a -1 b –x=a -1 b since a -1 a = 1 and 1x = x We use the same.
Database Principles ER to RDM Mapping. Database Principles Mapping from ER to Relational Data Model the next phase Exercise: Give me some suggestions.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture13: Mapping Reductions Prof. Amos Israeli.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Database Principles Relational Algebra. Database Principles What is Relational Algebra? It is a language in which we can ask questions (query) of a database.
Copyright © Cengage Learning. All rights reserved.
Mathematics Reciprocal Functions Science and Mathematics Education Research Group Supported by UBC Teaching and Learning Enhancement Fund Department.
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Venn Diagrams Database Principles.
Database Principles Relational Database Design I.
The Game of Algebra or The Other Side of Arithmetic The Game of Algebra or The Other Side of Arithmetic © 2007 Herbert I. Gross by Herbert I. Gross & Richard.
Normalization B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Called as the Interval Scheduling Problem. A simpler version of a class of scheduling problems. – Can add weights. – Can add multiple resources – Can ask.
Mathematics Number: Logarithms Science and Mathematics Education Research Group Supported by UBC Teaching and Learning Enhancement Fund Department.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Copyright © Curt Hill Schema Refinement III 4 th NF and 5 th NF.
DBSQL 3-1 Copyright © Genetic Computer School 2009 Chapter 3 Relational Database Model.
Mathematics Inverse Functions Science and Mathematics Education Research Group Supported by UBC Teaching and Learning Enhancement Fund Department.
Relations And Functions. A relation from non empty set A to a non empty set B is a subset of cartesian product of A x B. This is a relation The domain.
Functional Dependencies An example: loan-info= Observe: tuples with the same value for lno will always have the same value for amt We write: lno  amt.
Extending the Definition of Exponents © Math As A Second Language All Rights Reserved next #10 Taking the Fear out of Math 2 -8.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
1 Sections 1.5 & 3.1 Methods of Proof / Proof Strategy.
The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.
Normal Forms through BCNF CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Normalization Ioan Despi 2 The basic objective of logical modeling: to develop a “good” description of the data, its relationships and its constraints.
Mathematics Numbers: Absolute Value of Functions I Science and Mathematics Education Research Group Supported by UBC Teaching and Learning Enhancement.
1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
1 Lecture 6: Schema refinement: Functional dependencies
Lecture No 14 Functional Dependencies & Normalization ( III ) Mar 04 th 2011 Database Systems.
Revisit FDs & BCNF Normalization 1 Instructor: Mohamed Eltabakh
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#16: Schema Refinement & Normalization.
1 Functional Dependencies and Normalization Chapter 15.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Normalisation Lecture 3 Akhtar Ali 12/16/ Learning Objectives 1.To consider the process of Normalisation 2.To consider the definition and application.
Algebra Problems… Solutions Algebra Problems… Solutions © 2007 Herbert I. Gross Set 17 part 2 By Herbert I. Gross and Richard A. Medeiros next.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
MIS 3053 Database Design And Applications The University Of Tulsa Professor: Akhilesh Bajaj Normal Forms Lecture 1 © Akhilesh Bajaj, 2000, 2002, 2003.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Ch 7: Normalization-Part 1
Linear Programming Back to Cone  Motivation: From the proof of Affine Minkowski, we can see that if we know generators of a polyhedral cone, they.
MIS3053 Database Design And Applications The University Of Tulsa Professor: Akhilesh Bajaj Normal Forms Lecture 2 © Akhilesh Bajaj, 2000, 2002, 2003,2004.
Chapter 5. Section 5.1 Climbing an Infinite Ladder Suppose we have an infinite ladder: 1.We can reach the first rung of the ladder. 2.If we can reach.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts - 6 th Edition Chapter 7: Entity-Relationship Model.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Sorting by placement and Shift Sergi Elizalde Peter Winkler By 資工四 B 周于荃.
1 Relational Algebra and SQL. 2 Relational Query Languages Languages for describing queries on a relational database Relational AlgebraRelational Algebra.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh Part 2.
Database Management Systems (CS 564)
CS 480: Database Systems Lecture 22 March 6, 2013.
Handout 4 Functional Dependencies
Schema Refinement What and why
Functional Dependencies and Normalization
Back to Cone Motivation: From the proof of Affine Minkowski, we can see that if we know generators of a polyhedral cone, they can be used to describe.
UNIT II RECIPROCAL EQUATIONS
Database.
Validity and Soundness, Again
Presentation transcript:

Database Principles Relational Database Design II

Database Principles Design Objective: Turn bad tables into good. –Create a set of tables; all about one thing Do so without loss of information –The new set of tables are related in such a way that joining new tables recreates exactly the information found in the original tables

Database Principles It is all about Information: How is information captured within a table? In the above table, a table about Suppliers, the “information” is that: –every supplier has a unique ID –every supplier has a unique name –every supplier has a unique location

Database Principles Table Information and Enterprise Rules: The rules: –every supplier has a unique ID –every supplier has a unique name –every supplier has a unique location are called Enterprise Rules (ER). Enterprise Rules are rules that come from –domain knowledge, –how the business or organization is run, –what the “experts” know about the business

Database Principles Example: Instead of “every supplier has a unique location”: Suppose we are told that a supplier can have several locations: pk = {Sno} This is not permitted: each row-column intersection should contain a single value pk = {Sno}

Database Principles Example (cont): Suppose we are told that a supplier can have several locations – a better solution: Point to be Made: Different Enterprise Rules result in different table configurations. pk = {Sno}pk = {Sno,Location}

Database Principles How to Capture Enterprise Rules: A functional dependency (FD) is a functional relationship among attributes of a table, whose definition may vary over time. The difference between a mathematical function and a functional dependency is that the latter may have its definition change over time. Function Definition: For every a Є A there is a unique b Є B such that f(a) = b. a b = f(a)‏

Database Principles Example of Functional Dependencies: Consider the following: Both name_of and location_of are functional dependencies but may not both be functions. Enterprise Rules: Every Supplier has a unique name Change over Time: Not likely Every Supplier has a unique Location Change over Time: Possibly.

Database Principles Functional Dependencies for Capturing Enterprise Rules Write down all the functional dependencies in the following table: If you capture the table information in FDs you have also captured the Enterprise Rules of the business. borrowerid  b_name borrowerid  b_addr borrowerid  b_status borrowerid  loan_limit b_status  loan_limit

Database Principles Many Enterprise Rules of a business are captured by the Functional Dependencies found among the columns of the tables that hold the data of the business. As a database designer, if you capture the FDs of the various tables of the database you also understand the rules by which the organization goes about its business.

Database Principles Why we need FDs? Because we must never lose any information turning bad tables into good, we must know before we start what information we have. Relational Database Design starts with listing all FDs of all existing tables.

Database Principles FD Notation: Suppose R = {A, B, C, D, E, F} is a table schema. Further suppose X ⊆ R and that there is a functional dependency from X to D. We write this: X  D

Database Principles Different Ways of Describing FDs: If R is a table and A, B ε R then the following are equivalent: –A  B –B depends on A –A determines B –B is determined by A –if your know (the value of) A then you know (the value of) B

Database Principles Example: Consider the table: Claim: Alternatively: Justification: StudentID  DOB if you know the value of the StudentID then you can determine exactly the DOB. there is an Enterprise Rule that says “everyone has a unique DOB”

Database Principles Example: What other FDs exist in this table: StudentID  DOB StudentID  SSN StudentID  FName StudentID  LName StudentID  Address SSN  StudentID SSN  FName SSN  LName SSN  Address SSN  DOB What about: StudentID, Address  DOB Where do all these FDs come from? Answer: Enterprise Rules

Database Principles How Does Knowing the FDs tell Us If A Table Is Good? Remember, a table is “good” if it is about one thing. A table is always “about” whatever its key identifies. Therefore, a table is good if it only includes info about its key. An FD with a table key on the left-hand-side, tells us info about the key and consequently keeps the table “good”. Other FDs are consequently “bad”.

Database Principles Good FDs and Good Tables: Def’n: A functional dependency is good in a table if it is of the form table key  some attribute FDs: Sno  Sname Sno  Location Hence the only FDs are “good” and consequently the table is also “good”. pk = {Sno} NOTE: The fact that in the current version of the table there are no two suppliers with the same name does not mean that the world of business has an Enterprise Rule that says it must always be so. So we can’t say Sname  Location

Database Principles Something To Remember: The previous example points out an important fact. We can’t be looking at current rows in the table, which reflect today’s reality, to decide what FDs exist or don’t exist. FDs reflect Enterprise Rules and FDs come from knowing about the business over the long term, not just what happens to hold true today.

Database Principles Why is this a good thing? Because looking for FDs is a design-phase activity. We don’t need to have the tables already built. This means that we don’t need to build the tables only to find out later they are bad; we can avoid bad tables from the get-go.

Database Principles What is Our Basic Job: To find out if a table is “good” we must first find all FDs in a table and decide which ones are “good” and which ones are not. In order to know which FDs are “good” we need to know all the keys of a table. Keys to a table are either super keys and have more columns than they need or as small as they can be, in which case are candidate keys. Super keys always contain a candidate key Basic Job: Find all candidate keys.

Database Principles Reasoning Rule #1: Composition/Decomposition: The argument: –X  A, B means X determines both A and B. This is true either as a pair or individually so we can conclude –if X  A and X  B then knowing X means you know both A and B; either individually or as a pair. Hence X  A, B is equivalent to X  A and X  B X  A, B implies X  A and X  B X  A and X  B implies X  A, B

Database Principles Reasoning Rule #2: Identity: The argument: –If you know X then you know X. X  X

Database Principles Reasoning Rule #3: Transitivity: The argument: –If you know X then you know Y. –But if you know Y you know Z. –Hence if you know X you know Z. Note: This is nothing more than function composition applied to FDs. If X  Y and Y  Z then X  Z if f : A  B and g : B  C then g f : A  C

Database Principles Reasoning Rule #4: Trivial: The argument: –This says that X determines the empty set. –In other words, if you know X there is nothing else you need or want to know. If X  Ǿ

Database Principles Reasoning Rule #5: Augmentation: The argument: –This says that if X determines Y then including extra columns in X does not alter this. If X  Y then X, A  Y

Database Principles Reasoning Rule #6: Augmentation+: The argument: –This says that if X determines Y and A determines B, then X together with A determine Y together with B. If X  Y and A  B then X, A  Y,B

Database Principles The 6 Reasoning Rules for FDs: Composition/Decomposition: Identity: Transitivity: Trivial: Augmentation: Augmentation+: If X  Y and Y  Z then X  Z If X  Ǿ If X  Y and Y  Z then X  Z X  X If X  Y then X, A  Y If X  Y and A  B then X, A  Y,B

Database Principles Be Careful: Notice, for example, that you can’t reason Augmentation backwards. You can’t say: A simple counter-example: but neither nor are true. If X, A  Y then X  Y CourseNumber, SectionNumber  ProfessorName SectionNumber  ProfessorName CourseNumber  ProfessorName

Database Principles Exercise: Consider the table: The following sentences describe the data in the table: A student, sid, whose name is sn enrolls in section sec of course cid in semester sem. The course description is desc. The student gets a grade of gr. The course was held in room rn of the building bg. The capacity of the room is cap. The time table slot for the course was tt.

Database Principles Exercise: To find FDs we need to focus on Enterprise Rules StudID  StudName CrsID  CrsDesc Bldg, RoomNum  RoomCap CrsID, SecID, Semester  TTSlot CrsID, SecID, Semester  Bldg CrsID, SecID, Semester  RoomNum StudID, CrsID, SecID, Semester  Grade Bldg, RoomNum, TTSlot, Semester  CrsID,SecID CrsID, SecID, Semester  RoomCap, TTSlot, Bldg Reason: Composition + Transitivity

Database Principles Exercise: Find the FDs in this table: borrowerid  b_name, b_addr, b_status, loan_limit b_status  loan_limit b_name, b_addr  borrowerid (?)‏ good bad good NOTE: The last FD, if it is an FD, shows us an important fact; namely if X determines a key then X too is a key. Reason: If X is a key then X  R. If Y  X then by transitivity Y  R and so is a key too.

Database Principles Finding Candidate keys: In order to know if an FD is good or bad, and so know if it belongs in the table or not, we need to know if a given set of columns (the columns in the left hand side of the FD) is a key (contains a candidate key). To find all keys we must first find all candidate keys. Every key contains a candidate key.

Database Principles Relationship Between FDs and Keys Can we recognize a set of columns as being a key by looking at a FD? Recall that a key is a set of columns whose values uniquely determine the remaining values in a row. Another way of putting this is: X  R characterizes a key. if X is a key to R and r1 and r2 are rows of R then: r1[X] = r2[X] implies r1 = r2 if you know X you know everything (R)‏ X  R

Database Principles Exercise: X is a key to R if and only if X  (R \ X)‏ R = X (R \ X)‏ If X is a key to R we know X  R by definition, so X  X, (R \ X)‏ From the Decomposition Rule we can say X  X and X  (R \ X)‏ In other words, X  (R \ X)‏ Now suppose X  (R \ X)‏ We already know X  X by the Identity Rule. By the Composition Rule we can combine the last two FDs. X  X, (R \ X) = R So X is a key. R XR \ X

Database Principles Find Candidate Keys: Consider: Let’s assume that the FDs we know about are: –A, B  C -- FD 1 –D, E  B -- FD 2 –F  D -- FD 3 –B, E  F -- FD 4 –D  A -- FD 5

Database Principles Find Candidate Keys (2): A, B  C -- FD 1 D, E  B -- FD 2 F  D -- FD 3 B, E  F -- FD 4 D  A -- FD 5 We know we are looking for something like X  R \ X where we can’t make X any smaller. This means an FD with all six columns, either on the left or the right hand side. (i) D  A -- FD5 (ii) D, B  A, B -- (i), Identity(B), Aug+ (iii) D, B  C -- (ii), FD1, Tran (v) D, E  B, D -- FD2, Identity(D), Aug+ (vi) D, E  C -- (v), (iii),Tran (viii) D, E  B, E -- FD2, Iden(E), Aug+ (ix) D, E  F -- (viii), FD4, Tran (x) D, E  A -- FD5, Aug (xi) D, E  A, B, C, F -- (x), FD2, (vi), (ix), Comp So {D, E} is a key since it determines all other columns. It is a candidate key if we can’t remove either D or E. If we could then it should be possible to prove either D  E or E  D using only the original 5 FDs. Clearly D  E is impossible since nothing determines E (E is missing from all right-hand sides)‏ E  D is also impossible. The best we can do is B, E  F  D

Database Principles Finding Candidate Keys (3)‏ Conclusion: Are there any other candidate keys? You must always ask and try to answer this question. So can we solve the following query? {D, E} is a candidate key to the table R. Lemma: If X is a key to R and Y --> X then Y is a key to R Proof: Y --> X and X --> R so Y --> R. Hence Y is a key to R. ? --> D, E

Database Principles Finding Candidate Keys (4)‏ Solve: So {F, E} is a key to R. It is a CK if it can't be made any smaller. Making it “smaller” means that: F --> E is impossible since once again, E does not appear on the right-hand-side of any original FD. B, E --> F is the best we can do. {F, E} is another CK. A, B  C -- FD 1 D, E  B -- FD 2 F  D -- FD 3 B, E  F -- FD 4 D  A -- FD 5 ? --> D, E F --> D -- FD 3 (xii) F,E --> D, E -- FD 3, Ident(E), Aug+ either F --> E or E --> F

Database Principles Finding Candidate Keys (5)‏ Solve: So {B, E} is a key to R. It is a CK if it can't be made any smaller. Making it “smaller” means that: B --> E is impossible since once again, E does not appear on the right-hand-side of any original FD. D, E --> B is the best we can do. {B, E} is another CK. A, B  C -- FD 1 D, E  B -- FD 2 F  D -- FD 3 B, E  F -- FD 4 D  A -- FD 5 ? --> F, E B,E --> F -- FD 4 (xii) B,E --> F, E -- FD 4, Aug either B --> E or E --> B

Database Principles Observations: The candidate keys are {B, E}, {D, E} and {F, E}. If an attribute (E in the last example) does not appear on the right-hand-side of any basic FD then it belongs to every candidate key.

Database Principles Find Candidate Keys Exercise: Consider: Let’s assume that the FDs we know about are: –D  C, A -- FD 1 –B  F -- FD 2 –F  C, E -- FD 3 –B  D -- FD 4

Database Principles Answer OBS: B belongs to every CK. The above shows {B} is a key. Since it is a singleton, it is a candidate key. Since every candidate key must contain {B} but can't actually be any larger than {B}, any other candidate key is also {B}. {B} is the only candidate key. D  C, A -- FD 1 B  F -- FD 2 F  C, E -- FD 3 B  D -- FD 4 B --> F -- FD2 B --> D -- FD4 (i) B --> C, A -- FD4, FD1, transitivity (ii) F --> E -- FD3, decomposition (iii) B --> E -- FD2, (iii), transitivity B --> A, C, D, E, F -- (i), FD4, (iii), FD2, composition

Database Principles Find Candidate Keys Exercise: Consider: Let’s assume that the FDs we know about are: –B, E  C -- FD 1 –A, F  B -- FD 2 –C  A, D -- FD 3 –B  E -- FD 4

Database Principles Answer OBS: F belongs to every CK. The above shows {B,F} is a key. F can not be removed (it belongs to all CKs). In order to move B to the RHS we must leave A and F. But A is already moved so we need to leave C. But C is already moved so we must leave B. Hence in order to move B we must leave it where it is. Hence {B,F} is a candidate key. B, E  C -- FD 1 A, F  B -- FD 2 C  A, D -- FD 3 B  E -- FD 4 (i) B --> B, E -- FD4, ident(B), aug+ (ii) B --> C -- (i), FD1, trans (iii) B --> A, D -- (ii), FD3, transitivity (iv) B --> A,C,D,E -- (i)-(iii), composition (v) B,F --> A,C,D,E -- (iv), aug

Database Principles Answer (Additional Candidate Keys)‏ {B,F} is a candidate key. F belongs to every CK. Hence {A,F} is a key. It is a CK because –F can't be removed (see above)‏ –A can't be removed since {F} is not a key {C, F} is a third CK. B, E  C -- FD 1 A, F  B -- FD 2 C  A, D -- FD 3 B  E -- FD 4 ? --> B, F (vi) A, F --> B -- FD2 (vii) A, F --> B, F -- (vi), indent(F), aug+ C, F --> A, F -- FD3, decomp, indent(F), aug+

Database Principles Finding Candidate keys: Observation: There is a candidate key excluding any attribute that appears on the right-hand-side of an FD. Suppose X  A and A is not in X then X, R \ {X,A}  A by Augmentation Since all attributes of R appear in the above FD, the left hand side is a key and so by definition contains a candidate key. Since A does not appear on the left hand side the CK does not contain A.