Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.

Slides:



Advertisements
Similar presentations
primary key constraint foreign key constraint
Advertisements

CS 440 Database Management Systems Lecture 4: Constraints, Schema Design.
Slide 1 4/21/2015 Lecture 8 Lecture 8: Schema Refinement and Normal Forms; Physical Design and Tuning Schema Refinement –Motivation –Anomalies, Redundancy.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Functional Dependencies (based on notes by Silberchatz,Korth, and.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Integrity Constraints.
Database Management COP4540, SCS, FIU Functional Dependencies (Chapter 14)
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Murali Mani Normalization. Murali Mani What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert,
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
1 Database Design Theory Which tables to have in a database Normalization.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Databases 6: Normalization
Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.
Functional Dependencies CS 186, Spring 2006, Lecture 21 R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
Schema Refinement and Normal Forms 20131CS3754 Class Notes #7, John Shieh.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
Normalization Ioan Despi 2 The basic objective of logical modeling: to develop a “good” description of the data, its relationships and its constraints.
Ihr Logo Fundamentals of Database Systems Fourth Edition El Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Ihr Logo Fundamentals of Database Systems Fourth Edition El Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
1 Lecture 6: Schema refinement: Functional dependencies
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#16: Schema Refinement & Normalization.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
1 Dept. of CIS, Temple Univ. CIS616/661 – Principles of Data Management V. Megalooikonomou Integrity Constraints (based on slides by C. Faloutsos at CMU)
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependencies. 2 Motivation v E/R  Relational translation problems : –Often discover more “detailed” constraints after translation (upcoming.
Functional Dependencies R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another. Thomas Hobbes ( )
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
MIS 3053 Database Design And Applications The University Of Tulsa Professor: Akhilesh Bajaj Normal Forms Lecture 1 © Akhilesh Bajaj, 2000, 2002, 2003.
IST Database Normalization Todd Bacastow IST 210.
Ch 7: Normalization-Part 1
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Databases 1 Sixth lecture. 2 Functional Dependencies X -> A is an assertion about a relation R that whenever two tuples of R agree on all the attributes.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 CS122A: Introduction to Data Management Lecture #12: Relational DB Design Theory (1) Instructor: Chen Li.
Functional Dependency and Normalization
Database Management Systems (CS 564)
Faloutsos & Pavlo SCS /615 Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization.
Normalization Murali Mani.
Functional Dependencies and Normalization
Functional Dependencies
Normalization cs3431.
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
Chapter 19 (part 1) Functional Dependencies
Relational Database Design
CSC 453 Database Systems Lecture
Presentation transcript:

Schema Refinement SHIRAJ MOHAMED M | MIS 1

Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an instance  Identify possible functional dependencies in a relation  Determine all keys in a schema SHIRAJ MOHAMED M | MIS 2

What is Schema Refinement? Schema Refinement is the study of what should go where in a DBMS, or, which schemas are best to describe an application. For example, consider this schema Versus this one: Which schema do you think is best? Why? EID Name DeptID DeptName A01 Ali 12 Wing A12 Eric 10 Tail A13 Eric 12 Wing A03 Tyler 12 Wing EmpDept Emp EID Name DeptID A01 Ali 12 A12 Eric 10 A13 Eric 12 A03 Tyler 12 Dept DeptID DeptName 12 Wing 10 Tail SHIRAJ MOHAMED M | MIS 3

What’s wrong?* The first problem students usually identify with the EmpDept schema is that it combines two different ideas: employee information and department information. But what is wrong with this? 1. If we separated the two concepts we could save space. 2. Combining the two ideas leads to some bad anomalies. These two problems occur because DeptID determines DeptName, but DeptID is not a key. Let’s look into the anomalies further. SHIRAJ MOHAMED M | MIS 4

Anomalies, Redundancy* What anomalies are associated with EmpDept? Update Anomalies: If the Wing department changes its name, we must change multiple rows in EmpDept Insertion Anomalies: If a department has no employees, where do we store its name? Deletion Anomalies: If A12 Eric quits, the information about the Tail department will be lost. EID Name DeptID DeptName A01 Ali 12 Wing A12 Eric 10 Tail A13 Eric 12 Wing A03 Tyler 12 Wing EmpDept SHIRAJ MOHAMED M | MIS 5

Practice Anomalies, Redundancies* Identify anomalies associated with this schema. Include update, insertion and deletion anomalies. EnrollStud(StudID, ClassID, Grade, ProfID, StudName) Why do these anomalies occur? SHIRAJ MOHAMED M | MIS 6

Practice Anomalies, Redundancies* Update Anomaly: If a student changes his name, we must change each row for which the student has taken a class. If a class changes the profID, we must change it for every row in which the class appears. Insertion Anomaly: If a student has not taken a class, where do we store her name? If a class has no student grades recorded yet, where do we store its ProfID? Deletion Anomaly: If a student drops her last course, the information about the student’s name will be lost. If the last student drops the course, the info about the ProfID will be lost. SHIRAJ MOHAMED M | MIS 7

Decomposition: A good solution The intergalactic standard solution to the redundancy problem is to decompose redundant schemas, e.g., EmpDept becomes The secret to understanding when and how to decompose schemas is Functional Dependencies, a generalization of keys. When we say "X determines Y" we are stating a functional dependency. Emp EID Name DeptID A01 Ali 12 A12 Eric 10 A13 Eric 12 A03 Tyler 12 Dept DeptID DeptName 12 Wing 10 Tail SHIRAJ MOHAMED M | MIS 8

Review Keys Note that EID being a key* of EmpDept means that the values of EID are unique, and EID is minimal. Remember: you cannot determine keys from an instance, only from “natural” information or from a domain expert. Let’s practice keys by identifying possible keys in an instance. *sometimes called a candidate key EID Name DeptID DeptName A01 Ali 12 Wing A12 Eric 10 Tail A13 Eric 12 Wing A03 Tyler 12 Wing EmpDept SHIRAJ MOHAMED M | MIS 9

Identify Possible Keys* Identify all possible Keys based on this instance: Time Flight Plane Origin Destination 9:57AM 157 abc SEA PDX 10:42AM 233 def PDX SEA 11:44AM 155 des ORD ATL 12:44PM 244 xdy ATL PDX 1:43PM 074 xyz SEA ATL 2:44PM 233 def PDX ATL 3:55PM 455 eff MSP SEA 5:44PM 120 ikk MSP PDX 7:55PM 233 abf CHI SEA SHIRAJ MOHAMED M | MIS 10

Identify Possible Keys* Identify all possible Keys based on this instance: Time Flight Plane Origin Destination 9:57AM 157 abc SEA PDX 10:42AM 233 def PDX SEA 11:44AM 155 des ORD ATL 12:44PM 244 xdy ATL PDX 1:43PM 074 xyz SEA ATL 2:44PM 233 def PDX ATL 3:55PM 455 eff MSP SEA 5:44PM 120 ikk MSP PDX 7:55PM 233 abf CHI SEA Possible keys are: {Time}, {Plane, Dest}, {Origin, Dest} SHIRAJ MOHAMED M | MIS 11

Functional Dependencies A key like EID has another property: If two rows have the same EID, then they have the same value of every other attribute. We say EID functionally determines all other attributes and write this Functional Dependency (FD): EID  Name, DeptID, DeptName Is Name  DeptID true? No, because rows 2 and 3 have the same Name but not the same DeptID. EID Name DeptID DeptName A01 Ali 12 Wing A12 Eric 10 Tail A13 Eric 12 Wing A03 Tyler 12 Wing EmpDept SHIRAJ MOHAMED M | MIS 12

Functional Dependencies, ctd. Do you see any more FDs in EmpDept? Yes, the FD DeptID  DeptName DEFINITION: If A and B are sets of attributes in a relation, we say that A (functionally) determines B, or A  B is a Functional Dependency (FD) if whenever two rows agree on A, they agree on B. In other words, the value of a row on A functionally determines its value on B. There are two special kinds of FDs: Key FDs, X  A where X contains a key Trivial FDs, such as Name  Name, or  Name,DeptID  DeptID EID Name DeptID DeptName A01 Ali 12 Wing A12 Eric 10 Tail A13 Eric 12 Wing A03 Tyler 12 Wing EmpDept SHIRAJ MOHAMED M | MIS 13

Identify (natural) FDs* What are the (natural) FDs in these relations? Identify the key FDs but ignore trivial FDs Customer(CustID, Address, City, Zip, State) EnrollStud(StudID, ClassID, Grade, ProfID, StudName, ProfName) SHIRAJ MOHAMED M | MIS 14

Identify (natural) FDs* What are the (natural) FDs in these relations? Identify the key FDs but ignore trivial FDs Customer(CustID, Address, City, Zip, State) CustID -> Address, City, Zip, State. This is a key FD Address, City, State -> Zip Zip -> State EnrollStud(StudID, ClassID, Grade, ProfID, StudName, ProfName) {studID,ClassID}->grade, ProfID, StudName,ProfName. This is a key FD StudID -> StudName ClassID -> ProfID,ProfName ProfID -> ProfName SHIRAJ MOHAMED M | MIS 15

What are FDs? An FD is a generalization of the concept of key. FDs, like keys and foreign keys, are a kind of integrity constraint (IC). Like other ICs, FDs are part of a relation’s schema. For example, a schema might be: Assigned(EmpID Int, JobID Int, EmpName varchar(20), percent real, EmpID references…, JobID references…, PRIMARY KEY (EmpID, JobID)) FDs: EmpID  EmpName SHIRAJ MOHAMED M | MIS 16

How to determine FDs So far we have dealt with “natural” FDs. Sometimes it’s not clear what FDs apply in a relation, e.g., zip codes vs cities, or Supplier(Name, Address, Crating, Discount) – unclear what are the FDs. There are two ways to determine FDs Infer them as “natural” FDs from your experience You may be given them as part of the schema, by the instructor or by the customer. As with keys, you cannot determine FDs from an instance! But you can tell if something is not an FD SHIRAJ MOHAMED M | MIS 17

LO8.3:Identify Possible FDs* Identify two possible non-key FDs based on this instance (identical to slide 10). Remember the possible keys for this instance are {Time}, {Plane, Dest}, {Origin, Dest} Time Flight Plane Origin Destination 9:57AM 157 abc SEA PDX 10:42AM 233 def PDX SEA 11:44AM 155 des ORD ATL 12:44PM 244 xdy ATL PDX 1:43PM 074 xyz SEA ATL 2:44PM 233 def PDX ATL 3:55PM 455 eff MSP SEA 5:44PM 120 ikk MSP PDX 7:55PM 233 abf CHI SEA SHIRAJ MOHAMED M | MIS 18

LO8.3:Identify Possible FDs* Identify two possible non-key FDs based on this instance (identical to slide 10). Remember the possible keys for this instance are {Time}, {Plane, Dest}, {Origin, Dest} Time Flight Plane Origin Destination 9:57AM 157 abc SEA PDX 10:42AM 233 def PDX SEA 11:44AM 155 des ORD ATL 12:44PM 244 xdy ATL PDX 1:43PM 074 xyz SEA ATL 2:44PM 233 def PDX ATL 3:55PM 455 eff MSP SEA 5:44PM 120 ikk MSP PDX 7:55PM 233 abf CHI SEA Possible FDs are Plane -> Flight and Plane -> Orig SHIRAJ MOHAMED M | MIS 19

Reasoning about FDs EmpDept(EID, Name, DeptID, DeptName) Two natural FDs are EID  DeptID and DeptID  DeptName These two FDs imply the FD EID  DeptName Because if two tuples agree on EID, then by the first FD they agree on DeptID, then by the second FD they agree on DeptName. The set of FDs implied by a given set F of FDs is called the closure of F and is denoted F + SHIRAJ MOHAMED M | MIS 20

Armstrong’s Axioms The closure of F can be computed using these axioms  Reflexivity: If X  Y, then X  Y  Augmentation: If X  Y, then XZ  YZ for any Z  Transitivity: If X  Y and Y  Z then X  Z Armstrong’s axioms are sound (they generate only FDs in F + when applied to FDs in F) and complete (repeated application of these axioms will generate all FDs in F + ). SHIRAJ MOHAMED M | MIS 21

Determining Keys In order to determine if X is a key of a relation R, use this algorithm, which computes the attribute closure of X: AttClos = X; // Note: X is a set of attributes Repeat until there is no change  If there is an FD U  V with U  AttClos, then set AttClos = AttClos ∪ V  AttClos=R if and only if X is a key SHIRAJ MOHAMED M | MIS 22

Determining the keys of R* Given the schema: R(A,B,C,D,E) BC  A, DE  C. What are all the keys of this schema? Hint: any key must include A, BC or DE. Why? SHIRAJ MOHAMED M | MIS 23

Determining the keys of R* Any key must include A, BC or DE because otherwise the Attribute Closure algorithm will never get started. Determining the keys will be done in three steps, one for A, one for BC and one for DE. 1. A: it is already a key so we are done with this step, A is a key.. 2. BC determines A which determines everything else so we are done, BC is a key 3. DE->DEC, dead end, so DE is not a key. Let’s try adding attributes to DE, in alphabetical order. We can’t add A to DE, since the result would not be minimal (A is a key) DEB->DEBC contains a key so DEB is a key DEC, dead end, so DEC is not a key, and we can’t add anything to it to make a key (A and B would make it a key we have already seen). Conclusion: The keys are A, BC and DEB. Notice how systematic we were. You’ll need it for the exercises and homework. R(A,B,C,D,E) BC  A, DE  C. SHIRAJ MOHAMED M | MIS 24