CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos – A. Pavlo Lecture#16: Schema Refinement & Normalization.

Slides:



Advertisements
Similar presentations
primary key constraint foreign key constraint
Advertisements

Manipulating Functional Dependencies Zaki Malik September 30, 2008.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Functional Dependencies (based on notes by Silberchatz,Korth, and.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Integrity Constraints.
Database Management COP4540, SCS, FIU Functional Dependencies (Chapter 14)
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Normalization DB Tuning CS186 Final Review Session.
Normalization DB Tuning CS186 Final Review Session.
Multivalued Dependency Prepared by Tomasz Kaciak CS157A.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update.
Databases 6: Normalization
Functional Dependencies CS 186, Spring 2006, Lecture 21 R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Computing & Information Sciences Kansas State University Monday, 13 Oct 2008CIS 560: Database System Concepts Lecture 18 of 42 Monday, 13 October 2008.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Functional Dependencies An example: loan-info= Observe: tuples with the same value for lno will always have the same value for amt We write: lno  amt.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
Ihr Logo Fundamentals of Database Systems Fourth Edition El Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Ihr Logo Fundamentals of Database Systems Fourth Edition El Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 564 Database Management Systems: Design and Implementation Discussion Session Friday, Sept 18, Apul Jain.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
1 Lecture 6: Schema refinement: Functional dependencies
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
1 Dept. of CIS, Temple Univ. CIS616/661 – Principles of Data Management V. Megalooikonomou Integrity Constraints (based on slides by C. Faloutsos at CMU)
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.
Minimum Cover of F. Minimal Cover for a Set of FDs Minimal cover G for a set of FDs F: –Closure of F = closure of G. –Right hand side of each FD in G.
Deanship of Distance Learning Avicenna Center for E-Learning 1 Session - 7 Sequence - 2 Normalization Functional Dependencies Presented by: Dr. Samir Tartir.
1 Functional Dependencies. 2 Motivation v E/R  Relational translation problems : –Often discover more “detailed” constraints after translation (upcoming.
Functional Dependencies R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another. Thomas Hobbes ( )
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
MIS 3053 Database Design And Applications The University Of Tulsa Professor: Akhilesh Bajaj Normal Forms Lecture 1 © Akhilesh Bajaj, 2000, 2002, 2003.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Databases 1 Sixth lecture. 2 Functional Dependencies X -> A is an assertion about a relation R that whenever two tuples of R agree on all the attributes.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
CS422 Principles of Database Systems Normalization
Database Management Systems (CS 564)
CS422 Principles of Database Systems Normalization
Lecture 6: Design Theory
Faloutsos & Pavlo SCS /615 Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization.
Normalization Murali Mani.
Functional Dependencies and Normalization
Temple University – CIS Dept. CIS331– Principles of Database Systems
Lecture 09: Functional Dependencies, Database Design
Normalization cs3431.
CS 405G: Introduction to Database Systems
Chapter 19 (part 1) Functional Dependencies
CSC 453 Database Systems Lecture
Presentation transcript:

CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#16: Schema Refinement & Normalization

CMU SCS Administrivia HW5 is due today. HW6 is due Tuesday March 24 th. Faloutsos/PavloCMU SCS /6152

CMU SCS Database Design How do we design a “good” database? –Want to ensure the integrity of the data. –Want to get good performance. Relational DBMS vs. NoSQL DBMS Faloutsos/PavloCMU SCS /6153 This Week Next Week

CMU SCS Example Faloutsos/PavloCMU SCS /6154 studentIdcourseIdroomgradenameaddress GHC 6115AChristosPittsburgh GHC 8102BTupacLos Angeles GHC 6115AObamaChicago GHC 6115AWaka FlockaAtlanta Students(studentId, courseId, room, grade, name, address)

CMU SCS Redundancy Problems Update Anomalies –If the room number changes, we need to make sure that we change all students records. Insert Anomalies –May not be possible to add a student unless they’re enrolled in a course. Delete Anomalies –If all the students enrolled in a course are deleted, then we lose the room number. Faloutsos/PavloCMU SCS /6155

CMU SCS Example Faloutsos/PavloCMU SCS /6156 STUDENTSCOURSES This Week: why this decomposition is better and how to find it. studentIdnameaddress 123ChristosPittsburgh 456TupacLos Angeles 789ObamaChicago 012Waka FlockaAtlanta studentIdcourseIdgrade A B A A ROOMS courseIdroom GHC GHC 8102

CMU SCS Today’s Class Motivation Functional Dependencies Armstrong’s Axioms Closures Canonical Cover Faloutsos/PavloCMU SCS /6157

CMU SCS Functional Dependencies A form of a constraint: –Part of the schema to define a valid instance. Definition: X→Y –The value of ‘X’ functionally defines the value of ‘Y’. Faloutsos/PavloCMU SCS /6158

CMU SCS Functional Dependencies Formal Definition: –X→Y ⇒ (t 1 [x] = t 2 [x] ⇒ t 1 [y] = t 2 [y]) –If two tuples agree on the ‘X’ attribute, then they must agree on the ‘Y’ attribute too. Faloutsos/PavloCMU SCS /6159 studentId→name studentIdnameaddress 123ChristosPittsburgh 456TupacLos Angeles 789ObamaChicago 012Waka FlockaAtlanta

CMU SCS Functional Dependencies FD is a constraint, that it says that it allows instances for which where the FD holds. You can check if an FD is violated by an instance, but cannot prove that an FD is part of the schema using an instance. Faloutsos/PavloCMU SCS /61510 studentId→name studentIdnameaddress 123ChristosPittsburgh 456TupacLos Angeles 789ObamaChicago 012Waka FlockaAtlanta address→name ??

CMU SCS Functional Dependencies Note that the two FDs X→Y and X→Z can be written in shorthand as X→YZ. But XY→Z is not the same as the two FDs X→Z and Y→Z. Faloutsos/PavloCMU SCS /61511

CMU SCS Defining FDs in SQL Faloutsos/PavloCMU SCS /61512 CREATE ASSERTION student-name CHECK (NOT EXISTS (SELECT * FROM students AS s1, students AS s2 WHERE s1.studentId = s2.studentId AND s1.name <> s2.name)) Make sure that no two students ever have the same id without the same name. FD: studentId → name

CMU SCS Combining FDs in SQL Faloutsos/PavloCMU SCS /61513 CREATE ASSERTION student-name-address CHECK (NOT EXISTS (SELECT * FROM students AS s1, students AS s2 WHERE s1.studentId = s2.studentId AND ((s1.name <> s2.name OR (s1.address <> s2.address))) Make sure that no two students ever have the same id without the same name and address. FD 1 : studentId → name FD 2 : studentId → address

CMU SCS SQL Assertions WARNING: No major DBMS supports SQL-92 assertions. Why? Faloutsos/PavloCMU SCS /61514

CMU SCS Defining FDs in IBM DB2 Faloutsos/PavloCMU SCS /61515 CREATE TABLE students ( studentId INT PRIMARY KEY, name VARCHAR(32), ⋮ CONSTRAINT student_name CHECK (name) DETERMINED BY (studentId) ); CREATE TABLE students ( studentId INT PRIMARY KEY, name VARCHAR(32), ⋮ CONSTRAINT student_name CHECK (name) DETERMINED BY (studentId) ); FD: studentId → name 01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.sql.ref.doc/doc/r html?cp=SSEPGG_9.7.0%2F &lang=en

CMU SCS Why Should I Care? FDs seem important, but what can we actually do with them? They allow us to decide whether a database design is correct. –Note that this different then the question of whether it’s a good idea for performance… Faloutsos/PavloCMU SCS /61516

CMU SCS Implied Dependencies Faloutsos/PavloCMU SCS /61517 studentIdcourseIdgradenameaddress AChristosPittsburgh BTupacLos Angeles AObamaChicago AWaka FlockaAtlanta Students(studentId, courseId, grade, name, address) studentId → name, address studentId, courseId → grade Provided FDs studentId, courseId → grade, name, address studentId, courseId → studentId Implied FDs These holds for any instance!

CMU SCS Another Example Faloutsos/PavloCMU SCS /61518 namecolorcategorydeptprice GizmoGreenGadgetToys9.99 WidgetBlackGadgetToys49.99 GizmoGreenSquirrelsGarden19.99 Product(name, color, category, dept, price) name → color category → dept color, category → price Provided FDs name, category → price Implied FDs

CMU SCS Implied Dependencies Q: Given a set of FDs {f 1, … f n }, how do we decide whether FD g holds? A: Compute the closure using Armstrong’s Axioms (chapter 19.3): –Reflexivity –Augmentation –Transitivity Faloutsos/PavloCMU SCS /61519 The set of all implied FDs

CMU SCS Armstrong’s Axioms – Reflexivity If X ⊇ Y, then X→Y. Example: studentId, name → studentId Faloutsos/PavloCMU SCS /61520

CMU SCS Armstrong’s Axioms – Augmentation If X→Y, then XZ→YZ for any Z. Example: If studentId → name, then studentId, grade → name, grade Faloutsos/PavloCMU SCS /61521

CMU SCS Armstrong’s Axioms – Transitivity If X→Y and Y→Z, then X→Z. Example: If studentId→address and address →taxRate, then studentId→taxRate Faloutsos/PavloCMU SCS /61522

CMU SCS Armstrong’s Axioms Reflexivity: –X ⊇ Y ⇒ X→Y Augmentation: –X→Y ⇒ XZ→YZ Transitivity: –(X→Y) ∧ (Y→Z) ⇒ X→Z Faloutsos/PavloCMU SCS /61523

CMU SCS Additional Rules Union: –(X→Y) ∧ (X→Z) ⇒ X→YZ Decomposition: –X→YZ ⇒ (X→Y) ∧ (X→Z) Pseudo-transitivity: –(X→Y) ∧ (YW→Z) ⇒ XW→Z Faloutsos/PavloCMU SCS /61524

CMU SCS Closures Given a set F of FDs {f 1, … f n }, we define the closure F+ is the set of all implied FDs. studentId, courseId → grade studentId → name, address F studentId, name → studentId studentId → name studentId → address studentId, grade → name, grade studentId, courseId→ grade, name ⋮ Students(studentId, courseId, grade, name, address) Augmentation Reflexivity Decomposition Transitivity F+:

CMU SCS Another Example Faloutsos/PavloCMU SCS /61526 Product(name, color, category, dept, price) name → color category → dept color, category → price Provided FDs color, category → color, price color, category → color, category, price name, category → color, category, price name, category → name, color, category, dept, price Implied FDs Reflexivity Transitivity Reflexivity

CMU SCS Why Do We Need the Closure? With closure we can find all FD’s easily. We can then compute the attribute closure –For a given attribute X, the attribute closure X+ is the set of all attributes such that X→A can be inferred using the Armstron Axioms. To check if X→A, –Compute X+ –Check if A ∊ X+ Faloutsos/PavloCMU SCS /61527

CMU SCS But Again, Why Should I Care? Maintaining the closure at runtime is expensive: –The DBMS has to check all the constraints for every insert, update, delete operation. We want a minimal set of FDs that was enough to ensure correctness. Faloutsos/PavloCMU SCS /61528

CMU SCS Canonical Cover Given a set F of FDs {f 1, … f n }, we define the closure Fc is the minimal set of all FDs. Faloutsos/PavloCMU SCS /61529 studentId, courseId → grade studentId→ name, address studentId, name→ name, address studentId, courseId→ grade, name F Fc

CMU SCS Canonical Cover Definition Three properties for the canonical cover Fc: 1.The RHS of every FD is a single attribute. 2.The closure of Fc is identical to the closure of F (i.e., Fc = F are equivalent). 3.The Fc is minimal (i.e., if we eliminate any attribute from the LHS or RHS of a FD, property #2 is violated. Faloutsos/PavloCMU SCS /61530

CMU SCS Canonical Cover Definition For #3, we need to eliminate all extraneous attributes from our set of FDs. –An attribute is “extraneous” if the closure is the same, before and after its elimination. Faloutsos/PavloCMU SCS /61531

CMU SCS Computing the Canonical Cover Given a set F of FDs, examine each FD: –Drop extraneous LHS or RHS attributes; or redundant FDs –Make sure that FDs have a single attribute in their RHS Repeat until no change Faloutsos/PavloCMU SCS /61532

CMU SCS Computing the Canonical Cover Faloutsos/PavloCMU SCS /61533 Split (2) AB → C (1) A → BC (2) B → C (3) A → B (4) F: AB → C (1) A → B (2’) A → C (2’’) B → C (3) A → B (4) F1:F1:

CMU SCS Computing the Canonical Cover Faloutsos/PavloCMU SCS /61534 Eliminate (2’) AB → C (1) A → C (2’’) B → C (3) A → B (4) F2:F2: AB → C (1) A → B (2’) A → C (2’’) B → C (3) A → B (4) F1:F1:

CMU SCS Computing the Canonical Cover Faloutsos/PavloCMU SCS /61535 Eliminate (2’’) AB → C (1) B → C (3) A → B (4) F3:F3: AB → C (1) A → C (2’’) B → C (3) A → B (4) F2:F2: Implied by (4)+(3) through transitivity

CMU SCS Computing the Canonical Cover Faloutsos/PavloCMU SCS /61536 Eliminate A from (1) B → C (1’) B → C (3) A → B (4) F4:F4: AB → C (1) B → C (3) A → B (4) F4:F4: X Implied by (4)+(3)

CMU SCS Computing the Canonical Cover Faloutsos/PavloCMU SCS /61537 Eliminate (1’) B → C (3) A → B (4) F5:F5: B → C (1’) B → C (3) A → B (4) F4:F4: ✓ Nothing is extraneous ✓ All RHS are single attributes ✓ Final & original set of FDs are equivalent (same closure) Fc

CMU SCS No Really, Why Should I Care? The canonical cover is the minimum number of assertions that we need to implement to make sure that our database integrity is correct. Allows us to find the super key for a relation. Faloutsos/PavloCMU SCS /61538

CMU SCS Relational Model: Keys Super Key: –Any set of attributes in a relation that functionally determines all attributes in the relation. Candidate Key: –Any super key such that the removal of any attribute leaves a set that does not functionally determine all attributes. Faloutsos/PavloCMU SCS /61539

CMU SCS Relational Model: Keys Super Key: –Set of fields for which there are no two distinct tuples that have the same values for the attributes in this set. Candidate Key: –Set of fields that uniquely identifies a tuple according to a key constraint. Faloutsos/PavloCMU SCS /61540

CMU SCS But Why Care About Super Keys? It is going to help us determine whether it’s okay to split a table into multiple sub-tables. Super keys ensure that we are able to recreate the original relation through joins. Faloutsos/PavloCMU SCS /61541

CMU SCS Super Key Example Faloutsos/PavloCMU SCS /61542 namecolorcategorydeptprice GizmoGreenGadgetToys9.99 WidgetBlackGadgetToys49.99 GizmoGreenSquirrelsGarden19.99 Product(name, color, category, dept, price) name → color category → dept color, category → price Provided FDs name, category → price Implied FDs Super Key!

CMU SCS Summary How do we guarantee that F = F’? –Closures How do we find a minimal F’ for F? –Canonical Cover Faloutsos/PavloCMU SCS /61543