CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.

Slides:



Advertisements
Similar presentations
primary key constraint foreign key constraint
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala N ATIONAL I NSTITUTE OF T ECHNOLOGY A GARTALA Aug-Dec,2010 Normalization 2 CSE-503 :: D ATABASE.
NORMALIZATION. Normalization Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Database Management COP4540, SCS, FIU Functional Dependencies (Chapter 14)
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
Functional Dependencies and Normalization for Relational Databases.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Functional Dependencies and Normalization for Relational Databases
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Databases 6: Normalization
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization.
King Saud University College of Computer & Information Sciences Computer Science Department CS 380 Introduction to Database Systems Functional Dependencies.
DatabaseIM ISU1 Chapter 10 Functional Dependencies and Normalization for RDBs Fundamentals of Database Systems.
Topic 10 Functional Dependencies and Normalization for Relational Databases Faculty of Information Science and Technology Mahanakorn University of Technology.
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
Chapter Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#16: Schema Refinement & Normalization.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.
1 Functional Dependencies. 2 Motivation v E/R  Relational translation problems : –Often discover more “detailed” constraints after translation (upcoming.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CS 405G: Introduction to Database Systems
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
CS 405G: Introduction to Database Systems Database Normalization.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Ch 7: Normalization-Part 1
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Al-Imam University Girls Education Center Collage of Computer Science 1 st Semester, 1432/1433H Chapter 10_part 1 Functional Dependencies and Normalization.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
CS 505: Intermediate Topics to Database Systems
Functional Dependency and Normalization
CS422 Principles of Database Systems Normalization
CS422 Principles of Database Systems Normalization
Functional Dependencies and Normalization for RDBs
3.1 Functional Dependencies
Functional Dependencies and Normalization
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
Presentation transcript:

CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009

1/15/2016Jinze University of Kentucky2 Topic Functional Dependency. Normalization Decomposition BCNF

1/15/2016Jinze University of Kentucky3 Motivation How do we tell if a design is bad, e.g., WorkOn(EID, Ename, PID, Pname, Hours)? This design has redundancy, because the name of an employee is recorded multiple times, once for each project the employee is taking EIDPIDEnamePnameHours John SmithB2B platform Ben LiuCRM John SmithCRM Susan SidhukB2B platform40

1/15/2016Jinze University of Kentucky4 Why redundancy is bad? Waste disk space. What if we want to perform update operations to the relation INSERT an new project that no employee has been assigned to it yet. UPDATE the name of “John Smith” to “John L. Smith” DELETE the last employee who works for a certain project EIDPIDEnamePnameHours John SmithB2B platform Ben LiuCRM John SmithCRM Susan SidhukB2B platform40

1/15/2016Jinze University of Kentucky5 Functional dependencies A functional dependency (FD) has the form X -> Y, where X and Y are sets of attributes in a relation R X -> Y means that whenever two tuples in R agree on all the attributes in X, they must also agree on all attributes in Y t 1 [X] = t 2 [X]  t 1 [Y] = t 2 [Y] XYZ abc a?? XYZ abc ab? Must be “b” Could be anything, e.g. d XYZ abc abd

1/15/2016Jinze University of Kentucky6 FD examples Address (street_address, city, state, zip) street_address, city, state -> zip zip -> city, state zip, state -> zip? This is a trivial FD Trivial FD: LHS RHS zip -> state, zip? This is non-trivial, but not completely non-trivial Completely non-trivial FD: LHS ∩ RHS = ?

1/15/2016Jinze University of Kentucky7 Keys redefined using FD’s Let attr(R) be the set of all attributes of R, a set of attributes K is a (candidate) key for a relation R if K -> attr(R) - K, and That is, K is a “super key” No proper subset of K satisfies the above condition That is, K is minimal (full functional dependent) Address (street_address, city, state, zip) {street_address, city, state, zip} {street_address, city, zip} {street_address, zip} {zip} Super key Key Non-key

1/15/2016Jinze University of Kentucky8 Reasoning with FD’s Given a relation R and a set of FD’s F Does another FD follow from F ? Are some of the FD’s in F redundant (i.e., they follow from the others)? Is K a key of R? What are all the keys of R?

1/15/2016Jinze University of Kentucky9 Attribute closure Given R, a set of FD’s F that hold in R, and a set of attributes Z in R: The closure of Z (denoted Z + ) with respect to F is the set of all attributes {A 1, A 2, …} functionally determined by Z (that is, Z -> A 1 A 2 …) Algorithm for computing the closure Start with closure = Z If X -> Y is in F and X is already in the closure, then also add Y to the closure Repeat until no more attributes can be added

1/15/2016Jinze University of Kentucky10 A more complex example WorkOn(EID, Ename, , PID, Pname, Hours) EID -> Ename, -> EID PID -> Pname EID, PID -> Hours (Not a good design, and we will see why later)

1/15/2016Jinze University of Kentucky11 Example of computing closure F includes: EID -> Ename, -> EID PID -> Pname EID, PID -> Hours { PID, } + = ? closure = { PID, } -> EID Add EID; closure is now { PID, , EID } EID -> Ename, Add Ename, ; closure is now { PID, , EID, Ename } PID -> Pname Add Pname; close is now { PID, Pname, , EID, Ename } EID, PID -> hours Add hours; closure is now all the attributes in WorksOn

1/15/2016Jinze University of Kentucky12 Using attribute closure Given a relation R and set of FD’s F Does another FD X -> Y follow from F ? Compute X + with respect to F If Y X +, then X -> Y follow from F Is K a super key of R? Compute K + with respect to F If K + contains all the attributes of R, K is a super key Is a super key K a key of R? Test where K’ = K – { a | a  K} is a superkey of R for all possible a

1/15/2016Jinze University of Kentucky13 Rules of FD’s Armstrong’s axioms Reflexivity: If Y X, then X -> Y Augmentation: If X -> Y, then XZ -> YZ for any Z Transitivity: If X -> Y and Y -> Z, then X -> Z Rules derived from axioms Splitting: If X -> YZ, then X -> Y and X -> Z Combining: If X -> Y and X -> Z, then X -> YZ

1/15/2016Jinze University of Kentucky14 Using rules of FD’s Given a relation R and set of FD’s F Does another FD X -> Y follow from F ? Use the rules to come up with a proof Example: F includes: EID -> Ename, ; -> EID; EID, PID -> Hours, Pid -> Pname PID, -> hours? -> EID (given in F ) PID, -> PID, EID (augmentation) PID, EID -> hours (given in F ) PID, -> hours (transitivity)

1/15/2016Jinze University of Kentucky15 Example of redundancy WorkOn (EID, Ename, , PID, hour) We say X -> Y is a partial dependency if there exist a X’  X such that X’ -> Y e.g. EID, -> Ename, Otherwise, X -> Y is a full dependency e.g. EID, PID -> hours EIDPIDEname PnameHours John platform Ben 12349John Susan platform40

1/15/2016Jinze University of Kentucky16 Normalization A normalization is the process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations A normal form is a certification that tells whether a relation schema is in a particular state

1/15/2016Jinze University of Kentucky17 2 nd Normal Form An attribute A of a relation R is a nonprimary attribute if it is not part of any key in R, otherwise, A is a primary attribute. R is in (general) 2 nd normal form if every nonprimary attribute A in R is not partially functionally dependent on any key of R XYZW abce bbcf cbcg X, Y -> Z, W Y -> Z (X, Y, W) (Y, Z) 

1/15/2016Jinze University of Kentucky18 2 nd Normal Form Note about 2 Nd Normal Form by definition, every nonprimary attribute is functionally dependent on every key of R In other words, R is in its 2 nd normal form if we could not find a partial dependency of a nonprimary key to a key in R.

1/15/2016Jinze University of Kentucky19 Decomposition Decomposition eliminates redundancy To get back to the original relation:  EIDPIDEname PnameHours John platform Ben 12349John Susan platform40 Decomposition EIDEname 1234John 1123Ben 1023Susan EIDPIDPnameHours B2B platform CRM CRM B2B platform40 Foreign key

1/15/2016Jinze University of Kentucky20 Decomposition Decomposition may be applied recursively EIDPIDPnameHours B2B platform CRM CRM B2B platform40 PIDPname 10B2B platform 9CRM EIDPIDHours

1/15/2016Jinze University of Kentucky21 Unnecessary decomposition Fine: join returns the original relation Unnecessary: no redundancy is removed, and now EID is stored twice-> EIDEname 1234John 1123Ben 1023Susan EIDEname 1234John Smith 1123Ben Liu 1023Susan Sidhuk EID

1/15/2016Jinze University of Kentucky22 Bad decomposition Association between PID and hours is lost Join returns more rows than the original relation EIDPIDHours EIDPID EIDHours

1/15/2016Jinze University of Kentucky23 Lossless join decomposition Decompose relation R into relations S and T attrs(R) = attrs(S) attrs(T) S = π attrs(S) ( R ) T = π attrs(T) ( R ) The decomposition is a lossless join decomposition if, given known constraints such as FD’s, we can guarantee that R = S  T Any decomposition gives R S T (why?) A lossy decomposition is one with R S T

1/15/2016Jinze University of Kentucky24 Loss? But I got more rows-> “Loss” refers not to the loss of tuples, but to the loss of information Or, the ability to distinguish different original tuples EIDPIDHours EIDPID EIDHours

1/15/2016Jinze University of Kentucky25 Questions about decomposition When to decompose How to come up with a correct decomposition (i.e., lossless join decomposition)

1/15/2016Jinze University of Kentucky26 Non-key FD’s Consider a non-trivial FD X -> Y where X is not a super key Since X is not a super key, there are some attributes (say Z) that are not functionally determined by X That b is always associated with a is recorded by multiple rows: redundancy, update anomaly, deletion anomaly XYZ abc abd

1/15/2016Jinze University of Kentucky27 Dealing with Nonkey Dependency: BCNF A relation R is in Boyce-Codd Normal Form if For every non-trivial FD X -> Y in R, X is a super key That is, all FDs follow from “key -> other attributes” When to decompose As long as some relation is not in BCNF How to come up with a correct decomposition Always decompose on a BCNF violation (details next)  Then it is guaranteed to be a lossless join decomposition- >

1/15/2016Jinze University of Kentucky28 BCNF decomposition algorithm Find a BCNF violation That is, a non-trivial FD X -> Y in R where X is not a super key of R Decompose R into R 1 and R 2, where R 1 has attributes X Y R 2 has attributes X Z, where Z contains all attributes of R that are in neither X nor Y (i.e. Z = attr(R) – X – Y) Repeat until all relations are in BCNF

1/15/2016Jinze University of Kentucky29 BCNF decomposition example WorkOn (EID, Ename, , PID, hours) BCNF violation: EID -> Ename, Student (EID, Ename, ) Grade (EID, PID, hours) BCNF

1/15/2016Jinze University of Kentucky30 Another example WorkOn (EID, Ename, , PID, hours) BCNF violation: -> EID StudentID ( , EID) StudentGrade’ ( , Ename, PID, hours) BCNF BCNF violation: -> Ename StudentName ( , Ename) Grade ( , PID, hours) BCNF

1/15/2016Jinze University of Kentucky31 Exercise Property(Property_id#, County_name, Lot#, Area, Price, Tax_rate) Property_id# -> County_name, Lot#, Area, Price, Tax_rate County_name, Lot# -> Property_id#, Area, Price, Tax_rate County_name -> Tax_rate area -> Price

1/15/2016Jinze University of Kentucky32 Exercise Property(Property_id#, County_name, Lot#, Area, Price, Tax_rate) BCNF violation: County_name -> Tax_rate LOTS1 (County_name, Tax_rate ) LOTS2 (Property_id#, County_name, Lot#, Area, Price) BCNF violation: Area -> Price LOTS2A (Area, Price) LOTS2B (Property_id#, County_name, Lot#, Area) BCNF

1/15/2016Jinze University of Kentucky33 Why is BCNF decomposition lossless Given non-trivial X -> Y in R where X is not a super key of R, need to prove: Anything we project always comes back in the join: R π XY ( R ) π XZ ( R ) Sure; and it doesn’t depend on the FD Anything that comes back in the join must be in the original relation: R π XY ( R ) π XZ ( R ) Proof makes use of the fact that X -> Y

1/15/2016Jinze University of Kentucky34 Recap Functional dependencies: a generalization of the key concept Partial dependencies: a source of redundancy Use 2 nd Normal form to remove partial dependency Non-key functional dependencies: a source of redundancy BCNF decomposition: a method for removing ALL functional dependency related redundancies Plus, BNCF decomposition is a lossless join decomposition