Schema Refinement & Normalization Theory

Slides:



Advertisements
Similar presentations
primary key constraint foreign key constraint
Advertisements

Schema Refinement and Normal Forms
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
CS Schema Refinement and Normal Forms Chapter 19.
Normalization DB Tuning CS186 Final Review Session.
Normalization DB Tuning CS186 Final Review Session.
Schema Refinement and Normal Forms
H.Lu/HKUST L02: Logical Database Design  Introduction  Functional Dependencies  Normal Forms  Normalization by decomposition.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Schema Refinement and Normal Forms. The Evils of Redundancy v Redundancy is at the root of several problems associated with relational schemas: – redundant.
1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow.
1 Schema Refinement and Normal Forms Yanlei Diao UMass Amherst April 10, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Functional Dependencies CS 186, Spring 2006, Lecture 21 R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another.
1 Database Theory and Methodology. 2 The Good and the Bad So far we have not developed any measure of “goodness” to measure the quality of the design,
FUNCTIONAL DEPENDENCIES. Chapter Outline 1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant Information.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Schema Refinement and Normal Forms Chapter 19 Instructor: Mirsad Hadzikadic.
Schema Refinement and Normal Forms Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
CSCD34 - Data Management Systems - A. Vaisman1 Schema Refinement and Normal Forms.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
1 Schema Refinement, Normalization, and Tuning. 2 Design Steps v The design steps: 1.Real-World 2. ER model 3. Relational Schema 4. Better relational.
LECTURE 5: Schema Refinement and Normal Forms SOME OF THESE SLIDES ARE BASED ON YOUR TEXT BOOK.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
R EVIEW. 22 Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
Functional Dependencies And Normalization R&G Chapter 19.
Database Systems/COMP4910/Spring02/Melikyan1 Schema Refinement and Normal Forms.
1 IT 244 Database Management System Topic 7 The Relational Data Model Functional Dependencies Ref : -A First Course in Database System (Jeffrey D Ullman,
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 Schema Refinement and Normal Forms Week 6. 2 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependencies. 2 Motivation v E/R  Relational translation problems : –Often discover more “detailed” constraints after translation (upcoming.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 15.
Functional Dependencies R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another. Thomas Hobbes ( )
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
ICS 421 Spring 2010 Relational Model & Normal Forms Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 1/19/20101Lipyeow.
1 Schema Refinement and Normal Forms Chapter The Evils of Redundancy  Redundancy is at the root of several problems associated with relational.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 Schema Refinement and Normal Forms Chapter The Evils of Redundancy  Redundancy causes several problems associated with relational schemas: 
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 CS122A: Introduction to Data Management Lecture #12: Relational DB Design Theory (1) Instructor: Chen Li.
LECTURE 5: Schema Refinement and Normal Forms 1. Redundency Problem Redundent Storage Update Anomaly If one copy updated, an inconsistancy may occur Insertion.
1 CS122A: Introduction to Data Management Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Schema Refinement & Normal Forms.
Functional Dependency and Normalization
Functional Dependencies and Schema Refinement
Introduction to Database Systems Functional Dependencies
Schema Refinement and Normal Forms
UNIT - 3 DATABASE DESIGN Functional Dependencies
Functional Dependencies, BCNF and Normalization
Functional Dependencies, BCNF and Normalization
Functional Dependencies
Schema Refinement and Normal Forms
LECTURE 5: Schema Refinement and Normal Forms
Functional Dependencies
Schema Refinement and Normalization
Normalization cs3431.
Chapter 19 (part 1) Functional Dependencies
Schema Refinement and Normal Forms
Schema Refinement and Normalization
Schema Refinement and Normal Forms
Relational Database Theory
CSC 453 Database Systems Lecture
Presentation transcript:

Schema Refinement & Normalization Theory Lecture Week 11 INFS-614 Fall 2008 1

What’s the Problem Consider the relation (call it SNLRWH) Hourly_Emps(ssn, name, lot, rating, hrly_wages, hrs_worked) What if we know rating determines hrly_wages?

Redundancy When part of data can be derived from other parts, we say redundancy exists. Example: the hrly_wage of Smiley can be derived from the hrly_wage of Attishoo because they have the same rating and we know rating determines hrly_wage. Redundancy exists because of the existence of integrity constraints.

What’s the problem, again Update anomaly: Can we change W in just the 1st tuple of SNLRWH? Insertion anomaly: What if we want to insert an employee and don’t know the hourly wage for his rating? Deletion anomaly: If we delete all employees with rating 5, we loose the information about the wage for rating 5!

What do we do? Since constraints, in particular functional dependencies, cause problems, we need to study them, and understand when and how they cause redundancy. When redundancy exists, refinement is needed. Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD). Decomposition should be used judiciously: Is there a reason to decompose a relation? Normal forms; What problems (if any) does the decomposition cause? Properties of decompositions (lossless-join and dependency-preservation.) 14

Refining an ER Diagram Before: lot dname budget did since name Works_In Departments Employees ssn Before: 1st diagram translated: Workers(S,N,L,D,S) Departments(D,M,B) Lots associated with workers. Suppose all workers in a dept are assigned the same lot. Can fine-tune this: Workers2(S,N,D,S) Departments(D,M,B,L) lot dname budget did since name Works_In Departments Employees ssn After: 18

Functional Dependencies (FDs) Functional dependencies allows us to define new IC: The value of certain attributes depend on the values of a set of attributes in the same relation Example: Emps (ssn, name, lot, rank, salary) rank  salary rank  lot

Functional Dependencies (FDs) A functional dependency (FD) has the form: XY, where X and Y are two sets of attributes. Examples: Ratinghrly_Wage, AB C The FD XY is satisfied by a relation instance r if: for each pair of tuples t1 and t2 in r: t1[X] = t2[X] implies t1[Y] =t2[Y] i.e., given any two tuples in r, if the X values agree, then the Y values must also agree. (X and Y are sets of attributes.) Convention: X, Y, Z etc denote sets of attributes, and A, B, C, etc denote attributes. 15

Functional Dependencies (FDs) The FD holds over relation name R if, for every allowable instance r of R, r satisfies the FD. An FD, as an integrity constraint, is a statement about all allowable relation instances. Must be identified based on semantics of application. Given some instance r1 of R, we can check if it violates some FD f or not But we cannot tell if f holds over R by looking at an instance! This is the same for all integrity constraints!

Example: Constraints on Entity Set Consider relation obtained from Hourly_Emps: Hourly_Emps (ssn, name, lot, rating, hrly_wages, hrs_worked) Notation: We will denote this relation schema by listing the attributes: SNLRWH This is really the set of attributes {S,N,L,R,W,H}. Sometimes, we will refer to all attributes of a relation by using the relation name. (e.g., Hourly_Emps for SNLRWH) Some FDs on Hourly_Emps: ssn is the key: S  SNLRWH rating determines hrly_wages: R  W 16

Refining an ER Diagram R  W FD : Constraints on Entity Set hrs_worked rank hrly_wages lot name Employees ssn A better E-R design name lot ssn hrs_worked rank hrly_wages Employees has-rank Ranks

Decomposition: Example R W 8 10 5 7

One more example AA yes AB AC No AAB AAC ABC AABC A B C 1 2 3 FDs with A as the left side: Satisfied by the relation instance? AA yes AB AC No AAB AAC ABC AABC A B C 1 2 3 How many possible FDs totally on this relation instance? 49.

Violation of FD by a relation The FD XY is NOT satisfied by a relation instance r if: There exists a pair of tuples t1 and t2 in r such that t1[X] = t2[X] but t1[y]  t2[y] i.e., we can find two tuples in r, such that X values agree, but Y values don’t.

Some other FDs A B C 1 2 3 CB yes CAB No BC No [note!] BB Yes Satisfied by the relation instance? CB yes CAB No BC No [note!] BB Yes AC B Yes [note!] …

Relationship between FDs and Keys Given R(A, B, C). AABC means that A is a key. In general, X  R means X is a (super)key. lot dname budget did since name Works_In Departments Employees ssn

Reasoning About FDs Given some FDs, we can usually infer additional FDs: Employees(ssn, name, lot, did, since) ssn name,lot,did,since ssn did, did  lot implies ssn lot A  BC implies A  B An FD f is logically implied by a set of FDs F if f holds whenever all FDs in F hold. F+ = closure of F is the set of all FDs that are implied by F. 19

Reasoning about FDs How do we get all the FDs that are logically implied by a given set of FDs? Armstrong’s Axioms (X, Y, Z are sets of attributes): Reflexivity: If X  Y, then X  Y Augmentation: If X  Y, then XZ  YZ for any Z Transitivity: If X  Y and Y  Z, then X  Z

Armstrong’s axioms Armstrong’s axioms are sound and complete inference rules for FDs! Sound: all the derived FDs (by using the axioms) are those logically implied by the given set Complete: all the logically implied (by the given set) FDs can be derived by using the axioms.

Example of using Armstrong’s Axioms Couple of additional rules (that follow from AA): Union: If X  Y and X  Z, then X  YZ Decomposition: If X  YZ, then X  Y and X  Z Derive the above two by using Armstrong’s axioms!

Reasoning About FDs (Contd.) Example: Contracts(contractid,supplierid,projectid,deptid,partid,qty,value), and: C is the key: C  CSJDPQV A project (jid) purchases a given part using a single contract: JP  C Dept purchases at most one part from a supplier: SD  P JP  C, C  CSJDPQV imply JP  CSJDPQV SD  P implies SDJ  JP SDJ  JP, JP  CSJDPQV imply SDJ  CSJDPQV 20

Reasoning About FDs (Contd.) Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in # attrs!) Typically, we just want to check if a given FD X Y is in the closure of a set of FDs F. An efficient check: Compute attribute closure of X (denoted X+) wrt F: Set of all attributes A such that X  A is in F+ There is a linear time algorithm to compute this. Check if Y is in X+ Does F = {A  B, B  C, C D  E } imply A  E? i.e, is A  E in the closure F+? Equivalently, is E in A+? 21

Computing X+ Input F (a set of FDs), and X (a set of attributes) Output: Result=X+ (under F) Method: Step 1: Result :=X; Step 2: Take Y  Z in F, and Y is in Result, do: Result := Result union Z Repeat step 2 until Result cannot be changed and then output Result.

Example of computing X+ F={A B, AC D, AB C} X=A Result should be X+=ABCD

Computing F+ Given F={ A  B, B  C}. Compute F+ (with attributes A, B, C). A B C AB AC BC ABC  Attribute closure A+=ABC B+=BC C+=C AB+=ABC AC+=ABC BC+=BC ABC+=ABC An entry with  means FD (the row)  (the column) is in F+. An entry gets  when (the column) is in (the row)+

Check if two sets of FDs are equivalent Two sets F and G of FDs are equivalent if they logically imply the same set of FDs. I.e., if F+ = G+, then they are equivalent. For example, F={A B, A C} is equivalent to G={A  BC} How to test? Two steps: Every FD in F is in G+ Every FD in G is in F+ These two steps can use the algorithm (many times) for X+

Equivalences of Sets of FDs Two set of FD’s G and F are equivalent if : G+= F+ Every FD in G can be inferred from F, and vice versa Testing Equivalence of FD Sets For each FD’s X  Y in G, check if X  Y  F+ holds Compute the closure X+ w.r.t. F If Y  X+ then X  Y  F+ For each FD’s W Z in F, check if W  Z  G+ holds Compute the closure W+ w.r.t. G If Z  W+ then W  Z  G+

Example Given F = {BD  E , C  A , E  C, A  E} and G = {C  E ,BD  C}. Are G and F equivalent ? For each FD’s X  Y in G, X  Y  F+ holds ? CF+ = { A,C, E} : C  E  F + (BD)F+ = { B,D,E,C,A} : BD  C  F + For each FD’s W Z in F, W  Z  G+ holds ? (BD)G+ = { B,D,C,E} : BD  E  G + (C)G+ = {C,E} : C  A G + F and G are not equivalent!

Summary Constraints give rise to redundancy Three anomalies FD is a “popular” type of constraint Satisfaction & violation Logical implication Reasoning 9