Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Lecture 21 CS 157 B Revision of Midterm3 Prof. Sin-Min Lee.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
1 Chapter 6 Relational Normalization Theory. 2 Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does.
1 Relational Normalization Theory Chapter 8. 2 Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does.
Functional Dependencies and Normalization for Relational Databases.
CSCI 4333 Database Design and Implementation Review for Midterm Exam II Xiang Lian The University of Texas – Pan American Edinburg, TX 78539
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
1 Relational Normalization Theory. 2 Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
Relational Normalization Theory
Murali Mani Normalization. Murali Mani What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert,
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. Relational Database Design - part 2 - Database Management Systems I Alex Coman,
1 Database Design Theory Which tables to have in a database Normalization.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
1 Triggers: Correction. 2 Mutating Tables (Explanation) The problems with mutating tables are mainly with FOR EACH ROW triggers STATEMENT triggers can.
Introduction to Schema Refinement
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
CSCD34 - Data Management Systems - A. Vaisman1 Schema Refinement and Normal Forms.
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Chapter 6 Database Design with the Relational Normalization Theory.
11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
1 Functional Dependencies and Normalization Chapter 15.
Database Systems/COMP4910/Spring02/Melikyan1 Schema Refinement and Normal Forms.
1 Schema Refinement and Normal Forms Week 6. 2 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Design Process - Where are we?
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Normalization.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
1 Schema Refinement and Normal Forms Chapter The Evils of Redundancy  Redundancy is at the root of several problems associated with relational.
IST Database Normalization Todd Bacastow IST 210.
Ch 7: Normalization-Part 1
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
ITD1312 Database Principles Chapter 4C: Normalization.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSCI 6315 Applied Database Systems Review for Midterm Exam II Xiang Lian The University of Texas Rio Grande Valley Edinburg, TX 78539
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 Normalization Theory. 2 Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide a way.
11/16/2016CENG352 Database Management Systems1 Relational Normalization Theory.
Relational Database Design by Dr. S. Sridhar, Ph. D
Normalization Murali Mani.
Functional Dependencies and Normalization
Relational Normalization Theory
Normalization.
Presentation transcript:

Relational Normalization Theory

Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide a way of evaluating alternative schemas Normalization theory provides a mechanism for analyzing and refining the schema produced by an E-R design

Redundancy Dependencies between attributes cause redundancy –Ex. All addresses in the same town have the same zip code SSN Name Town Zip 1234 Joe Stony Brook Mary Stony Brook Tom Stony Brook …………………. Redundancy

Redundancy and Other Problems Set valued attributes in the E-R diagram result in multiple rows in corresponding table PersonExample: Person (SSN, Name, Address, Hobbies) Person –A person entity with multiple hobbies yields multiple rows in table Person Hence, the association between Name and Address for the same person is stored redundantly –SSN is key of entity set, but (SSN, Hobby) is key of corresponding relation PersonThe relation Person can’t describe people without hobbies

Example SSN Name Address Hobby 1111 Joe 123 Main biking 1111 Joe 123 Main hiking ……………. SSN Name Address Hobby 1111 Joe 123 Main {biking, hiking} ER Model Relational Model Redundancy

Anomalies Redundancy leads to anomalies: –Update anomaly: A change in Address must be made in several places –Deletion anomaly: Suppose a person gives up all hobbies. Do we: Set Hobby attribute to null? No, since Hobby is part of key Delete the entire row? No, since we lose other information in the row –Insertion anomaly: Hobby value must be supplied for any inserted row since Hobby is part of key

Decomposition PersonSolution: use two relations to store Person information – Person1 – Person1 (SSN, Name, Address) – Hobbies – Hobbies (SSN, Hobby) The decomposition is more general: people with hobbies can now be described No update anomalies: – Name and address stored once – A hobby can be separately supplied or deleted

Normalization Theory Result of E-R analysis need further refinement Appropriate decomposition can solve problems normalization theory functional dependencies multivalued dependenciesThe underlying theory is referred to as normalization theory and is based on functional dependencies (and other kinds, like multivalued dependencies)

Functional Dependencies functional dependency Definition: A functional dependency (FD) on a relation schema R is a constraint X  Y, where X and Y are subsets of attributes of R. satisfied Definition: An FD X  Y is satisfied in an instance r of R if for every pair of tuples, t and s: if t and s agree on all attributes in X then they must agree on all attributes in Y – Key constraint is a special kind of functional dependency: all attributes of relation occur on the right-hand side of the FD: SSN  SSN, Name, Address

Functional Dependencies Address  ZipCode – Stony Brook’s ZIP is ArtistName  BirthYear – Picasso was born in 1881 Author, Title  PublDate – Shakespeare’s Hamlet published in 1600

Functional Dependency - Example Brokerage firm allows multiple clients to share an account, but each account is managed from a single office and a client can have no more than one account in an office –HasAccount –HasAccount (AcctNum, ClientId, OfficeId) keys are (ClientId, OfficeId), (AcctNum, ClientId) –Client, OfficeId  AcctNum –AcctNum  OfficeId Thus, attribute values need not depend only on key values

Armstrong’s Axioms for FDs This is the syntactic way of computing/testing the various properties of FDs Reflexivity: If Y  X then X  Y (trivial FD) –Name, Address  Name Augmentation: If X  Y then X Z  YZ –If Town  Zip then Town, Name  Zip, Name Transitivity: If X  Y and Y  Z then X  Z

Normal Forms Each normal form is a set of conditions on a schema that guarantees certain properties (relating to redundancy and update anomalies) First normal form (1NF) is the same as the definition of relational model (relations = sets of tuples; each tuple = sequence of atomic values) Second normal form (2NF) – Not used. third normal formBoyce-Codd normal form The two commonly used normal forms are third normal form (3NF) and Boyce-Codd normal form (BCNF)

BCNF Definition: A relation schema R is in BCNF if for every FD X  Y associated with R either – Y  X (i.e., the FD is trivial) or – X is a superkey of R Person1 Example: Person1(SSN, Name, Address) – The only FD is SSN  Name, Address Person1 – Since SSN is a key, Person1 is in BCNF

(non) BCNF Examples PersonPerson (SSN, Name, Address, Hobby) –The FD SSN  Name, Address does not satisfy requirements of BCNF since the key is (SSN, Hobby) HasAccountHasAccount (AccountNumber, ClientId, OfficeId) –The FD AcctNum  OfficeId does not satisfy BCNF requirements since keys are (ClientId, OfficeId) and (AcctNum, ClientId)

Third Normal Form A relational schema R is in 3NF if for every FD X  Y associated with R either: – Y  X (i.e., the FD is trivial); or – X is a superkey of R; or – Every A  Y is part of some key of R 3NF is weaker than BCNF (every schema that is in BCNF is also in 3NF) BCNF conditions

Redundancy Suppose R has a FD A  B. If an instance has 2 rows with same value in A, they must also have same value in B (=> redundancy, if the A-value repeats twice) If A is a superkey, there cannot be two rows with same value of A – Hence, BCNF eliminates redundancy SSN  Name, Address SSN Name Address Hobby 1111 Joe 123 Main stamps 1111 Joe 123 Main coins redundancy

3NF Example HasAccount HasAccount (AcctNum, ClientId, OfficeId) – ClientId, OfficeId  AcctNum OK since LHS contains a key – AcctNum  OfficeId OK since RHS is part of a key HasAccount HasAccount is in 3NF but it might still contain redundant information due to AcctNum  OfficeId (which is not allowed by BCNF)

3NF Example HasAccount HasAccount might store redundant data: ClientId OfficeId AcctNum 1111 Stony Brook Stony Brook Stony Brook ClientId AcctNum BCNF (only trivial FDs) Decompose to eliminate redundancy: OfficeId AcctNum Stony Brook BCNF: AcctNum is key FD: AcctNum  OfficeId 3NF: OfficeId part of key FD: AcctNum  OfficeId redundancy

3NF (Non) Example PersonPerson (SSN, Name, Address, Hobby) – (SSN, Hobby) is the only key. – SSN  Name violates 3NF conditions since Name is not part of a key and SSN is not a superkey

Lossy Decomposition SSN Name Address SSN Name Name Address 1111 Joe 1 Pine 1111 Joe Joe 1 Pine 2222 Alice 2 Oak 2222 Alice Alice 2 Oak 3333 Alice 3 Pine 3333 Alice Alice 3 Pine The tuples (2222, Alice, 3 Pine) and (3333, Alice, 2 Oak) are in the join, but not in the original

Lossy Decompositions: What is Actually Lost? In the previous example, the tuples (2222, Alice, 3 Pine) and (3333, Alice, 2 Oak) were gained, not lost! –Why do we say that the decomposition was lossy? What was lost is information: –That 2222 lives at 2 Oak: In the decomposition, 2222 can live at either 2 Oak or 3 Pine –That 3333 lives at 3 Pine: In the decomposition, 3333 can live at either 2 Oak or 3 Pine

Dependency Preserving Decomposition Schema (R, F) where R = {SSN, Name, Address, Hobby} F = {SSN  Name, Address} can be decomposed into R 1 = {SSN, Name, Address} F 1 = {SSN  Name, Address} and R 2 = {SSN, Hobby} F 2 = { } Since F = F 1  F 2 the decomposition is dependency preserving

Algorithms BCNF decomposition algorithm –Dependencies are not always preserved –Lossless (information is not lost) 3NF Synthesis algorithm –Dependencies are always preserved –Lossless

Normalization Drawbacks By limiting redundancy, normalization helps maintain consistency and saves space But performance of querying can suffer because related information that was stored in a single relation is now distributed among several Example: A join is required to get the names and grades of all students taking CS305 in S2002. SELECT S.Name, T.Grade StudentTranscript FROM Student S, Transcript T WHERE S.Id = T.StudId AND T.CrsCode = ‘CS305’ AND T.Semester = ‘S2002’

Denormalization Tradeoff: Judiciously introduce redundancy to improve performance of certain queries Transcript Example: Add attribute Name to Transcript –Join is avoided Transcript –If queries are asked more frequently than Transcript is modified, added redundancy might improve average performance Transcript ’ –But, Transcript ’ is no longer in BCNF since key is (StudId, CrsCode, Semester) and StudId  Name SELECT T.Name, T.Grade Transcript ’ FROM Transcript ’ T WHERE T.CrsCode = ‘CS305’ AND T.Semester = ‘S2002’

Fourth Normal Form Relation has redundant data Yet it is in BCNF (since there are no non-trivial FDs) Redundancy is due to set valued attributes (in the E-R sense), not because of the FDs SSN PhoneN ChildSSN redundancy Person