CS 405G: Introduction to Database Systems Database Normalization.

Slides:



Advertisements
Similar presentations
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
Advertisements

NORMALIZATION. Normalization Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations.
Functional Dependencies and Normalization for Relational Databases.
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Functional Dependency CS157a Sec. 2 Koichiro Hongo.
Chapter 8 Normal Forms Based on Functional Dependencies Deborah Costa Oct 18, 2007.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Boyce-Codd Normal Form Kelvin Nishikawa SE157a-03 Fall 2006 Kelvin Nishikawa SE157a-03 Fall 2006.
Database Normalization Il-Han Yoo CS 157A Professor: Sin-Min Lee.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Databases 6: Normalization
M.P. Johnson, DBMS, Stern/NYU, Sp20041 C : Database Management Systems Lecture #6 Matthew P. Johnson Stern School of Business, NYU Spring, 2004.
Introduction to Schema Refinement
Normalization B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College.
IT420: Database Management and Organization Normalization 31 January 2006 Adina Crăiniceanu
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization.
Database Management COP4540, SCS, FIU Relation Normalization (Chapter 14)
Announcements Read 5.8 – 5.13 for Monday Project Step 3, due Monday 10/18 Homework 4, due Friday 10/15 – by (or turn in Monday in class)
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Normalization for Relational Databases.
Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to.
DatabaseIM ISU1 Chapter 10 Functional Dependencies and Normalization for RDBs Fundamentals of Database Systems.
Schema Refinement and Normal Forms 20131CS3754 Class Notes #7, John Shieh.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Normal Forms through BCNF CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Functional Dependencies and Normalization for Relational Databases.
1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that.
By Abdul Rashid Ahmad. E.F. Codd proposed three normal forms: The first, second, and third normal forms 1NF, 2NF and 3NF are based on the functional dependencies.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
CS 405G: Introduction to Database Systems 25 Exercise Chen Qian University of Kentucky.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#16: Schema Refinement & Normalization.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
CS 405G: Introduction to Database Systems
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Normalization.
Chapter 5.1 and 5.2 Brian Cobarrubia Database Management Systems II January 31, 2008.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Ch 7: Normalization-Part 1
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Al-Imam University Girls Education Center Collage of Computer Science 1 nd Semester, 1432/1433H Chapter 10_part2 Functional Dependencies and Normalization.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
Advanced Database System
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CS 405G: Introduction to Database Systems 13b Exercise Chen Qian University of Kentucky.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
Functional Dependencies and Normalization for Relational Databases تنبيه : شرائح العرض (Slides) هي وسيلة لتوضيح الدرس واداة من الادوات في ذلك. حيث المرجع.
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
COP 6726: New Directions in Database Systems
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
CS 505: Intermediate Topics to Database Systems
Functional Dependency and Normalization
CS422 Principles of Database Systems Normalization
Schedule Today: Next After that Normal Forms. Section 3.6.
A brief summary of database normalization
3.1 Functional Dependencies
CS 405G: Introduction to Database Systems
Presentation transcript:

CS 405G: Introduction to Database Systems Database Normalization

 Database normalization relates to the level of redundancy in a relational database’s structure.  The key idea is to reduce the chance of having multiple different version of the same data.  Well-normalized databases have a schema that reflects the true dependencies between tracked quantities.  Any increase in normalization generally involves splitting existing tables into multiple ones, which must be re-joined each time a query is issued. Database Normalization

2/5/2016Jinze University of Kentucky3 Normalization A normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. A normal form is a certification that tells whether a relation schema is in a particular state

Normal Forms Edgar F. Codd originally established three normal forms: 1NF, 2NF and 3NF. 3NF is widely considered to be sufficient. Normalizing beyond 3NF can be tricky with current SQL technology as of 2005 Full normalization is considered a good exercise to help discover all potential internal database consistency problems.

First Normal Form ( 1NF ) “What is your favorite color?” “What food will you not eat?” TABLE 1 Person / Favorite Color Bob / blue Jane / green TABLE 2 Person / Foods Not Eaten Bob / okra Bob / brussel sprouts Jane / peas

More examples. 2/5/2016Jinze University of Kentucky6

2/5/2016Jinze University of Kentucky7 2 nd Normal Form An attribute A of a relation R is a nonprimary attribute if it is not part of any key in R, otherwise, A is a primary attribute. R is in (general) 2 nd normal form if every nonprimary attribute A in R is not partially functionally dependent on any key of R XYZW abce bbcf cbcg X, Y -> Z, W Y -> Z (X, Y, W) (Y, Z) 

2/5/2016Jinze University of Kentucky8 2 nd Normal Form Note about 2 nd Normal Form by definition, every nonprimary attribute is functionally dependent on every key of R In other words, R is in its 2 nd normal form if we could not find a partial dependency of a nonprimary key to a key in R.

Second normal Form ( 2NF ) 2NF prescribes full functional dependency on the primary key. It most commonly applies to tables that have composite primary keys, where two or more attributes comprise the primary key. It requires that there are no non-trivial functional dependencies of a non-key attribute on a part (subset) of a candidate key. A table is said to be in the 2NF if and only if it is in the 1NF and every non-key attribute is irreducibly dependent on the primary key

2NF Example PART_NUMBER (PRIMARY KEY) SUPPLIER_NAME (PRIMARY KEY) PRICE SUPPLIER_ADDRESS The PART_NUMBER and SUPPLIER_NAME form the composite primary key. SUPPLIER_ADDRESS is only dependent on the SUPPLIER_NAME, and therefore this table breaks 2NF.

2/5/2016Jinze University of Kentucky11 Decomposition Decomposition eliminates redundancy To get back to the original relation:  EIDPIDEname PnameHours John platform Ben 12349John Susan platform40 Decomposition EIDEname 1234John 1123Ben 1023Susan EIDPIDPnameHours B2B platform CRM CRM B2B platform40 Foreign key

2/5/2016Jinze University of Kentucky12 Decomposition Decomposition may be applied recursively EIDPIDPnameHours B2B platform CRM CRM B2B platform40 PIDPname 10B2B platform 9CRM EIDPIDHours

2/5/2016Jinze University of Kentucky13 Unnecessary decomposition Fine: join returns the original relation Unnecessary: no redundancy is removed, and now EID is stored twice-> EIDEname 1234John 1123Ben 1023Susan EIDEname 1234John Smith 1123Ben Liu 1023Susan Sidhuk EID

2/5/2016Jinze University of Kentucky14 Bad decomposition Association between PID and hours is lost Join returns more rows than the original relation EIDPIDHours EIDPID EIDHours

2/5/2016Jinze University of Kentucky15 Lossless join decomposition Decompose relation R into relations S and T attrs(R) = attrs(S) attrs(T) S = π attrs(S) ( R ) T = π attrs(T) ( R ) The decomposition is a lossless join decomposition if, given known constraints such as FD’s, we can guarantee that R = S  T Any decomposition gives R S T (why?) A lossy decomposition is one with R S T

2/5/2016Jinze University of Kentucky16 Loss? But I got more rows-> “Loss” refers not to the loss of tuples, but to the loss of information Or, the ability to distinguish different original tuples EIDPIDHours EIDPID EIDHours

2/5/2016Jinze University of Kentucky17 Questions about decomposition When to decompose How to come up with a correct decomposition (i.e., lossless join decomposition)

Third normal form 3NF requires that there are no non-trivial functional dependencies of non-key attributes on something other than a superset of a candidate key. In summary, all non-key attributes are mutually independent.

Boyce-Codd normal form (BCNF) BCNF requires that there are no non-trivial functional dependencies of attributes on something other than a superset of a candidate key (called a superkey). All attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A).

A table is said to be in the BCNF if and only if it is in the 3NF and every non-trivial, left- irreducible functional dependency has a candidate key as its determinant. In more informal terms, a table is in BCNF if it is in 3NF and the only determinants are the candidate keys.

2/5/2016Jinze University of Kentucky21 Non-key FD’s Consider a non-trivial FD X -> Y where X is not a super key Since X is not a super key, there are some attributes (say Z) that are not functionally determined by X That b is always associated with a is recorded by multiple rows: redundancy, update anomaly, deletion anomaly XYZ abc abd

2/5/2016Jinze University of Kentucky22 Dealing with Nonkey Dependency: BCNF A relation R is in Boyce-Codd Normal Form if For every non-trivial FD X -> Y in R, X is a super key That is, all FDs follow from “key -> other attributes” When to decompose As long as some relation is not in BCNF How to come up with a correct decomposition Always decompose on a BCNF violation (details next)  Then it is guaranteed to be a lossless join decomposition

2/5/2016Jinze University of Kentucky23 BCNF decomposition algorithm Find a BCNF violation That is, a non-trivial FD X -> Y in R where X is not a super key of R Decompose R into R 1 and R 2, where R 1 has attributes X Y R 2 has attributes X Z, where Z contains all attributes of R that are in neither X nor Y (i.e. Z = attr(R) – X – Y) Repeat until all relations are in BCNF

2/5/2016Jinze University of Kentucky24 BCNF decomposition example WorkOn (EID, Ename, , PID, hours) BCNF violation: EID -> Ename, Student (EID, Ename, ) Grade (EID, PID, hours) BCNF

2/5/2016Jinze University of Kentucky25 Another example WorkOn (EID, Ename, , PID, hours) BCNF violation: -> EID StudentID ( , EID) StudentGrade’ ( , Ename, PID, hours) BCNF BCNF violation: -> Ename StudentName ( , Ename) Grade ( , PID, hours) BCNF

2/5/2016Jinze University of Kentucky26 Exercise Property(Property_id#, County_name, Lot#, Area, Price, Tax_rate) Property_id# -> County_name, Lot#, Area, Price, Tax_rate County_name, Lot# -> Property_id#, Area, Price, Tax_rate County_name -> Tax_rate area -> Price

2/5/2016Jinze University of Kentucky27 Exercise Property(Property_id#, County_name, Lot#, Area, Price, Tax_rate) BCNF violation: County_name -> Tax_rate LOTS1 (County_name, Tax_rate ) LOTS2 (Property_id#, County_name, Lot#, Area, Price) BCNF violation: Area -> Price LOTS2A (Area, Price) LOTS2B (Property_id#, County_name, Lot#, Area) BCNF

2/5/2016Jinze University of Kentucky28 Why is BCNF decomposition lossless Given non-trivial X -> Y in R where X is not a super key of R, need to prove: Anything we project always comes back in the join: R π XY ( R ) π XZ ( R ) Sure; and it doesn’t depend on the FD Anything that comes back in the join must be in the original relation: R π XY ( R ) π XZ ( R ) Proof makes use of the fact that X -> Y

2/5/2016Jinze University of Kentucky29 Recap Functional dependencies: a generalization of the key concept Partial dependencies: a source of redundancy Use 2 nd Normal form to remove partial dependency Non-key functional dependencies: a source of redundancy BCNF decomposition: a method for removing ALL functional dependency related redundancies Plus, BNCF decomposition is a lossless join decomposition