Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications C. Faloutsos Database design and normalization.

Slides:



Advertisements
Similar presentations
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #17: Schema Refinement & Normalization - Normal Forms (R&G,
Advertisements

Schema Refinement: Normal Forms
Normalisation to 3NF Database Systems Lecture 11 Natasha Alechina.
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Functional Dependencies (based on notes by Silberchatz,Korth, and.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Integrity Constraints.
NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =
Normalisation The theory of Relational Database Design.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Instructor: Amol Deshpande  Data Models ◦ Conceptual representation of the data  Data Retrieval ◦ How to ask questions of the database.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
CMSC424: Database Design Instructor: Amol Deshpande
1 Database Design Theory Which tables to have in a database Normalization.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Databases 6: Normalization
Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.
Introduction to Schema Refinement
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Normalization B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041
Daniel AdinugrohoDatabase Programming 1 DATABASE PROGRAMMING Lecture on 29 – 04 – 2005.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
Database Management COP4540, SCS, FIU Relation Normalization (Chapter 14)
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Normalization for Relational Databases.
Objectives of Normalization Develop a good description of the data, its relationships and constraints Produce a stable set of relations that Is a faithful.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Rel. model - SQL part3.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
1 Dept. of CIS, Temple Univ. CIS616/661 – Principles of Data Management V. Megalooikonomou Integrity Constraints (based on slides by C. Faloutsos at CMU)
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Relational model.
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth,
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Design Process - Where are we?
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Normalization.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Databases Illuminated Chapter 6 Normalization. Objectives of Normalization Develop a good description of the data, its relationships and constraints Produce.
Databases Illuminated
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
CS 405G: Introduction to Database Systems Database Normalization.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 Dept. of CIS, Temple Univ. CIS661 – Principles of Data Management V. Megalooikonomou Database design and normalization (based on slides by C. Faloutsos.
11/06/97J-1 Principles of Relational Design Chapter 12.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
1 First Normal Form (1NF) Unnormalized table : Contains a repeating group –Eg: from multi-valued attributes –Eg: from many-many relationship Table in 1NF:
Relational Data Model, Review Relation Tuple Attribute Domains Candidate key, primary key Key attribute, non-key attribute.
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
Database Design Dr. M.E. Fayad, Professor
Relational Database Design by Dr. S. Sridhar, Ph. D
Faloutsos & Pavlo SCS /615 Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization.
Lecture #17: Schema Refinement & Normalization - Normal Forms
Functional Dependencies and Normalization
Temple University – CIS Dept. CIS331– Principles of Database Systems
Database Design Dr. M.E. Fayad, Professor
Presentation transcript:

Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Database design and normalization

Carnegie Mellon C. Faloutsos2 Overview Relational model –formal query languages –commercial query languages (SQL) Integrity constraints –domain I.C., foreign keys –functional dependencies DB design and normalization

Carnegie Mellon C. Faloutsos3 Overview - detailed DB design and normalization –pitfalls of bad design –decomposition –normal forms

Carnegie Mellon C. Faloutsos4 Design ‘good’ tables –sub-goal#1: define what ‘good’ means –sub-goal#2: fix ‘bad’ tables in short: “we want tables where the attributes depend on the primary key, on the whole key, and nothing but the key” Let’s see why, and how: Goal

Carnegie Mellon C. Faloutsos5 Pitfalls takes1 (ssn, c-id, grade, name, address)

Carnegie Mellon C. Faloutsos6 Pitfalls ‘Bad’ - why? because: ssn->address, name

Carnegie Mellon C. Faloutsos7 Pitfalls Redundancy –space –(inconsistencies) –insertion/deletion anomalies:

Carnegie Mellon C. Faloutsos8 Pitfalls insertion anomaly: –“jones” registers, but takes no class - no place to store his address!

Carnegie Mellon C. Faloutsos9 Pitfalls deletion anomaly: –delete the last record of ‘smith’ (we lose his address!)

Carnegie Mellon C. Faloutsos10 Solution: decomposition split offending table in two (or more), eg.: ??

Carnegie Mellon C. Faloutsos11 Overview - detailed DB design and normalization –pitfalls of bad design –decomposition lossless join decomp. dependency preserving –normal forms

Carnegie Mellon C. Faloutsos12 Decompositions there are ‘bad’ decompositions: we want lossless and dependency preserving

Carnegie Mellon C. Faloutsos13 Decompositions - lossy: R1(ssn, grade, name, address) R2(c-id, grade) ssn->name, address ssn, c-id -> grade

Carnegie Mellon C. Faloutsos14 Decompositions - lossy: can not recover original table with a join! ssn->name, address ssn, c-id -> grade

Carnegie Mellon C. Faloutsos15 Decompositions example of non-dependency preserving S# -> address, status address -> status S# -> addressS# -> status

Carnegie Mellon C. Faloutsos16 Decompositions (drill: is it lossless?) S# -> address, status address -> status S# -> addressS# -> status

Carnegie Mellon C. Faloutsos17 Decompositions - lossless Definition: consider schema R, with FD ‘F’. R1, R2 is a lossless join decomposition of R if we always have: An easier criterion?

Carnegie Mellon C. Faloutsos18 Decomposition - lossless Theorem: lossless join decomposition if the joining attribute is a superkey in at least one of the new tables Formally:

Carnegie Mellon C. Faloutsos19 Decomposition - lossless example: ssn->name, address ssn, c-id -> grade ssn->name, address ssn, c-id -> grade R1 R2

Carnegie Mellon C. Faloutsos20 Overview - detailed DB design and normalization –pitfalls of bad design –decomposition lossless join decomp. dependency preserving –normal forms

Carnegie Mellon C. Faloutsos21 Decomposition - depend. pres. informally: we don’t want the original FDs to span two tables - counter-example: S# -> address, status address -> status S# -> addressS# -> status

Carnegie Mellon C. Faloutsos22 Decomposition - depend. pres. dependency preserving decomposition: S# -> address, status address -> status S# -> addressaddress -> status (but: S#->status ?)

Carnegie Mellon C. Faloutsos23 Decomposition - depend. pres. informally: we don’t want the original FDs to span two tables. More specifically: … the FDs of the canonical cover.

Carnegie Mellon C. Faloutsos24 Decomposition - depend. pres. why is dependency preservation good? S# -> addressaddress -> status S# -> address S# -> status (address->status: ‘lost’)

Carnegie Mellon C. Faloutsos25 Decomposition - depend. pres. A: eg., record that ‘Philly’ has status ‘A’ S# -> addressaddress -> status S# -> address S# -> status (address->status: ‘lost’)

Carnegie Mellon C. Faloutsos26 Decomposition - conclusions decompositions should always be lossless – joining attribute -> superkey whenever possible, we want them to be dependency preserving (occasionally, impossible - see ‘STJ’ example later…)

Carnegie Mellon C. Faloutsos27 Overview - detailed DB design and normalization –pitfalls of bad design –decomposition (-> how to fix the problem) –normal forms (-> how to detect the problem) BCNF, 3NF (1NF, 2NF)

Carnegie Mellon C. Faloutsos28 Normal forms - BCNF We saw how to fix ‘bad’ schemas - but what is a ‘good’ schema? Answer: ‘good’, if it obeys a ‘normal form’, ie., a set of rules. Typically: Boyce-Codd Normal form

Carnegie Mellon C. Faloutsos29 Normal forms - BCNF Defn.: Rel. R is in BCNF wrt F, if informally: everything depends the full key, and nothing but the key semi-formally: every determinant (of the cover) is a candidate key

Carnegie Mellon C. Faloutsos30 Normal forms - BCNF Example and counter-example: ssn->name, address ssn, c-id -> grade

Carnegie Mellon C. Faloutsos31 Normal forms - BCNF Formally: for every FD a->b in F+ –a->b is trivial (a superset of b) or –a is a superkey (or both)

Carnegie Mellon C. Faloutsos32 Normal forms - BCNF Theorem: given a schema R and a set of FD ‘F’, we can always decompose it to schemas R1, … Rn, so that –R1, … Rn are in BCNF and –the decompositions are lossless. (but, some decomp. might lose dependencies)

Carnegie Mellon C. Faloutsos33 Normal forms - BCNF How? algorithm in book - essentially, break off FDs of the cover eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade

Carnegie Mellon C. Faloutsos34 Normal forms - BCNF eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade name addressgrade c-id ssn

Carnegie Mellon C. Faloutsos35 Normal forms - BCNF ssn->name, address ssn, c-id -> grade ssn->name, address ssn, c-id -> grade

Carnegie Mellon C. Faloutsos36 Normal forms - BCNF pictorially: we want a ‘star’ shape name addressgrade c-id ssn :not in BCNF

Carnegie Mellon C. Faloutsos37 Normal forms - BCNF pictorially: we want a ‘star’ shape B C A G E D or FH

Carnegie Mellon C. Faloutsos38 Normal forms - BCNF or a star-like: (eg., 2 cand. keys): STUDENT(ssn, st#, name, address) name address ssn st# = name addressssnst#

Carnegie Mellon C. Faloutsos39 Normal forms - BCNF but not: or B CA D G E D F H

Carnegie Mellon C. Faloutsos40 Normal forms - 3NF consider the ‘classic’ case: STJ( Student, Teacher, subJect) T-> J S,J -> T is it BCNF? S T J

Carnegie Mellon C. Faloutsos41 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T How to decompose it to BCNF? S T J

Carnegie Mellon C. Faloutsos42 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? - lossless? - dep. pres.? ) 2) R1(T,J) R2(S,T) (BCNF? - lossless? - dep. pres.? )

Carnegie Mellon C. Faloutsos43 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? Y+Y - lossless? N - dep. pres.? N ) 2) R1(T,J) R2(S,T) (BCNF? Y+Y - lossless? Y - dep. pres.? N )

Carnegie Mellon C. Faloutsos44 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T in this case: impossible to have both BCNF and dependency preservation Welcome 3NF!

Carnegie Mellon C. Faloutsos45 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T S J T informally, 3NF ‘forgives’ the red arrow in the can. cover

Carnegie Mellon C. Faloutsos46 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T S J T Formally, a rel. R with FDs ‘F’ is in 3NF if: for every a->b in F+: it is trivial or a is a superkey or each b-a attr.: part of a cand. key

Carnegie Mellon C. Faloutsos47 Normal forms - 3NF how to bring a schema to 3NF? algo in book: for each FD in the cover, put it in a table

Carnegie Mellon C. Faloutsos48 Normal forms - 3NF vs BCNF If ‘R’ is in BCNF, it is always in 3NF (but not the reverse) In practice, aim for –BCNF; lossless join; and dep. preservation if impossible, we accept –3NF; but insist on lossless join and dep. preservation

Carnegie Mellon C. Faloutsos49 Normal forms - more details why ‘3’NF? what is 2NF? 1NF? 1NF: attributes are atomic (ie., no set- valued attr., a.k.a. ‘repeating groups’) not 1NF

Carnegie Mellon C. Faloutsos50 Normal forms - more details 2NF: 1NF and non-key attr. fully depend on the key counter-example: TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade name addressgrade c-id ssn

Carnegie Mellon C. Faloutsos51 Normal forms - more details 3NF: 2NF and no transitive dependencies counter-example: B CA D in 2NF, but not in 3NF

Carnegie Mellon C. Faloutsos52 Normal forms - more details 4NF, multivalued dependencies etc: IGNORE in practice, E-R diagrams usually lead to tables in BCNF

Carnegie Mellon C. Faloutsos53 Overview - conclusions DB design and normalization –pitfalls of bad design –decompositions (lossless, dep. preserving) –normal forms (BCNF or 3NF) “everything should depend on the key, the whole key, and nothing but the key”