Chapter 10 - 1 Design Objectives Obtain the theoretically “best” design (normalize) –Remove redundancy and update anomalies –Remove nulls –Minimize the.

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Advertisements

Database Management COP4540, SCS, FIU Functional Dependencies (Chapter 14)
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
NestedRelations: 1 Nested Relations Flat schemas often have replicated data values in their relations. Nested schemas allow us to collapse some of these.
The Relational Model System Development Life Cycle Normalisation
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
XNF: 1 XML and NNF A Standard Form for XML Documents (XNF) Properties –As few hierarchical trees as possible –No redundant data values in any tree Method.
Chapter Abstraction Concrete: directly executable/storable Abstract: not directly executable/storable –automatic translation (as good as executable/storable)
FDImplication: 1 Functional Dependencies (FDs) Let r(R) be a relation and let t  r, then the restriction of t to X  R, written t[X], is the projection.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
FileOrg: 1 Secondary Storage Rough Speed Differentials –nanoseconds: retrieve data in main memory –microseconds: retrieve from under a read head –milliseconds:
Chapter Nested Schemes Flat schemes often have replicated data values. Nested schemes allow us to collapse some of these replicated data values.
Chapter Database Design Purpose: organize to achieve efficiency, … Considerations –time/space tradeoff for application –balance application characteristics,
XNF-1 XML and NNF A Standard Form for XML Documents (XNF) Properties –As few hierarchical trees as possible –No redundant data values in any tree Method.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Create, Insert, Delete, Update. Create Create database Create table Create index – Primary – Secondary.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
CostAnalysis: 1 Cost Analysis Rule-of-Thumb Guidelines As a guide, consider denormalizing if: –redundancy is minimal and update anomalies are not expected.
Databases 6: Normalization
Chapter Implementation Faithful translation of a design into a target environment Design should be free of target-environment dependencies Should.
Design Algorithm Essentials 1.Combine functional edges with the same tail object set(s) into a scheme. (The tail object set(s) and any others in a 1-1.
MVDs: 1 Join Dependencies—Example Let r = A B C = A B |  | A C 1 a x 1 a 1 x 1 a y 1 b 1 y 1 b x 2 a 2 y 1 b y 2 b 2 a y 2 b y Observe: r =  AB r | 
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
Further Normalization II: Higher Normal Forms Prof. Yin-Fu Huang CSIE, NYUST Chapter 13.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Normal Forms through BCNF CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 11 Relational Database Design Algorithms and Further Dependencies.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
CSE314 Database Systems Basics of Functional Dependencies and Normalization for Relational Databases Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
1 Functional Dependencies and Normalization Chapter 15.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
RelAlg: 1 Relational Algebra An Algebra is a pair: (set of values, set of operations) Note that an algebra is the same idea as an ADT Relational Algebra:
3 Spring Chapter Normalization of Database Tables.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Multivalued Dependencies and 4th NF CIS 4301 Lecture Notes Lecture /21/2006.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Relational Database Design Algorithms and Further Dependencies.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
3.1 Functional Dependencies
Chapter 14 Normalization – Part I Pearson Education © 2009.
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Chapter 7a: Overview of Database Design -- Normalization
Presentation transcript:

Chapter Design Objectives Obtain the theoretically “best” design (normalize) –Remove redundancy and update anomalies –Remove nulls –Minimize the number of schemes and maximize their size –Make the design faithful to the specification preserve information preserve constraints Use cost analysis to adjust, if necessary (denormalize) –The theoretically “best” is often the best. –Adjust for application-dependent time and space considerations.

Chapter Update Anomalies Guest Room Nr Name G1 1 Kennedy G2 1 Kennedy G3 1 Kennedy G4 5 Green G5 3 Carter  2 Nixon  4 Blue Modification Anomaly: e.g., change Kennedy to Clinton – must update all redundant values consistently. Deletion Anomaly: e.g., G4 cancels reservation – the fact that the Green room is room 5 is lost. Insertion Anomaly: e.g., add a new room (the Gold room) – necessarily yields a null. Update anomalies and redundancy are two sides of the same coin.

Chapter Join Dependencies – Example Let r = A B C = A B  A C 1 a x 1 a 1 x 1 a y 1 b 1 y 1 b x 2 a 2 y 1 b y 2 b 2 a y 2 b y Observe: r =  AB r   AC r This always holds when we build r by joining relationship sets in this way. In general, however, if we arbitrarily create a relation, this may not happen. Add to r, for example, then r   AB r   AC r because the join also yields, which is not in r. Note: r is the cross product of B and C wrt A.

Chapter Join Dependencies – Definitions A join dependency (JD) denoted  (R1, …, Rn) holds for a relation r(R) if r =  R1 r  …   Rn r. (e.g.,  (AB, AC)) When n = 2, we call a JD a Multivalued Dependency (MVD) and write X  Y or X  Z or X  Y | Z where X = R1  R2, Y = R1 - R2, and Z = R2 - R1. (e.g., A  B or A  C or A  B | C)

Chapter Redundancy We (usually) want to remove redundancy. –Space savings: no need to store duplicate values. –Time savings: no need for extra processing to avoid update anomalies. Basic Idea: –A data value v is redundant if we can “erase” v and then from the remaining data values and the constraints uniquely determine v. –The constraints we consider: FDs, MVDs, JDs.

Chapter FD Redundancy If B  C, the circled data values are redundant. A B C

Chapter MVD Redundancy If A  B | C, the circled data values are redundant. A B C A B C

Chapter JD Redundancy If  (AB, BC, AC), the circled data values are redundant. A B C

Chapter Nulls IncongruentCongruent A B  3  4   A A  B

Chapter Minimize the Number of Schemes Combine object and relationship sets BUT only if there is no possibility of: –redundancy –nulls Preserve information and constraints

Chapter Sample Combinations with Redundancy A B C A B C A B C A B 1 2 C D 3 4 A B C D  =

Chapter Sample Combinations with No Redundancy A B C A B C A B C D

Chapter Canonical ORM Hypergraph Congruent Nonrecursive Head and Tail Reduced Object-Set Reduced (Lexical & Merged) Non-FD-edge Reduced Embedded-FD Reduced Separately Linked (Semantically Separate Eq. Classes) Minimally Consolidated Semantically Head Consistent

Chapter Semantically Separate Eq. Class HasName(Room, Name)  WasNamed(Room, Name) = Room Name R1 Kennedy R1 Nixon R3 Carter R2 Nixon R2 Kennedy R3 Carter R3 Carter R4 Blue R4 Green R5 Green R5 Blue Room Room Name Prior Room Name R1 Kennedy Nixon R2 Nixon Kennedy R3 Carter Carter R4 Blue Green R5 Green Blue

Chapter Semantically Head Consistent IsDoing(Guest Activity)  NextDoes(Guest Activity) = Guest Activity G1 4-Wheeling G1 Hot Tub G2 Horse Riding G2 4-Wheeling G3 Hot Tub G3 Horse Riding Guest Current Activity Next Activity G1 4-Wheeling Hot Tub G2 Horse Riding 4-Wheeling G3 Hot Tub Horse Riding

Chapter Scheme Synthesis Input: a canonical ORM hypergraph. Output: a set of relation schemes with keys. Equivalence classes (including trivial equivalence classes) with FDs – each equivalence-class element is a key Nontrivial equivalence classes without FDs – each equivalence-class element is a key Non-FD edges – all the attributes together constitute a composite key Stand-alone object sets – the lone attribute is a key

Chapter Scheme Synthesis – Example Case 1: A B C & C D & F D Case 2: E F G Case 3: B E Case 4: H

Chapter Inclusion Dependencies – Generation of Foreign Keys Input: a canonical ORM hypergraph and a set of schemes generated by the scheme-synthesis algorithm Output: a set of inclusion dependencies Generalization/specialization pairs Multiple appearances Subset constraints among relationship sets

Chapter Inclusion Dependencies – Example Database scheme: q(A, B), r(A, C), s(D, E) Inclusion dependencies: –Case 1: q[A]  s[D], r[A]  s[D] –Case 2: q[A] = r[A] –Case 3: r[A, C]  s[D, E]

Chapter B&B Example – ORM Diagram

Chapter B&B Example – ORM Hypergraph

Chapter B&B Example – Congruent

Chapter B&B Example – Canonical

Chapter B&B Example – Generated Database Scheme Room(RoomNr, RoomName, NrBeds, Cost) Guest(GuestNr, GuestName, StreetNr, City) Reservation(GuestNr, RoomNr, ArrivalDate, NrDays) Room[RoomNr]  Reservation[RoomNr] Guest[GuestNr] = Reservation[GuestNr]

Chapter Keys and FDs Let U be a set of object sets, and let F be a set of FDs over U. Let R  U be a relation scheme. A subset K of R (K need not be a proper subset of R) is a superkey of R if K  R  F + and is a candidate key of R if there does not exist a proper subset K  of K such that K   R  F +. Example: U = ABCDE and F = {A  B, B  A, AB  C, D  BC}. Scheme Candidate Keys AB A, B CE CE ABCD D ABCDE DE

Chapter Generated Keys are Candidate Keys (Thm 10.3) Superkeys: Minimal Keys: Suppose not, then tail reducible.

Chapter Generated Schemes have no Potential Redundancy* (Thm 10.4) *Except (possibly) for schemes that have a nontrivial, inextricably embedded JD. Canonical hypergraphs do not have edges that cause redundancy. Not allowed: A B C Not allowed: A B C

Chapter Inextricably Embedded JDs  (AB, AC) A B C D ABCD has redundancy within its ABC component, but cannot be decomposed losslessly into ABC and any other scheme.

Chapter No Nulls (Thm 10.5) Canonical hypergraphs are congruent. A B C  A B Q C

Chapter Synthesis Preserves Information (Thm 10.6) Original Object and Relationship Sets: A B C A B A C Join/Project always returns the original. Generated Scheme: A B C

Chapter Minimal Number of Schemes* (Thm 10.7) *Without potential redundancy or nulls and assuming more than one tuple per relation is possible. A B C A B C 

Chapter Minimal Number of Attributes in Schemes (Prop & 10.2) Hard to guarantee no fewer: –Do we count replacing two attributes, say Name and Address, by a single combined attribute, say Name-Address? –Perhaps a different way of deriving attributes for schemes might yield fewer. Can guarantee: –Proposition 10.1: We can’t make fewer by lexicalization or by one-to-one merges of nonlexical object sets. –Proposition 10.2: We can’t make fewer by consolidation within equivalence classes.

Chapter Synthesis Preserves Constraints (Thms 10.8 & 10.9) Theorem 10.8: We keep all constraints of the canonical hypergraph. –Some constraints become key constraints. –Some constraints become foreign-key constraints. –General constraints given or generated, including generated co-occurrence constraints for embedded FD reductions, remain intact. Theorem 10.9: Sometimes all constraints become key constraints or foreign-key constraints. –We can represent these constraints in SQL DDL. –Database systems efficiently check these constraints for us (no extra code need be written to check these constraints).

Chapter Cost Analysis Rule-of-Thumb Guidelines As a guide, consider denormalizing if: –nulls are applicable but unknown (e.g., address information) –redundancy is minimal and update anomalies are not expected (e.g., StreetNr City State  Zip) –replicated objects are large (e.g., images in View) –join frequencies are very high when compared to updates (e.g., approximate costs in foreign currencies) Using actual application characteristics, estimate space and time requirements for various possibilities and compare costs.

Chapter Cost Estimation – B & B (Space) Case 1: Room(RoomNr, RoomName, NrBeds, Cost) Guest(GuestNr, GuestName, StreetNr, City) Reservation(GuestNr, RoomNr, ArrivalDate, NrDays) vs. Case 2: Room(RoomNr, RoomName, NrBeds, Cost) Reservation(GuestNr, GuestName, StreetNr, City, RoomNr, ArrivalDate, NrDays) vs. Case 3: Guest(GuestNr, GuestName, StreetNr, City) Reservation(GuestNr, RoomNr, ArrivalDate, NrDays, RoomName, NrBeds, Cost) vs. Case 4: Reservation(GuestNr, GuestName, StreetNr, City, RoomNr, ArrivalDate, NrDays, RoomName, NrBeds, Cost) Assume: 5 rooms, 100 reservations, and 80 guests. 5  4 =  4 =  4 =  4 =  7 =  4 =  7 =  10 = 1000

Chapter Cost Estimation – B & B (Time) Case 1: Room(RoomNr, RoomName, NrBeds, Cost) Guest(GuestNr, GuestName, StreetNr, City) Reservation(GuestNr, RoomNr, ArrivalDate, NrDays) vs. Case 2: Room(RoomNr, RoomName, NrBeds, Cost) Reservation(GuestNr, GuestName, StreetNr, City, RoomNr, ArrivalDate, NrDays) Most important queries/updates: 1. (40%) What rooms are available? 2. (30%) Make a reservation. 3. (10%) Change a reservation. 4. (10%) Cancel a reservation. 5. (the rest) Miscellaneous. Assume indexed on primary keys. 1. Case 1 & 2: Retrieve reservations that could overlap the requested date and determine room availibility. (Case 1 insignificantly better.) …

Chapter B & B – Semantic Change? Case 1: Room(RoomNr, RoomName, NrBeds, Cost) Guest(GuestNr, GuestName, StreetNr, City) Reservation(GuestNr, RoomNr, ArrivalDate, NrDays) vs. Case 2: Room(RoomNr, RoomName, NrBeds, Cost) Reservation(GuestNr, GuestName, StreetNr, City, RoomNr, ArrivalDate, NrDays) Most important queries/updates: (30%) Make a reservation.... Assume indexed on primary keys. 2. Case 1: Insert tuple in Reservation (1 read & 1 write) and insert tuple in Guest, if necessary, (1 read and (usually) 1 write). Case 2: Insert tuple in Reservation (1 read & 1 write); check duplicate guest information (read file, or add secondary index). (Developer) Do we really need to check duplicate guest information? (Proprietor) Hmmm, maybe not; it doesn’t matter if it is different. (Developer) Does a guest always need the same guest number? (Proprietor) Not really; there are no guest numbers in our manual system. (Developer) Aha! Great, this really lets us save – watch this.

Chapter Unique GuestNr in Reservation Observe that we have a new equivalence class: { {GuestNr}, {RoomNr, ArrivalDate} } And thus a new generated database scheme: Reservation(GuestNr, GuestName, City, Street, RoomNr, ArrivalDate, NrDays) Room(RoomNr, RoomName, NrBeds, Cost)

Chapter Nested Schemes Flat schemes often have replicated data values. Nested schemes allow us to collapse some of these replicated data values. NrBeds RoomNr NrBeds ( RoomNr )*

Chapter Redundancy in Nested Schemes The redundancy definition is the same as for flat relations. If a value change causes a constraint violation, the value is redundant. NrBeds (RoomNr (View)* )* 2 1 Sea Forest City 2 Sea Forest 3 City View (RoomNr NrBeds)* Sea Forest City

Chapter Algorithm 10.3 Input: a canonical, acyclic, binary ORM hypergraph H. Output: a set of nested schemes with no potential redundancy. Repeat Mark an unmarked node in H as the first attribute in a new nested scheme. While an unmarked edge is incident on a marked node A: Mark the edge. If A  B: Add B with A; Mark B. If A  B: Add B with A; Mark B if all B’s incident edges are marked. If A  B: Nest B under A; Mark B. Else (A – B): Nest B under A; Mark B if all B’s incident edges are marked. Until all nodes have been marked

Chapter Nested Scheme Generation Example 1. NrBeds, (RoomNr, RoomName, Cost, (View)*, (GuestNr, GuestName)* )* 2. RoomNr, RoomName, Cost, NrBeds, (View)*, (GuestNr, GuestName)* 3. GuestNr, GuestName, RoomNr RoomNr, RoomName, Cost, NrBeds (View)*

Chapter Redundancy Prevention xa1yb2zxa1yb2z A ( BC )* ax1 y1 by1 z2 This replication... … causes this redundancy.

Chapter Generalization of Algorithm 10.3 for N-ary Relationship Sets “Composite nodes” can be treated as a node (in Algorithm 10.3). –B C (A)* (D)* –D (B C)*; A B C NNF (see Exercise 10.35), basically: –Schemes should be constructed along hypergraph paths. –Schemes should not violate the natural 1-many hierarchical structure.

Chapter Guidelines for Selecting Nested Schemes Select “important nodes” as the initial nodes for nested-scheme generation – e.g., Scheme 3 or 2 in earlier Bed-&-Breakfast example. Maximize the size of schemes. –Select nodes included in the largest number of FD closures (i.e., when Algorithm 10.3 requires a new node to be arbitrarily selected, compute the set of unmarked nodes in the FD closure of every unmarked node and choose a node included in at least as many sets as any other node) – e.g., Scheme 1 in earlier example. –When possible, adjust these generated maximal schemes by placing the most important node first – e.g., Scheme 2 in earlier example.

Chapter Cost Analysis for Nested Schemes Nested schemes impose variable-length records. Recall variable-length record implementation strategies: –Reserve enough space for maximum. –Chain each nested record. –Reserve space for the expected number and chain the rest. Insertion, deletion, modification, retrieval tradeoffs.