Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2.

Slides:



Advertisements
Similar presentations
1 Lecture 7 Design Theory for Relational Databases (part 1) Slides based on
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
4NF and 5NF Prof. Sin-Min Lee Department of Computer Science.
1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms.
Algebraic and Logical Query Languages Spring 2011 Instructor: Hassan Khosravi.
1 Lecture 12: Further relational algebra, further SQL
1 Database Systems Relations as Bags Grouping and Aggregation Database Modification.
Functional Dependencies, Normalization Rose-Hulman Institute of Technology Curt Clifton.
Midterm Review II. Redundancy. –Information may be repeated unnecessarily in several tuples. –E.g. length and filmType. Update anomalies. –We may change.
Relational Operations on Bags Extended Operators of Relational Algebra.
Relational Algebra on Bags A bag is like a set, but an element may appear more than once. –Multiset is another name for “bag.” Example: {1,2,1,3} is a.
1 Normalization Anomalies Boyce-Codd Normal Form 3 rd Normal Form Source: Slides by Jeffrey Ullman.
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
Winter 2002Arthur Keller – CS 1804–1 Schedule Today: Jan. 15 (T) u Normal Forms, Multivalued Dependencies. u Read Sections Assignment 1 due. Jan.
1 Multivalued Dependencies Fourth Normal Form. 2 A New Form of Redundancy uMultivalued dependencies (MVD’s) express a condition among tuples of a relation.
Relational Operations on Bags Extended Operators of Relational Algebra.
1 Normalization Anomalies Boyce-Codd Normal Form 3 rd Normal Form.
1 Normalization Anomalies Boyce-Codd Normal Form 3 rd Normal Form Source: Slides by Jeffrey Ullman.
Fall 2001Arthur Keller – CS 1804–1 Schedule Today Oct. 4 (TH) Functional Dependencies and Normalization. u Read Sections Project Part 1 due. Oct.
1 Normalization Anomalies Boyce-Codd Normal Form 3 rd Normal Form.
1 More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms.
Relational Algebra Basic Operations Algebra of Bags.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Decompositions uDo we need to decompose a relation? wSeveral normal forms for relations. If schema in these normal forms certain problems don’t.
Database Management Systems Chapter 3 The Relational Data Model (III) Instructor: Li Ma Department of Computer Science Texas Southern University, Houston.
1 Normalization Anomalies Boyce-Codd Normal Form 3 rd Normal Form.
Normalization Goal = BCNF = Boyce-Codd Normal Form = all FD’s follow from the fact “key  everything.” Formally, R is in BCNF if for every nontrivial FD.
Databases 2 Storage optimization: functional dependencies, normal forms, data ware houses.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
From Professor Ullman, Relational Algebra.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
1 Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms.
Third Normal Form (3NF) Zaki Malik October 23, 2008.
Extended Operators in SQL and Relational Algebra Zaki Malik September 11, 2008.
Databases : Relational Algebra - Complex Expression 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof.
More Relation Operations 2015, Fall Pusan National University Ki-Joune Li.
More Relation Operations 2014, Fall Pusan National University Ki-Joune Li.
1 Multivalued Dependencies Fourth Normal Form Reasoning About FD’s + MVD’s.
Relational Algebra BASIC OPERATIONS 1 DATABASE SYSTEMS AND CONCEPTS, CSCI 3030U, UOIT, COURSE INSTRUCTOR: JAREK SZLICHTA.
More SQL (and Relational Algebra). More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
Databases 1 Sixth lecture. 2 Functional Dependencies X -> A is an assertion about a relation R that whenever two tuples of R agree on all the attributes.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Computational Biology Dr. Jens Allmer Lecture Slides Week 5.
1. Chapter 2: The relational Database Modeling Section 2.4: An algebraic Query Language Chapter 5: Algebraic and logical Query Languages Section 5.1:
Design Theory for Relational Databases Functional Dependencies Decompositions Normal Forms: BCNF, Third Normal Form Introduction to Multivalued Dependencies.
1 Lecture 8 Design Theory for Relational Databases (part 2) Slides from
4NF & MULTIVALUED DEPENDENCY By Kristina Miguel. Review  Superkey – a set of attributes which will uniquely identify each tuple in a relation  Candidate.
1 Database Design: DBS CB, 2 nd Edition Physical RDBMS Model: Schema Design and Normalization Ch. 3.
Slides are reused by the approval of Jeffrey Ullman’s
Design Theory for Relational Databases
Outerjoins, Grouping/Aggregation Insert/Delete/Update
Schedule Today: Next After that Normal Forms. Section 3.6.
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
CPSC-310 Database Systems
Schedule Today: Jan. 23 (wed) Week of Jan 28
3.1 Functional Dependencies
Database Design and Programming
BCNF and Normalization
Operators Expression Trees Bag Model of Data
Multivalued Dependencies & Fourth Normal Form (4NF)
More Relation Operations
Basic Operations Algebra of Bags
More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation
Multivalued Dependencies
Anomalies Boyce-Codd Normal Form 3rd Normal Form
CS4222 Principles of Database System
Presentation transcript:

Databases 1 Seventh lecture

Topics of the lecture Extended relational algebra Normalization Normal forms 2

3 Relational Algebra on Bags A bag is like a set, but an element may appear more than once. ▫Multiset is another name for “bag.” Example: {1,2,1,3} is a bag. {1,2,3} is also a bag that happens to be a set. Bags also resemble lists, but order in a bag is unimportant. ▫Example: {1,2,1} = {1,1,2} as bags, but [1,2,1] != [1,1,2] as lists.

4 Why Bags? SQL, the most important query language for relational databases is actually a bag language. ▫SQL will eliminate duplicates, but usually only if you ask it to do so explicitly. Some operations, like projection, are much more efficient on bags than sets.

5 Operations on Bags Selection applies to each tuple, so its effect on bags is like its effect on sets. Projection also applies to each tuple, but as a bag operator, we do not eliminate duplicates. Products and joins are done on each pair of tuples, so duplicates in bags have no effect on how we operate.

6 Example: Bag Selection AB BC SELECT A+B<5 (R) =AB 12 R S

7 Example: Bag Projection AB BC PROJECT A (R) =A R S

8 Example: Bag Product AB BC R * S =AR.BS.BC R S

9 Example: Bag Theta-Join AB BC R JOIN R.B<S.B S =AR.BS.BC R S

10 Bag Union Union, intersection, and difference need new definitions for bags. An element appears in the union of two bags the sum of the number of times it appears in each bag. Example: {1,2,1} UNION {1,1,2,3,1} = {1,1,1,1,1,2,2,3}

11 Bag Intersection An element appears in the intersection of two bags the minimum of the number of times it appears in either. Example: {1,1,2,1} INTER {1,1,2,3} = {1,1,2}.

12 Bag Difference An element appears in the difference A – B of bags as many times as it appears in A, minus the number of times it appears in B. ▫But never less than 0 times. Example: {1,2,1} – {1,2,3} = {1}.

13 Beware: Bag Laws <> Set Laws Not all algebraic laws that hold for sets also hold for bags. For one example, the commutative law for union (R UNION S = S UNION R ) does hold for bags. ▫Since addition is commutative, adding the number of times x appears in R and S doesn’t depend on the order of R and S.

14 An Example of Inequivalence Set union is idempotent, meaning that S UNION S = S. However, for bags, if x appears n times in S, then it appears 2n times in S UNION S. Thus S UNION S <> S in general.

15 The Extended Algebra 1.DELTA = eliminate duplicates from bags. 2.TAU = sort tuples. 3.Extended projection : arithmetic, duplication of columns. 4.GAMMA = grouping and aggregation. 5.OUTERJOIN: avoids “dangling tuples” = tuples that do not join with anything.

16 Duplicate Elimination R1 := DELTA(R2). R1 consists of one copy of each tuple that appears in R2 one or more times.

17 Example: Duplicate Elimination R =AB DELTA(R) =AB 12 34

18 Sorting R1 := TAU L (R2). ▫L is a list of some of the attributes of R2. R1 is the list of tuples of R2 sorted first on the value of the first attribute on L, then on the second attribute of L, and so on. ▫Break ties arbitrarily. TAU is the only operator whose result is neither a set nor a bag. ORDER BY in SQL

19 Example: Sorting R =AB TAU B (R) =[(5,2), (1,2), (3,4)]

20 Extended Projection Using the same PROJ L operator, we allow the list L to contain arbitrary expressions involving attributes, for example: 1.Arithmetic on attributes, e.g., A+B. 2.Duplicate occurrences of the same attribute.

21 Example: Extended Projection R =AB PROJ A+B,A,A (R) =A+BA1A

22 Aggregation Operators Aggregation operators are not operators of relational algebra. Rather, they apply to entire columns of a table and produce a single result. The most important examples: SUM, AVG, COUNT, MIN, and MAX.

23 Example: Aggregation R =AB SUM(A) = 7 COUNT(A) = 3 MAX(B) = 4 AVG(B) = 3

24 Grouping Operator R1 := GAMMA L (R2). L is a list of elements that are either: 1.Individual (grouping ) attributes. 2.AGG(A ), where AGG is one of the aggregation operators and A is an attribute.

25 Applying GAMMA L (R) Group R according to all the grouping attributes on list L. ▫That is, form one group for each distinct list of values for those attributes in R. Within each group, compute AGG(A ) for each aggregation on list L. Result has grouping attributes and aggregations as attributes. One tuple for each list of values for the grouping attributes and their group’s aggregations.

26 Example: Grouping/Aggregation R =ABC GAMMA A,B,AVG(C) (R) = ?? First, group R : ABC Then, average C within groups: ABAVG(C)

27 Outerjoin Suppose we join R JOIN C S. A tuple of R that has no tuple of S with which it joins is said to be dangling. ▫Similarly for a tuple of S. Outerjoin preserves dangling tuples by padding them with a special NULL symbol in the result.

28 Example: Outerjoin R = ABS =BC (1,2) joins with (2,3), but the other two tuples are dangling. R OUTERJOIN S =ABC NULL NULL67

29 Normalization: Anomalies Goal of relational schema design is to avoid anomalies and redundancy. ▫Update anomaly : one occurrence of a fact is changed, but not all occurrences. ▫Deletion anomaly : valid fact is lost when a tuple is deleted.

30 Example of Bad Design Data is redundant, because each of the ???’s can be figured out by using the FD’s name -> addr favBeer and beersLiked -> manf. nameaddrBeersLikedmanfFavBeer JanewayVoyagerBudA.B.WickedAle Janeway???WickedAlePete’s??? SpockEnterpriseBud???Bud Drinkers(name, addr, beersLiked, manf, favBeer)

31 This Bad Design Also Exhibits Anomalies Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples? Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud. nameaddrBeersLikedmanfFavBeer JanewayVoyagerBudA.B.WickedAle JanewayVoyagerWickedAlePete’sWickedAle SpockEnterpriseBudA.B.Bud

32 Boyce-Codd Normal Form We say a relation R is in BCNF : if whenever X ->A is a nontrivial FD that holds in R, X is a superkey. ▫Remember: nontrivial means A is not a member of set X. ▫Remember, a superkey is any superset of a key (not necessarily a proper superset).

33 Example Drinkers(name, addr, beersLiked, manf, favBeer) FD’s: name->addr favBeer, beersLiked->manf Only key is {name, beersLiked}. In each FD, the left side is not a superkey. Any one of these FD’s shows Drinkers is not in BCNF

34 Another Example Beers(name, manf, manfAddr) FD’s: name->manf, manf->manfAddr Only key is {name}. name->manf does not violate BCNF, but manf- >manfAddr does.

35 Decomposition into BCNF Given: relation R with FD’s F. Look among the given FD’s for a BCNF violation X ->B. ▫If any FD following from F violates BCNF, then there will surely be an FD in F itself that violates BCNF. Compute X +. ▫Not all attributes, or else X is a superkey.

36 Decompose R Using X -> B Replace R by relations with schemas: 1. R 1 = X R 2 = (R – X + ) U X. wProject given FD’s F onto the two new relations. 1.Compute the closure of F = all nontrivial FD’s that follow from F. 2.Use only those FD’s whose attributes are all in R 1 or all in R 2.

37 Decomposition Picture R-X +R-X + XX +-XX +-X R2R2 R1R1 R

38 Example Drinkers(name, addr, beersLiked, manf, favBeer) F = name->addr, name -> favBeer, beersLiked- >manf Pick BCNF violation name->addr. Close the left side: {name} + = {name, addr, favBeer}. Decomposed relations: 1.Drinkers1(name, addr, favBeer) 2.Drinkers2(name, beersLiked, manf)

39 Example, Continued We are not done; we need to check Drinkers1 and Drinkers2 for BCNF. Projecting FD’s is complex in general, easy here. For Drinkers1(name, addr, favBeer), relevant FD’s are name->addr and name->favBeer. ▫Thus, name is the only key and Drinkers1 is in BCNF.

40 Example, Continued For Drinkers2(name, beersLiked, manf), the only FD is beersLiked->manf, and the only key is {name, beersLiked}. ▫Violation of BCNF. beersLiked + = {beersLiked, manf}, so we decompose Drinkers2 into: 1.Drinkers3(beersLiked, manf) 2.Drinkers4(name, beersLiked)

41 Example, Concluded The resulting decomposition of Drinkers : 1.Drinkers1(name, addr, favBeer) 2.Drinkers3(beersLiked, manf) 3.Drinkers4(name, beersLiked) wNotice: Drinkers1 tells us about drinkers, Drinkers3 tells us about beers, and Drinkers4 tells us the relationship between drinkers and the beers they like.

42 Third Normal Form - Motivation There is one structure of FD’s that causes trouble when we decompose. AB ->C and C ->B. ▫Example: A = street address, B = city, C = zip code. There are two keys, {A,B } and {A,C }. C ->B is a BCNF violation, so we must decompose into AC, BC.

43 We Cannot Enforce FD’s The problem is that if we use AC and BC as our database schema, we cannot enforce the FD AB - >C by checking FD’s in these decomposed relations. Example with A = street, B = city, and C = zip on the next slide.

44 An Unenforceable FD street zip 545 Tech Sq Tech Sq city zip Cambridge02138 Cambridge02139 Join tuples with equal zip codes. street city zip 545 Tech Sq.Cambridge Tech Sq.Cambridge02139 Although no FD’s were violated in the decomposed relations, FD street city -> zip is violated by the database as a whole.

45 3NF Let’s Us Avoid This Problem 3 rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation. An attribute is prime if it is a member of any key. X ->A violates 3NF if and only if X is not a superkey, and also A is not prime.

46 Example In our problem situation with FD’s AB ->C and C ->B, we have keys AB and AC. Thus A, B, and C are each prime. Although C ->B violates BCNF, it does not violate 3NF.

47 What 3NF and BCNF Give You There are two important properties of a decomposition: 1.Recovery : it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original. 2.Dependency preservation : it should be possible to check in the projected relations whether all the given FD’s are satisfied.

48 3NF and BCNF, Continued We can get (1) with a BCNF decompsition. ▫Explanation needs to wait for relational algebra. We can get both (1) and (2) with a 3NF decomposition. But we can’t always get (1) and (2) with a BCNF decomposition. ▫street-city-zip is an example.