Presentation is loading. Please wait.

Presentation is loading. Please wait.

Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2.

Similar presentations


Presentation on theme: "Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2."— Presentation transcript:

1 Databases 1 Seventh lecture

2 Topics of the lecture Extended relational algebra Normalization Normal forms 2

3 3 Relational Algebra on Bags A bag is like a set, but an element may appear more than once. ▫Multiset is another name for “bag.” Example: {1,2,1,3} is a bag. {1,2,3} is also a bag that happens to be a set. Bags also resemble lists, but order in a bag is unimportant. ▫Example: {1,2,1} = {1,1,2} as bags, but [1,2,1] != [1,1,2] as lists.

4 4 Why Bags? SQL, the most important query language for relational databases is actually a bag language. ▫SQL will eliminate duplicates, but usually only if you ask it to do so explicitly. Some operations, like projection, are much more efficient on bags than sets.

5 5 Operations on Bags Selection applies to each tuple, so its effect on bags is like its effect on sets. Projection also applies to each tuple, but as a bag operator, we do not eliminate duplicates. Products and joins are done on each pair of tuples, so duplicates in bags have no effect on how we operate.

6 6 Example: Bag Selection AB BC 1234 5678 12 SELECT A+B<5 (R) =AB 12 R S

7 7 Example: Bag Projection AB BC 1234 5678 12 PROJECT A (R) =A 1 5 1 R S

8 8 Example: Bag Product AB BC 1234 5678 12 R * S =AR.BS.BC 1234 1278 5634 5678 1234 1278 R S

9 9 Example: Bag Theta-Join AB BC 1234 5678 12 R JOIN R.B<S.B S =AR.BS.BC 1234 1278 5678 1234 1278 R S

10 10 Bag Union Union, intersection, and difference need new definitions for bags. An element appears in the union of two bags the sum of the number of times it appears in each bag. Example: {1,2,1} UNION {1,1,2,3,1} = {1,1,1,1,1,2,2,3}

11 11 Bag Intersection An element appears in the intersection of two bags the minimum of the number of times it appears in either. Example: {1,1,2,1} INTER {1,1,2,3} = {1,1,2}.

12 12 Bag Difference An element appears in the difference A – B of bags as many times as it appears in A, minus the number of times it appears in B. ▫But never less than 0 times. Example: {1,2,1} – {1,2,3} = {1}.

13 13 Beware: Bag Laws <> Set Laws Not all algebraic laws that hold for sets also hold for bags. For one example, the commutative law for union (R UNION S = S UNION R ) does hold for bags. ▫Since addition is commutative, adding the number of times x appears in R and S doesn’t depend on the order of R and S.

14 14 An Example of Inequivalence Set union is idempotent, meaning that S UNION S = S. However, for bags, if x appears n times in S, then it appears 2n times in S UNION S. Thus S UNION S <> S in general.

15 15 The Extended Algebra 1.DELTA = eliminate duplicates from bags. 2.TAU = sort tuples. 3.Extended projection : arithmetic, duplication of columns. 4.GAMMA = grouping and aggregation. 5.OUTERJOIN: avoids “dangling tuples” = tuples that do not join with anything.

16 16 Duplicate Elimination R1 := DELTA(R2). R1 consists of one copy of each tuple that appears in R2 one or more times.

17 17 Example: Duplicate Elimination R =AB 12 34 12 DELTA(R) =AB 12 34

18 18 Sorting R1 := TAU L (R2). ▫L is a list of some of the attributes of R2. R1 is the list of tuples of R2 sorted first on the value of the first attribute on L, then on the second attribute of L, and so on. ▫Break ties arbitrarily. TAU is the only operator whose result is neither a set nor a bag. ORDER BY in SQL

19 19 Example: Sorting R =AB 12 34 52 TAU B (R) =[(5,2), (1,2), (3,4)]

20 20 Extended Projection Using the same PROJ L operator, we allow the list L to contain arbitrary expressions involving attributes, for example: 1.Arithmetic on attributes, e.g., A+B. 2.Duplicate occurrences of the same attribute.

21 21 Example: Extended Projection R =AB 12 34 PROJ A+B,A,A (R) =A+BA1A2 311 733

22 22 Aggregation Operators Aggregation operators are not operators of relational algebra. Rather, they apply to entire columns of a table and produce a single result. The most important examples: SUM, AVG, COUNT, MIN, and MAX.

23 23 Example: Aggregation R =AB 13 34 32 SUM(A) = 7 COUNT(A) = 3 MAX(B) = 4 AVG(B) = 3

24 24 Grouping Operator R1 := GAMMA L (R2). L is a list of elements that are either: 1.Individual (grouping ) attributes. 2.AGG(A ), where AGG is one of the aggregation operators and A is an attribute.

25 25 Applying GAMMA L (R) Group R according to all the grouping attributes on list L. ▫That is, form one group for each distinct list of values for those attributes in R. Within each group, compute AGG(A ) for each aggregation on list L. Result has grouping attributes and aggregations as attributes. One tuple for each list of values for the grouping attributes and their group’s aggregations.

26 26 Example: Grouping/Aggregation R =ABC 123 456 125 GAMMA A,B,AVG(C) (R) = ?? First, group R : ABC 123 125 456 Then, average C within groups: ABAVG(C) 12 4 45 6

27 27 Outerjoin Suppose we join R JOIN C S. A tuple of R that has no tuple of S with which it joins is said to be dangling. ▫Similarly for a tuple of S. Outerjoin preserves dangling tuples by padding them with a special NULL symbol in the result.

28 28 Example: Outerjoin R = ABS =BC 1223 4567 (1,2) joins with (2,3), but the other two tuples are dangling. R OUTERJOIN S =ABC 123 45NULL NULL67

29 29 Normalization: Anomalies Goal of relational schema design is to avoid anomalies and redundancy. ▫Update anomaly : one occurrence of a fact is changed, but not all occurrences. ▫Deletion anomaly : valid fact is lost when a tuple is deleted.

30 30 Example of Bad Design Data is redundant, because each of the ???’s can be figured out by using the FD’s name -> addr favBeer and beersLiked -> manf. nameaddrBeersLikedmanfFavBeer JanewayVoyagerBudA.B.WickedAle Janeway???WickedAlePete’s??? SpockEnterpriseBud???Bud Drinkers(name, addr, beersLiked, manf, favBeer)

31 31 This Bad Design Also Exhibits Anomalies Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples? Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud. nameaddrBeersLikedmanfFavBeer JanewayVoyagerBudA.B.WickedAle JanewayVoyagerWickedAlePete’sWickedAle SpockEnterpriseBudA.B.Bud

32 32 Boyce-Codd Normal Form We say a relation R is in BCNF : if whenever X ->A is a nontrivial FD that holds in R, X is a superkey. ▫Remember: nontrivial means A is not a member of set X. ▫Remember, a superkey is any superset of a key (not necessarily a proper superset).

33 33 Example Drinkers(name, addr, beersLiked, manf, favBeer) FD’s: name->addr favBeer, beersLiked->manf Only key is {name, beersLiked}. In each FD, the left side is not a superkey. Any one of these FD’s shows Drinkers is not in BCNF

34 34 Another Example Beers(name, manf, manfAddr) FD’s: name->manf, manf->manfAddr Only key is {name}. name->manf does not violate BCNF, but manf- >manfAddr does.

35 35 Decomposition into BCNF Given: relation R with FD’s F. Look among the given FD’s for a BCNF violation X ->B. ▫If any FD following from F violates BCNF, then there will surely be an FD in F itself that violates BCNF. Compute X +. ▫Not all attributes, or else X is a superkey.

36 36 Decompose R Using X -> B Replace R by relations with schemas: 1. R 1 = X +. 2. R 2 = (R – X + ) U X. wProject given FD’s F onto the two new relations. 1.Compute the closure of F = all nontrivial FD’s that follow from F. 2.Use only those FD’s whose attributes are all in R 1 or all in R 2.

37 37 Decomposition Picture R-X +R-X + XX +-XX +-X R2R2 R1R1 R

38 38 Example Drinkers(name, addr, beersLiked, manf, favBeer) F = name->addr, name -> favBeer, beersLiked- >manf Pick BCNF violation name->addr. Close the left side: {name} + = {name, addr, favBeer}. Decomposed relations: 1.Drinkers1(name, addr, favBeer) 2.Drinkers2(name, beersLiked, manf)

39 39 Example, Continued We are not done; we need to check Drinkers1 and Drinkers2 for BCNF. Projecting FD’s is complex in general, easy here. For Drinkers1(name, addr, favBeer), relevant FD’s are name->addr and name->favBeer. ▫Thus, name is the only key and Drinkers1 is in BCNF.

40 40 Example, Continued For Drinkers2(name, beersLiked, manf), the only FD is beersLiked->manf, and the only key is {name, beersLiked}. ▫Violation of BCNF. beersLiked + = {beersLiked, manf}, so we decompose Drinkers2 into: 1.Drinkers3(beersLiked, manf) 2.Drinkers4(name, beersLiked)

41 41 Example, Concluded The resulting decomposition of Drinkers : 1.Drinkers1(name, addr, favBeer) 2.Drinkers3(beersLiked, manf) 3.Drinkers4(name, beersLiked) wNotice: Drinkers1 tells us about drinkers, Drinkers3 tells us about beers, and Drinkers4 tells us the relationship between drinkers and the beers they like.

42 42 Third Normal Form - Motivation There is one structure of FD’s that causes trouble when we decompose. AB ->C and C ->B. ▫Example: A = street address, B = city, C = zip code. There are two keys, {A,B } and {A,C }. C ->B is a BCNF violation, so we must decompose into AC, BC.

43 43 We Cannot Enforce FD’s The problem is that if we use AC and BC as our database schema, we cannot enforce the FD AB - >C by checking FD’s in these decomposed relations. Example with A = street, B = city, and C = zip on the next slide.

44 44 An Unenforceable FD street zip 545 Tech Sq.02138 545 Tech Sq.02139 city zip Cambridge02138 Cambridge02139 Join tuples with equal zip codes. street city zip 545 Tech Sq.Cambridge02138 545 Tech Sq.Cambridge02139 Although no FD’s were violated in the decomposed relations, FD street city -> zip is violated by the database as a whole.

45 45 3NF Let’s Us Avoid This Problem 3 rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation. An attribute is prime if it is a member of any key. X ->A violates 3NF if and only if X is not a superkey, and also A is not prime.

46 46 Example In our problem situation with FD’s AB ->C and C ->B, we have keys AB and AC. Thus A, B, and C are each prime. Although C ->B violates BCNF, it does not violate 3NF.

47 47 What 3NF and BCNF Give You There are two important properties of a decomposition: 1.Recovery : it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original. 2.Dependency preservation : it should be possible to check in the projected relations whether all the given FD’s are satisfied.

48 48 3NF and BCNF, Continued We can get (1) with a BCNF decompsition. ▫Explanation needs to wait for relational algebra. We can get both (1) and (2) with a 3NF decomposition. But we can’t always get (1) and (2) with a BCNF decomposition. ▫street-city-zip is an example.


Download ppt "Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2."

Similar presentations


Ads by Google