Download presentation
Presentation is loading. Please wait.
1
Midterm Review II
2
Redundancy. –Information may be repeated unnecessarily in several tuples. –E.g. length and filmType. Update anomalies. –We may change information in one tuple but leave it unchanged in other tuples. –E.g. we could change the length of Star Wars to 125, in the first tuple, and forget to do the same in the second and third tuple. Deletion anomalies. –If a set of values becomes empty, we may lose other information as a side effect. –E.g. if we delete Emilio Estevez we will lose all the information about Mighty Ducks. Anomalies Mike MeyersParamountcolor951992Wayne’s World Dana CarveyParamountcolor951992Wayne’s World Emilio EstevezDisneycolor1041991Mighty Ducks Harrison FordFoxcolor1241977Star Wars Mark HamillFoxcolor1241977Star Wars Carrie FisherFoxcolor1241977Star Wars starNamestudioNamfilmTyplengthyeartitle
3
Decomposing Relations - Example Star Wars title Wayne’s World Foxcolor1241977 studioNamfilmTyplengthyear Paramountcolor951992 Disneycolor1041991Mighty Ducks Wayne’s World Mighty Ducks Star Wars title 1992 1991 1977 Mike Meyers Dana Carvey Emilio Estevez Harrison Ford Mark Hamill Carrie Fisher starNameyear No true redundancy! The update anomaly disappeared. If we change the length of a movie, it is done only once. The deletion anomaly disappeared. If we delete all the stars from Movie 2 we still will have the other info for a movie. Movie 1 relation Movie 2 relation
4
Boyce-Codd Normal Form The goal of decomposition is to replace a relation by several that do not exhibit anomalies. There is a simple condition under which the anomalies can be guaranteed not to exist. This condition is called Boyce-Codd Normal Form, or BCNF. A relation is in BCNF if: –Whenever there is a nontrivial dependency A 1 A 2 …A n B 1 B 2 …B m for R, it must be the case that the left hand side {A 1, A 2, …, A n } is a superkey for R.
5
Boyce-Codd Normal Form - Example Relation Movie in the previous figure is not in BCNF. Consider the FD: title year length filmType studioName Unfortunately, the left side of the above dependency is not a superkey. –In particular we know that the title and the year does not functionally determine starName. On the other hand, Movie 1 is in BCNF. –The only key is {title, year} and –title year length filmType studioName is the only (non-trivial) FD that holds in the relation. Violating BCNF
6
Decomposition into BCNF The decomposition strategy is: –Find a non-trivial FD A 1 A 2 …A n B 1 B 2 …B m that violates BCNF, i.e. A 1 A 2 …A n is not a superkey. –Decompose the relation schema into two overlapping relation schemas: One is all the attributes involved in the violating dependency and the other is the left side and all the other attributes not involved in the dependency. By repeatedly, choosing suitable decompositions, we can break any relation schema into a collection of smaller schemas in BCNF. The data in the original relation is represented faithfully by the data in the relations that are the result of the decomposition. –i.e. we can reconstruct the original relation exactly from the decomposed relations.
7
Boyce-Codd Normal Form - Example Consider relation schema: Movies(title, year, studioName, president, presAddr) and functional dependencies: title year studioName studioName president president presAddr Last two FDs violate BCNF. Why? Compute {title, year}+, {studioName}+, {president}+ and see if you get all the attributes of the relation. If not, you got some FD which is violates BCNF, and need to break relation.
8
Boyce-Codd Normal Form – Example Let’s decompose starting with: studioName president Let’s add to the right-hand side any other attributes in the closure of studioName (optional “rule of thumb”). 1.X={studioName} studioName president 2.X={studioName, president} president presAddr 3.X={studioName}+={studioName, president, presAddr}
9
Boyce-Codd Normal Form – Example From the closure we get: studioName president presAddr We decompose the relation schema into the following two schemas: Movies1(studioName, president, presAddr) Movies2(title, year, studioName) Movies2 is in BCNF. Because we can’t find a “bad” FD holding there. What about Movies1? The following dependency violates BCNF. president presAddr Why it’s bad to leave Movies1 table as is? If many studios share the same president than we would have redundancy when repeating the presAddr in all those studios.
10
Boyce-Codd Normal Form – Example We must decompose Movies1, using the FD: president presAddr The resulting relation schemas, both in BCNF, are: Movies11(title, year, studioName) Movies12(studioName, president) In general, we must keep applying the decomposition rule as many times as needed, until all our relations are in BCNF. So, finally we got Movies11, Movies12, and Movies2.
11
Finding FDs for the decomposed relations When we decompose a relation, we need to check that the resulting schemas are in BCNF. We can’t tell a relation is in BCNF, unless we can determine the FDs that hold for that relation.
12
Suppose S is one of the resulting relations in a decomposition of R. For this: Consider each subset X of attributes of S. Compute X + using the FD on R. At the end throw out the attributes of R, which aren’t in S. Then, for each attribute B such that: B is an attribute of S, B is in X + we have that the functional dependency X B holds in S. Finding FDs for the decomposed relations
13
Relational Algebra Operations Operations of relational algebra fall into four broad classes: 1.The usual set operations union intersection difference 2.Operations that remove parts of a relation: selection eliminates some rows(tuples) projection eliminates some columns 3.Operations that combine the tuples of two relations: Cartesian product pairs the tuples of two relations in all possible ways join selectively pairs tuples from two relations. 4.An operation called “renaming.”
14
Conditions for Set Operations on Relations 1.R and S must have schemas with identical sets of attributes. 2.Before applying the operations, the columns of R and S must be ordered so that the order of attributes is the same for both relations.
15
Projection Produces from a relation R a new relation that has only some of R’s columns. A1, A2,…,An (R) is a relation that has only the columns for attributes A 1, A 2,…, A n of R. Example: Compute the expression title, year, length (Movies) on the table: title yearlengthfilmType studioName producerC# Star wars 1977124color Fox 12345 Mighty Ducks 1991104color Disney 67890 Wayne’s World 199295color Paramount 99999
16
Example (Continued) Resulting relation: titleyearlength Star wars1977124 Mighty Ducks1991104 Wayne’s World199295 What about filmtype (Movies)
17
Selection Selection, applied to a relation R, produces a new relation with a subset of R’s tuples. The tuples in the result are those that satisfy some condition C. Denote it with C ( R ). The schema for the resulting relation is the same as R’s schema. Example: The expression length 100 (Movie) is: title yearlengthfilmType studioName producerC# Star wars 1977124color Fox 12345 Mighty Ducks 1991104color Disney 67890
18
Cartesian Product Cartesian Product of two relations R and S is the set of pairs that can be formed by choosing the first element of the pair to be any element of R and the second an element of S. This denoted as R S. Example: R: A B S: B C D 1 2 2 5 6 3 4 4 7 8 9 10 11 R S: AR.BS.BCD 12256 12478 1291011 34256 34478 3491011
19
Natural Join Denoted as R S. Let A 1, A 2,…,A n be the attributes in both the schema of R and the schema of S. Then a tuple r from R and a tuple s from S are successfully paired if and only if r and s agree on each of the attributes A 1, A 2, …, A n. Example: The natural join of the relation R and S from previous example is: ABCD 1256 3478
20
Combing Operations to Form Queries “What are the title and years of movies made by Fox that are at least 100 minutes long?” One way to compute the answer to this query is: Select those Movie tuples that have length 100. Select those Movie tuples that have studioName =‘Fox’. Compute the intersection of first and second steps. Project the relation from the third step onto attributes title and year.
21
Another Example Consider two relations Movie1 and Movie2, With schemas: Movie1(title, year, length, filmType, studioName) Movie2(title, year, starName) Suppose we want to know: “Find the stars of the movies that are at least 100 minutes long.” First we join the two relations: Movie1, Movie2 Second we select movies with length at least 100 min. Then we project the starName.
22
Relational Algebra on Bags A bag is like a set, but an element may appear more than once. Example: {1,2,1,3} is a bag. {1,2,3} is also a bag that happens to be a set. Bags also resemble lists, but order in a bag is unimportant. –Example: {1,2,1} = {1,1,2} as bags, but [1,2,1] != [1,1,2] as lists.
23
Operations on Bags Selection applies to each tuple, so its effect on bags is like its effect on sets. Projection also applies to each tuple, but as a bag operator, we do not eliminate duplicates. Products and joins are done on each pair of tuples, so duplicates in bags have no effect on how we operate.
24
Bag Union, Intersection, Difference An element appears in the union of two bags the sum of the number of times it appears in each bag. Example: {1,2,1} {1,1,2,3,1} = {1,1,1,1,1,2,2,3} An element appears in the intersection of two bags the minimum of the number of times it appears in either. Example: {1,2,1} {1,2,3} = {1,2}. An element appears in difference A – B of bags as many times as it appears in A, minus the number of times it appears in B. –But never less than 0 times. Example: {1,2,1} – {1,2,3} = {1}.
25
The Extended Algebra 1. = eliminate duplicates from bags. 2. = sort tuples. 3.Extended projection: arithmetic, duplication of columns. 4. = grouping and aggregation. 5.OUTERJOIN: avoids “dangling tuples” = tuples that do not join with anything.
26
Example: Outerjoin R = ABS =BC 1223 4567 (1,2) joins with (2,3), but the other two tuples are dangling. R S =ABC 123 45NULL NULL67
27
Aggregation Operators They apply to entire columns of a table and produce a single result. The most important examples: –SUM –AVG –COUNT –MIN –MAX
28
Example: Aggregation R =AB 13 34 32 SUM(A) = 7 COUNT(A) = 3 MAX(B) = 4 MIN(B) = 2 AVG(B) = 3
29
Grouping Operator R 1 := L (R 2 ). L is a list of elements that are either: 1.Grouping attributes. 2.AGG(A), where AGG is one of the aggregation operators and A is an attribute. Semantics Group R according to all the grouping attributes on list L. That is, form one group for each distinct list of values for those attributes in R. Within each group, compute AGG(A) for each aggregation on list L. Result has grouping attributes and aggregations as attributes. One tuple for each list of values for the grouping attributes and their group’s aggregations.
30
Example: Grouping/Aggregation R =ABC 123 456 125 A,B,AVG(C) (R) = ?? First, group R : ABC 123 125 456 Then, average C within groups: ABAVG(C) 12 4 45 6
31
Example: Grouping/Aggregation StarsIn(title, year, starName) We want, for each star who has appeared in at least three movies the earliest year in which he or she appeared. First we group, using starName as a grouping attribute. Then, we have to compute the MIN(year) for each group. However, we need also compute COUNT(title) aggregate for each group, in order to filter out those stars with less than three movies. ctTitle>3 [ starName,MIN(year) minYear,COUNT(title) ctTitle (StarsIn)]
32
Aggregations in SQL SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause to produce that aggregation on the column. Find the average length of movies from Disney. SELECT AVG(length) FROM Movie WHERE studioName = ' Disney ' ;
33
Eliminating Duplicates in an Aggregation DISTINCT inside an aggregation causes duplicates to be eliminated before the aggregation. Example: Find the number of different producers for Disney movies: SELECT COUNT(DISTINCT producerc) FROM Movie WHERE studioname = 'Disney'; This is not the same as: SELECT DISTINCT COUNT(producerc) FROM Movie WHERE studioname = 'Disney';
34
NULL’s Ignored in Aggregation NULL never contributes to a sum, average, or count, and can never be the minimum or maximum of a column. select SUM(networth) from movieexec;
35
Example: Effect of NULL’s SELECT count(*) FROM Movie WHERE studioName = 'Disney'; SELECT count(length) FROM Movie WHERE studioName = 'Disney'; The number of movies from Disney. The number of movies from Disney with a known length.
36
Grouping We may follow a SELECT-FROM-WHERE expression by GROUP BY and a list of attributes. The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group. From Movie relation, find the average length for each studio : SELECT studioName, AVG(length) FROM Movie GROUP BY studioName;
37
Example: Grouping Find the producer’s total length of film produced. SELECT name, SUM(length) FROM Movie, MovieExec WHERE producerc = cert GROUP BY name; Compute those tuples first, then group by name.
38
Restriction on SELECT Lists With Aggregation If any aggregation is used, then each element of the SELECT list must be either: 1.Aggregated, or 2.An attribute on the GROUP BY list.
39
Illegal Query Example We might think we could find the shortest movie of Disney as: SELECT title, MIN(length) FROM Movie WHERE studioName = 'Disney'; But this query is illegal in SQL. Why? Because title is neither aggregated nor on the GROUP BY list. We should do instead: SELECT title, length FROM Movie WHERE studioName = 'Disney' AND length = (SELECT MIN(length) FROM Movie WHERE studioName = 'Disney');
40
HAVING Clauses HAVING may follow a GROUP BY clause. If so, the condition applies to each group, and groups not satisfying the condition are eliminated. These conditions may refer to attributes that make sense within a group; i.e., they are either: 1.Grouping attributes, or 2.Aggregated attributes.
41
Example: HAVING Suppose that we didn’t wish to include all the producers in our table of aggregated movie lengths. Suppose for instance we want those producers who have at least one movie before 1972. SELECT name, SUM(length) FROM MovieExec, Movie WHERE producerc = cert GROUP BY name HAVING MIN(year) < 1973;
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.