Download presentation
Presentation is loading. Please wait.
Published bySimon Sherman Modified over 9 years ago
1
SQL Unit 7 Set Operations Kirk Scott 1
2
7.1 Introduction 7.2 UNION Queries 7.3 Queries with IN (Intersection) 7.4 Queries with NOT IN (Set Subtraction) 7.5 Unions, Joins, and Outer Joins 2
3
7.1 Introduction 3
4
1. The technical term for a table is a relation. A relation is like a set. The technical term for a row in a table is a tuple. A tuple is like an element in a set. 4
5
The fundamental difference between an element of a set and a tuple in a relation is that the tuple may be a composite. It may contain values for more than one different attribute. 5
6
The similarity between sets and relations explains some of the aspects of relations. The order of elements in a set is immaterial; likewise, the order of tuples in a relation is immaterial. 6
7
A set can't contain duplicate elements; likewise, a relation can't contain duplicate tuples. (Although a query result can.) 7
8
2. Recall the logical operator OR. This allowed you to make conditions on the values of attributes. There is also a set level operation, union, which is related in meaning. Union applies not to attribute values, but to collections of tuples in relations. 8
9
Given two sets, A and B, you may recall this definition of union from math class: = the union of A and B = the set of elements in A or in B or in both A and B 9
10
In this Venn diagram, both A and B and the area where they overlap are shaded, indicating that they all are included in the union. A B 10
11
Microsoft Access SQL has the keyword UNION, which implements the idea behind a logical union. 11
12
3. Here is a simple example illustrating the use of the union operator. Suppose that table A and table B have each been defined with the same number of fields, of the same type, in the same order. The names of the fields don't have to be the same. 12
13
Then consider this query: SELECT * FROM A UNION SELECT * FROM B 13
14
The results will contain the set of rows that were in A or B or both Because the query uses a set operator, and duplicates are not allowed in sets, any row that might have occurred in both A and B will only appear once in the results. It’s not a major thing, but to emphasize, you can observe the following: In general, query results may contain duplicate rows, but the use of set operators has a similar effect to the keyword DISTINCT. 14
15
4. The previous example specified that the tables in the two parts of the query had to have the same number of fields of the same type and in the same order. It wouldn't do to have records in the same result table which varied in the number of fields they contained. It also wouldn't do for numeric fields to hold non-numeric values, and vice-versa. 15
16
Having a correspondence between fields in the two parts of the query is known as union compatibility. The specific requirements for this are: A. The corresponding fields in the two parts of the query should mean the same thing. This may be referred to as semantic equivalence. 16
17
B. If the corresponding fields are of exactly the same type and size, there is no problem at all. The formal requirements are less stringent though: i. All numeric fields are union compatible with each other. ii. All text fields are union compatible with each other. iii. All date fields are union compatible with each other. 17
18
In cases where the types of the fields are not the same, but they are union compatible, the "larger" of the two types will be used in the results. Given two union compatible types, the "larger" kind of one type can always hold values of the other type. 18
19
A text field with a large width can hold the values of a text field with a smaller width. A numeric type that can have decimal points can hold integer values. Since the one that can hold the other is used in the results, no data will be lost when a union is done. 19
20
7.2 UNION Queries 20
21
1. Here is a concrete example of a union query using tables and fields from the cardealership database: SELECT * FROM Car WHERE make = 'Chevrolet' UNION SELECT * FROM Car WHERE make = 'Toyota' 21
22
This query illustrates the relationship between the UNION operator and the OR operator. Because the two parts of the query are on the same table, the following query would accomplish the same thing: SELECT * FROM Car WHERE make = 'Chevrolet' OR make = 'Toyota' 22
23
In this query the Car table is the "universe", and the query finds the union of two disjoint subsets of the Car table, because no car could have two different makes at the same time. This is the Venn diagram for the query: 23
24
make = 'Chevro -let' make = 'Toyota' Car 24
25
2. Here is another example of a union query. The two parts of the query are based on two different tables: SELECT name, addr, city, state FROM Customer UNION SELECT name, addr, city, state FROM Salesperson 25
26
Because two tables are involved, it would not be possible to accomplish this with the OR operator. Notice that there is no problem with union compatibility because the corresponding fields in the two tables were defined in exactly the same way. 26
27
The Venn diagram for this query is more typical than the previous diagram. The same person could be both a salesperson and a customer. The results of the query would include the names, addresses, cities, and states of all customers, all salespeople, and anybody who fell into both categories. 27
28
Custom- ers Sales- people 28
29
A union can be thought of as a vertical combination of two tables: 29
30
UNION 30
31
3. As noted previously, the union operator eliminates duplicates from the results of a query. If by chance you would like to do a union and not have the duplicates eliminated, you would use the keywords UNION ALL: SELECT city FROM Customer UNION ALL SELECT city FROM Salesperson 31
32
There is a side effect related to eliminating or keeping duplicates in the results. When plain UNION is used, the duplicates will be eliminated and the results will typically be sorted in some order. The explanation is that the system uses the following approach to eliminate duplicates: First it sorts the records. After sorting, duplicates should be next to each other. Then the system finds and eliminates them. 32
33
4. It is possible to do unions where one part of the query doesn't have fields corresponding to the fields in the other part. Those fields that correspond have to be union compatible. For those fields without corresponding fields, nulls have to be used. 33
34
Recall that the schemas for Customer and Salesperson look like this: Customer(custno pk, name, addr, city, state, phone) Salesperson(spno pk, name, addr, city, state, phone, bossno, commrate) 34
35
Here is an example of a union query where all of the fields of the Customer table are matched with the explicitly listed corresponding fields of the Salesperson table: SELECT * FROM Customer UNION SELECT spno, name, addr, city, state, phone FROM Salesperson 35
36
If you would like to keep all of the fields from the Salesperson table while also including all of the records from the Customer table in the results, you could do this: SELECT *, NULL, NULL FROM Customer UNION SELECT * FROM Salesperson 36
37
7.3 Queries with IN (Intersection) 37
38
1. Among the concepts of set theory, along with union, there is intersection and there is the idea of set containment. Given two sets, A and B, here is the definition of union again, along with the definitions of intersection and containment: 38
39
= the union of A and B = the set of elements in A or in B or in both A and B 39
40
= the intersection of A and B = the set of elements that A and B have in common 40
41
= A is contained in B; as a proposition this is either true or false, either the elements of A are also in B, or they're not 41
42
In this Venn diagram, the area where A and B overlap is crosshatched, indicating that this is the area in the intersection. A B 42
43
This Venn diagram signifies that A is contained in B: A B 43
44
Microsoft Access SQL does not have keywords for intersection or containment, but it does have this operator: IN Using IN it is possible to write expressions that check whether or not a given set of tuples is included in another set. This makes it possible to find the intersection of two sets. 44
45
2. It is possible to specify a set of values in SQL by enclosing the values in parentheses (not curly braces) and separating them with commas. This first example of the use of the keyword IN involves such a set: SELECT * FROM Car WHERE make IN ('Chevrolet', 'Toyota') 45
46
This query is equivalent in results to the following query already seen above: SELECT * FROM Car WHERE make = 'Chevrolet' OR make = 'Toyota' The results of the query are the union of two sets. 46
47
3. The more general use of the keyword IN occurs when a set of values in a query is defined by a subquery rather than a set listed in parentheses. An example is shown below. Notice that its structure is similar to the foregoing examples. 47
48
The outer query selects from a table where some field value is in or is not in the set specified by the subquery: SELECT name FROM Salesperson WHERE spno IN (SELECT spno FROM Carsale) 48
49
This query illustrates the ideas of intersection and containment. You're selecting the names of salespeople whose spno's appear in the Carsale table. Because of referential integrity, every spno in the Carsale table has to appear in the Salesperson table. That means that the set of spno's from the Carsale table is a subset of the spno's in the Salesperson table. 49
50
Not every salesperson has to have sold a car, so not necessarily every spno in Salesperson appears in the Carsale table. When you find the intersection between the two, it is simply the set of spno's from Carsale. 50
51
This is a Venn diagram of the situation: Carsale spno's Salesperson spno's— get the names from this table 51
52
The query finds the names of salespeople who sold cars. Notice that because of the way this query is structured as a set query, a salesperson's name will appear only once in the results, even if that salesperson sold more than one car. In other words, a given spno may occur more than once in the Carsale table, but it will only appear once in the query results. 52
53
This happens because this is how a set query with IN logically works: In the outer query, when checking whether a given spno is in the set defined by the subquery, the answer is either yes or no. If the answer is yes, then the spno is valid in the outer query, but it only appears there once. 53
54
The "IN" can be read as "squeezing out" duplicate occurrences of spno. Then in the outer query, for each distinct spno, the one corresponding name is shown. 54
55
4. Here is another example of an IN query with a subquery. It shows the stickerprices of cars that sold. SELECT stickerprice FROM Car WHERE vin IN (SELECT vin FROM Carsale) 55
56
This is the Venn diagram for this query. Since not all cars have sold, the Carsale vin's will be a subset of the Car vin's: Carsale vin's Car vin's— get the stickerprices from this table 56
57
The previous example illustrated how the use of IN can remove duplicates. This example illustrates another point. Remember that cars can only be sold once, so there would be no duplicate vin values to squeeze out of the subquery results. IN would still squeeze them out if they existed, but that won’t actually happen in this case. 57
58
However, duplicates can still arise in the overall results. In this example, among the cars that sold, there are two of them with a stickerprice of 18,000. 18,000 will show up twice in the results of the query. 58
59
In the previous example, if two salespeople had the same name, duplicates would appear in the results. It was just assumed that there would be no duplicate names. 59
60
The explanation for this is that in the Carsale table as given, cars can only be sold once, so their vin's show up there only once. The use of IN would check to see whether a car had been sold more than once and eliminate any duplicates if it had, but that would have no effect in this example because there are no duplicate car sales. 60
61
However, once execution reaches the outer query, for whatever set of vin's the inner query found, you select the stickerprice. If two different vin's have the same stickerprice, that stickerprice will be shown twice in the overall results of the query. 61
62
5. In the previous two example queries, one table is opened in the inner query and another is opened in the outer query. In order to do an IN query, the inner query has to select exactly one field, and there has to be a field which corresponds to it in the table of the outer query. 62
63
In the examples given, the names of the corresponding fields, vin and spno, were the same in the inner and outer queries. There was no need to fully qualify the field names because the parentheses serve as a barrier between the inner and outer queries. Inside the parentheses the field name belongs to the table of the inner query. Outside of the parentheses the field name belongs to the table of the outer query. 63
64
7.4 Queries with NOT IN (Set Subtraction) 64
65
1. Among the concepts of set theory, along with union, intersection, and containment, there are two more concepts to consider: complement or negation; and set subtraction. Given two sets, A and B, here are the definitions of complement and subtraction: 65
66
A' = the complement of A = the set of elements not in A A – B = the difference between A and B = the set of elements which are in A but not in B 66
67
This Venn diagram shows the complement of A: A' A 67
68
This Venn diagram shows A – B: A - B B 68
69
Microsoft Access SQL does not have separate operators for complement or set subtraction. However, it does have this operator: NOTThis is negation or complement Using NOT, you can negate expressions and effectively find a complement. Using NOT IN, you can accomplish set subtraction. 69
70
2. It is possible to negate the initial query of the previous section. This gives a straightforward example of the use of NOT IN: SELECT * FROM Car WHERE make NOT IN ('Chevrolet', 'Toyota') 70
71
The results of this query are the complement of the results of the plain IN query of the previous section. This is the Venn diagram for the negated query, where once again, it is the shaded area which is included in the results: 71
72
Car make = 'Toyota' make = 'Chevro -let' 72
73
3. The non-negated version of the query, which simply used IN, could be interpreted as an OR query. The negated version of the query under discussion here, which uses NOT, can be regarded as the negation of an OR query. Once you start negating logical expressions, you need to be careful in interpreting what the results might be. 73
74
The following query is approximately logically equivalent to the NOT IN query: SELECT * FROM Car WHERE make <> 'Chevrolet' AND make <> 'Toyota' 74
75
If you negate something that can be regarded as an OR, you get the AND of the two parts negated separately. Likewise, if you negate something that can be regarded as an AND, you get the OR of the two parts negated separately. The following two rules are DeMorgan's Laws for sets. They give the general result described here: 75
76
76
77
The reason why the AND query is only approximately equivalent to the NOT IN query has to do with null values. The query with AND will return records where make is null. The NOT IN query will not return records where make is null. 77
78
To understand why, look at the NOT IN query again: SELECT * FROM Car WHERE make NOT IN ('Chevrolet', 'Toyota') 78
79
The logic of this is that a null is not a value at all. Null could never be an element of a set. When you use the set operator NOT IN, the system will only return actual values that occur in the set of values for make. It will not return null as a value. 79
80
You can add this to the list of “peculiarities” of set queries. Together, the list consists of these two points: Set queries do not include duplicate values. Set queries do not include nulls. 80
81
4. Here is a more general NOT IN query, where the set is defined by a subquery: SELECT stickerprice FROM Car WHERE vin NOT IN (SELECT vin FROM Carsale) 81
82
Here is the Venn diagram for this query: Car vin's— get the stickerprices from here Carsale vin's 82
83
The meaning of this is straightforward. It finds the stickerprices of cars that haven't sold. There are no surprises because the field in question, vin, is the primary key of both the Car and Carsale tables, so it will never be null. 83
84
7.5 Unions, Joins, and Outer Joins 84
85
1. Part of the relationship between unions and outer joins was brought up in a previous unit: A join can be thought of as a horizontal combination of two tables: 85
86
JOIN 86
87
A union can be thought of as a vertical combination of two tables: 87
88
UNION 88
89
A full outer join can be found with the union of a left join and a right join on the same two tables: …LEFT JOIN… UNION …RIGHT JOIN… 89
90
2. It is also possible to do an outer join with the help of set operators. To get started on this topic, here is a review of what an outer join is. A left or right outer join is a join that includes all of the records from both tables that match on the joining field, plus it includes all of the records from one of the tables, either left or right, that don't have a match on the joining field. For those that don't have a match, it supplies NULL as the value for the fields that come from the other table. 90
91
This is a left join on the Car and Carsale tables: SELECT * FROM Car LEFT JOIN Carsale ON Car.vin = Carsale.vin 91
92
This will give a result table containing records for all cars, both those sold and unsold. For those that sold, there will be values for the fields vin (from the Carsale table), spno, custno, salesdate, and salesprice. For cars that didn't sell, those fields will be null. 92
93
3. It's not important to be able to do an outer join using UNION, NOT, and IN. The outer join syntax is easier. However, writing an outer join query with the set operators gives an additional chance to see them used to accomplish a desired result. The first part of the problem is simple. 93
94
This plain join will give all of the records which have matches: SELECT * FROM Car, Carsale WHERE Car.vin = Carsale.vin 94
95
This nested query with NOT IN will find the records of all of the cars that didn't sell, and it will put the value NULL into unspecified fields in the result table that will correspond to the five fields of the Carsale table: SELECT *, NULL, NULL, NULL, NULL, NULL FROM Car WHERE vin NOT IN (SELECT vin FROM Carsale) 95
96
The left join is completed by finding the UNION of the two previous results: SELECT * FROM Car, Carsale WHERE Car.vin = Carsale.vin UNION SELECT *, NULL, NULL, NULL, NULL, NULL FROM Car WHERE vin NOT IN (SELECT vin FROM Carsale) 96
97
The End 97
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.