Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3: Relational Algebra and SQL Tuesday, March 25, 2008.

Similar presentations


Presentation on theme: "Lecture 3: Relational Algebra and SQL Tuesday, March 25, 2008."— Presentation transcript:

1 Lecture 3: Relational Algebra and SQL Tuesday, March 25, 2008

2 2 Outline Relational Algebra: 4.2 (except 4.2.5) SQL: 5.2, 5.3, 5.4

3 3 Querying the Database Goal: specify what we want from our database –Find all the employees who earn more than $50,000 and pay taxes in New Jersey. Could write in C++/Java, but bad idea Instead use high-level query languages: –Theoretical: Relational Algebra, Datalog –Practical: SQL Relational algebra: a basic set of operations on relations that provide the basic principles.

4 4 Instances of Branch and Staff (part) Relations

5 5 Relational Algebra Operators: relations as input, new relation as output Five basic RA operators: –Set Operators union, difference Selection:  –Projection:  –Cartesian Product: X Derived operators: –Intersection, complement –Joins (natural,equi-join, theta join, semi-join) When our relations have attribute names: –Renaming: 

6 6 Set Operations: Union Union: all tuples in R1 or R2 Notation: R1 U R2 R1, R2 must have the same schema R1 U R2 has the same schema as R1, R2 Example: –ActiveEmployees U RetiredEmployees

7 7 Union R  S –Union of two relations R and S defines a relation that contains all the tuples of R, or S, or both R and S, duplicate tuples being eliminated. –R and S must be union-compatible. If R and S have I and J tuples, respectively, union is obtained by concatenating them into one relation with a maximum of (I + J) tuples. Example: List all cities where there is either a branch office or a property for rent. Pcity(Branch) union Pcity(PropertyForRent) or R = Branch[city] union PropertyForRent[city]

8 8 Intersection R  S –Defines a relation consisting of the set of all tuples that are in both R and S. –R and S must be union-compatible. Expressed using basic operations: R  S = R – (R – S) Example: List all cities where there is both a branch office and at least one property for rent.  city (Branch)   city (PropertyForRent)

9 9 Difference (Minus) R – S –Defines a relation consisting of the tuples that are in relation R, but not in S. –R and S must be union-compatible. List all cities where there is a branch office but no properties for rent.  city (Branch) –  city (PropertyForRent) Or R = Branch [city] - PropertyForRent [city]

10 10 Set Operations: Difference Difference: all tuples in R1 and not in R2 Notation: R1 – R2 R1, R2 must have the same schema R1 - R2 has the same schema as R1, R2 Example –AllEmployees - RetiredEmployees

11 11 Relational Algebra Operations

12 12 Set Operations: Selection Returns all tuples which satisfy a condition Notation:  c (R) c is a condition: =,, and, or, not Output schema: same as input schema Find all employees with salary > $40,000: –  Salary > 40000 (Employee)

13 13 Find all employees with salary more than $40,000.  Salary > 40000 (Employee)

14 14 Restriction (or Selection) –Works on a single relation R and defines a relation that contains only those tuples (rows) of R that satisfy the specified condition (predicate). Example: List all staff with a salary greater than US$10,000.  salary > 10000 (Staff) or R = STAFF Where Salary > 10000

15 15 Projection Unary operation: returns certain columns Eliminates duplicate tuples ! Notation:  A1,…,An (R) Input schema R(B1,…,Bm) Condition: {A1, …, An}  {B1, …, Bm} Output schema S(A1,…,An) Example: project social-security number and names: –  SSN, Name (Employee)

16 16 Projection  col1,..., coln (R) –Works on a single relation R and defines a relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates. Produce a list of salaries for all staff, showing only staffNo, fName, lName, and salary details.  staffNo, fName, lName, salary (Staff) or Staff [ staffNo, fName, lName, salary ]

17 17  SSN, Name (Employee)

18 18 Cartesian Product Each tuple in R1 with each tuple in R2 Notation: R1 x R2 Input schemas R1(A1,…,An), R2(B1,…,Bm) Condition: {A1,…,An} ∩ {B1,…Bm} =  Output schema is S(A1, …, An, B1, …, Bm) Notation: R1 x R2 Example: Employee x Dependents Very rare in practice; but joins are very often

19 19

20 20 Cartesian Product (Multiplication) R X S –Defines a relation that is the concatenation of every tuple of relation R with every tuple of relation S. List the names and comments of all clients who have viewed a property for rent. (  clientNo, fName, lName (Client)) X (  clientNo, propertyNo, comment (Viewing)) or Client [ clientNo, fName, lName ] x Viewing [ clientNo, propertyNo, comment ]

21 21 Example - Cartesian product and Selection Use selection operation to extract those tuples where Client.clientNo = Viewing.clientNo.  Client.clientNo = Viewing.clientNo ((  clientNo, fName, lName (Client))  (  clientNo, propertyNo, comment (Viewing))) u Cartesian product and Selection can be reduced to a single operation called a Join.

22 22 Renaming Does not change the relational instance Changes the relational schema only Notation:  B1,…,Bn (R) Input schema: R(A1, …, An) Output schema: S(B1, …, Bn) Example:  LastName, SocSocNo (Employee)

23 23 Renaming Example Employee NameSSN John999999999 Tony777777777 LastNameSocSocNo John999999999 Tony777777777  LastName, SocSocNo (Employee)

24 24 Derived Operations Intersection can be derived: –R1 ∩ R2 = R1 – (R1 – R2) –There is another way to express it (later) Most importantly: joins, in many variants

25 25 Join Join is a derivative of Cartesian product. Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations. One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems.

26 26 Join Join is a derivative of Cartesian product. Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations. One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems. Various forms of join operation –Natural join (defined by Codd) –Outer join –Theta join –Equijoin (a particular type of Theta join) –Semijoin

27 27 Theta join (  -join) R F S –Defines a relation that contains tuples satisfying the predicate F from the Cartesian product of R and S. –The predicate F is of the form R.a i  S.b i where  may be one of the comparison operators (, , =,  ).

28 28 Theta join (  -join) Can rewrite Theta join using basic Selection and Cartesian product operations. R F S =  F (R  S) u Degree of a Theta join is sum of degrees of the operand relations R and S. If predicate F contains only equality (=), the term Equijoin is used.

29 29 Example - Equijoin List the names and comments of all clients who have viewed a property for rent. (  clientNo, fName, lName (Client)) Client.clientNo = Viewing.clientNo (  clientNo, propertyNo, comment (Viewing))

30 30 EQUIJOIN SELECT table.column, table.column FROM table1, table2 WHERE table1.column1 = table2.column2;

31 31 Natural Join Notation: R1 R2 Input Schema: R1(A1, …, An), R2(B1, …, Bm) Output Schema: S(C1,…,Cp) –Where {C1, …, Cp} = {A1, …, An} U {B1, …, Bm} Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes: –{A1,…,An} ∩ {B1,…, Bm} (called the join attributes) Equivalent to a cross product followed by selection Example Employee Dependents

32 32 Natural Join Example Employee NameSSN John999999999 Tony777777777 Dependents SSNDname 999999999Emily 777777777Joe NameSSNDname John999999999Emily Tony777777777Joe Employee Dependents =  Name, SSN, Dname (  SSN=SSN2 (Employee x  SSN2, Dname (Dependents))

33 33 Natural Join List the names and comments of all clients who have viewed a property for rent. (  clientNo, fName, lName (Client)) Join (  clientNo, propertyNo, comment (Viewing)) Or Client [ clientNo, fName, lName ] Join Viewing [ clientNo, propertyNo, comment ]

34 34 Another Natural Join Example R= S= R S= AB XY XZ YZ ZV BC ZU VW ZV ABC XZU XZV YZU YZV ZVW

35 35 Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? Given R(A, B, C), S(D, E), what is R S ? Given R(A, B), S(A, B), what is R S ?

36 36 Theta Join A join that involves a predicate Notation: R1  R2 where  is a condition Input schemas: R1(A1,…,An), R2(B1,…,Bm) {A1,…An} ∩ {B1,…,Bm} =  Output schema: S(A1,…,An,B1,…,Bm) Derived operator: R1  R2 =   (R1 x R2)

37 37 Semijoin R S =  A1,…,An (R S) Where the schemas are: –Input: R(A1,…An), S(B1,…,Bm) –Output: T(A1,…,An)

38 38 Outer join To display rows in the result that do not have matching values in the join column, use Outer join. R S –(Left) outer join is join in which tuples from R that do not have matching values in common columns of S are also included in result relation.

39 39 Outer Join To display rows in the result that do not have matching values in the join column, use Outer join. R Left Outer Join S –(Left) outer join is join in which tuples from R that do not have matching values in common columns of S are also included in result relation. Example: Produce a status report on property viewings.  propertyNo, street, city (PropertyForRent) Left Outer Join Viewing

40 40 Types of Joins Left Outer Join –keep all of the tuples from the “left” relation –join with the right relation –pad the non-matching tuples with nulls Right Outer Join –same as the left, but keep tuples from the “right” relation Full Outer Join –same as left, but keep all tuples from both relations

41 41 Left Outer Join If we do a left outer join on R and S, and we match on the first column, the result is: AB CD AF GH R=R=S=S= ABF CD- name phone name email name phone email

42 42 Example - Left Outer join Produce a status report on property viewings.  propertyNo, street, city (PropertyForRent) Viewing

43 43 Right Outer Join If we do a right outer join on R and S, and we match on the first column, the result is: AB CD AF GH R=R=S=S= ABF G-H name phone name email name phone email

44 44 Full Outer Join If we do a full outer join on R and S, and we match on the first column, the result is: AB CD AF GH R=R=S=S= ABF CD- G-H name phone name email name phone email

45 45 OUTER JOIN Example 1

46 46 OUTER JOIN Example 2

47 47 Division Identify all clients who have viewed all properties with three rooms. (  clientNo, propertyNo (Viewing))  (  propertyNo (  rooms = 3 (PropertyForRent)))

48 48 Natural join R S –An Equijoin of the two relations R and S over all common attributes x. One occurrence of each common attribute is eliminated from the result.

49 49 Example - Natural join List the names and comments of all clients who have viewed a property for rent. (  clientNo, fName, lName (Client)) (  clientNo, propertyNo, comment (Viewing))

50 50 Semijoin R F S –Defines a relation that contains the tuples of R that participate in the join of R with S. u Can rewrite Semijoin using Projection and Join: R F S =  A (R F S)

51 51 Example - Semijoin List complete details of all staff who work at the branch in Glasgow. Staff Staff.branchNo = Branch.branchNo and Branch.city = ‘Glasgow’ Branch

52 52 Division R  S –Defines a relation over the attributes C that consists of set of tuples from R that match combination of every tuple in S. Expressed using basic operations: T 1   C (R) T 2   C ((S X T 1 ) – R) T  T 1 – T 2

53 53 Example - Division Identify all clients who have viewed all properties with three rooms. (  clientNo, propertyNo (Viewing))  (  propertyNo (  rooms = 3 (PropertyForRent)))

54 54

55 55

56 56

57 57

58 58

59 59 Summary of Relational Algebra Five basic operators, many derived Combine operators in order to construct queries: relational algebra expressions, usually shown as trees

60 60 RA has Limitations ! Cannot compute “transitive closure” Find all direct and indirect relatives of Fred Cannot express in RA !!! Need to write C program Name1Name2Relationship FredMaryFather MaryJoeCousin MaryBillSpouse NancyLouSister

61 61 Equivalences The same relational algebraic expression can be written in many different ways. The order in which tuples appear in relations is never significant. A  B B  A A  B B  A A  B B  A (A - B) is not the same as (B - A)  c1 (  c2 (A))  c2 (  c1 (A))  c1 ^ c2 (A)  a1 (A)  a1 (  a1,etc (A)), where etc is any attributes of A....

62 62 Operations on Bags (and why we care) Union: {a,b,b,c} U {a,b,b,b,e,f,f} = {a,a,b,b,b,b,b,c,e,f,f} –add the number of occurrences Difference: {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d} –subtract the number of occurrences Intersection: {a,b,b,b,c,c}∩{b,b,c,c,c,c,d} = {b,b,c,c} –minimum of the two numbers of occurrences Selection: preserve the number of occurrences Projection: preserve the number of occurrences (no duplicate elimination) Cartesian product, join: no duplicate elimination

63 63 SQL Introduction Standard language for querying and manipulating data Structured Query Language Many standards out there: SQL92, SQL2, SQL3, SQL99 Vendors support various subsets of these, but all of what we’ll be talking about. Works on bags, rather than sets Basic construct: SELECT … FROM … WHERE …

64 64 Selections Company(sticker, name, country, stockPrice) Find all US companies whose stock is > 50: SELECT * FROM Company WHERE country=“USA” AND stockPrice > 50 Output schema: R(sticker, name, country, stockPrice)

65 65 Selections What you can use in WHERE: attribute names of the relation(s) used in the FROM. comparison operators: =, <>,, = apply arithmetic operations: stockprice*2 operations on strings (e.g., “||” for concatenation). Lexicographic order on strings. Pattern matching: s LIKE p Special stuff for comparing dates and times.

66 66 The LIKE operator s LIKE p: pattern matching on strings p may contain two special symbols: –% = any sequence of characters –_ = any single character Company(sticker, name, address, country, stockPrice) Find all US companies whose address contains “Mountain”: SELECT * FROM Company WHERE country=“USA” AND address LIKE “%Mountain%” Needed in the 1 st assignment !

67 67 Select only a subset of the attributes SELECT name, stockPrice FROM Company WHERE country=“USA” AND stockPrice > 50 Input schema: Company(sticker, name, country, stockPrice) Output schema: R(name, stock price) Projections

68 68 Rename the attributes in the resulting table SELECT name AS company, stockprice AS price FROM Company WHERE country=“USA” AND stockPrice > 50 Input schema: Company(sticker, name, country, stockPrice) Output schema: R(company, price) Projections with Renamings

69 69 Eliminating Duplicates SELECT DISTINCT country FROM Company WHERE stockPrice > 50 Without DISTINCT the result is a bag

70 70 Ordering the Results SELECT name, stockPrice FROM Company WHERE country=“USA” AND stockPrice > 50 ORDERBY country, name Ordering is ascending, unless you specify the DESC keyword. Ties are broken by the second attribute on the ORDERBY list, etc.

71 71 Joins Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stockPrice, country) Person( per-name, phoneNumber, city) Find names of people living in Seattle that bought gizmo products, and the names of the stores they bought from SELECT per-name, store FROM Person, Purchase WHERE per-name=buyer AND city=“Seattle” AND product=“gizmo”

72 72 Disambiguating Attributes SELECT Person.name FROM Person, Purchase, Product WHERE Person.name=buyer AND product=Product.name AND Product.category=“telephony” Product (name, price, category, maker) Purchase (buyer, seller, store, product) Person(name, phoneNumber, city) Find names of people buying telephony products:

73 73 Tuple Variables SELECT product1.maker, product2.maker FROM Product AS product1, Product AS product2 WHERE product1.category=product2.category AND product1.maker <> product2.maker Product ( name, price, category, maker) Find pairs of companies making products in the same category

74 74 Tuple Variables Tuple variables introduced automatically by the system: Product ( name, price, category, maker) SELECT name FROM Product WHERE price > 100 Becomes: SELECT Product.name FROM Product AS Product WHERE Product.price > 100 Doesn’t work when Product occurs more than once.

75 75 Meaning (Semantics) of SQL Queries SELECT a1, a2, …, ak FROM R1 AS x1, R2 AS x2, …, Rn AS xn WHERE Conditions 1. Nested loops: Answer = {} for x1 in R1 do for x2 in R2 do ….. for xn in Rn do if Conditions then Answer = Answer U {(a1,…,ak)} return Answer

76 76 Meaning (Semantics) of SQL Queries SELECT a1, a2, …, ak FROM R1 AS x1, R2 AS x2, …, Rn AS xn WHERE Conditions 2. Parallel assignment Answer = {} for all assignments x1 in R1, …, xn in Rn do if Conditions then Answer = Answer U {(a1,…,ak)} return Answer Doesn’t impose any order !

77 77 Meaning (Semantics) of SQL Queries SELECT a1, a2, …, ak FROM R1 AS x1, R2 AS x2, …, Rn AS xn WHERE Conditions 3. Translation to Relational algebra:  a1,,…,ak (  Conditions (R1 x R2 x … x Rn)) Select-From-Where queries are precisely Select-Project-Join

78 78 First Unintuitive SQLism SELECT R.A FROM R, S, T WHERE R.A=S.A OR R.A=T.A Looking for R ∩ (S U T) But what happens if T is empty?

79 79 Union, Intersection, Difference (SELECT name FROM Person WHERE City=“Seattle”) UNION (SELECT name FROM Person, Purchase WHERE buyer=name AND store=“The Bon”) Similarly, you can use INTERSECT and EXCEPT. You must have the same attribute names (otherwise: rename).

80 80 Conserving Duplicates (SELECT name FROM Person WHERE City=“Seattle”) UNION ALL (SELECT name FROM Person, Purchase WHERE buyer=name AND store=“The Bon”) The UNION, INTERSECTION and EXCEPT operators operate as sets, not bags.

81 81 Subqueries A subquery producing a single tuple: SELECT Purchase.product FROM Purchase WHERE buyer = (SELECT name FROM Person WHERE ssn = “123456789”); In this case, the subquery returns one value. If it returns more, it’s a run-time error.

82 82 Can say the same thing without a subquery: SELECT Purchase.product FROM Purchase, Person WHERE buyer = name AND ssn = “123456789” Is this query equivalent to the previous one ?

83 83 Subqueries Returning Relations SELECT Company.name FROM Company, Product WHERE Company.name=maker AND Product.name IN (SELECT product FROM Purchase WHERE buyer = “Joe Blow”); Here the subquery returns a set of values Find companies who manufacture products bought by Joe Blow.

84 84 Subqueries Returning Relations SELECT Company.name FROM Company, Product, Purchase WHERE Company.name=maker AND Product.name = product AND buyer = “Joe Blow” Equivalent to: Is this query equivalent to the previous one ?

85 85 Subqueries Returning Relations SELECT name FROM Product WHERE price > ALL (SELECT price FROM Purchase WHERE maker=“Gizmo-Works”) Product ( pname, price, category, maker) Find products that are more expensive than all those produced By “Gizmo-Works” You can also use: s > ALL R s > ANY R EXISTS R

86 86 Question for Database Fans Can we express this query as a single SELECT-FROM-WHERE query, without subqueries ? Hint: show that all SFW queries are monotone (figure out what this means). A query with ALL is not monotone

87 87 Conditions on Tuples SELECT Company.name FROM Company, Product WHERE Company.name=maker AND (Product.name,price) IN (SELECT product, price) FROM Purchase WHERE buyer = “Joe Blow”);

88 88 Correlated Queries SELECT title FROM Movie AS x WHERE year < ANY (SELECT year FROM Movie WHERE title = x.title); Movie (title, year, director, length) Find movies whose title appears more than once. Note (1) scope of variables (2) this can still be expressed as single SFW correlation

89 89 Complex Correlated Query Product ( pname, price, category, maker, year) Find products (and their manufacturers) that are more expensive than all products made by the same manufacturer before 1972 SELECT pname, maker FROM Product AS x WHERE price > ALL (SELECT price FROM Product AS y WHERE x.maker = y.maker AND y.year < 1972); Powerful, but much harder to optimize !

90 90 Exercises: write RA and SQL expressions Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone number, city) Ex #1: Find people who bought telephony products. Ex #2: Find names of people who bought American products Ex #3: Find names of people who bought American products and did not buy French products Ex #4: Find names of people who bought American products and they live in Seattle. Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.


Download ppt "Lecture 3: Relational Algebra and SQL Tuesday, March 25, 2008."

Similar presentations


Ads by Google