Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Database Design: DBS CB, 2 nd Edition Relational Algebra: Basic Operations & Algebra of Bags Ch. 5.

Similar presentations


Presentation on theme: "1 Database Design: DBS CB, 2 nd Edition Relational Algebra: Basic Operations & Algebra of Bags Ch. 5."— Presentation transcript:

1 1 Database Design: DBS CB, 2 nd Edition Relational Algebra: Basic Operations & Algebra of Bags Ch. 5

2 2 What is an “Algebra” Mathematical system consisting of:  Operands --- variables or values from which new values can be constructed  Operators --- symbols denoting procedures that construct new values from given values

3 3 What is Relational Algebra? An algebra whose operands are relations or variables that represent relations Operators are designed to do the most common things that we need to do with relations in a database  The result is an algebra that can be used as a query language for relations

4 4 Relational Operations on Bags vs on Sets Bag is a relation with relaxed conditions – allows duplicates while sets would not Some relational operations are much more efficient on Bags rather than on sets:  Projection on relation as set: we need to compare each projected tuple with all other projected tuples to eliminate duplicates Set is a special case of a Bag --- will cover Bags in details later in this session

5 5 Core Relational Algebra Union, intersection, and difference  Usual set operations, but both operands must have the same relation schema Selection: picking certain rows Projection: picking certain columns Products and joins: compositions of relations Renaming of relations and attributes

6 6 Union, Intersection, and Difference Assume R and S are Bags of the same schema (attributes). Assume tuple t appears n times in R and m times in S:  Bag Union (R U S), tuple t appears (n+m) times. Set Union, tuple t appears once  Bag intersection (R ∩ S), tuple t appears min(n, m) times. Set intersection, tuple t appears 0 or 1 times  Bag difference (R – S), tuple t appears max(0, n-m) times. Set difference, tuple t appears 0 or 1 times

7 7 Relational Algebra: Basic Operations On Sets

8 8 Selection on Sets R1 := σ C (R2)  C is a condition (as in “if” statements) that refers to attributes of R2 (i.e., col1 > 10)  R1 is all those tuples of R2 that satisfy C

9 9 Example: Selection Relation Sells: barbeer price Joe’s Bud 2.50 Joe’s Miller 2.75 Sue’sBud 2.50 Sue’sMiller 3.00 JoeMenu := σ bar=“Joe’s” (Sells): barbeerprice Joe’sBud2.50 Joe’sMiller2.75

10 10 Projection on Sets R1 := π L (R2)  L is a list of attributes from the schema of R2  R1 is constructed by looking at each tuple of R2, extracting the attributes on list L, in the order specified, and creating from those components a tuple for R1  Eliminate duplicate tuples, if any

11 11 Example: Projection Relation Sells: barbeer price Joe’sBud 2.50 Joe’sMiller 2.75 Sue’sBud 2.50 Sue’sMiller 3.00 Prices := π beer,price (Sells): beer price Bud 2.50 Miller 2.75 Miller 3.00

12 12 Extended Projection Using the same π L operator, we allow the list L to contain arbitrary expressions involving attributes: 1. Arithmetic on attributes, e.g., A+B  C 2. Duplicate occurrences of the same attribute

13 13 Example: Extended Projection R = ( A B ) 1 2 3 4 π A+B  C,A,A (R) = C A1 A2 3 1 1 7 33

14 14 Product (Cartesian Product) on Sets R3 := R1 Χ R2  Pair each tuple t 1 of R1 with each tuple t 2 of R2  Concatenation t 1 t 2 is a tuple of R3  Schema of R3 is the attributes of R1 and then R2, in order  If R1 has n tuples, R2 has m tuples, then R3 will have nxm tuples  But beware to access attribute A of the same name in R1 and R2: use R1.A and R2.A

15 15 Example: R3 := R1 Χ R2 R1( A, B ) 1 2 3 4 R2( B, C ) 5 6 7 8 9 10 R3( A,R1.B,R2.B,C ) 1 2 56 1 2 78 1 2 9 10 3 4 56 3 4 78 3 4 9 10

16 16 Join (Logical Join) Inner Join:  Cross Join: cartesian product  Equi-Join: cross join with equality predicates only  Natural Join: cross join with union of the attributes of the two relations  Theta Join: like natural join but we apply a boolean-valued condition Outer Join:  Left Outer Join (left join): for every tuple on left relation, join with every tuple on the right relation and if none matches the condition return a tuple with left side and NULLs for the right side relation  Right Outer Join (right join): opposite of the left join  Full Outer Join (full join): union of left join and right join Self Join: joining table to itself

17 17 Natural Join A useful join variant (natural join) connects two relations by:  Equating attributes of the same name, and  Projecting out one copy of each pair of equated attributes Denoted R3 := R1 ⋈ R2

18 18 Example: Natural Join Sells( bar,beer,price ) Bars( bar, addr ) Joe’sBud2.50Joe’sMaple St. Joe’sMiller2.75Sue’sRiver Rd. Sue’sBud2.50 Sue’sCoors3.00 BarInfo := Sells ⋈ Bars Note: Bars.bar has become Sells.bar to make the natural join “work” BarInfo(bar, beer, price, addr ) Joe’sBud 2.50 Maple St. Joe’sMiller 2.75 Maple St. Sue’sBud 2.50 River Rd. Sue’sCoors 3.00 River Rd.

19 19 Theta-Join R3 := R1 ⋈ C R2  Take the product R1 Χ R2  Then apply σ C to the result As for σ, C can be any boolean-valued condition Historic versions of this operator allowed only A  B, where  is =, <, etc.; hence the name “theta- join”

20 20 Example: Theta Join - 1 Sells( bar, beer,price ) Bars( name, addr ) Joe’sBud 2.50 Joe’s Maple St. Joe’sMiller 2.75 Sue’s River Rd. Sue’sBud 2.50 Sue’sCoors 3.00 BarInfo := Sells ⋈ Sells.bar = Bars.name Bars Note: Sells.bar has become Bars.name to make the theta join “work” BarInfo( bar, beer, price, name, addr ) Joe’sBud2.50 Joe’s Maple St. Joe’sMiller2.75 Joe’s Maple St. Sue’sBud2.50 Sue’s River Rd. Sue’sCoors3.00 Sue’s River Rd.

21 21 Example: Theta Join - 2 Sells( bar, beer,price ) Bars( name, addr ) Joe’sBud 2.50 Joe’s Maple St. Joe’sMiller 2.75 Sue’s River Rd. Sue’sBud 2.50 Sue’sCoors 3.00 BarInfo := Sells ⋈ Sells.bar = Bars.name AND Sells.price < 2.75 Bars Note: Sells.bar has become Bars.name to make the theta join “work” BarInfo( bar, beer, price, name, addr ) Joe’sBud2.50 Joe’s Maple St. Sue’sBud2.50 Sue’s River Rd.

22 22 Renaming The ρ operator gives a new schema to a relation R1 := ρ R1(A1,…,An) (R2) makes R1 be a relation with attributes A1,…,An and the same tuples as R2 Simplified notation: R1(A1,…,An) := R2 Or even simpler notation: R1 := ρ R1 (R2)

23 23 Example: Renaming Bars( name, addr ) Joe’s Maple St. Sue’s River Rd. R( bar, addr ) Joe’s Maple St. Sue’s River Rd. R(bar, addr) := Bars

24 24 Aggregation Operators These are operators that apply to both Sets and Bags of numbers or strings Examples include:  SUM: sum of a column with numeric values  AVG: average of a column with numeric values  MIN and MAX of a column with numeric values  COUNT: number of values returned from a column

25 25 Grouping Operator -  L (R) Often we do not want to aggregate on the entire column values, but rather we want to aggregate on a subset (group) of the values of one or more columns  L (R) is a grouping operator that returns group of tuples having one value to the attributes in List L. Assume relation Orders (OrderId, OrderDate, OrderPrice, Customer) Assume you want to group by customer and return total orders and date of first order by every Customer  Customer, SUM(OrderPrice), MIN(OrderDate) (Orders) S elect Customer, SUM(OrderPrice), MIN(OrderDate) F rom Orders G ROUP BY Customer;

26 26 Sort Operator -  L (R) Often we want to sort the tuples of a relation (R) or sort the returned result relation (S) by one or more of the relevant relation’s attributes  L (R) where R is a relation and L is a list of some of R’s attributes, is a relation R but with tuples sorted in order indicated by L Assume relation Orders (OrderId, OrderDate, OrderPrice, Customer) Assume you want to group by customer and return total orders and date of first order by every Customer  OrderDate (Orders) S elect * F rom Orders O RDER BY OrderDate ASC|DESC;

27 27 Building Complex Expressions Combine operators with parentheses and precedence rules Three notations, just as in arithmetic: 1. Sequences of assignment statements 2. Expressions with several operators 3. Expression trees

28 28 Sequences of Assignments Create temporary relation names Renaming can be implied by giving relations a list of attributes Example: R3 := R1 ⋈ C R2 can be written: R4 := R1 Χ R2 R3 := σ C (R4)

29 29 Expressions in a Single Assignment Example: the theta-join R3 := R1 ⋈ C R2 can be written: R3 := σ C (R1 Χ R2) Precedence of relational operators: 1. [ σ, π, ρ ] (highest) 2. [ Χ, ⋈ ] 3. ∩ 4. [ ∪, — ]

30 30 Expression Trees Leaves are operands --- either variables standing for relations or particular, constant relations Interior nodes are operators, applied to their child or children (subexpressions). You might want to use parentheses to indicate grouping of operands

31 31 Example: Expression Tree for a Query Using the relations Bars(name, addr) and Sells(bar, beer, price), find the names of all the bars that are either on Maple St. or sell Bud for less than $3

32 32 As an Expression Tree: BarsSells σ addr = “Maple St.” σ price<3 AND beer=“Bud” π name ρ R(name) π bar ∪

33 33 Example: Self-Join Using Sells(bar, beer, price), find the bars that sell two different beers at the same price Strategy: by renaming, define a copy of Sells, called S(bar, beer1, price). The natural join of Sells and S consists of quadruples (bar, beer, beer1, price) such that the bar sells both beers at this price

34 34 The Expression Tree Sells ρ S(bar, beer1, price) ⋈ π bar σ beer != beer1

35 35 Schemas for Results Union, intersection, and difference: the schemas of the two operands must be the same, so use that schema for the result Selection: schema of the result is the same as the schema of the operand Projection: list of attributes tells us the schema

36 36 Schemas for Results --- (2) Product: schema is the attributes of both relations  Use R.A, etc., to distinguish two attributes named A Natural join: union of the attributes of the two relations Theta-join: same as product with applying a condition Renaming: the operator tells the schema

37 37 Relational Algebra: Basic Operations On Bags

38 38 Relational Algebra on Bags A bag (or multiset ) is like a set, but an element may appear more than once Example: {1,2,1,3} is a bag Example: {1,2,3} is also a bag that happens to be a set

39 39 Why Bags? SQL, the most important query language for relational databases, is actually a bag language Some operations, like projection, are more efficient on bags than sets

40 40 Operations on Bags Selection applies to each tuple, so its effect on bags is like its effect on sets Projection also applies to each tuple, but as a bag operator, we do not eliminate duplicates Products and joins are done on each pair of tuples, so duplicates in bags have no effect on how we operate

41 41 Example: Bag Selection R(A, B ) 1 2 5 6 1 2 σ A+B < 5 (R) =A B 1 2

42 42 Example: Bag Projection R( A, B ) 1 2 5 6 1 2 π A (R) =A 1 5 1

43 43 Example: Bag Product R( A, B ) S( B, C ) 1 2 3 4 5 6 7 8 1 2 R Χ S = AR.BS.B C 123 4 127 8 563 4 567 8 123 4 127 8

44 44 Example: Bag Theta-Join R( A, B ) S( B, C ) 1 2 3 4 5 6 7 8 1 2 R ⋈ R.B<S.B S = AR.B S.B C 123 4 127 8 567 8 123 4 127 8

45 45 Bag Union An element appears in the union of two bags is the sum of the number of times it appears in each bag Example: {1,2,1} ∪ {1,1,2,3,1} = {1,1,1,1,1,2,2,3}

46 46 Bag Intersection An element appears in the intersection of two bags is the minimum of the number of times it appears in either Example: {1,2,1,1} ∩ {1,2,1,3} = {1,1,2}

47 47 Bag Difference An element appears in the difference A – B of bags as many times as it appears in A, minus the number of times it appears in B  But never less than 0 times Example: {1,2,1,1} – {1,2,3} = {1,1}

48 48 Beware: Bag Laws != Set Laws Some, but not all algebraic laws that hold for sets also hold for bags Example: the commutative law for union (R ∪ S = S ∪ R ) does hold for bags  Since addition is commutative, adding the number of times x appears in R and S doesn’t depend on the order of R and S

49 49 Example: A Law That Fails Set union is idempotent, meaning that S ∪ S = S However, for bags, if x appears n times in S, then it appears 2n times in S ∪ S Thus S ∪ S != S in general  e.g., {1} ∪ {1} = {1,1} != {1}

50 50 Datalog: Logic Instead of Algebra

51 51 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic (Datalog) that consists of if-then rules. Datalog is inherently a logic of sets Datalog queries are more powerful than relational algebra; several rules can express recursions that are not expressable in algebra Relations are represented in Datalog as predicates; predicate (R) is followed by its arguments is called an atom Predicate returns a boolean value

52 52 Datalog: Logic instead of Algebra Rule: head  body  Head: relational atom   : read “if”  Body: one or more atoms called subgoals which may be relational or arithmetic Example: LongMovie(t,y)  Movies(t,y,l,s,p) AND l ≥ 100 It says: LongMovie(t,y) is true whenever we can find tuple in Movies with: first 2 components as (t,y) and 3 rd component as l that is at least 100, and any values in components 4 and 5 Equivalent to assignment statement in relational algebra: LongMovie := ∏ title,year (σ length≥100 (Movies))

53 53 Datalog: Logic instead of Algebra Extensional and Intentional Predicates:  Extensional Predicates (EDB) are predicates whose relations are stored in a database  Intentional Predicates (IDB) are predicates whose relations are computed by applying one or more Datalog rules As long as there is no negated relational subgoals, evaluating rules when relations are sets apply for bags as well Relational Algebra and Datalog: assume R(A,B,C), and S(A,B,C):  Boolean: Union: R υ S is equivalent to these 2 rules:  U(A,B,C)  R(A,B,C)  U(A,B,C)  S(A,B,C)

54 54 Datalog: Logic instead of Algebra Intersection: R ∩ S is equivalent to the following rule:  I(A,B,C)  R(A,B,C) AND S(A,B,C) Set Difference: R - S is equivalent to the following rule:  D(A,B,C)  R(A,B,C) AND NOT S(A,B,C)  Projection P(A,B)  R(A,B,C)  Selection S(A,B,C)  R(A,B,C) AND C ≥ 100  Product P(A,B,C,D,E.F)  R(A,B,C) AND S(D,E,F)  Joins J(A,B,C,D)  R(A,B) AND (S(B,C,D)

55 55 Datalog: Logic instead of Algebra Simulating Multiple Operations with Datalog Example: Algebraic Expression  ∏ Title,year (σ length ≥ 100 (Movies) ∩ σ StudioName=‘Fox’ (Movies))  Translates into this set of rules: W(t,y,l,g,s,p)  Movies(t,y,l,g,s,p) AND l ≥ 100 X(t,y,l,g,s,p)  Movies(t,y,l,g,s,p) AND s = ‘Fox’ Y(t,y,l,g,s,p)  W(t,y,l,g,s,p) AND X(t,y,l,g,s,p) Answer(t,y)  Y(t,y,l,g,s,p)

56 56 END


Download ppt "1 Database Design: DBS CB, 2 nd Edition Relational Algebra: Basic Operations & Algebra of Bags Ch. 5."

Similar presentations


Ads by Google