Database Systems 236363 Relational Algebra. Query Languages A query – An expression that enables extracting data from a database A query language – A.

Slides:

Advertisements

Similar presentations

D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.

Advertisements

CS4432: Database Systems II Query Operator & Algebraic Expressions 1.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.

INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.

1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.

Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.

By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.

Database Systems Chapter 6 ITM Relational Algebra The basic set of operations for the relational model is the relational algebra. –enable the specification.

FALL 2004CENG 351 File Structures and Data Managemnet1 Relational Algebra.

1 Relational Algebra. 2 Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model supports.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.

Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.

The Relational Database Model. 2 Objectives How relational database model takes a logical view of data Understand how the relational model’s basic components.

By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.

1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.

Rutgers University Relational Algebra 198:541 Rutgers University.

3 1 Chapter 3 The Relational Database Model Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.

Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.

Nov 18, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.

CSCD343- Introduction to databases- A. Vaisman1 Relational Algebra.

Murali Mani Relational Algebra. Murali Mani What is Relational Algebra? Defines operations (data retrieval) for relational model SQL’s DML (Data Manipulation.

Relational Algebra, R. Ramakrishnan and J. Gehrke (with additions by Ch. Eick) 1 Relational Algebra.

Relational Algebra Basic Operations Algebra of Bags.

Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.

1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.

Relational Algebra Example Database Application (COMPANY) Relational Algebra –Unary Relational Operations –Relational Algebra Operations From Set Theory.

Relational Query Languages. Languages of DBMS  Data Definition Language DDL  define the schema and storage stored in a Data Dictionary  Data Manipulation.

CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.

The Relational Database Model

METU Department of Computer Eng Ceng 302 Introduction to DBMS The Relational Algebra by Pinar Senkul resources: mostly froom Elmasri, Navathe and other.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.

Relational Algebra  Souhad M. Daraghma. Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational.

CS 4432query processing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.

1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)

1 Relational Algebra and Calculas Chapter 4, Part A.

1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.

Database Management Systems 1 Raghu Ramakrishnan Relational Algebra Chpt 4 Xin Zhang.

Relational Algebra.

ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.

1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.

From Professor Ullman, Relational Algebra.

Chapter 6 The Relational Algebra Copyright © 2004 Ramez Elmasri and Shamkant Navathe.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.

Database Management Systems 1 Raghu Ramakrishnan Relational Algebra Chpt 4 Xin Zhang.

Relational Algebra Operators

Advanced Relational Algebra & SQL (Part1 )

CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.

Databases : Relational Algebra - Complex Expression 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof.

Database Systems Relational Calculus. Relational Algebra vs. Relational Calculus Relational algebra queries are relatively easy to implement in.

Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.

1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.

1. Chapter 2: The relational Database Modeling Section 2.4: An algebraic Query Language Chapter 5: Algebraic and logical Query Languages Section 5.1:

Database Systems Chapter 6

Chapter (6) The Relational Algebra and Relational Calculus Objectives

Basic Operations Algebra of Bags

Relational Model By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany)

Fundamental of Database Systems

Relational Algebra - Part 1

Relational Algebra Chapter 4, Part A

Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.

Relational Algebra.

Operators Expression Trees Bag Model of Data

The Relational Algebra and Relational Calculus

Instructor: Mohamed Eltabakh

Basic Operations Algebra of Bags

CENG 351 File Structures and Data Managemnet

Relational Algebra Chapter 4 - part I.

Presentation transcript:

Database Systems Relational Algebra

Query Languages A query – An expression that enables extracting data from a database A query language – A language for expressing queries

Relational Algebra Relational algebra is a language that enables expressing queries over relational databases The syntax is similar to algebra, but the expressions operate on tables The result of such an algebraic expression (query) is a table (relation) Unless stated otherwise, we assume that all relations are sets and all algebraic operators operate on sets

Algebraic Operators There are 5 basic operators plus 2 technical ones – Unary operators: Projection, Selection – Binary operators: Cartesian product, Subtraction, Union – Renaming attributes: a technical operation to enable composing operators – Assignment: a technical operation to enable execution in stages Complex operators that can be obtained by composing basic operators: – Intersection, Join (including its variants), Division, etc. For performance reasons, some complex operators might have direct implementations

Projection Projection eliminates some of the attributes from the records of a given relation In terms of tables manipulation, this means removing columns that do not appear in the projection index and then eliminating multiple lines formed due to the column elimination RomanHebNo i א 1 ii ב 2 T= RomanHeb i א ii ב π Heb,Roman T=

Selection Let T=T(A 1,…,A m ) be a table with schema consisting of attributes A 1,…,A m. For an expression θ, σ θ T consists of all records in T satisfying the expression θ θ may consist of the following: – Comparisons (through the operators =,,≥,≤,≠) between an attribute and a constant or between two attributes. When the attribute is a set, it is also possible to use set operators like  etc. – Boolean operators, e.g., (A 1 =“Cat”)  (A 2 ≤8) RomanHebNo i א 1 ii ב 2 iii ג 3 T=T= RomanHebNo i א 1 ii ב 2 σ No≤2 T=

Union, Subtraction and Intersection These operations are applied only to relations of the same schema and are identical to their counterparts from set theory – Union S  T: obtains all records that appear in either S or T – Subtraction S\T: obtains all records that appear in S but not in T – Intersection: S  T: obtains all records that appear in both S and T Intersection is not a basic operator – it can be expressed as S\(S\T) Examples: ColorAnimal BrownHorse WhiteGoat ColorAnimal BlackDog WhiteGoat ColorAnimal BrownHorse WhiteGoat BlackDog ColorAnimal BrownHorse ColorAnimal WhiteGoat T=T= S=S= ST=ST= S\T= ST=ST=

Cartesian Product A Cartesian product yields all combinations of records from the first relation with records from the second relation In terms of tables, we take all concatenations of rows from the first table with rows from the second table Whenever S and T have attributes with the same name, we distinguish between them either by adding the table name as a prefix of the attribute name, e.g., “S.Name” and “T.Name”, or by adding a sequence number, e.g., “Name1” and “Name2” ColorAnimal BlackDog WhiteGoat T=T= Size Big Small S=S= ColorAnimalSize BlackDogBig WhiteGoatBig BlackDogSmall WhiteGoatSmall S x T= What happens if one of the relations is empty?

Renaming This is not an algebraic operation, but rather a technical helper operation that is used to compose complex operations (examples appear shortly) For a given table T=T(A 1,…,A m ) with a schema consisting of attributes A 1,…,A m, the operator ρ A 1 → B 1,…, A m → B m (T) returns an identical table in which the attribute names were changed to B 1,…,B m ColorAnimal BlackDog WhiteGoat T=T= צבעחיה BlackDog WhiteGoat ρ Animal→ חיה,Color → צבע (T)=

Basic Operators The five operators - Projection, Selection, Cartesian Product, Subtraction, and Union – are basic, i.e., – None of these operators can be obtained from the other four – More complex operators (e.g., Join, Division, Intersection) can be obtained by composing some of the basic ones How can we show that an operator is basic? – We need to find a property that is satisfied by this operator and cannot be obtained by any composition of the others

Proving that an Operator is Basic Claim: Projection cannot be expressed by composing Select, Cartesian Product, Subtraction, and Union Proof sketch: Let R be a relation with n attributes A 1,…,A n. Then π A 1 R results in a relation with fewer than n attributes. However, the result of applying any of the operators Select, Cartesian Product, Subtraction, and Union on R (and possibly other relations) yields at least n attributes. – This can be shown by induction on the number of operators in the expression How can this proof be adjusted to show that Cartesian Product is a basic operator?

θ-Join Given two relations S(A 1,…,A n ) and T(B 1,…,B m ) and an expression θ on the attributes A 1,…,A n,B 1,…,B m, denote by S ⋈ θ T the result of the algebraic expression σ θ (SxT) Example BA DC DCBA S=S=T=T= S ⋈ B>C T= An SQL server walks into a bar. He approaches two tables at the far corner asking them: “do you mind if I join you”

Natural Join A very common operation on databases For the relations S(A 1,…,A n,B 1,…,B m ) and T(B 1,…,B m,C 1,…,C k ), denote by S ⋈ T the relation that includes all possible combinations of a record from S with a record from T whose common attributes are the same, in which only a single attribute (column) is kept for each pair of common attributes. More precisely, S ⋈ T=  A 1,...,A n,S.B 1,...,S.B m,C 1,...,C k (S ⋈ (S.B 1 =T.B 1 ) ...  (S.B m =T.B m ) T) ColorAnimal BlackDog WhiteGoat PinkElephant S=S= ShadeColor LightBlue DarkBlue LightPink DarkPink T=T= ShadeColorAnimal LightPinkElephant DarkPinkElephant S⋈T=S⋈T=

Semi-Join For the relations S(A 1,…,A n,B 1,…,B m ) and T(B 1,…,B m,C 1,…,C k ), denote by S ⋉ T the relation that includes all records in S for which there exists a record in T whose common attributes are the same More precisely, S ⋉ T=  A 1,...,A n,B 1,...,B m (S ⋈ T) For performance reasons, this is usually implemented directly ColorAnimal BlackDog WhiteGoat PinkElephant S=S= ShadeColor LightBlue DarkBlue LightPink DarkPink T=T= ColorAnimal PinkElephant S⋉T=S⋉T=

Division For the relations S(A 1,…,A n,B 1,…,B m ) and T(B 1,…,B m ) (i.e., the attributes of T are a subset of the attributes of S), denote by S  T the relation R(A 1,…,A n ) consisting of all records for which there exists a record in S corresponding to all records in T More precisely, S  T is the maximal relation R such that RxT  S I.e., S  T=  A 1,...,A n S\  A 1,...,A n (((  A 1,...,A n S)  T)\S) ColorAnimal BlackDog WhiteDog PinkElephant S=S= Color Black White T=T= Animal Dog ST=ST=

Division - Example We’d like to obtain from table A the supplier number of all suppliers that sell all parts in table B pnosno P1S1 P2S1 P3S1 P4S1 P1S2 P2S2 P2S3 P2S4 P4S4 A= pno P2 sno S1 S2 S3 S4 AB=AB= B=

Division - Example We’d like to obtain from table A the supplier number of all suppliers that sell all parts in table B pnosno P1S1 P2S1 P3S1 P4S1 P1S2 P2S3 P2S4 P4S4 A= pno P2 P4 sno S1 S4 AB=AB= B=

Division - Example We’d like to obtain from table A the supplier number of all suppliers that sell all parts in table B pnosno P1S1 P2S1 P3S1 P4S1 P1S2 P2S3 P2S4 P4S4 A= pno P1 P2 P4 sno S1 AB=AB= B=

Relational Algebra – Summarizing Example Recall the train operation example Station Height S_Name S_Type Line L_Type L_Num Direction Serves Km Train T_Num Days Service T_Category Class Food Gives Arrives Platform D_TimeA_Time

Relational Algebra – Summarizing Example What Tables do we Extract? – What columns should exist for the relationship set “Serves”? – The key S_Name (of the “Station” entity set) – The key’s attributes L_Num and Direction (of “Line”) These triplet would serve as the key for “Serves” In addition, a column for the relation attribute Km – What columns should exist for the relationship set “Arrives”? The key T_Num of the entity set “Train” The key’s attributes for the aggregated relationship set “Serves”, i.e., S_Name, L_num, and Direction The three attributes of the relationship set “Arrives” itself – Platform, A_time, D_time

The Schemas From the previous slide (underlined attribute names represent keys) – Serves(S_Name, L_Num, Direction, Km) – Arrives(T_Num, S_Name, L_Num, Direction, Platform, – D_Time, A_Time) We will represent the multiple value attribute as a separate relation – Station(S_Name, Height) – Station_Type(S_Name, S_Type)

Sample Queries Which stations are served by the line 1-South? – Here, all information is in the table/relation “Serves”  S_Name (  (L_Num=1)  (Direction=“south”) (Serves)) Which lines serve stations below sea level? – Here, we need to join “Serves” and “Station”  L_Num,Direction (  Height<0 (Station ⋈ Serves)) – We can also use semi-join  L_Num,Direction (Serves ⋉  S_Name (  Height<0 (Station)))

Sample Queries What stations are served by more than one line? – Here, we need to examine two lines from “Serves” at a time  S_Name (  (S_Name=S)  ((L_Num  L)  (Direction  D)) (  S_Name → S, L_Num → L, Direction → D, Km → K (Serves)  Serves)) Suppose we do not care about different directions of the same line?  S_Name (  (S_Name=S)  (L_Num  L) (  S_Name → S, L_Num → L, Direction → D, Km → K (Serves)  Serves)) Which stations serve exactly one line (in any direction)?  S_Name (Serves) \  S_Name (  (S_Name=S)  (L_Num  L) (  S_Name → S, L_Num → L, Direction → D, Km → K (Serves)  Serves))

Sample Queries What is the name of the highest station? – It is in fact easier to find all other stations R =  S_Name (  (Height<H) (Station   S_Name → N, Height → H Station)) – Now, we can complete the query  S_Name (Station) \ R

Example of Using Division Which trains (by number) serve all stations?  T_Num,S_Name (Arrives)  S_Name (Station) What if there are stations that do not appear in any line? How can we avoid referring to them?  T_Num,S_Name (Arrives)  S_Name (Serves) What if there are stations that serve a line but no train stops there? How can we ignore them?  T_Num,S_Name (Arrives)  S_Name (Arrives)

More Complex Examples Which lines share “change” stations? – First, we will find all couples for which the station is a “change” station R = (Line ⋈ Serves) ⋉  S_Name (  S_Type =“change” (Station_Type)) – Now, the Cartesian product RxR would include all combinations of couples for which the station is a “change” station. Suppose we distinguish between attributes of each using indices, e.g., S_Name 1,S_Name 2, etc. – We would like to select from the product only couples for which the station is the same S =  (S_Name 1 =S_Name 2 )  (L_Num 1  L_Num 2 ) (R  R) – Finally, we project to obtain only the attributes we are intereted in T =  L_Type 1,L_Num 1,Direction 1,L_Type 2,L_Num 2,Direction 2 (S)

Which Queries Cannot be Expressed in RA? Aggregate functions: – How many lines travel “North”? – What is the average distance between stations in Line 1- South? – How many trains stop in each station on Monday? – These functions operate on an unknown number of parameters (counting, sum, average, …) from the set of values obtained from the relation – There are several extensions to RA that enable such queries as well as in SQL

Inexpressible Queries - Continued Transitive Closure – The following query cannot be expressed in RA: What are all the stations that can be reached from station S in a finite number of train changes? – Notice, for any given constant k, we can express the following query What are all the stations that can be reached from station S in at most k train changes? – The key here is whether k is bounded and given or not These type of queries cannot be expressed in standard ANSI SQL either without the help of the hosting language In contrast, some other query languages supports recursive queries, and in particular this type of queries

Attribute-less Relations In order to facilitate handling yes/no questions, and to simplify some other queries, often RA is extended to allow relations with no attributes (tables with no rows) What can such a relation contain? – It can be empty – It can include a single empty line This represents the value “true” vs. “false” in an empty relation, or “exists” vs. “non-exist” in an empty relation

Generalizing Algebraic Operators Projection – For a relation R, the empty set is considered a subset of the set of its attributes and the corresponding projection is denoted π λ R – The result is the empty relation if R is an empty relation and the single empty record if R is not empty Cartesian Product – When S has no attributes, RxS results in an empty relation if S is empty and in R if S includes the empty record Division – When R and S have the same attributes, R  S includes the empty record if S  R and is empty otherwise

Queries Equivalence Two expressions are equivalent if they return the same result when applied to any possible database content Why is this important? – For performance reasons, we may be able to translate one query into an equivalent one that is easier/faster to evaluate Example: – For the relation R(A,B,C), which of the following equivalent expressions can be evaluated more efficiently?  A (R)  A (  A,B ( R))  A (  A,C ( R))

Example Given the relations R(A,B), S(A,B), we will show that  A R  A S is equivalent to  A (R  S) – We will show that  A R  A S   A (R  S) – As well as that  A R  A S   A (R  S)

Example - Proof (1/2) We show that  A R  A S   A (R  S) For any record t, if t  (  A R   A S) then by the definition of union, either t  A R or t  A S (or both) Hence, either there exists a u  R such that u[A]=t[A] or there exists a u  S such that u[A]=t[A] (or both) In other words, t  A (R  S)

Example - Proof (2/2) We show that  A R  A S   A (R  S) If there exists a t  A (R  S), then by definition there exists a u  R  S such that u[A]=t[A] Hence, either there is a u  R such that u[A]=t[A] or there is a u  S such that u[A]=t[A] (or both) In other words, t  (  A R  A S)

On the Importance of Equivalence A typical SQL query One needs to ensure that the optimization does not alter the semantics SELECT DISTINCT A FROM R,S WHERE R.B=S.B AND C=5; Direct translation  A (  (R.B=S.B)  (C=5) (R  S)) Optimization  A (R ⋉  B (  C=5 S)))

Outerjoin Natural Join suffers from information loss – that is, the result does not include records that do not have a corresponding record in the second table – To that end, the missing attributes are padded with null More precisely, for the relations S(A 1,…,A n,B 1,…,B m ) and T(B 1,…,B m,C 1,…,C k ), denote by S ⋈ outer T the natural join of S and T such that we append null attributes to each record in S that does not have a corresponding record in T and vice versa That is, S ⋈ outer T=(S ⋈ T)  ((S\(S ⋉ T))  C 1,...,C k T null )  (  A 1,...,A n S null  (T\(T ⋉ S))) where T null is a relation with the same schema as T that includes one record with null values and similarly S null is a record with the same schema as S that includes one record with null values

Example ColorAnimal BlackDog WhiteGoat PinkElephant S= ShadeColor LightBlue DarkBlue LightPink DarkPink T= ShadeColorAnimal LightPinkElephant DarkPinkElephant NullBlackDog NullWhiteGoat LightBlueNull DarkBlueNull S ⋈ outer T= What happens when one of the relations is empty? There are also left-outer join in which only the records of the left relation are padded and right-outer join in which only the records of the right relation are padded

Null Values Once we allow null values, the definition of the RA operators need to be generalized to encompass them This definition is complicated and outside the scope of this course

Bag/Multi-set Semantics Until now, we assumed that relations are sets In practice, and in particular in SQL, we may wish to allow relations that include multiple entries of the same record To that end, the operators should be generalized accordingly

RA Operators in Bag Semantics Projection – we eliminate attributes without eliminating duplicates Selection – we return the corresponding records without eliminating duplicates Cartesian product RxS – if a record u appears n times in R and a record v appears m times in S, then their concatenation appears nm times in RxS Union S  T – if a record u appears n times in R and m times in S, it will appear n+m times in S  T Subtraction S\T – if a record u appears n times in R and m times in S, it will appear max{0,n-m} times in S\T Intersection S  T – if a record u appears n times in R and m times in S, it will appear min{n,m} times in S  T Operator δ – explicit duplicate elimination Unless otherwise stated, we assume a set semantics in this course