Download presentation
Presentation is loading. Please wait.
1
COP5725 Advanced Database Systems
DB Fundamentals
2
What are Database Management Systems
DBMS is a system for providing EFFICIENT, CONVENIENT, and SAFE MULTI-USER storage of and access to MASSIVE amounts of PERSISTENT data
3
Example: Banking System
Data Information on accounts, customers, balances, current interest rates, transaction histories, etc. MASSIVE Many gigabytes at a minimum for big banks, more if keep history of all transactions, even more if keep images of checks -> Far too big to fit in main memory PERSISTENT Data outlives programs that operate on it
4
Example: Banking System
SAFE: from system failures from malicious users CONVENIENT: simple commands to debit account, get balance, write statement, transfer funds, etc. also unpredicted queries should be easy EFFICIENT: don't search all files in order to get balance of one account, get all accounts with low balances, get large transactions, etc. massive data! -> DBMS's carefully tuned for performance
5
Multi-user Access Many people/programs accessing same database, or even same data, simultaneously -> Need careful controls ATM1: withdraw $100 from account #007 get balance from database; if balance >= 100 then balance := balance - 100; dispense cash; put new balance into database; ATM2: withdraw $50 from account #007 if balance >= 50 then balance := balance - 50; Initial balance = 120. Final balance = ??
6
Why File Systems Won’t Work
Storing data: file system is limited size limit by disk or address space when system crashes we may lose data Password/file-based authorization insufficient Query/update: need to write a new C++/Java program for every new query need to worry about performance Concurrency: limited protection need to worry about interfering with other users need to offer different views to different users (e.g. registrar, students, professors) Schema change: entails changing file formats need to rewrite virtually all applications That’s why the notion of DBMS was motivated!
7
DBMS Architecture User/Web Forms/Applications/DBA Main Memory query
transaction DDL commands Query Parser Transaction Manager DDL Processor Query Rewriter Concurrency Control Logging & Recovery Query Optimizer Query Executor Records Indexes Lock Tables Buffer: data, indexes, log, etc Buffer Manager Main Memory Storage Manager Storage data, metadata, indexes, log, etc CS411
8
Data Structuring: Model, Schema, Data
Data model conceptual structuring of data stored in database ex: data is set of records, each with student-ID, name, address, courses, photo ex: data is graph where nodes represent cities, edges represent airline routes Schema versus data schema: describes how data is to be structured, defined at set-up time, rarely changes (also called "metadata") data: actual "instance" of database, changes rapidly vs. types and variables in programming languages
9
Schema vs. Data Schema: name, name of each field, the type of each field Students (Sid:string, Name:string, Age: integer, GPA: real) A template for describing a student Data: an example instance of the relation Sid Name Age GPA 0001 Alex 19 3.55 0002 Bob 22 3.10 0003 Chris 20 3.80 0004 David 3.95 0005 Eugene 21 3.30
10
Data Structuring: Model, Schema, Data
Data definition language (DDL) commands for setting up schema of database Data Manipulation Language (DML) Commands to manipulate data in database: RETRIEVE, INSERT, DELETE, MODIFY Also called "query language"
11
People DBMS user: queries/modifies data DBMS application designer
set up schema, loads data, … DBMS administrator user management, performance tuning, … DBMS implementer: builds systems
12
Key Steps in Building DB Applications
Step 0: pick an application domain Step 1: conceptual design Discuss with your team mate what to model in the application domain Need a modeling language to express what you want ER model is the most popular such language Output: an ER diagram of the application domain Step 2: pick a type of DBMS’s Relational DBMS is most popular and is our focus
13
Key Steps in Building DB Applications
Step 3: translate ER design to a relational schema Use a set of rules to translate from ER to relational schema Use a set of schema refinement rules to transform the above relational schema into a good relational schema 1NF, 2NF, 3NF, BCNF, 4NF,…… At this point You have a good relational schema on paper
14
Key Steps in Building DB Applications
Step 4: Implement your relational DBMS using a "database programming language" called SQL SELECT-FROM-WHERE-GROUPBY-HAVING Step 5: Ordinary users cannot interact with the database directly and the database also cannot do everything you want, hence write your application program in C++, Java, PHP, etc. to handle the interaction and take care of things that the database cannot do
15
Constraints Constraint: an assertion about the database that must be true at all times Part of the database schema Very important in database design Finding constraints is part of the modeling process Keys: social security number uniquely identifies a person Single-value constraints: a person can have only one father Referential integrity constraints: if you work for a company, it must exist in the database Domain constraints: peoples’ ages are between 0 and 150 General constraints: all others (at most 30 students enroll in a class)
16
More about Keys Every entity must have a key
why? A key can consist of more than one attribute There can be more than one key for an entity set Among all candidate keys, one key will be designated as primary key
17
ER Model vs. Relational Model
Both are used to model data ER model has many concepts Entities, relationships, attributes, etc. Well-suited for capturing the app. requirements Not well-suited for computer implementation Relational model Has just a single concept: relation (table) World is represented with a collection of tables Well-suited for efficient manipulations on computers
18
Name of Table (Relation) Column (Field, Attribute)
Relation: An Example Name of Table (Relation) Column (Field, Attribute) Products Name Price Category Manufacturer Gizmo 19.99 Gadgets Gizmo works Power gizmo 29.99 Single touch 149.99 Photography Canon Multi touch 203.99 househould Hitachi Row (Record, Tuple) Domain (Atomic type)
19
Relations Schema vs. instance = columns vs. rows Schema of a relation
Relation name Attribute names Attribute types (domains) Schema of a database A set of relation schemas Questions When do you determine a schema (instance)? How often do you change your mind?
20
Relations The database maintains a current database state
Updates to the data happen very frequently add a tuple delete a tuple modify an attribute in a tuple Updates to the schema are relatively rare, and rather painful. Why?
21
Defining a Database Schema
A database schema comprises declarations for the relations (“tables”) of the database Simplest form of creation is: CREATE TABLE <name> ( <list of elements> ); And you may remove a relation from the database schema by: DROP TABLE <name>;
22
Elements of Table Declarations
The principal element is a pair consisting of an attribute and a type The most common types are: INT or INTEGER (synonyms) REAL or FLOAT (synonyms) CHAR(n ) = fixed-length string of n characters VARCHAR(n ) = variable-length string of up to n characters
23
Example: Create Table CREATE TABLE Sells ( bar CHAR(20), beer VARCHAR(20), price REAL );
24
Declaring Keys An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE Each says the attribute(s) so declared functionally determines all the attributes of the relation schema Single attribute keys CREATE TABLE Beers ( name CHAR(20) UNIQUE, manf CHAR(20) );
25
Multi-attribute Keys CREATE TABLE Sells ( bar CHAR(20), beer VARCHAR(20), price REAL, PRIMARY KEY (bar, beer) );
26
Foreign Keys A Foreign Key is a field whose values are keys in another relation Must correspond to primary key of the second relation Like a `logical pointer’ Enrolled Students CREATE TABLE Enrolled ( sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students, FOREIGN KEY (cid) REFERENCES Courses )
27
Relational Algebra Querying the database: specify what we want from our database Find all the people who earn more than $1,000,000 and pay taxes in Tallahassee Could write in C++/Java, but a bad idea Instead use high-level query languages: Theoretical: Relational Algebra, Datalog Practical: SQL Relational algebra: a basic set of operations on relations that provide the basic principles
28
What is an “Algebra”? Mathematical system consisting of: Examples
Operands --- variables or values from which new values can be constructed Operators --- symbols denoting procedures that construct new values from given values Examples Arithmetic algebra, linear algebra, Boolean algebra …… What are operands? What are operators?
29
What is Relational Algebra?
An algebra Whose operands are relations or variables that represent relations Whose operators are designed to do common things that we need to do with relations in a database relations as input, new relation as output Can be used as a query language for relations
30
Relational Operators at a Glance
Five basic RA operations: Basic Set Operations union, difference (no intersection, no complement) Selection: s Projection: p Cartesian Product: X When our relations have attribute names: Renaming: r Derived operations: Intersection, complement Joins (natural join, equi-join, theta join, semi-join)
31
Set Operations Union: all tuples in R1 or R2, denoted as R1 U R2
R1, R2 must have the same schema R1 U R2 has the same schema as R1, R2 Example: Active-Employees U Retired-Employees Difference: all tuples in R1 and not in R2, denoted as R1 – R2 R1 - R2 has the same schema as R1, R2 Example All-Employees - Retired-Employees
32
Selection Returns all tuples which satisfy a condition, denoted as sc(R) c is a condition: =, <, >, AND, OR, NOT Output schema: same as input schema Find all employees with salary more than $40,000: sSalary > (Employee) SSN Name Dept-ID Salary Alex 1 30K Bob 32K Chris 2 45K SSN Name Dept-ID Salary Chris 2 45K
33
Projection Unary operation: returns certain columns, denoted as P A1,…,An (R) Eliminates duplicate tuples ! Input schema R(B1, …, Bm) Condition: {A1, …, An} {B1, …, Bm} Output schema S(A1, …, An) Example: project social-security number and names: P SSN, Name (Employee) SSN Name Dept-ID Salary Alex 1 30K Bob 32K Chris 2 45K SSN Name Alex Bob Chris
34
Selection vs. Projection
Think of relation as a table How are they similar? How are they different? Why do you need both?
35
Cartesian Product Each tuple in R1 with each tuple in R2, denoted as R1 x R2 Input schemas R1(A1,…,An), R2(B1,…,Bm) Output schema is S(A1, …, An, B1, …, Bm) Very rare in practice; but joins are very common Example: Employee x Dependent
36
Example SSN Name 111060000 Alex 754320032 Brandy Employee-SSN
Dependent SSN Name Alex Brandy Employee-SSN Dependent-Name Chris David Employee x Dependent SSN Name Employee-SSN Dependent-Name Alex Chris David Brandy
37
Renaming Soc-sec-num, firstname(Employee)
Does not change the relational instance, denoted as Notation: r S(B1,…,Bn) (R) Changes the relational schema only Input schema: R(A1, …, An) Output schema: S(B1, …, Bn) Example: Soc-sec-num, firstname(Employee) SSN Name Alex Bob Chris Soc-sec-num firstname Alex Bob Chris
38
Set Operations: Intersection
Intersection: all tuples both in R1 and in R2, denoted as R1 R2 R1, R2 must have the same schema R1 R2 has the same schema as R1, R2 Example UnionizedEmployees RetiredEmployees Intersection is derived: R R2 = R1 – (R1 – R2) why ?
39
Theta Join A join that involves a predicate q, denoted as R1 q R2
Input schemas: R1(A1,…,An), R2(B1,…,Bm) Output schema: S(A1,…,An,B1,…,Bm) Derived operator: R1 q R2 = s q (R1 x R2) Take the product R1 x R2 Then apply SELECTC to the result As for SELECT, C can be any Boolean-valued condition
40
Theta Join: Example Name Address Bar Beer Price AJ’s Bud 2.5 Miller
Sells Name Address AJ's 1800 Tennessee Michael's Pub 513 Gaines Bar Beer Price AJ’s Bud 2.5 Miller 2.75 Michael’s Pub Corona 3.0 BarInfo := Sells Sells.Bar=Bar.Name Bar Bar Beer Price Name Address AJ’s Bud 2.5 AJ's 1800 Tennessee Miller 2.75 Michael’s Pub Michael's Pub 513 Gaines Corona 3.0
41
Natural Join Notation: R1 R2
Input Schema: R1(A1, …, An), R2(B1, …, Bm) Output Schema: S(C1,…,Cp) Where {C1, …, Cp} = {A1, …, An} U{B1, …, Bm} Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes: {A1,…,An} {B1,…, Bm} (called the join attributes)
42
Natural Join: Examples
Employee Dependent SSN Name Alex Brandy SSN Dependent-Name Chris David Employee Dependent = P SSN, Name, Dependent-Name(sEmployee.SSN=Dependent.SSN(Employee x Dependent) SSN Name Dependent-Name Alex Chris Brandy David
43
Natural Join: Examples
B X Y Z V B C Z U V W R S A B C X Z U V Y W
44
Natural Join Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? Given R(A, B, C), S(D, E), what is R S? Given R(A, B), S(A, B), what is R S?
45
Equi-join Special case of theta join: condition c contains only conjunction of equalities Result schema is the same as that of Cartesian product May have fewer tuples than Cartesian product Most frequently used in practice: R1 A=B R2 Natural join is a particular case of equi-join A lot of research on how to do it efficiently
46
Building Complex Expressions
Algebras allow us to express sequences of operations in a natural way Example In arithmetic algebra: (x + 4)*(y - 3) Relational algebra allows the same Three notations, just as in arithmetic: Sequences of assignment statements Expressions with several operators Expression trees
47
Sequences of Assignments
Create temporary relation names Renaming can be implied by giving relations a list of attributes Example: R3 := R1 JOINC R2 can be written: R4 := R1 x R2 R3 := SELECTC (R4)
48
Expressions with Several Operators
Example: the theta-join R3 := R1 JOINC R2 can be written: R3 := SELECTC (R1 x R2) Precedence of relational operators: Unary operators --- select, project, rename --- have highest precedence, bind first Then come products and joins Then intersection Finally, union and set difference bind last But you can always insert parentheses to force the order you desire
49
Expression Trees Leaves are operands
either variables standing for relations or particular constant relations Interior nodes are operators, applied to their child or children
50
Expression Tree: Examples
Given Bars(name, addr), Sells(bar, beer, price), find the names of all the bars that are either on Tennessee St. or sell Bud for less than $3 UNION RENAMER(name) PROJECTname PROJECTbar SELECTaddr = “Tennessee St.” SELECT price<3 AND beer=“Bud” Bars Sells
51
Question: How to do this?
Using Sells(bar, beer, price), find the bars that sell two different beers at the same price
52
Glimpse Ahead: Efficient Implementations of Operators
s(age >= 30 AND age <= 35)(Employees) Method 1: scan the file, test each employee Method 2: use an index on age Which one is better ? Depends a lot… Employees Relatives Iterate over Employees, then over Relatives Iterate over Relatives, then over Employees Sort Employees, Relatives, do “merge-join” “hash-join” Etc.
53
Glimpse Ahead: Optimizations
Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Person(ssn, name, phone number, city) Which is better: sprice>100(Product) (Purchase scity=seaPerson) (sprice>100(Product) Purchase) scity=seaPerson Depends ! This is the optimizer’s job…
54
SQL Standard language for querying and manipulating data Why SQL?
SQL stands for Structured Query Language Initially developed at IBM by Donald Chamberlin and Raymond Boyce in the early 1970s, and called SEQUEL (Structured English Query Language) Many standards out there: SQL92, SQL2, SQL3, SQL99 Vendors support various subsets of these standards Why SQL? A very-high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation details that would be necessary in languages like C++ Its queries are “optimized” quite well, yielding efficient query executions
55
Introduction Two sublanguages DDL – Data Definition Language
define and modify schema CREATE TABLE table_name ( { column_name data_type [ DEFAULT default_expr ] [ column_constraint [, ... ] ] | table_constraint } [, ... ] ) DML – Data Manipulation Language Queries can be written intuitively Select-From-Where
56
Select-From-Where Statements
The principal form of a SQL query is: SELECT desired attributes FROM one or more tables WHERE condition about tuples of the tables
57
Our Running Example Most of our SQL queries will be based on the following database schema Underline indicates key attributes Beers(name, manf) Bars(name, addr, license) Drinkers(name, addr, phone) Likes(drinker, beer) Sells(bar, beer, price) Frequents(drinker, bar)
58
Select-From-Where Example
Using Beers(name, manf), what beers are made by Busch? SELECT name FROM Beers WHERE manf = ‘Busch’; The answer is a relation with a single attribute name, and tuples with the name of each beer by Busch, such as Bud Name ‘Bud’ ‘Bud Lite’ ‘Michelob’
59
Single-Relation Query
Operation Begin with the relation in the FROM clause Apply the selection indicated by the WHERE clause Apply the extended projection indicated by the SELECT clause Semantics To implement this algorithm think of a tuple variable ranging over each tuple of the relation mentioned in FROM Check if the “current” tuple satisfies the WHERE clause If so, compute the attributes or expressions of the SELECT clause using the components of this tuple
60
* In SELECT clauses When there is one relation in the FROM clause, * in the SELECT clause stands for “all attributes of this relation.” Example using Beers(name, manf): SELECT * FROM Beers WHERE manf = ‘Busch’; Name manf ‘Bud’ ‘Busch’ ‘Bud Lite’ ‘Michelob’ Now, the result has each of the attributes of Beers
61
Renaming Attributes If you want the result to have different attribute names, use “AS <new name>” to rename an attribute Example based on Beers(name, manf): SELECT name AS beer, manf FROM Beers WHERE manf = ‘Busch’ beer manf ‘Bud’ ‘Busch’ ‘Bud Lite’ ‘Michelob’
62
Expressions in SELECT Clauses
Any expression that makes sense can appear as an element of a SELECT clause Example: from Sells(bar, beer, price): SELECT bar, beer, price * 120 AS priceInYen FROM Sells; bar beer priceInYen Joe’s Bud 300 Sue’s Miller 360 …
63
Complex Conditions in WHERE Clause
From Sells(bar, beer, price), find the price Joe’s Bar charges for “cheap” beers: SELECT price FROM Sells WHERE bar = ‘joe bar’ AND price < 5.0;
64
Selections What you can use in WHERE:
attribute names of the relation(s) used in the FROM comparison operators: =, <>, <, >, <=, >= apply arithmetic operations: stockprice*2 operations on strings (e.g., “||” for concatenation) Lexicographic order on strings Pattern matching: s LIKE p Special stuff for comparing dates and times.
65
NULL Values Tuples in SQL relations can have NULL as a value for one or more components Meaning depends on context. Two common cases: Missing value : e.g., we know Joe’s Bar has some address, but we don’t know what it is Inapplicable : e.g., the value of attribute spouse for an unmarried person The logic of conditions in SQL is really 3-valued logic: TRUE, FALSE, UNKNOWN When any value is compared with NULL, the truth value is UNKNOWN A query only produces a tuple in the answer if its value for the WHERE clause is TRUE (not FALSE or UNKNOWN)
66
Three-Valued Logic To understand how AND, OR, and NOT work in 3-valued logic, think of TRUE = 1, FALSE = 0, and UNKNOWN = ½, AND = MIN; OR = MAX, NOT(x) = 1-x. Example: TRUE AND (FALSE OR NOT(UNKNOWN)) = MIN(1, MAX(0, (1 - ½ ))) = MIN(1, MAX(0, ½ ) = MIN(1, ½ ) = ½
67
Surprising Example From the following Sells relation: SELECT bar
FROM Sells WHERE price < 2.00 OR price >= 2.00; bar beer Price Joe’s Bud NULL UNKNOWN UNKNOWN UNKNOWN
68
Multi-relation Queries
Interesting queries often combine data from more than one relation, we can address several relations in one query by listing them all in the FROM clause. Distinguish attributes of the same name by “<relation>.<attribute>” Example: Using relations Likes(drinker, beer) and Frequents(drinker, bar), find the beers liked by at least one person who frequents Joe’s Bar. SELECT Likes.beer FROM Likes, Frequents WHERE Frequents.bar = ‘Joe Bar’ AND Frequents.drinker = Likes.drinker;
69
Semantics Almost the same as for single-relation queries:
Start with the (Cartesian) product of all the relations in the FROM clause Apply the selection condition from the WHERE clause Project onto the list of attributes and expressions in the SELECT clause SELECT a1, a2, …, ak FROM R1 AS x1, R2 AS x2, …, Rn AS xn WHERE Conditions Translation to Relational algebra: Πa1,…,ak (s Conditions (R1 x R2 x … x Rn)) Select-From-Where queries are precisely Select-Project-Join
70
Semantics SELECT a1, a2, …, ak FROM R1 AS x1, R2 AS x2, …, Rn AS xn WHERE Conditions Answer = {} for x1 in R1 do for x2 in R2 do ….. for xn in Rn do if Conditions then Answer = Answer U {(a1,…,ak) return Answer
71
Explicit Tuple-Variables
Sometimes, a query needs to use two copies of the same relation Distinguish copies by following the relation name by the name of a tuple-variable, in the FROM clause It’s always an option to rename relations this way, even when not essential SELECT s1.bar FROM Sells s1, Sells s2 WHERE s1.beer = s2.beer AND s1.price < s2.price;
72
SubQueries A parenthesized SELECT-FROM-WHERE statement (subquery) can be used as a value in a number of places, including FROM and WHERE clauses Example: in place of a relation in the FROM clause, we can place another query, and then query its result Better use a tuple-variable to name tuples of the result Subqueries that return Scalar If a subquery is guaranteed to produce one tuple with one component, then the subquery can be used as a value “Single” tuple often guaranteed by key constraint A run-time error occurs if there is no tuple or more than one tuple
73
Example From Sells(bar, beer, price), find the bars that serve Miller for the same price Joe charges for Bud Two queries would surely work: Find the price Joe charges for Bud Find the bars that serve Miller at that price SELECT bar FROM Sells WHERE beer = ‘Miller’ AND price = (SELECT price WHERE bar = ‘Joe Bar’ AND beer = ‘Bud’)
74
The IN Operator <tuple> IN <relation> is true if and only if the tuple is a member of the relation <tuple> NOT IN <relation> means the opposite IN-expressions can appear in WHERE clauses The <relation> is often a subquery Query: From Beers(name, manf) and Likes(drinker, beer), find the name and manufacturer of each beer that Fred likes SELECT * FROM Beers WHERE name IN ( SELECT beer FROM Likes WHERE drinker = ‘Fred’ ); The set of beers Fred likes
75
The Exists Operator EXISTS( <relation> ) is true if and only if the <relation> is not empty Being a Boolean-valued operator, EXISTS can appear in WHERE clauses Query: From Beers(name, manf), find those beers that are the only beer by their manufacturer SELECT name FROM Beers b1 WHERE NOT EXISTS( SELECT * FROM Beers WHERE manf = b1.manf AND name <> b1.name); Scope rule: manf refers to closest nested FROM with a relation having that attribute. Set of beers with the same manf as b1, but not the same beer
76
The Operator ANY x = ANY( <relation> ) is a Boolean condition meaning that x equals at least one tuple in the relation Similarly, = can be replaced by any of the comparison operators Example: x >= ANY( <relation> ) means x is not smaller than some tuples in the relation Note tuples must have one component only
77
The Operator ALL x <> ALL( <relation> ) is true if and only if for every tuple t in the relation, x is not equal to t That is, x is not a member of the relation. The <> can be replaced by any comparison operator Example: x >= ALL( <relation> ) means there is no tuple larger than x in the relation Query: From Sells(bar, beer, price), find the beer(s) sold for the highest price SELECT beer FROM Sells WHERE price >= ALL( SELECT price FROM Sells); price from the outer Sells must not be less than any price
78
Bag (Set) Semantics for SFW Queries
The SELECT-FROM-WHERE statement uses bag semantics Selection: preserve the number of occurrences Projection: preserve the number of occurrences (no duplicate elimination) Cartesian product, join: no duplicate elimination The default for union, intersection, and difference is set semantics, and is expressed by the following forms, each involving subqueries: ( subquery ) UNION ( subquery ) ( subquery ) INTERSECT ( subquery ) ( subquery ) EXCEPT ( subquery )
79
Example Happy Drinker: From relations Likes(drinker, beer), Sells(bar, beer, price) and Frequents(drinker, bar), find the drinkers and beers such that: The drinker likes the beer, and The drinker frequents at least one bar that sells the beer (SELECT * FROM Likes) INTERSECT (SELECT drinker, beer FROM Sells, Frequents WHERE Frequents.bar = Sells.bar ); The drinker frequents a bar that sells the beer
80
Set vs. Bag: Efficiency When doing projection in relational algebra, it is easier to avoid eliminating duplicates Just work tuple-at-a-time When doing intersection or difference, it is most efficient to sort the relations first At that point you may as well eliminate the duplicates anyway
81
Controlling Duplicate Elimination
Force the result to be a set by SELECT DISTINCT From Sells(bar, beer, price), find all the different prices charged for beers: SELECT DISTINCT price FROM Sells; Force the result to be a bag (i.e., don’t eliminate duplicates) by ALL, as in UNION ALL . . . Lists drinkers who frequent more bars than they like beers, and does so as many times as the difference of those counts (SELECT drinker FROM Frequents) EXCEPT ALL (SELECT drinker FROM Likes);
82
Aggregations SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause to produce that aggregation on the column e.g. COUNT(*) counts the number of tuples Query: From Sells(bar, beer, price), find the average price of Bud SELECT AVG(price) FROM Sells WHERE beer = ‘Bud’
83
Group By We may follow a SELECT-FROM-WHERE expression by GROUP BY and a list of attributes The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group Query: From Sells(bar, beer, price), find the average price for each beer: SELECT beer, AVG(price) FROM Sells GROUP BY beer
84
Example Query: From Sells(bar, beer, price) and Frequents (drinker, bar), find for each drinker the average price of Bud at the bars they frequent: SELECT drinker, AVG(price) FROM Frequents, Sells WHERE beer = ‘Bud’ AND Frequents.bar = Sells.bar GROUP BY drinker; Compute drinker-bar- price of Bud tuples first, then group by drinker
85
Restriction on SELECT Lists With Aggregation
If any aggregation is used, then each element of the SELECT list must be either: Aggregated, or An attribute on the GROUP BY list Question: How about this query? SELECT bar, MIN(price) FROM Sells WHERE beer = ‘Bud’;
86
Having Clause HAVING <condition> may follow a GROUP BY clause. If so, the condition applies to each group, and groups not satisfying the condition are eliminated These conditions may refer to any relation or tuple-variable in the FROM clause They may refer to attributes of those relations, as long as the attribute makes sense within a group; i.e., it is either: A grouping attribute, or Aggregated
87
Having Clause: Example
SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 OR beer = ‘michelob’;
88
General form of Grouping and Aggregation
SELECT S FROM R1,…,Rn WHERE C1 GROUP BY a1,…,ak HAVING C2 S = may contain attributes a1,…,ak and/or any aggregates but NO OTHER ATTRIBUTES C1 = is any condition on the attributes in R1,…,Rn C2 = is any condition on aggregate expressions or grouping attributes
89
General form of Grouping and Aggregation
SELECT S FROM R1,…,Rn WHERE C1 GROUP BY a1,…,ak HAVING C2 Evaluation steps: Compute the FROM-WHERE part, obtain a table with all attributes in R1,…,Rn Group by the attributes a1,…,ak Compute the aggregates in C2 and keep only groups satisfying C2 Compute aggregates in S and return the result
90
Modifications A modification command does not return a result as a query does, but it changes the database in some way There are three kinds of modifications: Insert a tuple or tuples Delete a tuple or tuples Update the value(s) of an existing tuple or tuples
91
Insertion To insert a single tuple: INSERT INTO <relation>
VALUES ( <list of values> ); Example: add to Likes(drinker, beer) the fact that Sally likes Bud: INSERT INTO Likes VALUES(‘Sally’, ‘Bud’);
92
Specifying Attributes in INSERT
We may add to the relation name a list of attributes There are two reasons to do so: We forget the standard order of attributes for the relation We don’t have values for all attributes, and we want the system to fill in missing components with NULL or a default value Another way to add the fact that Sally likes Bud to Likes(drinker, beer): INSERT INTO Likes(beer, drinker) VALUES(‘Bud’, ‘Sally’);
93
Inserting Many Tuples We may insert the entire result of a query into a relation, using the form: INSERT INTO <relation> ( <subquery> ); E.g., INSERT INTO Beers(name) SELECT beer from Sells;
94
Example: Insert a Subquery
Using Frequents(drinker, bar), enter into the new relation PotBuddies (name) all of Sally’s “potential buddies,” i.e., those drinkers who frequent at least one bar that Sally also frequents Pairs of Drinker tuples where the first is for Sally, the second is for someone else, and the bars are the same The other drinker INSERT INTO PotBuddies (SELECT d2.drinker FROM Frequents d1, Frequents d2 WHERE d1.drinker = ‘Sally’ AND d2.drinker <> ‘Sally’ AND d1.bar = d2.bar );
95
Deletion To delete tuples satisfying a condition from some relation:
DELETE FROM <relation> WHERE <condition>; Example: Delete from Likes(drinker, beer) the fact that Sally likes Bud: DELETE FROM Likes WHERE drinker = ‘Sally’ AND beer = ‘Bud’;
96
Delete all Tuples Make the relation Likes empty: DELETE FROM Likes;
Note no WHERE clause needed
97
Delete Many Tuples Delete from Beers(name, manf) all beers for which there is another beer by the same manufacturer. DELETE FROM Beers b WHERE EXISTS ( SELECT name FROM Beers a WHERE a.manf = b.manf AND a.name <> b.name ); Beers with the same manufacturer and a different name from the name of the beer represented by tuple b
98
Semantics of Deletion Suppose Busch makes only Bud and Bud Lite, and suppose we come to the tuple b for Bud first The subquery is nonempty, because of the Bud Lite tuple, so we delete Bud Now, When b is the tuple for Bud Lite, do we delete that tuple too? The answer is that we do delete Bud Lite as well. The reason is that deletion proceeds in two stages: Mark all tuples for which the WHERE condition is satisfied in the original relation Delete the marked tuples
99
Updates To change certain attributes in certain tuples of a relation:
UPDATE <relation> SET <list of attribute assignments> WHERE <condition on tuples>; Example: Change drinker Fred’s phone number to : UPDATE Drinkers SET phone = ‘ ’ WHERE name = ‘Fred’;
100
Update Several Tuples Increase price that is cheap: UPDATE Sells
SET price = price * 1.07 WHERE price < 3.0;
101
Views A view is a “virtual table”, a relation that is defined in terms of the contents of other tables and views Declare by: CREATE VIEW <name> AS <query>; In contrast, a relation whose value is really stored in the database is called a base table
102
Example: View Definition
CanDrink (drinker, beer) is a view “containing” the drinker-beer pairs such that the drinker frequents at least one bar that serves the beer: CREATE VIEW CanDrink AS SELECT drinker, beer FROM Frequents, Sells WHERE Frequents.bar = Sells.bar;
103
Example: Accessing a View
You may query a view as if it were a base table There is a limited ability to modify views if the modification makes sense as a modification of the underlying base table Example: SELECT beer FROM CanDrink WHERE drinker = ‘Sally’;
104
What Happens When a View Is Used?
The DBMS starts by interpreting the query as if the view were a base table Typical DBMS turns the query into something like relational algebra The queries defining any views used by the query are also replaced by their algebraic equivalents, and “spliced into” the expression tree for the query
105
Example: View Expansion
PROJbeer SELECTdrinker=‘Sally’ CanDrink PROJdrinker, beer JOINFrequents.bar = Sells.bar Frequents Sells
106
Have fun!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.