Why use a DBMS in your website?

Slides:



Advertisements
Similar presentations
SQL Introduction Standard language for querying and manipulating data Structured Query Language Many standards out there: SQL92, SQL2, SQL3. Vendors support.
Advertisements

Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Relational Algebra (end) SQL April 19 th, Complex Queries Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store,
Slides adapted from Rao (ASU) & Franklin (Berkeley) Structure How will search and querying on these three types of data differ? A generic web page containing.
Query Optimization Goal: Declarative SQL query
Relational Algebra Maybe -- SQL. Confused by Normal Forms ? 3NF BCNF 4NF If a database doesn’t violate 4NF (BCNF) then it doesn’t violate BCNF (3NF) !
1 Lecture 12: SQL Friday, October 26, Outline Simple Queries in SQL (5.1) Queries with more than one relation (5.2) Subqueries (5.3) Duplicates.
1 Relational Query Optimization Module 5, Lecture 2.
1 Data Definition in SQL So far we have see the Data Manipulation Language, DML Next: Data Definition Language (DDL) Data types: Defines the types. Data.
Query Optimization: Transformations May 29 th, 2002.
CSE494 - Information Retrieval, Mining and Integration on the Internet Database Concepts - A Refresher 30 th March 2004.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Correlated Queries SELECT title FROM Movie AS Old WHERE year < ANY (SELECT year FROM Movie WHERE title = Old.title); Movie (title, year, director, length)
1 Information Systems Chapter 6 Database Queries.
Complex Queries (1) Product ( pname, price, category, maker)
One More Normal Form Consider the dependencies: Product Company Company, State Product Is it in BCNF?
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Integrity Constraints An important functionality of a DBMS is to enable the specification of integrity constraints and to enforce them. Knowledge of integrity.
Exercises Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Query Optimization Imperative query execution plan: Declarative SQL query Ideally: Want to find best plan. Practically: Avoid worst plans! Goal: Purchase.
Query Optimization March 10 th, Very Big Picture A query execution plan is a program. There are many of them. The optimizer is trying to chose a.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Aggregation SELECT Sum(price) FROM Product WHERE manufacturer=“Toyota” SQL supports several aggregation operations: SUM, MIN, MAX, AVG, COUNT Except COUNT,
SQL.
Relational Algebra at a Glance
Lecture 8: Relational Algebra
Modifying the Database
Server-Side Application and Data Management IT IS 3105 (FALL 2009)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
SQL Introduction Standard language for querying and manipulating data
Introduction to Query Optimization
Relational Algebra Chapter 4, Part A
Introduction to Database Systems CSE 444 Lecture 04: SQL
Chapter 15 QUERY EXECUTION.
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Building a Database Application
SQL Introduction Standard language for querying and manipulating data
Relational Query Optimization
Lecture 12: SQL Friday, October 20, 2000.
Lectures 3: Introduction to SQL 2
Introduction to Database Systems CSE 444 Lecture 02: SQL
Where are we? Until now: Modeling databases (ODL, E/R): all about the schema Now: Manipulating the data: queries, updates, SQL Then: looking inside -
Lecture 4: SQL Thursday, January 11, 2001.
Integrity Constraints
Relational Algebra Friday, 11/14/2003.
Lecture 3 Monday, April 8, 2002.
Lecture 5: The Relational Data Model
Relational Query Optimization
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Lecture 23: Query Execution
Evaluation of Relational Operations: Other Techniques
Query Optimization.
Terminology Product Attribute names Name Price Category Manufacturer
Lecture 3: Relational Algebra and SQL
Syllabus Introduction Website Management Systems
Relational Query Optimization (this time we really mean it)
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Introduction to Database Systems CSE 444
Relational Query Optimization
Relational Query Optimization
Presentation transcript:

Why use a DBMS in your website? Suppose we are building web-based music distribution site. Several questions arise: How do we store the data? (file organization, etc.) How do we query the data? (write programs…) Make sure that updates don’t mess things up? Provide different views on the data? (registrar versus students) How do we deal with crashes? Way too complicated! Buy a database system!

Functionality of a DBMS Storage management Abstract data model High level query and data manipulation language May tell us what we are missing in text-based search Efficient query processing May change in the internet scenario Transaction processing Resiliency: recovery from crashes, Different views of the data, security May be useful to model a collection of databases together Interface with programming languages

Database Outline What we care about Structured data representations Relational databases Deductive databases Structured query languages SQL Views (& materialized views) Query optimization overview

Building an Application with a Database System Requirements modeling (conceptual, pictures) Decide what entities should be part of the application and how they should be linked. Schema design and implementation Decide on a set of tables, attributes. Define the tables in the database system. Populate database (insert tuples). Write application programs using the DBMS Now much easier, with data management API

Conceptual Modeling name category name ssn Takes Course Student quarter Advises Teaches Professor name field address

Schema Design & Implementation Table Students Separates the logical view from the physical view of the data.

Terminology Product Attribute names Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi tuples (Arity=4) Product(name: string, Price: real, category: enum, Manufacturer: string)

Querying a Database Find all the students taking CSE490i in Q1, 2000 S(tructured) Q(uery) L(anguage) select E.name from Enroll E where E.course=CS490i and E.quarter=“Winter, 2000” Query processor figures out how to answer the query efficiently.

Relational Algebra Operators Basic Binary Set Operators tuple sets as input, new set as output Basic Binary Set Operators Result is table (set) with same attributes Sets must be compatible! R1(A1,A2,A3)  R2(B1,B2,B3)  Domain(Ai) = Domain(Bi) Union All tuples in either R1 or in R2 Intersection All tuples in both R1 and R2 Difference All tuples in R1 but not in R2 Complement - what’s the universe? Selection, Projection, Cartesian Product, Join

Selection s Grab a subset of the tuples in a relation that satisfy a given condition Use and, or, not, >, <… to build condition Unary operation… returns set with same attributes, but ‘selects’ rows

Projection p Unary operation, selects columns Returned schema is different, So returned tuples are not subset of original set Contrast with selection Eliminates duplicate tuples

Cartesian Product X Binary Operation Result is set of tuples combining all elements of R1 with all elements of R2, for R1  R2 Schema is union of Schema(R1) & Schema(R2) Notice we could do selection on result to get meaningful info!

Join Most common (and exciting!) operator… Combines 2 relations Selecting only related tuples Equivalent to Cross product followed by selection followed by Projection Result has all attributes of the two relations Equijoin Join condition is equality between two attributes Natural join Equijoin on attributes of same name result has only one copy of join condition attribute

Example: Natural Join Employee Dependents

Complex Queries Product ( pname, price, category, maker) Purchase (buyer, seller, store, prodname) Company (cname, stock price, country) Person( per-name, phone number, city) Find phone numbers of people who bought gizmos from Fred. Find telephony products that somebody bought

Exercises Product ( pname, price, category, maker) Purchase (buyer, seller, store, prodname) Company (cname, stock price, country) Person( per-name, phone number, city) Ex #1: Find people who bought telephony products. Ex #2: Find names of people who bought American products Ex #3: Find names of people who bought American products and did not buy French products Ex #4: Find names of people who bought American products and they live in Seattle. Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.

Deductive Databases Relations viewed as predicates. Interrelations between relations expressed as “datalog” rules (Horn clauses, without function symbols) Enames(Name) :- Employe(Name, SSN) [Projection] Wealthy-Employee(Name) :- Employee(Name,SSN), Salary(SSN,Money),Money> 100000 [Selection] Ed(Name, Dname) :- Employee(Name, SSN), Employee_Dependents(SSN, Dname) [Join] Emprelated(Name,Dname) :- Ed(Name,Dname) Emprelated(Name,Dname) :- Ed(Name,D1), Emprelated(D1,D2) [Recursion]

More datalog terminology A datalog program is a set of datalog rules. A program with a single rule is a conjunctive query. We distinguish EDB predicates and IDB predicates EDB’s are stored in the database, appear only in the bodies IDB’s are intensionally defined, appear in both bodies and heads.

Structured Query Language 2/23/2019 11:46 AM 22

SQL Introduction Standard language for querying and manipulating data Structured Query Language Many standards out there: SQL92, SQL2, SQL3, SQL99 Vendors support various subsets of these (but we’ll only discuss a subset of what they support) Basic form = syntax on relational algebra (but many other features too) Select attributes From relations (possibly multiple, joined) Where conditions (selections)

11/13 2/23/2019 11:46 AM 24

Selections s SELECT * FROM Company WHERE country=“USA” AND stockPrice > 50 You can use: Attribute names of the relation(s) used in the FROM. Comparison operators: =, <>, <, >, <=, >= Apply arithmetic operations: stockprice*2 Operations on strings (e.g., “||” for concatenation). Lexicographic order on strings. Pattern matching: s LIKE p Special stuff for comparing dates and times.

Projection p Select only a subset of the attributes SELECT name, stock price FROM Company WHERE country=“USA” AND stockPrice > 50 Rename the attributes in the resulting table SELECT name AS company, stockprice AS price FROM Company WHERE country=“USA” AND stockPrice > 50

Ordering the Results SELECT name, stock price FROM Company WHERE country=“USA” AND stockPrice > 50 ORDERBY country, name Ordering is ascending, unless you specify the DESC keyword. Ties are broken by the second attribute on the ORDERBY list, etc.

Join SELECT name, store FROM Person, Purchase WHERE per-name=buyer AND city=“Seattle” AND product=“gizmo” Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone number, city)

Disambiguating Attributes Find names of people buying telephony products: SELECT Person.name FROM Person, Purchase, Product WHERE Person.name=buyer AND product=Product.name AND Product.category=“telephony” Product ( name, price, category, maker) Purchase (buyer, seller, store, product) Person( name, phone number, city)

Tuple Variables Find pairs of companies making products in the same category SELECT product1.maker, product2.maker FROM Product AS product1, Product AS product2 WHERE product1.category = product2.category AND product1.maker <> product2.maker Product ( name, price, category, maker)

Union, Intersection, Difference (SELECT name FROM Person WHERE City=“Seattle”) UNION FROM Person, Purchase WHERE buyer=name AND store=“The Bon”) Similarly, you can use INTERSECT and EXCEPT. Inputs must have the same attribute names (otherwise: rename).

Subqueries SELECT Purchase.product FROM Purchase WHERE buyer = (SELECT name FROM Person WHERE social-security-number = “123 - 45 - 6789”); In this case, the subquery returns one value. If it returns more, it’s a run-time error.

Subqueries Returning Relations Find companies who manufacture products bought by Joe Blow. SELECT Company.name FROM Company, Product WHERE Company.name=maker AND Product.name IN (SELECT product FROM Purchase WHERE buyer = “Joe Blow”); You can also use: s > ALL R s > ANY R EXISTS R

Views 2/23/2019 11:46 AM 36

Defining Views Views are relations, except that they are not physically stored. They are used mostly in order to simplify complex queries and to define conceptually different views of the database to different classes of users. View: purchases of telephony products: CREATE VIEW telephony-purchases AS SELECT product, buyer, seller, store FROM Purchase, Product WHERE Purchase.product = Product.name AND Product.category = “telephony”

A Different View CREATE VIEW Seattle-view AS SELECT buyer, seller, product, store FROM Person, Purchase WHERE Person.city = “Seattle” AND Person.name = Purchase.buyer We can later use the views: SELECT name, store FROM Seattle-view, Product WHERE Seattle-view.product = Product.name AND Product.category = “shoes” What’s really happening when we query a view??

Updating Views How can I insert a tuple into a table that doesn’t exist? CREATE VIEW bon-purchase AS SELECT store, seller, product FROM Purchase WHERE store = “The Bon Marche” If we make the following insertion: INSERT INTO bon-purchase VALUES (“the Bon Marche”, Joe, “Denby Mug”) We can simply add a tuple (“the Bon Marche”, Joe, NULL, “Denby Mug”) to relation Purchase.

Materialized Views Views whose corresponding queries have been executed and the data is stored in a separate database Uses: Caching Issues Using views in answering queries Normally, the views are available in addition to database (so, views are local caches) In information integration, views may be the only things we have access to. An internet source that specializes in woody allen movies can be seen as a view on a database of all movies. Except, there is no database out there which contains all movies.. Maintaining consistency of materialized views 2/23/2019 11:46 AM 40

Query Optimization 2/23/2019 11:46 AM 41

Query Optimization Goal: Declarative SQL query Imperative query execution plan: buyer SELECT S.buyer FROM Purchase P, Person Q WHERE P.buyer=Q.name AND Q.city=‘seattle’ AND Q.phone > ‘5430000’  City=‘seattle’ phone>’5430000’ Inputs: the query statistics about the data (indexes, cardinalities, selectivity factors) available memory Buyer=name (Simple Nested Loops) Purchase Person (Table scan) (Index scan) Ideally: Want to find best plan. Practically: Avoid worst plans! 2/23/2019 11:46 AM 42

R.bid=100 AND S.rating>5 SELECT S.sname FROM Reserves R, Sailors S sid=sid bid=100 rating > 5 sname (Simple Nested Loops) (On-the-fly) Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) temp T2) (Sort-Merge Join) (On-the-fly) sname SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Goal of optimization: To find more efficient plans that compute the same answer. (On-the-fly) rating > 5 sid=sid with pipelining ) (Use hash index; do bid=100 Sailors not write result to temp) Reserves 2/23/2019 11:46 AM 43

Relational Algebra Equivalences Allow us to choose different join orders and to ‘push’ selections and projections ahead of joins. Selections: (Cascade) (Commute) Projections: (Cascade) (Associative) Joins: R (S T) (R S) T (Commute) (R S) (S R) Show that: R (S T) (T R) S 10

Optimizing Joins Q(u,x) :- R(u,v), S(v,w), T(w,x) R S T Many ways of doing a single join R S Symmetric vs. asymmetric join operations Nested join, hash join, double pipe-lined hash join etc. Processing costs alone vs. processing + transfer costs Get R and S together vs, get R, get just the tuples of S that will join with R (“semi-join”) Many orders in which to do the join (R join S) join T (S join R) join T (T join S) join R etc. All with different costs 2/23/2019 11:46 AM 45

Determining Join Order In principle, we need to consider all possible join orderings: As the number of joins increases, the number of alternative plans grows rapidly; we need to restrict the search space. System-R: consider only left-deep join trees. Left-deep trees allow us to generate all fully pipelined plans:Intermediate results not written to temporary files. Not all left-deep trees are fully pipelined (e.g., SM join). C D B A B A C D 15

Query Optimization Process (simplified a bit) Parse the SQL query into a logical tree: identify distinct blocks (corresponding to nested sub-queries or views). Query rewrite phase: apply algebraic transformations to yield a cheaper plan. Merge blocks and move predicates between blocks. Optimize each block: join ordering. Complete the optimization: select scheduling (pipelining strategy).

Cost Estimation For each plan considered, must estimate cost: Must estimate cost of each operation in plan tree. Depends on input cardinalities. Must estimate size of result for each operation in tree! Use information about the input relations. For selections and joins, assume independence of predicates. System R cost estimation approach. Very inexact, but works ok in practice. More sophisticated techniques known now. 8

Key Lessons in Optimization There are many approaches and many details to consider in query optimization Classic search/optimization problem! Not completely solved yet! Main points to take away are: Algebraic rules and their use in transformations of queries. Deciding on join ordering: System-R style (Selinger style) optimization. Estimating cost of plans and sizes of intermediate results.