Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Transaction Management Overview. Transactions Concurrent execution of user programs is essential for good DBMS performance. –Because disk accesses are.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Relational Query Optimization Module 5, Lecture 2.
Relational Query Optimization 198:541. Overview of Query Optimization  Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically.
Concurrency Control and Recovery In real life: users access the database concurrently, and systems crash. Concurrent access to the database also improves.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
Query Optimization, Concluded and Transactions and Concurrency Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December.
Transactions and Wrap-Up Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 8, 2005 Some slide content derived.
Relational Query Optimization (this time we really mean it)
Query Optimization Chapter 15. Query Evaluation Catalog Manager Query Optmizer Plan Generator Plan Cost Estimator Query Plan Evaluator Query Parser Query.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
1 Transaction Management Overview Yanlei Diao UMass Amherst March 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Query Optimization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Slide content courtesy Raghu Ramakrishnan.
Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 1, 2005 Some slide content derived.
Dec 15, 2003Murali Mani Transactions and Security B term 2004: lecture 17.
Cs3431 Transactions, Logging and Security. cs3431 Transactions: What and Why? A set of operations on a database must appear as one “unit”. Example: Consider.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Transactions and Wrap-Up Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 9, 2004 Some slide content derived.
Transactions and Concurrency Control Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2003 Slide content.
Transaction Management WXES 2103 Database. Content What is transaction Transaction properties Transaction management with SQL Transaction log DBMS Transaction.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 8A Transaction Concept.
Overview of Query Optimization v Plan : Tree of R.A. ops, with choice of alg for each op. –Each operator typically implemented using a `pull’ interface:
Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
Query Optimization, Concluded and Transactions and Concurrency Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Transactions1 Unit of work on a database. Transactions2 Transactions, concept Logical unit of work on the database –Examples Transfer money from bank.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 136 Database Systems I SQL Modifications and Transactions.
1 Relational Query Optimization Chapter Query Blocks: Units of Optimization  An SQL query is parsed into a collection of query blocks :  An SQL.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L12_JDBC_MySQL 1 Transations.
1 Database Systems ( 資料庫系統 ) December 13, 2004 Chapter 15 By Hao-hua Chu ( 朱浩華 )
Implementation of Database Systems, Jarek Gryz1 Relational Query Optimization Chapters 12.
Cost Estimation For each plan considered, must estimate cost: –Must estimate cost of each operation in plan tree. Depends on input cardinalities. –Must.
Database Applications (15-415) DBMS Internals- Part X Lecture 21, April 3, 2016 Mohammad Hammoud.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction To Query Optimization and Examples Chpt
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Introduction to Query Optimization
Introduction to Database Systems
Query Optimization Overview
Transactions and Wrap-Up
Query Optimization, Concluded and Transactions and Concurrency
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Query Optimization Overview
Query Optimization.
Relational Query Optimization
Relational Query Optimization (this time we really mean it)
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Relational Query Optimization
Relational Query Optimization
Presentation transcript:

Optimization, Auto-Tuning, and Introduction to Transactions Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 25, 2003 Some slide content courtesy Susan Davidson, Raghu Ramakrishnan & Johannes Gehrke

2 Administrivia  It’s nearly the end!  Homework 7 due next Tuesday 12/2  Projects due next Thurs. 12/4: please sign up to give me a demo that day  Final exam handed out 12/4  Final exam and project report due 12/18  Projects will be graded both on quality of the project and the quality of the report – writing is always important!

Recap of Query Optimization  Plan: Tree of R.A. ops, with choice of alg for each op.  Each operator typically implemented using a `pull’ interface: when an operator is `pulled’ for the next output tuples, it `pulls’ on its inputs and computes them.  Two main issues:  For a given query, what plans are considered?  Algorithm to search plan space for cheapest (estimated) plan.  How is the cost of a plan estimated?  Ideally: Want to find best plan. Practically: Avoid worst plans!  Our focus is on the approach from “System R”.

Highlights of System R Optimizer  Impact:  Most widely used currently; works well for < 10 joins.  Cost estimation: Approximate art at best.  Statistics, maintained in system catalogs, used to estimate cost of operations and result sizes.  Considers combination of CPU and I/O costs.  Plan Space: Too large, must be pruned!  Break the query into blocks  Only the space of left-deep plans will be considered.  Left-deep plans allow output of each operator to be pipelined into the next operator without storing it in a temporary relation.  Cartesian products avoided.

First Need to Divide into Query Blocks  An SQL query is parsed into a collection of query blocks, and these are optimized one block at a time.  Nested blocks are usually treated as calls to a subroutine, made once per outer tuple. (This is an over- simplification, but serves for now.) SELECT S.sname FROM Sailors S WHERE S.age IN ( SELECT MAX (S2.age) FROM Sailors S2 GROUP BY S2.rating ) Nested blockOuter block  For each block, the plans considered are: – All available access methods, for each reln in FROM clause. – Possible join trees

6 Recall: The Core Idea in System R  For computing the most effective way of joining tables, use dynamic programming, which builds subresults and then uses these to construct successively bigger results  Find cheapest ways of accessing tables  Find cheapest ways of joining every pair of tables  Find cheapest way of joining every pair of tables with a 3 rd table (reusing the cheapest way of getting that pair)  Find cheapest way of joining every triple of tables with a 4 th table (reusing the cheapest way of joining that triple)  …

7 Still Not Quite Enough…  Problem 1: Query blocks also have selection, projection, grouping!  Problem 2: We still have to consider too many alternative plans!  Problem 3: Sorting causes funny things to happen in the dynamic programming approach – it doesn’t easily account for amortization of a sort across multiple queries

8 Heuristic 1: Selections, Projections, Groupings  What do we know is generally the case about selection & projection operations?  ORDER BY, GROUP BY, aggregates etc. handled as a final step

Heuristic: Left-Deep Join Trees  Fundamental decision in System R: only left-deep join trees are considered.  As the number of joins increases, the number of alternative plans grows rapidly; we need to restrict the search space.  Left-deep trees allow us to generate all fully pipelined plans.  Intermediate results not written to temporary files.  Not all left-deep trees are fully pipelined (e.g., SM join). B A C D B A C D C D B A

10 “Interesting Orders”  Dynamic programming doesn’t account for amortization of a sort across multiple joins  We need to fix this!  Solution:  Figure out all of the possible orderings that might be useful in the plan (for joining or grouping)  Create a separate “layer” in the DP table for these  At every point in the DP algorithm:  Find the cheapest join that maintains the order  Find the cheapest join that doesn’t maintain the order, using both the ordered and unordered alternatives

Final Details of Query Block Optimization  First, joins and cartesian products are enumerated:  An N-1 way plan is not combined with an additional relation if there is no join condition between them, unless all predicates in WHERE have been used up.  i.e., avoid Cartesian products if possible  Selections and projections are “pushed down”  Final ORDER BY is applied  In spite of pruning the plan space and using heuristics, this approach is still exponential in the # of tables.

Nested Queries  Nested block is optimized independently, with the outer tuple considered as providing a selection condition.  Outer block is optimized with the cost of `calling’ nested block computation taken into account.  Implicit ordering of these blocks means that some good strategies are not considered. The non-nested version of the query is typically optimized better. SELECT S.sname FROM Sailors S WHERE EXISTS ( SELECT * FROM Reserves R WHERE R.bid=103 AND R.sid=S.sid) Nested block to optimize: SELECT * FROM Reserves R WHERE R.bid=103 AND S.sid= outer value Equivalent non-nested query: SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid=R.sid AND R.bid=103

Query Optimization Recapped  Query optimization is an important task in a relational DBMS  Must understand optimization in order to understand the performance impact of a given database design (relations, indexes) on a workload (set of queries)  Additionally, may need to do “hand optimization”  Two parts to optimizing a query:  Consider a set of alternative plans  Heuristics for simpler operators  Must prune search space; typically, left-deep plans only  Must estimate cost of each plan that is considered  Must estimate size of result and cost for each plan node  Key issues: statistics, indexes, operator implementations  PITFALL: often the estimates of intermediate results AREN’T good!!!

14 The Bigger Picture: Tuning  We saw that indexes and optimization decisions were critical to performance  Homeworks 6 and 7 tried to demonstrate some of that  Also important: buffer pool sizes, layout of data on disk, isolation levels (discussed shortly)  Many DBAs and consultants have made a living off understanding query workloads, data, and estimated intermediate result sizes  They “tune” DBs as a specialty  … Though this career MIGHT be diminishing in significance…

15 Autonomic & Auto-Tuning DBMSs  Hot research topic: self-tuning and adaptive DBMSs  SQL Server and DB2 have “Index Wizards” that take a query workload and try to find an optimal set of indices for it  Basically, they try lots of combinations of indices to find one that works well  “Adaptive query processing” systems also try to figure out where the optimizer’s estimates “went wrong” and compensate for it  Change the query in the middle, or  Make a note so we pick a better plan next time!

16 Switching Gears…  We’ve spent a lot of time talking about querying data  Yet updates are a really major part of many DBMS applications  Particularly important: ensuring ACID properties  Atomicity: each operation looks atomic to the user  Consistency: each operation in isolation keeps the database in a consistent state (this is the responsibility of the user)  Isolation: should be able to understand what’s going on by considering each separate transaction independently  Durability: updates stay in the DBMS!!!

17 What is a transaction?  A transaction is a sequence of read and write operations on data items that logically functions as one unit of work:  should either be done entirely or not at all  if it succeeds, the effects of write operations persist (commit); if it fails, no effects of write operations persist (abort)  these guarantees are made despite concurrent activity in the system, and despite failures that may occur

18 How things can go wrong  Suppose we have a table of bank accounts which contains the balance of the account. A deposit of $50 to a particular account # 1234 would be written as:  Reads and writes the account’s balance  What if two owners of the account make deposits simultaneously? update Accounts set balance = balance + $50 where account#= ‘1234’;

19 Concurrent deposits  This SQL update code is represented as a sequence of read and write operations on “data items” (which for now should be thought of as individual accounts):  Here, X is the data item representing the account with account# Deposit 1 Deposit 2 read(X.bal) X.bal := X.bal + $50 X.bal:= X.bal + $10 write(X.bal)

20 A “bad” concurrent execution  But only one “action” (e.g. a read or a write) can happen at a time, and there are a variety of ways in which the two deposits could be simultaneously executed: Deposit 1 Deposit 2 read(X.bal) X.bal := X.bal + $50 X.bal:= X.bal + $10 write(X.bal) time BAD!

21 A “good” execution  Previous execution would have been fine if the accounts were different (i.e. one were X and one were Y).  The following execution is a serial execution, and executes one transaction after the other: Deposit 1 Deposit 2 read(X.bal) X.bal := X.bal + $50 write(X.bal) read(X.bal) X.bal:= X.bal + $10 write(X.bal) time GOOD!

22 Good executions  An execution is “good” if is it serial (i.e. the transactions are executed one after the other) or serializable (i.e. equivalent to some serial execution)  This execution is equivalent to executing Deposit 1 then Deposit 3, or vice versa. Deposit 1 Deposit 3 read(X.bal) read(Y.bal) X.bal := X.bal + $50 Y.bal:= Y.bal + $10 write(X.bal) write(Y.bal)

23 Atomicity  Problems can also occur if a crash occurs in the middle of executing a transaction:  Need to guarantee that the write to X does not persist (ABORT)  Default assumption if a transaction doesn’t commit Transfer read(X.bal) read(Y.bal) X.bal= X.bal-$100 Y.bal= Y.bal+$100 CRASH

24 Transactions in SQL  A transaction begins when any SQL statement that queries the db begins.  To end a transaction, the user issues a COMMIT or ROLLBACK statement. Transfer UPDATE Accounts SET balance = balance - $100 WHERE account#= ‘1234’; UPDATE Accounts SET balance = balance + $100 WHERE account#= ‘5678’; COMMIT;

25 Read-only transactions  When a transaction only reads information, we have more freedom to let the transaction execute in parallel with other transactions.  We signal this to the system by stating SET TRANSACTION READ ONLY; SELECT * FROM Accounts WHERE account#=‘1234’;...

26 Read-write transactions  If we state “read-only”, then the transaction cannot perform any updates.  Instead, we must specify that the transaction may update (the default): SET TRANSACTION READ ONLY; UPDATE Accounts SET balance = balance - $100 WHERE account#= ‘1234’;... SET TRANSACTION READ WRITE; update Accounts set balance = balance - $100 where account#= ‘1234’;... ILLEGAL!

27 Dirty reads  Dirty data is data written by an uncommitted transaction; a dirty read is a read of dirty data.  Sometimes dirty reads are acceptable, other times they are not: e.g., if we wished to ensure balances never went negative in the transfer example, we should test that there is enough money first!

28 “Bad” dirty read EXEC SQL select balance into :bal from Accounts where account#=‘1234’; if (bal > 100) { EXEC SQL update Accounts set balance = balance - $100 where account#= ‘1234’; EXEC SQL update Accounts set balance = balance + $100 where account#= ‘5678’;} EXEC SQL COMMIT; If the initial read (italics) were dirty, the balance could become negative!

29 Acceptable dirty read  However, if we are just checking availability of an airline seat, a dirty read might be fine! Reservation transaction: EXEC SQL select occupied into :occ from Flights where Num= ‘123’ and date= and seat=‘23f’; if (!occ) {EXEC SQL update Flights set occupied=true where Num= ‘123’ and date= and seat=‘23f’;} else {notify user that seat is unavailable}

30 Other phenomena  Unrepeatable read: a transaction reads the same data item twice and gets different values.  Phantom problem: a transaction retrieves a collection of tuples twice and sees different results

31 Phantom Problem Example  T1: “find the oldest climber who is either MED or EXP”  T2: “insert a new EXP climber aged 96, then insert a new MED climber aged 60” Suppose that T1 locks all data pages with some EXP climber and finds that the oldest is 85. Then T2 executes, inserting the new EXP climber on a page not locked by T1. T1 then completes, locking all pages with some MED climber and finding the oldest MED climber is 60 (whereas the previous oldest MED climber had been 40).