Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.

Slides:



Advertisements
Similar presentations
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CS 540 Database Management Systems
Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Lecture 13: Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Slide: 1 Presentation Title Presentation Sub-Title Copyright 2010 Robert Haas, EnterpriseDB Corporation. Creative Commons 3.0 Attribution. The PostgreSQL.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 Overview of Storage and Indexing Chapter 8 (part 1)
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
1 Implementation of Relational Operations: Joins.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Copyright © Curt Hill Query Evaluation Translating a query into action.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
Storage and Indexing1 Overview of Storage and Indexing.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data ACM EuroSys 2013 (Best Paper Award)
Query Optimizer (Chapter ). Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost,
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Hash Tables and Query Execution March 1st, Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: –There are n.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
1 Overview of Storage and Indexing Chapter 8. 2 Review: Architecture of a DBMS  A typical DBMS has a layered architecture.  The figure does not show.
Database Management System
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Introduction to Query Optimization
Evaluation of Relational Operations
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
File Processing : Query Processing
Predictive Performance
Selected Topics: External Sorting, Join Algorithms, …
(A Research Proposal for Optimizing DBMS on CMP)
Lecture 2- Query Processing (continued)
Overview of Query Evaluation
Implementation of Relational Operations
Lecture 13: Query Execution
Evaluation of Relational Operations: Other Techniques
Overview of Query Evaluation: JOINS
Database Systems (資料庫系統)
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison

2 managing main memory Main memories are increasing Prices declining at about 100x per decade Advent of 64-bit machines 1 TB of main memory feasible Use a BIGGER buffer pool Caching does not automatically guarantee improved performance

3 Optimizer uses “worst-case” estimates Selection query on a single table Optimizers would choose an unclustered index only for highly selective queries (~0.1%) Even if all required pages are cached, optimizer would still pick a table scan problem

4 goal Buffer-pool aware query optimizer Benefits ? Architecture ? Focus Single table queries Foreign-key joins

5 single table queries Prototype query engine SHORE (320 MB buffer pool, 16 KB pages) TPC-H 1 GB database Selection predicate on Lineitem table (0.5%) Unclustered index available for evaluating predicate

6

7 join queries Join Query between Lineitem and Orders range predicate on l_receiptdate unclustered index on l_receiptdate Index alternatives Covering Indexes (Cov1, Cov2) Join Index on (l_orderkey, o_orderkey) Stores (RID1, RID2) pair of joining tuples

8 JINDEX plan FETCH (Lineitem) B-Tree Range Scan (l_receiptdate) PROBE Join Index FILTER FETCH (Orders)

9 effect of buffer pool

10 benefits Similar tradeoff for other combinations Index nested loops vs. Sort Merge Join Relative costs of plans Caching can cause a big difference Optimizer could miss plans that have much better performance

11 what is needed ? Optimizer needs improved cost functions Given a selection (join) predicate What fraction of pages (f) containing tuples that satisfy the predicate is in memory. Cost of Index plan = N * (1 – f) * io_cost Not altering search space

12 challenges Parameter f function of query and buffer pool state Simple page count per relation will not suffice Different queries require different subsets of pages

13 Assume interface bool isCached(RID) selection (join) predicate Optimizer computes RIDs of tuples that satisfy the predicate Use isCached() to calculate f. solution ?

14 candidates Index Pre-execution Accurate technique High overheads Sampling techniques “close-enough” accuracy Low overheads

15 index pre-execution Compute RID lists during query optimization “pre-execute” predicates on indexes Selection Predicates Unclustered index on required attribute. Evaluate predicate only on index pages. Use List of RIDs and IsCached() to calculate f.

16 selection predicates Lineitem table (1 GB TPC-H) Range Predicate on l_shipdate column Shore B-Tree on l_shipdate column overhead can be15-20% of scan time

17 observations Index pre-execution Accurate but not practical Optimizer should not miss important cases Large fraction of required pages are in memory How important is accuracy ?

18 relaxing accuracy Close-enough (~5%) estimates can suffice Can sampling help ?

19 sampling Select * from R where R.value = f actual = 30/40 = 0.75 f estimated = 3/4 =

20 sampling Index pre-execution Used to gather RID lists that satisfy predicates Alternative Use random samples of RIDs instead Pre-compute samples and cache in main memory Avoids I/Os during query optimization

21 selection predicates Pre-computation Samples on base table (table A) Reservoir sampling using table scan S a stores (Atuple, RID-A) pair Using the samples Evaluate predicate on S a Use RID-A samples and isCached() interface to calculate f estimated

22 experiments Simulate buffer pool configurations Pre-fetch appropriate ranges calculate f actual calculate f estimated using sampling Evaluation Metric Mean of ABS (f actual - f estimated ) ERR1 (all configurations) ERR2 (configurations having f actual > 0.75)

23 selection predicate Selection predicate on Lineitem table l_shipdate between ( , ) Sample Size ERR1 ERR % 4.60% % 3.50% % 2.57% % 2.39%

24 join predicates Foreign key join between A and B A.a is foreign key pointing to B.b. Index pre-execution not feasible Sampling techniques Pre-computation for joins Assume S ab is pre-computed S ab = S a Join B stores (RID-A, Atuple, Btuple, RID-B)

25 using the samples Join Query between A.a and B.b Range predicate on table A Required What fraction of pages of B that satisfies the join predicate is cached (f) Cost of Index nested loops join with B as “inner” Approach Evaluate predicate on Sab Project RID-B samples that satisfy predicate Use RID-B samples and isCached() to calculate f estimated

26 join predicate Join between Lineitem and Order Predicates on l_receiptdate and l_shipmode Sample Size ERR1 ERR % 7.98% % 5.99% % 3.43% % 3.09%

27 overheads Sampling Overheads No I/Os (compared to index pre-execution) CPU overheads ~20 ms (2 GHz machine) Space Overheads 1% sample (base table + foreign key relationships) 25 MB for entire TPC-H database (1 GB)

28 not enough samples Unclustered Index vs. Table Scan Evaluate selection predicate on Sample RID sample not sufficient Avoid changing plans if “confidence” is low Infer “highly-selective” predicates Choose index plan

29 highly selective predicates Thresholds in predicate selectivity (s) s < T1 ( Use Index Plan) s > T2 ( Use Table Scan) Probability of “Error” is low T1 = 0.1%, T2 = 1% Correct with 99% probability if sample size is 1800

30 extensions Multi-way foreign key joins Join Synopses + RIDs Nested Queries De-correlation vs. Nested Iteration Compiled queries Use “choose” operator

31 summary Large Buffer pools (~ 1 TB) Significant fraction of “required” pages can be cached Optimizer needs to be aware of buffer pool contents Can result in significant improvements

32 Misc slides Transient buffer pool Pre-execution for joins

33 transient buffer pool Buffer pool contents could change before query execution Use Choose operator P1 – Plan picked by traditional optimizer P2 – Plan picked by buffer pool aware optimizer Execution plan is Choose (P1, P2)

34 pre-execution for joins Foreign-key join between A.a and B.b RID-AFETCH A.a PROBE INDEX (B.b) RID-B Use Index on Key value (B.b)Use Join Index PROBE JOIN INDEX RID-B

35 join predicates