Gergely Lukács Pázmány Péter Catholic University

Slides:



Advertisements
Similar presentations
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Advertisements

1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization See Sections 15.1, 2, 3, 7.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Chapter 19 Query Processing and Optimization
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
B+ Trees COMP
Database Management 9. course. Execution of queries.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
NOEA/IT FEN Databases/PhysicalDB 1 Physical DB On hardware, disc etc. File Structures Hashing Index Structures Search Trees and B trees revisited.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Internal and External Sorting External Searching
Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
10/3/2017 Chapter 6 Index Structures.
Query Processing and Optimization, and Database Tuning
Query Optimization Heuristic Optimization
COMP261 Lecture 23 B Trees.
Data Indexing Herbert A. Evans.
Database System Implementation CSE 507
TCSS 342, Winter 2006 Lecture Notes
CPS216: Data-intensive Computing Systems
Multiway Search Trees Data may not fit into main memory
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Tree Indices Chapter 11.
Chapter 12: Query Processing
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Chapter Trees and B-Trees
Chapter Trees and B-Trees
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Session #, Speaker Name Indexing Chapter 8 11/19/2018.
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
B+-Trees (Part 1).
QUERY OPTIMIZATION.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
(2,4) Trees 2/15/2019 (2,4) Trees (2,4) Trees.
B- Trees D. Frey with apologies to Tom Anastasio
Database Design and Programming
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
B-Trees.
Indexing, Access and Database System Architecture
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
Presentation transcript:

Database Systems Data Access Structures (Index) Query planning and execution Gergely Lukács Pázmány Péter Catholic University Faculty of Information Technology Budapest, Hungary lukacs@itk.ppke.hu

External/Conceptual/Internal Views External View: A user is anyone who needs to access some portion of the data. Conceptual/Logical View: An abstract representation of the entire information content of the database. Internal/physical View: Describes how the data are stored/organised physically (indeces, acces paths, block size…)

Data access structures, Binary Search Tree B+ tree („Index“) 4

Binary tree each node can have up to two successor nodes The successor nodes of a node are called its children The predecessor node of a node is called its parent The "beginning" node is called the root (has no parent) A node without children is called a leaf

Binary tree - 2 Nodes are organize in levels (indexed from 0). Level (or depth) of a node: number of edges in the path from the root to that node. Height of a tree h: #levels = L (some books define h as #levels-1). Full tree: (every node has exactly two children) and all the leaves are on the same level.

Binary tree - 3 Max. #nodes at level l: 2l Height h of full tree H = log2(N+1) → (log2(N)) Tree operations (e.g., insert, delete, retrieve etc.) depend on h. h determines running time!

Binary Search Tree (BST) - 4 The value stored at a node is greater than the value stored at its left child and less than the value stored at its right child How to search a BST? We begin by examining the root node. If the tree is null, the key we are searching for does not exist in the tree. Otherwise, if the key equals that of the root, the search is successful and we return the node. If the key is less than that of the root, we search the left subtree. Similarly, if the key is greater than that of the root, we search the right subtree. This process is repeated until the key is found or the remaining subtree is null. If the searched key is not found before a null subtree is reached, then the item must not be present in the tree. Tree operations (e.g., insert, delete, retrieve etc.) depend on height of tree. h determines running time! 13 9 44 4 11 30 2 15 34 6

Disk storage Disk storage Data is organised in Blocks! data is transferred between disk and main memory in blocks (e.g, 4/8/16/32 kB) No matter, whether 1 byte or the whole block is needed Significant cost factor in DBMS: # blocks accessed from disk Sequential access much faster than random access 3600 RPM 16.7 ms X00 000 instructions!!

Table storage, blocks

Binary search tree – vs. disk storage Assume: X0(0) million values Deep tree Idea: more branches and thus reduce the height of the tree! Values at different levels Idea: values only in the leaf nodes

B+ tree as database index B+ tree – nodes ~ blocks high fanout (ca. 100-?00) E.g., 4 KB block, one value (separator): 40 Bytes: ~100 values in a block! flat tree (log100N instead of log2: ca. 3-5 levels)

B+ tree Data (pointer) only in leaf nodes Leaf nodes chained Pointer on the data (table)

B+ tree – select (search) Searching for Key Value 6

B+ tree: inserting new element Search bucket If the bucket is not full, add the record. Otherwise, split the bucket. Allocate new leaf and move half the bucket's elements to the new bucket. Insert the new leaf's smallest key and address into the parent. If the parent is full, split it too. Add the middle key to the parent node. Repeat until a parent is found that need not split. If the root splits, create a new root which has one key and two pointers. Propagating splits up, as far as necessary

Example Inserting new data with key 74 74 ... 25 45 65 75 10 ... 20 ... 30 ... 40 ... 50 ... 60 ... 70 ... 74 ... 80 ... 90 ...

Example Inserting new data with key 51 Overflow 51 ... 25 45 65 75 10 ... 20 ... 30 ... 40 ... 50 ... 60 ... 70 ... 74 ... 80 ... 90 ...

Example Inserting new data with key 51 Overflow 25 45 65 75 10 ... New separator: 55 10 ... 20 ... 30 ... 40 ... 50 ... 51 ... 70 ... 74 ... 80 ... 90 ... 60 ...

Example Inserting new data with key 51 55 25 45 65 75 10 ... 20 ... 30 ... 40 ... 50 ... 51 ... 70 ... 74 ... 80 ... 90 ... 60 ...

Example Deleting data with key 60: underflow 55 25 45 65 75 10 ... 20 ... 30 ... 40 ... 50 ... 51 ... 70 ... 74 ... 80 ... 90 ... X underflow 60 ...

Example Deleting data with key 60: 55 25 45 65 75 10 ... 20 ... 30 ... underflow X 25 45 65 75 X 10 ... 20 ... 30 ... 40 ... 50 ... 51 ... 70 ... 74 ... 80 ... 90 ...

Example Deleting data with key 60: 25 45 55 75 10 ... 20 ... 30 ... 40 ... 50 ... 51 ... 70 ... 74 ... 80 ... 90 ...

B+ tree as database index -- pointers in index to records in table

Creating and using indexes CREATE INDEX Idx_Emp_LName ON Employees ("Last Name") Exact match query ... WHERE "Last Name" = 'Doe' Range query ... WHERE "Last Name" > 'DA' AND "Last Name" < 'DC' Selectivity of query (condition) Index only for highly selective queries

Selectivity in a quey Selectivity of query (condition) Index useful only for highly selective queries Low selectivity: nearly all data blocks have to be read, even several times!

Multiple-Column Indexes CREATE INDEX Idx_Emp_Name ON Employees ("Last Name", "First Name") Useful for ... WHERE "Last Name" = 'Doe' ... WHERE "Last Name" = 'Doe' AND "First Name" = 'John' ... WHERE "First Name" = 'John' AND "Last Name" = 'Doe' Can not be used ... WHERE "First Name" = 'John' 1 2

Index Query conditions Join conditions Sorted result Space requirement ~ similar to table Maintenance (updates, inserts) ~ (>) table Bulk updates, inserts Deactivating, dropping index Reactivating, recreating

Cost based query optimization 28

Query execution SQL: very high level, declarative What to retrieve, not how! SQL query is translated by the query processor into a low level program – the execution plan

Query optimisation One SQL query – many (!) different execution plans, execution alternatives Index – using, not using tendence: selective query – using Which index? Join order ? (Join execution?) Dramatically different costs! Query optimizer: choosing a relatively good execution plan

Query execution Query parser Transformation Query optimizer SQL Query parser Internal query description Transformation Internal query description Query optimizer Query Execution Plan Query execution

Query execution (detailed) Parsing Syntactic check (tables, attributes – Data Dictionary) View expansion Query parser Transformation Query optimizer Query execution

Query execution (detailed) Query parser Standardized description (operator tree) Transformation Query optimizer Query execution

Query execution (detailed) Query parser Optimisation (cost-based): Setting up execution plans Estimating their costs Selecting a cheap execution plan Transformation Query optimizer Query execution

Reminder: Key Idea: Algebraic Optimization N = ((n*5)+(n*2)+0)/1 Algebraic laws: (+) identity: x+0=x (/) identity: x/1=x (*) distributes: (y*x)+(z*x)=(y+z)*x (*) commutes: y*x=x*y Rules 1, 3, 4,2: N=(5+2)*n using relational algebra

Query execution (detailed) Query parser Transformation Query optimizer Execution of selected execution plan Query execution

Transformation, operator tree  B A1, A2,..., An select A1, A2,..., An from R1, R2,..., Rm where B  A1, A2,..., An (B (R1  R2  ...  Rm))

Basic idea Keep intermediate results as small as possible Executing ,  early, ⋈, , ,… late, as  and  reduces the volume of data, ⋈, … result often in large intermediate results. Heuristics + cost estimation

Equivalences 1. Selection c1c2 ... cn (R)  c1(c2(…(cn(R)) …)) 2.  ist commutative c1(c2((R))  c2(c1((R)) 3.  -cascades: If L1  L2  …  Ln, then L1(L2 (…(Ln(R)) …))  L1(R) 4. Changing  and  If the selection only refers to the projected attributes A1, …, An, selection and projection can be exchanged c(A1, …, An(R))  A1, …, An(c(R))

Equivalences 2 5. , ⋈,  and  are commutative 6. A Cartesic product, followed by a selection referring to both operands can be replaced by a join. c(R  S)  R ⋈c S ....

Example select Lname from Employee, WorksOn, Project where Pname = 'GOM‘ and Pnumber = PNO and ESSN = SSN and Bdate > 58.04.16 (Select the lastname of an employee born after 16.04.58 and working on the project „GOM”)

Selection as early, as possible Lname Pname=‘GOM‘  Pnumber = PNO  ESSN= SSN  Bdate > 58.04.16 EMPLOYEE WORKS_ON PROJECT  Selection as early, as possible

Restrictive joins early Pnumber = PNO EMPLOYEE WORKS_ON PROJECT Pname=‘GOM‘ ESSN=SSN Bdate>58.04.16  Lname Restrictive joins early

Cross product and selection => join PROJECT WORKS_ON EMPLOYEE  Lname Pnumber = PNO ESSN=SSN Pname=‘GOM‘ Bdate>58.04.16 Cross product and selection => join

PROJECT WORKS_ON EMPLOYEE Lname ⋈Pnumber = PNO ⋈ESSN=SSN Pname=‘GOM‘ Bdate>58.04.16 Projections as early, as possible (attributes for result and intermediate results kept)

Lname ⋈Pnumber = PNO ⋈ESSN=SSN Pname=‘GOM‘ Bdate>58.04.16 PROJECT WORKS_ON EMPLOYEE ⋈Pnumber = PNO ⋈ESSN=SSN Pname=‘GOM‘ Bdate>58.04.16 Pnumber ESSN,PNO SSN,Lname ESSN

Cost based selection Optimisation (cost-based): Query parser Optimisation (cost-based): Setting up execution plans Estimating their costs Selecting a cheap execution plan Transformation Query optimizer Query execution

Statistics in databases Optimizer needs statistics on the data to make decisions! E.g., Size of a table? Selectivity of a (join) condition?

Equivalences - … c1c2 ... cn (R)  c1(c2(…(cn(R)) …)) c1(c2((R))  c2(c1((R)) L1(L2 (…(Ln(R)) …))  L1(R) c(A1, …, An(R))  A1, …, An(c(R)) R  S  S  R (⋈, , , ) c(R  S)  R ⋈c S c(R  S)  c(R)  S (⋈, ) c(R  S)  c1(R)  c2(S) (⋈, ) L(R ⋈c S)  (A1, …, An(R)) ⋈c (B1, …, Bn(S)) L(R ⋈c S)  L((A1, …, An, A1', …, An'(R)) ⋈c (B1, …, Bn, B1', …, Bn'(S))) (R S) T  R (S T) (⋈, , , ) c(R  S) (c(R))  (c(S)) (,  , ) L(R  S) (L(R))  (L(S)) (,  , )

Join-order Left oriented join trees, greedy search... Many joins Joins expensive ⋈ ⋈ ⋈ ⋈ R1 ⋈ ⋈ R1 ⋈ ⋈ R2 R1 R2 R3 R4 R2 ⋈ R4 R3 R3 R4 Left oriented join trees, greedy search...

Example select FlugNr from (select F., FT., count (TicketNr) from FLUG F, FLUGZEUGTYP FT, BUCHUNG B where B.FlugNr = F.FlugNr group by F., FT., Datum) as DFT(F.,FT.,count) where F.FtypId = FT.FtypId and FT.First+FT.Business+FT.Economy < DFT.count

Example FLUG F FLUGZEUGTYP FT BUCHUNG B  F.FlugNr  F.FtypId = FT.FtypId  FT.First+FT.Business+FT.Economy < count  F., FT., count F., FT., count (TicketNr)  F., FT., Datum B.FlugNr = F.FlugNr  F.FlugNr ( F.FtypId = FT.FtypId  FT.First+FT.Business+FT.Economy < count ( F.,FT.,count ( F., FT., count (TicketNr) ( F., FT., Datum ( B.FlugNr = F.FlugNr (FLUG F  FLUGZEUGTYP FT BUCHUNG B))))))

Statistics in databases Oracle TABLES: num_rows, num_blocks avg_row_len TAB_COL_STATISTICS num_distinct num_nulls num_buckets INDEXES leaf_blocks blevel

Calculation of statistics Task of database administrator (expensive task!) analyze table relation compute statistics for columns attribute,..., attribute size value : estimate statistics sample value percent

Oracle SQL Developer, Explain Plan