How is data stored? ● Table and index Data are stored in blocks(aka Page). ● All IO is done at least one block at a time. ● Typical block size is 8Kb.

Slides:



Advertisements
Similar presentations
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Advertisements

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
6.830 Lecture 9 10/1/2014 Join Algorithms. Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
BTrees & Bitmap Indexes
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
David Konopnicki Choosing Access Path ä The basic methods. ä The access paths and when they are available. ä How the optimizer chooses among the.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Optimization - Selection. 2 The Selection Operation Table: Reserves(sid, bid, day, agent) A page (block) can hold 100 Reserves tuples There are 1,000.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Processing & Optimization
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Access Path Selection in a Relation Database Management System (summarized in section 2)
Boris Gurov Support Analyst Oracle Bulgaria All you need to know about Oracle Indexes.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Oracle Data Block Oracle Concepts Manual. Oracle Rows Oracle Concepts Manual.
Oracle Database Administration Lecture 6 Indexes, Optimizer, Hints.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Database Management 9. course. Execution of queries.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Lecture 5 Cost Estimation and Data Access Methods.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 5 Index and Clustering
Computing & Information Sciences Kansas State University Monday, 03 Nov 2008CIS 560: Database System Concepts Lecture 27 of 42 Monday, 03 November 2008.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Tuning Oracle SQL The Basics of Efficient SQL Common Sense Indexing
Storage Access Paging Buffer Replacement Page Replacement
CS 540 Database Management Systems
CS 440 Database Management Systems
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Database Management System
Physical Database Design
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Secondary Storage Data Retrieval.
Choosing Access Path The basic methods.
COMP 430 Intro. to Database Systems
Database Management Systems (CS 564)
Chapter 12: Query Processing
Evaluation of Relational Operations
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Chapter 15 QUERY EXECUTION.
File Processing : Query Processing
File Processing : Query Processing
File organization and Indexing
Dynamic Hashing Good for database that grows and shrinks in size
Lecture 12 Lecture 12: Indexing.
Database Management Systems (CS 564)
Lecture#12: External Sorting (R&G, Ch13)
Yan Huang - CSCI5330 Database Implementation – Access Methods
External Joins Query Optimization 10/4/2017
Selected Topics: External Sorting, Join Algorithms, …
Chapters 15 and 16b: Query Optimization
Lecture 2- Query Processing (continued)
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
Chapter 12 Query Processing (1)
CPS216: Advanced Database Systems
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Lecture 11: B+ Trees and Query Execution
Presentation transcript:

How is data stored? ● Table and index Data are stored in blocks(aka Page). ● All IO is done at least one block at a time. ● Typical block size is 8Kb. ● Each table contains several rows/tuples.(Deliberately ignoring very large rows).

Points of language The original plural form of index was indicies. however most people today use indexes and it is has been accepted into the English language, I will use indexes.

Oracle Has the the Header at the End

What is an Index ● An Index contains an organized summary of information in a table together with references to the location of rows in the table.(a rowid). ● The most common form for an index is the balanced tree Index, from here on referred to as B-Tree index. ● There are other types of indexes, mainly Bitmap indexes.

B-Tree Indexes ● Data(references to rows) is only in the leaves. ● Very Large Branching factor. each Node may Split into hundreds of nodes at the next level. ● Most indexes have a depth of no more than 3. ● Not truly Balanced, in order to avoid expensive on line balancing operations, The tree may slowly grow unbalanced or even increase in size unproportionally to table size. ● Each index indexes only a few table columns.

Access Path: Scan Methods ● Sequential Scan(aka Full Table Scan, FTS) ● Index Scan( in Oracle Index Range/unique scan) ● Index Fast Full Scan – Oracle Only.

Sequential Scan ● Reads All data in table. Regardless of number of rows actually retrieved. ● Takes full advantage of reading sequencially from Magnetic Hard drives. ● May Read Empty blocks. ● Unordered output.

Index Scan ● Traverse the index first, accordingly to part of the where clause. fetch from table only relevant rows. ● Output rows are ordered according to the index. ● When fetching many rows is likely to perform many random access reads from disk.

Index Only access ● Oracle(but not postgres) allows queries to run only on the index with out retrieving data from the Heap. ● Works only when querying only indexed columns. ● Postgresql requires reading the Heap any way to know tuple visibility. ● This allows Oracle to perform Index Fast Full Scan(index_ffs).

How much does it cost? ● Full Table Scan: – startup cost: 0 – total: number of blocks in table. ● Index Scan: – startup cost:0 – total: tree hight + number of leaf blocks read + number of Heap blocks read.

Full Table Scan Cost ● Since the table blocks are stored more or less sequentially on disk we can read them quickly. ● We will set a DB parameter for how many blocks should we read at once when performing sequential reads, DB_BLOCK_MULTIREAD_COUNT. ● We will shorten it here to ● Full Table Scan will cost us: – /

Index Scan Cost ● let the Filter Factor,, be the percentage of the records from tree read in the tree scan ● let the Screening Factor,, be the percentage of the heap records read. ● let be the number of leafs in the index. ● let be the number of blocks in a table. ● the total cost of an index scan is: – -1+ * + *

The Clustering Factor ● Our previous formula is inaccurate, since when scanning the tree, the heap is not referenced sequentially and we may end up reading a block for each row we fetch. (if the data in the table is totally unordered with respect to the index). ● The previous formula is correct only if the data in the table happens to be ordered in the same order as the index.

The Clustering Factor. cont. ● We will designate the number of block changes when scanning the entire index as the clustering factor. ● The clustering factor ranges from the number of blocks in the table, to the number of rows in the table. ● We will designate the Clustering Factor: ● A revised formula will be: – -1+ * + *

Clustering Factor and DB cache ● However since the Database has a significant cache it is likely that when scanning an index and reading the same block twice it would be in the cache. ● Let OPTIMIZER_INDEX_COST_ADJUST be the percentage of heap blocks not found in cache when scanning an index. for short:. ● Our Final formula would be: – -1+ * + * *

Join Methods ● When querying data from more than one table a join is used: ● There Are several methods of joining tables: – Nested loop Join – Sort Merge Join – Hash Join

Nested Loops ● Iterate over relevant rows from primary table, for each row fetch matching row(s) from secondary table. ● Cost is the Cost to access the primary table + number of rows fetched from primary table * cost of fetching the matching rows from secondary table. ● Sensitive to ordering of the tables.

Sort Merge Join ● Scan both tables ordered by the join columns, matching the rows as you go. ● Cost is the Time to read(and sort) both tables. ● If the table may be accessed through an index ordered by the join column sorting is unnecessary.

Hash Join ● Load the Secondary table into a hash table, with the join column as the key. ● Scan the primary table and fetch matching rows from from the hash table. ● works only for equijoin.(using equals sign). ● Cost is time to read both tables(when secondary resides can reside in RAM).

The CBO ● The CBO needs to find optimal path for a given query. ● If there are a small number of tables (less than ~6) it will estimate the cost of all join&access methods. ● If there are many tables joined a genetic heuristic algorithm is used. ● An access path is a tree of join methods and accees methods as well as extra operations such as filter, sort and aggregate.

The CBO, Continued ● For every stage the CBO estimates: – the cost for startup. – the cost for fetching all rows. – number of records. ● The CBO makes these estimates based on statistics it gathers on the Tables and indexes used.

Lies, Damn Lies and Statistics ● Table Statistics normally include: – number of blocks. – number of rows – for each column, number of distinct values – for each column, minimum&maximum(sometimes more of a histogram is available) ● Index statistics normally include: – index height. – number of leaf blocks. – Clustering Factor. (to what extent are the table and index ordered the same).

Estimating Filter&Screening factor ● for equals clause assume FF=1/ – or use full histogram if available. ● for range clause assume linear dispersion withing histogram buckets. ● If query has bind variables the values are not known during parse time, in which case magic numbers are used for range clauses(9% for '>', 4% from between or like).

Optimizer pitfalls ● The optimizer does not know how columns are correlated. ● When using pre parsed query, the optimizer does not have the values for bind variables. ● When the Values are from a joined table the optimizer does not have real values. ● Histograms are normally very incomplete even when usable ● Statistics may be out of date(or missing).

Solutions ● Rewrite query. – total rewrite – change join to subquery and vice-versa. – add redundant clauses. ● use optimizer hints. ● Dynamic query replanning – work in progress(IBM).

Bitmap indexes ● A bitmap index contains a bitmap of the table rows for each value the indexed columns have. ● If we have a Bitmap index on the Gendre column 2 bitmaps will be created each containing a 1 for the rows that have that value. ● Useful for DataWhareHouse applications. ● Expensive to maintain.(insert/update/delete underlying table). ● several bitmap indexes can be used together in one query.