Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
1 Lecture 8: Data structures for databases II Jose M. Peña
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
B+-tree and Hashing.
Tree-Structured Indexes. Introduction v As for any index, 3 alternatives for data entries k* : À Data record with key value k Á Â v Choice is orthogonal.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B+ Tree What is a B+ Tree Searching Insertion Deletion.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Database Management 9. course. Execution of queries.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
March 7 & 9, Csci 2111: Data and File Structures Week 8, Lectures 1 & 2 Multi-Level Indexing and B-Trees.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Hashing and Hash-Based Index. Selection Queries Yes! Hashing  static hashing  dynamic hashing B+-tree is perfect, but.... to answer a selection query.
Chapter 12 B+ Trees CS 157B Spring 2003 By: Miriam Sy.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
Chapter 5 Record Storage and Primary File Organizations
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
Data Indexing Herbert A. Evans.
Storage Access Paging Buffer Replacement Page Replacement
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Tree-Structured Indexes
Extra: B+ Trees CS1: Java Programming Colorado State University
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Database Management Systems (CS 564)
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+-Trees and Static Hashing
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Database Systems (資料庫系統)
Database Design and Programming
Indexing 1.
Credit for some of the slides in this lecture goes to
Database System Architecture
Indexing, Access and Database System Architecture
Presentation transcript:

Database System Architecture and Performance CSCI 6442 ©Copyright 2015, David C. Roberts, all rights reserved

2 Agenda Database performance goals DBMS use of disk Searching B-trees DBMS Architecture

3 Data is stored on disk Disk is necessary for database to be reliably available Disk is millions of times slower than anything that happens in RAM Number of disk accesses is a good measure of DBMS cost for an operation

4 Disk o Disk is composed of fixed- length records, rotating around o To access information, we need to move the head and wait for the disk to rotate o We wait the same time whether we use one byte or all the record o We call this fixed length record a page

5 Efficient Use of Disk For efficient use of disk, we want to use all the information contained in a single page We will look at how we organize disk in order to reduce the number of disk accesses for a search

6 Disk vs. RAM RAM is accessible in any order Any sort of structures can be used Data structure courses usually cover data structures for RAM We’ll talk about how to make efficient use of disk

7 Disk as Pages Disk is composed of fixed-length records, rotating around To access information, we need to move the head and wait for the disk to rotate We wait the same time whether we use one byte or all the record We call this fixed length record a page

8 Search Methods Linear search Binary search Binary tree-structured search N-ary trees B-trees Hashing

9 Linear Search Elements are stored in arrival order Search starts at the beginning, continues until desired value is found Average number of accesses for n elements is approximately n/2

10 Binary Search Elements are stored in order by value to be searched Search starts at midpoint With each probe, half of candidates are removed Average number of probes is log 2 n

11 Disadvantages of Binary Search Elements must be kept in order Inserting one element may require reorganization of entire list If stored, search jumps from bucket to bucket

12 Using Linked Structure for Binary Search Using links we can separate physical organization from search sequence Avoids possible need to reorganize the entire store because of a single update Accelerates update, still allows fast search

13 Example Binary Search Tree

14 Problems with Binary Search Tree Each node is likely to be on a different page, making inefficient use of disk accesses What if, instead of just one key at each node, we could store a whole page full of keys? Then we would use disk efficiently and have a very shallow tree

15 Balance

16 Balance A tree is said to be balanced if the length of all the paths from the root to the leaves differ by no more than one.

17 B-tree We allow nodes to be incompletely filled in order to maintain perfect balance We grow the tree from the bottom; when a node is over-full we split it and put an added node one level up Deletions are the reverse of additions

18 B-tree

19 B-tree Data Store We understand that with each entry there is an address in storage. Having understood that, we omit them from the rest of the diagrams

20 B-tree

21 B-tree 1

22 B-tree 14

23 B-tree 146

24 B-tree 1468

25 B-tree

26 B-tree When a node is full, to add a value we split the node and put the middle value in the level above.

27 How It Really Looks

28 B-tree questions How large should node size be? How many values should it contain? Are the values indexed by a b-tree properly called keys? How full are b-tree nodes, on the average, after the system has been operating for a while?

29 B-plus tree B+ trees have all indexed values represented in the leaves Other nodes do not have pointers to rows, only pointers to other nodes B+ trees provide very high density of indexes

30 B+ tree Index Set Sequence Set

31 B+ Tree Add Algorithm The insert algorithm for B+ Trees Leaf Page Full Index Page FULL Action NO Place the record in sorted position in the appropriate leaf page YESNO 1.Split the leaf page 2.Place Middle Key in the index page in sorted order. 3.Left leaf page contains records with keys below the middle key. 4.Right leaf page contains records with keys equal to or greater than the middle key. YES 1.Split the leaf page. 2.Records with keys < middle key go to the left leaf page. 3.Records with keys >= middle key go to the right leaf page. 4.Split the index page. 5.Keys < middle key go to the left index page. 6.Keys > middle key go to the right index page. 7.The middle key goes to the next (higher level) index. IF the next level index page is full, continue splitting the index pages.

32 B+ Tree Delete Algorithm The delete algorithm for B+ Trees Leaf Page Below Fill Factor Index Page Below Fill Factor Action NO Delete the record from the leaf page. Arrange keys in ascending order to fill void. If the key of the deleted record appears in the index page, use the next key to replace it. YESNO Combine the leaf page and its sibling. Change the index page to reflect the change. YES 1.Combine the leaf page and its sibling. 2.Adjust the index page to reflect the change. 3.Combine the index page with its sibling. Continue combining index pages until you reach a page with the correct fill factor or you reach the root page.

33 Hashing Develop a function that maps data values into a range of storage addresses For each search value, use a function to compute a hash value and store the associated data at that address To search, just compute the hash value and look at that address

34 Hashing Instead of storing the data at the hash address, store a pointer to the data The table of pointers is called a hash table Using hashing for a search locates a stored value in just one access Number of accesses to locate a value is independent of n

35 Hashing Question Why are b-trees the most used index method for database systems and not hashing, given that hashing is faster? Hint—think about the disadvantages of hashing

Database System Architecture

37 DBMS and Applications Application Program Buffer Application Program Buffer Application Program Buffer Application Program Buffer Database Management System

38 DBMS Software Architecture Application Program Buffer Application Program Buffer Application Program Buffer Application Program Buffer System Global Area Database System

39 Database System Architecture Lexical Analyzer Syntax Analyzer SQLTokens Executor Quads Results

40 Executor Software Architecture SQL Executor Table Management Row Management Page Management Node Management Index Management Data Store

41 DBMS and Applications Application Program Buffer Application Program Buffer Application Program Buffer Application Program Buffer Database Management System

42 DBMS Software Architecture Application Program Buffer Application Program Buffer Application Program Buffer Application Program Buffer System Global Area Database System Code

43 Inside the Database System Lexical Analyzer Syntax Analyzer, Code Generator SQLTokens Executor Quads Results

44 Executor Software Architecture SQL Executor Table Management Row Management Page Management Node Management Index Management Data Store

Pages Disk is divided into physical records called “pages” A page can be an index page (ie b- tree) or a data page Index page contains one node of a b- tree Data page contains rows of tables 45

Page Allocation Pages are initially considered all unallocated In response to requests, they are allocated and marked allocated When freed, they are chained onto a list of free pages 46

Database Extents Database needs to be able to extend over disk boundaries Size may require it Growth may require it Typically it’s managed as “extents”, each of which is a file to the OS file system Multiple files are mapped into a single sequence of page IDs 47

48 Extents SQL Executor Table Management Row Management Page Management Node Management Index Management Data Store Extent Management

The Database Extent 3 Extent 2 Extent 1 Row,,, …,,,, … >> 49

Startup At startup, DBMS creates an empty system catalog Catalog has images of some tables; once images are established, then SQL can be used to create other tables 50

System Catalog System Catalog tells you how the database system works When the system starts with a new database, it lays down part of the system catalog from an image The rest of the system catalog is created by SQL statements Many SQL statements reference or change the system catalog 51

Database Performance 52

Join Processing For a non-join query be sure there are indexes on columns used in predicates Joins are the issue in database performance We need to understand how they are performed so that we can make them efficient 53

“Optimization” More properly called access path selection “Optimizer” selects a strategy for processing Approaches: Cost-based: estimate total cost to process by different approaches, choose lowest estimate Heuristic: use rules to decid e how to process Cost-based is typically used by all database systems today 54

The Optimizer Selects indexes to use Chooses the order of using indexes Chooses algorithms to use Decides when to apply predicates 55

Classes of Predicates Predicate: condition in the WHERE clause Predicates are combined using AND, OR to make WHERE clauses Classes of predicates: Sargable: search arguments that can be processed close to the data Residual: not sargable, such as complex use of nesting 56

Access Paths Five possible access paths: Table scan Non-selective index scan Selective index scan Index only access Fully qualified unique index Each of these types of scans has different cost estimates for its use 57

Predicate Selectivity Selectivity function f(p): % of rows retrieved on average by predicate p Number of rows retrieved is strongly related to cost n = number of rows in table 58 Form of Pf column = value1/n column != value1-1/n (nearly 1) column > value(high value - search value)/high value - low value) p1 or p2f(p1) + f(p2) p1 and p2f(p1) * f(p2)

Join Processing Cartesian Product: for each row of inner table, inspect join value for every row of outer table. n 2 operations Nested loop: for each row of inner table, use index to retrieve matching rows of outer table. > 2n operations Merge join: single pass through indexes on join columns for both tables. 2n operations 59

Join Order For JOIN queries, the “outer” table is access first, “inner” second Order for joining tables must be selected Most selective first Least costly joins first 60

Query Transformations Queries and subqueries may be transformed We’ll ignore this for now, look at the bigger picture 61

Database Statistics System catalog includes various database statistics Max, min values Cardinality of each table Data distribution Statistics must be updated 62