Designing for Performance Announcement: The 3-rd class test is coming up soon. Open book. It will cover the chapter on Design Theory of Relational Databases.

Slides:



Advertisements
Similar presentations
CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Tree-Structured Indexes Module 4, Lecture 4. 2 Introduction As for any index, 3 alternatives for data entries k* : 1. Data record with key value k 2.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
1 Lecture 8: Data structures for databases II Jose M. Peña
1 Overview of Storage and Indexing Chapter 8 (part 1)
Chapter 8 File organization and Indices.
Tree-Structured Indexes. Introduction v As for any index, 3 alternatives for data entries k* : À Data record with key value k Á Â v Choice is orthogonal.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Hash-Based Indexes Chapter Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.  Static and dynamic.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
File Structures Dale-Marie Wilson, Ph.D.. Basic Concepts Primary storage Main memory Inappropriate for storing database Volatile Secondary storage Physical.
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
1 Physical Data Organization and Indexing Lecture 14.
Physical DB Issues, Indexes, Query Optimisation Database Systems Lecture 13 Natasha Alechina.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CPSC 404, Laks V.S. Lakshmanan1 Tree-Structured Indexes BTrees -- ISAM Chapter 10 – Ramakrishnan & Gehrke (Sections )
Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar.
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
1 Overview of Storage and Indexing Chapter 8 (part 1)
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
B+ Trees: An IO-Aware Index Structure Lecture 13.
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
CS4432: Database Systems II
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Database Applications (15-415) DBMS Internals- Part IV Lecture 15, March 13, 2016 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
COMP261 Lecture 23 B Trees.
Indexing Goals: Store large files Support multiple search keys
Database Management System
Tree-Structured Indexes
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Physical Database Design and Performance
COMP 430 Intro. to Database Systems
Hash-Based Indexes Chapter 11
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
File organization and Indexing
Chapter 11: Indexing and Hashing
Tree-Structured Indexes
Hash-Based Indexes Chapter 10
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Lecture 2- Query Processing (continued)
Database Management System
Database Systems (資料庫系統)
Indexing 4/11/2019.
Evaluation of Relational Operations: Other Techniques
Chapter 11 Instructor: Xin Zhang
Tree-Structured Indexes
Chapter 11: Indexing and Hashing
B+-trees In practice, B-trees are not used much as defined earlier.
Presentation transcript:

Designing for Performance Announcement: The 3-rd class test is coming up soon. Open book. It will cover the chapter on Design Theory of Relational Databases. (Probably next week Fri, so watch your towards the end of this week.) This week's lectures: – Physical design of databases Including: – Some implementation details; The evaluation of queries – Optimisation Techniques

Some implementation details Two goals in any sw system: (1) Correctness (2) Efficiency So far we studied (1) Now we will consider (2).

Some implementation details * DB files are typically very large – they do not fit into the memory. * They are stored such that the DBMS loads pages when they are needed. * Fetching a page takes 1000 times longer than the processing time required for accessing data on one page. (And processors get even faster!) * Inside the main memory the DBMS maintains a buffer pool of one page each. These are shared across a number of users.

Hence: a. The number of pages needed to store a table should be minimised. This can be achieved by either: - having records which are small (so that many of them fit onto one page) or - by having fewer records overall. b. An operation which requires the DBMS to look at every record of a large table is expensive.

The evaluation of queries * Selection operation: - selects certain rows. In general this needs a linear scan - in simple cases the system may know on which page to find the relevant row without looking at each row! This is achieved by indexing (to be discussed later) * Natural join: - it is the main time-consuming operation. - requires comparison of each row of T1 with each row of T2 (to check agreement on a common attribute). - optimisation seeks to reduce nr of comparisons; but works only if the resulting tables are reasonably small.

Optimisation techniques Technique 1: Pre-computing to avoid queries

Optimisation techniques Technique 1: Pre-computing to avoid queries Technique 2: 'Vertical' splitting - that is, separating out subsets of rows into different tables -...based on semantic criteria: e.g. time: keep last years sales separate e.g. status: move completed sales to archive e.g. location: different tables for customers from different countries - good if common queries will require access to only one of the tables (otherwise it is counterproductive)

Optimisation techniques Technique 2b: Horizontal splitting - by duplicating the key attribute, the table is made narrower by separating out some attributes into a different table. - e.g. the Uni student table has ~50 attributes. But few are used frequently (e.g. UCAS code & entry qualifications are v rarely needed once the student has arrived) - here too, for the split to be effective it is important that it is based on a use-case analysis, so queries that require the tables to be joined back again are rare.

Optimisation Technique 3: Indexes - aim to avoid linear search. - the principle is the same as for searching in a sorted array log(n) steps [vs O(n) steps in unsorted array] But a list of DB records can only be kept in order according to one attribute – not several ones! Analogy with phone books Solution: create an index into the data that allows fast access according to another (combination of) attribute(s).

Indexing The index does not repeat the full information but have pointers instead. Indexes are contained in separate files. They contain: the attribute values according to which we want fast access and pointers to the actual database records. There are 2 types of indexes: Tree-based [a version of binary search idea] – Most commonly used data structure is a B+ tree – This makes speed-up possible for range-search too Hash-based

Tree-based indexes Syntax - no standardised syntax - in PostgreSQL the syntax is: CREATE INDEX some_name ON staff USING BTREE (office); You will not need the name of the index unless you want to get rid of it again: DROP INDEX some_name; By default PostgreSQL builds and maintains a B+ tree index on the primary key of each table.

* dynamic index structure * high fan-out (F) (fan-out means # child nodes) ==> depth rarely exceeds 3-4. * each node has m entries (m is called 'the order of the tree') Instert, delete efficient [at log_F(N) (where N is # leaf pages)] * allows range-based search too!

Hash-based indexes Hash function: takes some input data and returns a number that describes the location of the record where more info about the record can be found This computation is very fast – more efficient than B+ tree. But: similar inputs do NOT lead to similar hash values. ==> not good for range-based search

* Hash based index is typically used when a table has alternative search keys (e.g. cid, bc) Note: the term “search key” is a different concept than that of primary key or secondary key. Do not confuse them. Search key is an attribute with respect to which we want fast access. * Only one search key can be primary – the one according to which the original table is physically ordered. Usually this contains the primary key – this is created automatically by the DBMS when you create the table. * Other search keys we call “secondary” – to create these we need to tell the system to build index files for them.

Before you jump to have lots of indexes Creation & maintenance of indexes causes some effort. * when a new record is entered * when a record is deleted The index files need to be adjusted at these operations. Hence, for a table with lots of “traffic” (i.e. a table that is likely to be modified a lot) having many indexes is less useful. To decide, some experimentation is needed. You can create the indexes and then delete them later if you find they aren't that useful.

Technique: De-normalisation

Remember Achim's example tables So how do we deal with the FD? cid,year → numbers Applying what we learned, we decompose the lecturing(cid,sid,year,numbers) schema into: course_instances(cid,year,numbers) taught_by(cid,year,sid). course_instances is now a weak entity of courses. (courses has additional info: name, level, semester, bc) We can consider putting the additional fields into course_instances?