©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 21:

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

CpSc 3220 File and Database Processing Lecture 17 Indexed Files.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Indexing (Cont.) These slides are a modified version of the slides of the book “Database System Concepts” (Chapter 12), 5th Ed., McGraw-Hill,McGraw-Hill.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
INDEXING AND HASHING.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. Indexing and Hashing Database Management Systems I Alex Coman, Winter 2006.
File and Index Structure
Chapter 8 File organization and Indices.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Database Management Systems I Alex Coman, Winter 2006
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Storage and.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Ch12: Indexing and Hashing  Basic Concepts  Ordered Indices B+-Tree Index Files B+-Tree Index Files B-Tree Index Files B-Tree Index Files  Hashing Static.
1.A file is organized logically as a sequence of records. 2. These records are mapped onto disk blocks. 3. Files are provided as a basic construct in operating.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Hashing.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
Computing & Information Sciences Kansas State University Friday, 24 Oct 2008CIS 560: Database System Concepts Lecture 23 of 42 Friday, 24 October 2008.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
©Silberschatz, Korth and Sudarshan23.1Database System Concepts, 5 th Ed. 1 Chapter 23: Performance and Tuning Performance Tuning Performance Benchmarks.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
Nimesh Shah (nimesh.s) , Amit Bhawnani (amit.b)
©Silberschatz, Korth and Sudarshan11.1Database System Concepts Chapter 11: Storage and File Structure File Organization Organization of Records in Files.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Indexing and Hashing By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 12: Indexing and Hashing.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Indexing.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module D: Hashing.
Computing & Information Sciences Kansas State University Monday, 31 Mar 2008CIS 560: Database System Concepts Lecture 25 of 42 Monday, 31 March 2008 William.
Indexing COMSATS INSTITUTE OF INFORMATION TECHNOLOGY, VEHARI.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Indexing and.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Indexing.
Computing & Information Sciences Kansas State University Wednesday, 02 Apr 2008CIS 560: Database System Concepts Lecture 27 of 42 Wednesday, 02 April 2008.
PART 4 DATA STORAGE AND QUERY. Chapter 12 Indexing and Hashing.
CS4432: Database Systems II
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Practical Database Design and Tuning
Data Indexing Herbert A. Evans.
Indexing and hashing.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Chapter 24: Advanced Application Development
Chapter 12: Query Processing
File organization and Indexing
Chapter 11: Indexing and Hashing
Practical Database Design and Tuning
Indexing and Hashing Basic Concepts Ordered Indices
Chapter 11: Indexing and Hashing
Presentation transcript:

©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 21:

©Silberschatz, Korth and SudarshanDatabase System Concepts 21.2 Índices e Queries

©Silberschatz, Korth and Sudarshan21.3Database System Concepts 3 Índices Indices  Sequenciais, Hash,  Únicos e não únicos  primários  Simples e compostos  Vantagens e desvantagens dos indices Declaração em SQL:  CREATE UNIQUE INDEX nome ON tabela(a1,....,an)

©Silberschatz, Korth and Sudarshan21.4Database System Concepts 4 Índices e Queries Select A1,..,An from T where Ai = x  O índices tipo hash são melhores nestes casos Select A1,..,An from T where Ai y O índices tipo sequenciais são melhores nestes casos. Select A1,..,An from T where Ai = x AND Aj = y Podemos usar índices simples em Ai e Aj Podemos usar índices compostos (Ai, Aj)

©Silberschatz, Korth and Sudarshan21.5Database System Concepts 5 Basic Concepts Indexing mechanisms used to speed up access to desired data.  E.g., author catalog in library Search Key - attribute to set of attributes used to look up records in a file. An index file consists of records (called index entries) of the form Index files are typically much smaller than the original file Two basic kinds of indices:  Ordered indices: search keys are stored in sorted order  Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”. search-key pointer

©Silberschatz, Korth and Sudarshan21.6Database System Concepts 6 Index Evaluation Metrics Access types supported efficiently. E.g.,  records with a specified value in the attribute  or records with an attribute value falling in a specified range of values. Access time Insertion time Deletion time Space overhead

©Silberschatz, Korth and Sudarshan21.7Database System Concepts 7 Ordered Indices In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library. Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.  Also called clustering index  The search key of a primary index is usually but not necessarily the primary key. Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index. Index-sequential file: ordered sequential file with a primary index. Indexing techniques evaluated on basis of:

©Silberschatz, Korth and Sudarshan21.8Database System Concepts 8 Hash Indices Hashing can be used not only for file organization, but also for index-structure creation. A hash index organizes the search keys, with their associated record pointers, into a hash file structure. Strictly speaking, hash indices are always secondary indices  if the file itself is organized using hashing, a separate primary hash index on it using the same search-key is unnecessary.  However, we use the term hash index to refer to both secondary index structures and hash organized files.

©Silberschatz, Korth and Sudarshan21.9Database System Concepts 9 Example of Hash Index

©Silberschatz, Korth and Sudarshan21.10Database System Concepts 10 Comparison of Ordered Indexing and Hashing Cost of periodic re-organization Relative frequency of insertions and deletions Is it desirable to optimize average access time at the expense of worst-case access time? Expected type of queries:  Hashing is generally better at retrieving records having a specified value of the key.  If range queries are common, ordered indices are to be preferred

©Silberschatz, Korth and Sudarshan21.11Database System Concepts 11 Index Definition in SQL Create an index create index on ( ) E.g.: create index b-index on branch(branch-name) Use create unique index to indirectly specify and enforce the condition that the search key is a candidate key is a candidate key.  Not really required if SQL unique integrity constraint is supported To drop an index drop index

©Silberschatz, Korth and Sudarshan21.12Database System Concepts 12 Multiple-Key Access Use multiple indices for certain types of queries. Example: select account-number from account where branch-name = “Perryridge” and balance = 1000 Possible strategies for processing query using indices on single attributes: 1.Use index on branch-name to find accounts with balances of $1000; test branch-name = “Perryridge”. 2.Use index on balance to find accounts with balances of $1000; test branch-name = “Perryridge”. 3.Use branch-name index to find pointers to all records pertaining to the Perryridge branch. Similarly use index on balance. Take intersection of both sets of pointers obtained.

©Silberschatz, Korth and Sudarshan21.13Database System Concepts 13 Indices on Multiple Attributes With the where clause where branch-name = “Perryridge” and balance = 1000 the index on the combined search-key will fetch only records that satisfy both conditions. Using separate indices in less efficient — we may fetch many records (or pointers) that satisfy only one of the conditions. Can also efficiently handle where branch-name = “Perryridge” and balance < 1000 But cannot efficiently handle where branch-name < “Perryridge” and balance = 1000 May fetch many records that satisfy the first but not the second condition. Suppose we have an index on combined search-key (branch-name, balance).

©Silberschatz, Korth and SudarshanDatabase System Concepts Performance Tuning

©Silberschatz, Korth and Sudarshan21.15Database System Concepts 15 Performance Tuning Adjusting various parameters and design choices to improve system performance for a specific application. Tuning is best done by 1. identifying bottlenecks, and 2. eliminating them. Can tune a database system at 3 levels:  Hardware -- e.g., add disks to speed up I/O, add memory to increase buffer hits, move to a faster processor.  Database system parameters -- e.g., set buffer size to avoid paging of buffer, set checkpointing intervals to limit log size. System may have automatic tuning.  Higher level database design, such as the schema, indices and transactions (more later)

©Silberschatz, Korth and Sudarshan21.16Database System Concepts 16 Desempenho dos Sistemas Como desempenho do Processadores actualmente, porquê que ainda temos problemas com o desempenho? Decomposição do tempo gasto numa transacção óptima: 10% – CPU 30% – Rede 30% – Base de Dados 30% – Idle (caso contrário as queues entram em “trash”) Com transacções distribuídas, o cenário é bastante pior Rede Processador & Memória IO Base de Dados Log Tipicamente, 5 a 20 I/Os por transacção

©Silberschatz, Korth and Sudarshan21.17Database System Concepts 17 Bottlenecks Performance of most systems (at least before they are tuned) usually limited by performance of one or a few components: these are called bottlenecks Transactions request a sequence of services  e.g. CPU, Disk I/O, locks  E.g. 80% of the code may take up 20% of time and 20% of code takes up 80% of time  Worth spending most time on 20% of code that take 80% of time Bottlenecks may be in hardware (e.g. disks are very busy, CPU is idle), or in software Removing one bottleneck often exposes another De-bottlenecking consists of repeatedly finding bottlenecks, and removing them  This is a heuristic

©Silberschatz, Korth and Sudarshan21.18Database System Concepts 18 Identifying Bottlenecks With concurrent transactions, transactions may have to wait for a requested service while other transactions are being served Can model database as a queueing system with a queue for each service  transactions repeatedly do the following  request a service, wait in queue for the service, and get serviced Bottlenecks in a database system typically show up as very high utilizations (and correspondingly, very long queues) of a particular service  E.g. disk vs CPU utilization  100% utilization leads to very long waiting time:  Rule of thumb: design system for about 70% utilization at peak load  utilization over 90% should be avoided

©Silberschatz, Korth and Sudarshan21.19Database System Concepts 19 Queues In A Database System

©Silberschatz, Korth and Sudarshan21.20Database System Concepts 20 Tunable Parameters Tuning of hardware Tuning of schema Tuning of indices Tuning of materialized views

©Silberschatz, Korth and Sudarshan21.21Database System Concepts 21 Tuning of Hardware Even well-tuned transactions typically require a few I/O operations  Typical disk supports about 100 random I/O operations per second  Suppose each transaction requires just 2 random I/O operations. Then to support n transactions per second, we need to stripe data across n/50 disks (ignoring skew) Number of I/O operations per transaction can be reduced by keeping more data in memory  If all data is in memory, I/O needed only for writes  Keeping frequently used data in memory reduces disk accesses, reducing number of disks required, but has a memory cost

©Silberschatz, Korth and Sudarshan21.22Database System Concepts 22 Hardware Tuning: Five-Minute Rule Question: which data to keep in memory:  If a page is accessed n times per second, keeping it in memory saves  n * price-per-disk-drive accesses-per-second-per-disk  Cost of keeping page in memory  price-per-MB-of-memory ages-per-MB-of-memory  Break-even point: value of n for which above costs are equal  If accesses are more then saving is greater than cost  Solving above equation with current disk and memory prices leads to: 5-minute rule: if a page that is randomly accessed is used more frequently than once in 5 minutes it should be kept in memory  (by buying sufficient memory!)

©Silberschatz, Korth and Sudarshan21.23Database System Concepts 23 Hardware Tuning: One-Minute Rule For sequentially accessed data, more pages can be read per second. Assuming sequential reads of 1MB of data at a time: 1-minute rule: sequentially accessed data that is accessed once or more in a minute should be kept in memory Prices of disk and memory have changed greatly over the years, but the ratios have not changed much  so rules remain as 5 minute and 1 minute rules, not 1 hour or 1 second rules!

©Silberschatz, Korth and Sudarshan21.24Database System Concepts 24 RAID Levels RAID 0: nonredudant striping (Block striping) RAID 1: mirroed disks (Block striping) RAID 3: Bit interleaved parity RAID 5: Block interleaved distributed parity C C C C P P P P P P

©Silberschatz, Korth and Sudarshan21.25Database System Concepts 25 Hardware Tuning: Choice of RAID Level To use RAID 1 or RAID 5?  Depends on ratio of reads and writes  RAID 5 requires 2 block reads and 2 block writes to write out one data block If an application requires r reads and w writes per second  RAID 1 requires r + 2w I/O operations per second  RAID 5 requires: r + 4w I/O operations per second For reasonably large r and w, this requires lots of disks to handle workload  RAID 5 may require more disks than RAID 1 to handle load!  Apparent saving of number of disks by RAID 5 (by using parity, as opposed to the mirroring done by RAID 1) may be illusory! Thumb rule: RAID 5 is fine when writes are rare and data is very large, but RAID 1 is preferable otherwise  If you need more disks to handle I/O load, just mirror them since disk capacities these days are enormous!

©Silberschatz, Korth and Sudarshan21.26Database System Concepts 26 Tuning the Database Design Schema tuning  Vertically partition relations to isolate the data that is accessed most often -- only fetch needed information. E.g., split account into two, (account-number, branch-name) and (account-number, balance). Branch-name need not be fetched unless required  Improve performance by storing a denormalized relation E.g., store join of account and depositor; branch-name and balance information is repeated for each holder of an account, but join need not be computed repeatedly. Price paid: more space and more work for programmer to keep relation consistent on updates better to use materialized views (more on this later..)  Cluster together on the same disk page records that would match in a frequently required join,  compute join very efficiently when required.

©Silberschatz, Korth and Sudarshan21.27Database System Concepts 27 Tuning the Database Design (Cont.) Index tuning  Create appropriate indices to speed up slow queries/updates  Speed up slow updates by removing excess indices (tradeoff between queries and updates)  Choose type of index (B-tree/hash) appropriate for most frequent types of queries.  Choose which index to make clustered Index tuning wizards look at past history of queries and updates (the workload) and recommend which indices would be best for the workload