I/O Trap Reading existing data Changing existing data –Update existing records –Adding new records –Deleting records All these involve going to disk =>

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
1 Overview of Storage and Indexing Chapter 8 (part 1)
Issues in Database Performance Performance in Read / write are hardware issues => throw money at it Performance of DB = ability of engine to locate data.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Data Indexing Herbert A. Evans. Purposes of Data Indexing What is Data Indexing? Why is it important?
1 - Oracle Server Architecture Overview
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Beyond data modeling Model must be normalised – purpose ? Outcome is a set of tables = logical design Then, design can be warped until it meets the realistic.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Alternative: Bitmap Indexing Imagine the following query in huge table Find customers living in London, with 2 cars and 3 children occupying a 4 bed house.
Partitioned Tables Partitions / partitioning / partitioned tables For very large tables Improve querying Easier admin Backup and recovery easier Optimiser.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
1 Database Tuning Rasmus Pagh and S. Srinivasa Rao IT University of Copenhagen Spring 2007 February 8, 2007 Tree Indexes Lecture based on [RG, Chapter.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Objectives Learn what a file system does
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Chapter 13 File Structures. Understand the file access methods. Describe the characteristics of a sequential file. After reading this chapter, the reader.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Oracle Data Block Oracle Concepts Manual. Oracle Rows Oracle Concepts Manual.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Chapter 11 Indexing & Hashing. 2 n Sophisticated database access methods n Basic concerns: access/insertion/deletion time, space overhead n Indexing 
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Partitioned tables Partitions / partitioning / partitioned tables For very large tables Improve querying Easier admin Backup and recovery easier Optimiser.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
File Organization Lecture 1
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
File Organizations and Indexing
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Chapter 5 Record Storage and Primary File Organizations
CS4432: Database Systems II
Select Operation Strategies And Indexing (Chapter 8)
Database Applications (15-415) DBMS Internals- Part IV Lecture 15, March 13, 2016 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Data Indexing Herbert A. Evans.
Module 11: File Structure
Indexing Structures for Files and Physical Database Design
Index An index is a performance-tuning method of allowing faster retrieval of records. An index creates an entry for each value that appears in the indexed.
CS522 Advanced database Systems
Physical Database Design and Performance
Database Management Systems (CS 564)
Database Performance Tuning and Query Optimization
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Disk Storage, Basic File Structures, and Buffer Management
Physical Database Design
Practical Database Design and Tuning
Chapter 11 Database Performance Tuning and Query Optimization
Indexing 4/11/2019.
Presentation transcript:

I/O Trap Reading existing data Changing existing data –Update existing records –Adding new records –Deleting records All these involve going to disk => slowest device by considerable margin Minimise / reschedule physical I/O

Issues in Database Performance Performance in Read / write are hardware issues => throw money at it Performance of DB = ability of engine to locate data Factors affecting speed of retrieval –Cache (sizing of objects) –Access method (table scan / indexes…) –Contention between processes –Indirect processes (roll back, archiving…)

Reading data in Oracle 1 – determine how to access block where data is located –read data dictionary stored in separate part of DB –DD may be loaded up in cache (aka row cache) to limit I/O activity 2 - DD indicates preferred access method for block –B tree index, partitioning, hash clusters etc… 3 - Search begins either in full scan or with index until data found

Designing DB for reading Supply methods for high precision access to data But some queries will defeat the strategies E.g. credit cards transactions – monthly report of scattered items No solution = take off-line

Changing data Oracle makes hard work of changes –Rollback data (immediate) –Log files (long term) Changed blocks read and updated in buffer Released to disk as buffer is cleared But rollback info generate most I/O operations In sensitive environments, simultaneous archiving makes it worse (ARCHIVELOG mode)

Indexing Problems Super-fast indexes need updating as data is changed => DB slows down. More complex index = more complex update mechanism = more rollback …. DB physical structure degrades, so does index (eg split blocks) Performance decreases over time Rebuild needed (which interferes with operations) INDEX_STATS tells you how big the index has grown

Side effects Definition of DB = consistency A reading process may need an older version of the data –Need to create a private version of the data –Cl: processes that should be Read only require writes Attempt to describe all required operations in finishing to execute query on old data

Effects of read consistency Must find an older version of data Must apply roll back Must find old roll back block (I/O) Roll back index (I/O) Find data (I/O) Roll back data Read old data Reverse all changes Significant I/O implications + buffer full of old stuff

Conclusions Writers and readers DO interfere with each other The mechanisms used by Oracle to bypass locking have performance side effects Performance come with minimising I/O –i.e. with good access techniques –Precision of data location –Physical proximity of related data (cf: caching) However: Techniques to reduce I/O numbers tend to reduce the speed of access!

Indexing example (see figure): Alphabetic search using B tree index for name = Oscar Smith Split the table in sections and read until find start overshoot letter “S”: Smith, N is the one Then look for page with Smith, N in header Then scan for actual entry in index Read pointer Move to table Read

About Btree indexes Root and Branch blocks = approx 2% of index (small) In frequent hit situations, both blocks loaded in Data Buffer all the time Then only 2 I/O may be required: –One to read leaf block –One to read the table In practice, read in index and in table may require reading several blocks –E.g: Several similar names –Index spans two (or more) leaf blocks –Each record is in a different data block

Creating and Using Indexes Important for live access, but even more for querying with multiple tables Value matching is costly process in RDB –No pointers –Connection purely on comparison basis only –One row with all other rows If link between 2 huge tables, perf is low All RDBs use some form of indexing Some complexity involved as index can reduce physical I/O while increasing logical I/O (ie CPU time)

Btree indexing Creating an index means creating a table with X+1 columns –X = number of columns in index –Rowid (added field) [table block + row] Index is then copied into consecutive blocks PCTFREE function leaves space for growth of data (but high value will generate many leaf blocks) Pointer is added to previous and next leaf blocks in header of block

Btree indexing (2) Then branch layer is built: If index > one block –Collect all first entries + block address of each leaf block –Write down into the first level branch block (packed) –If branch block is full, initiates second level of branch blocks etc…. Room is saved in branch blocks: –No forward and backward pointer in branch blocks –Entries are “trimmed” to the bare minimum –First entries are omitted See figure 6.1

Syntax CREATE INDEX name ON table name (field1, field2 …) PCTFREE 80; Oracle has many utility programmes to assess the performance of indexes – use INDEX_STATS (see handout) Practical problem: is it easy to create a new index for a large table? NO!

Updating indexes Index entries are NEVER changed –Marked as deleted and re-inserted Space made available cannot be used until after index is re-built Inserts that don’t fit split the block (rarely 50/50!) If a blocks becomes empty, it is marked as free, but is never removed Also, blocks never merge automatically

Some problems Some situations cannot be addressed with indexes e.g. In a FIFO processing situation (e.g. a queue), indexes will prove counterproductive Index may grow to stupid proportions even with small error rate (unsuccessful processing of data) Every time a transaction is added or processed (deleted) the index must change

Alternative: Bitmap indexing Imagine following query in huge table Find customers living in London, with 2 cars and 3 children occupying a 4 bed house Index not useful – why? –Too big –If query changes in any way =>new index needed –Maintaining a set of indexes for each query would just be too costly Use a bitmap (see table 6.1, 6.2 and 6.3)

Bitmap indexes (2) Special for data warehouse type DBs Build one bitmap for each relevant parameter Combine bitmaps using the “and” SQL keyword Also possible to use “not”ing of bitmap (see table 6.4, 6.5)

Key points to remember What is the key advantage of a bitmap index? What situation does it best suit ? Bitmaps can also be packed by Oracle compression features But size is unpredictable – why?

Example: Table with 1,000,000 rows Bitmap on one column that can contain one of 8 different values (e.g. city names) Data is such that all same city together 125,000 times –Write the bitmap –Imagine what compression can be achieved Data is such that cities are in random order, but same number of each –Same questions

solution First scenario Bitmap for first city: 125,000 ones and 875,000 zeros [trimmed off] Size ~ 125,000 bits or approx 18Kbytes Full bitmap = 156 Kbytes Second scenario: Bitmap for first city sequences of 1’s and 7 zeros, repeated 125,000 times Size ~ 1,000,000 bits or approx 140 Kbytes Full bitmap = 1.12 Mbytes But BTree index for such data would be around 12MB

Conclusion Bitmap indexes work best when combined They are very quick to build –Up to a million rows for 10 seconds Work best when limited number of values + when high repetitions Best way to deal with huge volumes => make drastic selection of interesting rows before reading the table Warning: one entry in a Bitmap = hundreds of records = > locking can be crazy (OLTP systems) => for datawarehouse type applications (no contention)

What oracle says about it Use index for queries with low hit ratio or when queries access < 2 - 4% of data Index maintenance is costly so index “just in case” is silly Must analyse the type of data when deciding what kind of index Do NOT use columns with loads of changes in an index Use indexed fields in “where” statement Can also write queries with NO_INDEX

Administering indexes Indexes degrade over time Should stabilise around 75% efficiency, but don’t Run stats: Analyse index NAME validate structure Analyse index NAME compute statistics Analyse index NAME estimate statistics sample 1 percent See table 6.6

Partitioned tables Partitions / partitioning / partitioned tables For very large tables Improve querying Easier admin Backup and recovery easier Optimiser knows when partitioning used Can use in SQL also

Creating a PT Create table FRED ( IDnumber namevarchar2(25) agenumber constraint fred_pk primary key (ID) ) partition by range (age) (partition PART1 values less than (21) partition PART2 values less than (40) partition PART3 values less than (maxvalue)

Warning Specification of partition is exclusive E.g. partition by range (name) (partition part1 values less than (‘F’) implies that f is excluded Maxvalue is a general term to pick up anything that failed so far Works for text as well as number

Hash partition Only in Oracle 8i and above Uses a numerical algorithm based on partition key to determine where to place data Range partition = consecutive values together Hash = consecutive values may be in different partitions –Also gives more partitions = reduces the risk of contention

What is Hash? Imagine 8GB table – split in 8 / 1 GB No intuitively clever way to split data Or obvious way is totally imbalanced –1 partition 7BG MB –Huge variations in performance Randomise breakdown of data so objects of similar size –Select one column –Select number of chuncks –Oracle does the rest!

Mechanics of hashing Each record is allocated into a bucket based on key value – e.g. Name = Joe Applying the hashing function to the value Joe uniquely returns the bucket number where the record is located: E.g. using prime number –divide KEY by a prime number –If text, translation into numeric value using ASCII code –use remainder of the division = address on the disk –if record already at same address - pointer to overflow area.

Hash partition - SQL Create table FRED ( Namevarchar2(25) primary key, Agenumber, Years abroadnumber ) Partition by hash (age) Partitions 2 Store in (Part1_fred, Part2_fred); (Not compulsory)

Sub-partitions Create table FRED ( Namevarchar2(25) primary key, Agenumber, Years abroadnumber ) Partition by range (years abroad) Subpartition by hash (name) Subpartitions 5 (partition Part1 values less than (1) partition Part2 values less than (3) partition Part3 values less than (6) partition Part4 values less than (MAXVALUE));

Indexing partitions Performance requirements may mean Partitioned tables should be indexed (separate issue) Create index FRED_NAME on FRED (name) Local Partitions (Part1, Part2, Part3, Part4) Local means create separate index for each partition of the table Alternative is to create a global index with values from different partitions Global indexes cannot be created for Hash partitions