Oracle Index study for Event TAG DB M. Boschini S. Della Torre

Slides:



Advertisements
Similar presentations
Youre Smarter than a Database Overcoming the optimizers bad cardinality estimates.
Advertisements

Tuning Oracle SQL The Basics of Efficient SQLThe Basics of Efficient SQL Common Sense Indexing The Optimizer –Making SQL Efficient Finding Problem Queries.
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
© IBM Corporation Informix Chat with the Labs John F. Miller III Unlocking the Mysteries Behind Update Statistics STSM.
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Concurrency Control Part 2 R&G - Chapter 17 The sequel was far better than the original! -- Nobody.
SQL Performance 2011/12 Joe Chang, SolidQ
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
File Systems.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Overview of Storage and Indexing Chapter 8 (part 1)
1 File Organizations and Indexing Module 4, Lecture 2 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander.
David Konopnicki Choosing Access Path ä The basic methods. ä The access paths and when they are available. ä How the optimizer chooses among the.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Optimization Exercises. Question 1 How do you think the following query should be computed? What indexes would you suggest to use? SELECT E.ename, D.mgr.
Lecture 6 Indexing Part 2 Column Stores. Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)--
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
9/11/2015ISYS366 - Week051 ISYS366 – Week 5-6 Database Tuning - User and Rollback Data Spaces, Recovery, Backup.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Lecture 8 Index Organized Tables Clusters Index compression
Oracle Data Block Oracle Concepts Manual. Oracle Rows Oracle Concepts Manual.
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 9 Index Management.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Oracle Database Administration Lecture 6 Indexes, Optimizer, Hints.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Database Management 9. course. Execution of queries.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
Sizing Basics  Why Size?  When to size  Sizing issues:  Bits and Bytes  Blocks (aka pages) of Data  Different Data types  Row Size  Table Sizing.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 “How index-learning turns no student pale Yet holds.
© Pearson Education Limited, Chapter 13 Physical Database Design – Step 4 (Choose File Organizations and Indexes) Transparencies.
Views Lesson 7.
Parallel Execution Plans Joe Chang
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Oracle tuning: a tutorial Saikat Chakraborty. Introduction In this session we will try to learn how to write optimized SQL statements in Oracle 8i We.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Methodology – Physical Database Design for Relational Databases.
Virtual Memory 1 1.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 5 Index and Clustering
Chap 5. Disk IO Distribution Chap 6. Index Architecture Written by Yong-soon Kwon Summerized By Sungchan IDS Lab
CSCI 156: Lab 11 Paging. Our Simple Architecture Logical memory space for a process consists of 16 pages of 4k bytes each. Your program thinks it has.
11-Nov Distr. DB Operations workshop - November 2008 The PVSS Oracle DB Archive in ATLAS ( life cycle of the data ) Gancho Dimitrov (LBNL)
Query Processing – Implementing Set Operations and Joins Chap. 19.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
How to kill SQL Server Performance Håkan Winther.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
Select Operation Strategies And Indexing (Chapter 8)
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
How is data stored? ● Table and index Data are stored in blocks(aka Page). ● All IO is done at least one block at a time. ● Typical block size is 8Kb.
Tuning Oracle SQL The Basics of Efficient SQL Common Sense Indexing
Indexing Structures for Files and Physical Database Design
Index An index is a performance-tuning method of allowing faster retrieval of records. An index creates an entry for each value that appears in the indexed.
Design Patterns for SSIS Performance
Lecture 16: Data Storage Wednesday, November 6, 2006.
Scaling SQL with different approaches
Database Tuning - User and Rollback Data Spaces, Recovery, Backup
File Organizations Chapter 8 “How index-learning turns no student pale
Lecture 12 Lecture 12: Indexing.
File Organizations and Indexing
Troubleshooting Techniques(*)
Presentation transcript:

Oracle Index study for Event TAG DB M. Boschini S. Della Torre

Outline TAG DB issue: space usage issue INDEX accounting for ~ 50% of used space... query performance Tests with random data Tests with AMS01 data Tests with AMS02 Cosmic data M. Boschini Oracle Index Study for STATUS DB – TIM October CERN

EVENT TAG DB Goal: test feasibility of storing in an Oracle DB TAG info for AMS02 events TAG is a 64bit number AMS02 MC has, as for now, no meaningful TAG info (0 or error bit)‏ We start from previous studies: _09_05b.ps.gz chini_LV3_DB.pdf

EVENT TAG DB Test INDEX type space usage Test INDEX usage Time insert & selections What can we use ?

test environment SLC 4.6 (64bit)‏ 2 dual core Intel(R) Xeon(R) 2.00GHz Oracle 11g 64 bit we opted for Oracle11g because of product lifetime All data is fed to DB using C-OCI programs Oracle setup: user AMSDES with dedicated Bigfile tablespace overkill wrt Oracle's suggestion (>= 1 TB)...? default TEMP TBLSpace (3 GB)‏

test environment Dummy Table: RUN(NUMBER) EVENT(NUMBER) TAG(BINARY_DOUBLE)‏ TAG field will be used for indexing...assuming queries will be mainly on TAG 10^8 records, equivalent to ~ 10 days of DAQ Test B-Tree vs BitMap Index as hinted by our previous studies...

Indexes (theory)‏ Btree default index type in Oracle very space consuming. Not advised for dataware housing dataware house: write few, read many, HUGE data sample BitMap advised for dataware housing Bitmap indexes are stored as an array of zero-and-one values, with one entry for each row. Should be used only on low cardinality fields

Indexes (theory)‏ BitMap From Oracle Advanced Programming White Paper, Sep “...Conventional wisdom holds that bitmap indexes are most appropriate for columns having low distinct values—such as GENDER, MARITAL_STATUS. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems...”

Tests Items to be tested are: SPACE OCCUPANCY = f(index type)‏ QUERY PERFORMANCE = f(index type)‏ Index types: B-Tree and Bitmap We started with a random generated sample We then used an AMS01 sample 40 ntuples no request on charge, pmass, pmom, lat (nothing!) AMS02 Cosmic data

Disk Space TABLE SIZE = 5 GB this means that for AMS02 we'll need 500 GB B-Tree INDEX SIZE = 4.8 GB this means that for AMS02 we'd need 480 GB BitMap INDEX SIZE = 70 MB this means that for AMS02 we'd need 7 GB thus, at least for space reasons, we should use BitMap INDEX, a factor 30 smaller.

when to create an INDEX After TABLE has been populated ! B-Tree INDEX creation time: 4 µsec/record mainly SORT and disk space allocation BitMap INDEX creation time: 0.5 µsec/record

Theory: is an INDEX useful ? Not always... ORACLE has a built-in decision taking algorithm (EXECUTION PLAN) which decides how to actually implement a query. Execution plan tries to optimize disk accesses and CPU usage. So, e.g., if reading an index which returns “many” records takes more than X, than the index is not used. These are based, since Oracle i on Cost Based Optimizer and Dynamic Sampling (by default ON)‏

EXECUTION PLAN PLAN_TABLE_OUTPUT Plan hash value: | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | 0 | SELECT STATEMENT | | | 503K| 297 (1) | 00:00:04 | | 1 | TABLE ACCESS FULL | N_TAG_RAND_NOISE | | 503K| 297 (1)| 00:00:05 |

EXECUTION PLAN PLAN_TABLE_OUTPUT Plan hash value: | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | 0 | SELECT STATEMENT | | | 503K| 297 (1) | 00:00:04 | | 1 | TABLE ACCESS BY INDEX ROWID| N_TAG_RAND_NOISE | | 503K| 297 (1)| 00:00:04 | |* 2 | INDEX RANGE SCAN | TAG_RAND_IDX | | | 106 (0)| 00:00:02 |

IDX usage Test To test if Index is actually used, we wrote a simple PLSQL+Perl program that Dynamically populates statistics table HOW-MANY TAG_STA For each TAG_STA value, generates and analyzes EXECUTION PLAN for SELECT run, event from TABLE wher tag=TAG_STA; We thus run CARDINALITY queries to DB and analyze the Optimizer decisions...

Sample1 CARDINALITY:5 relative population > 5.7%

Sample2 CARDINALITY:2517 relative population: 6x10E-4

Performance Sample1 and Sample2 both do NOT use INDEX in the select query. This is because Oracle's built-in optimization algorithm discovers that too many records will be returned, and thus decides to ignore the INDEX Thus: LOW/MEDIUM cardinality with HIGH relative population makes INDEX uneffective

Sample3 CARDINALITY:1024 relative population 6x10E-5

Performance Sample3 uses INDEX ! This is because Oracle's builtin optimization algorithm discovers that TABLE ACCESS BY INDEX ROWID is efficient. Thus LOW/MEDIUM cardinality with “correct for Oracle” relative population makes INDEX effective !

Real Data: AMS01 We then used an AMS01 sample 40 ntuples (no request on charge, pmass, pmom, ecc)‏ “scrambled” this sample to get 10^8 records Average relative cardinality = 2x10E-6 BitMap Index is USED ! Selection time no IDX: 22 sec IDX: 0.05 sec INDEX really effective

Real Data: AMS02 !!! We also used AMS02 Cosmic Rays data we used only 2  2x10E8 events no request on charge, pmass, pmom,...(nothing!) Table size: 5.5 GB  2 BitMap IDX size: 57 MB  2 just for the sake of curiosity: a UNIQUE requirement on RUN+EVENTNO creates an INDEX 5 GB big !!! Btree IDX size: 4.2 GB  2

Real Data: AMS02 !!! Mean Relative cardinality: 1.8x10E-4 Max Relative cardinality: 3x10E-3 BitMap Index can be USED ! tested EXECUTION PLAN for all TAG values.

Real Data: AMS02 !!! Selection time no IDX: Average Selection time: 24 sec 0.01 sec/record once retrieved IDX: Average Selection time: 0.05 sec sec/record once retrieved Again, B-Map index is very effective.

naïve usage... The most naïve way to design the DB is 1 table for each month of DAQ. This leads to ~36 tables. Most naïve way to query them all is using a UNION statement which still uses index with nearly no overhead for the SORT UNIQUE/UNION-ALL part...

PLAN_TABLE_OUTPUT Plan hash value: | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | | 0 | SELECT STATEMENT | | 44 | 1408 | 290K (50)| 00:58:03 | | 1 | SORT UNIQUE | | 44 | 1408 | 290K (50)| 00:58:03 | | 2 | UNION-ALL | | | | | | | 3 | TABLE ACCESS BY INDEX ROWID | COSMIC_TDV | 42 | 1344 | 145K (1)| 00:29:10 | | 4 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 5 | BITMAP INDEX SINGLE VALUE | COMIX_TDV | | | | | | 6 | TABLE ACCESS BY INDEX ROWID | COSMIC_TDV1 | 2 | 64 | 144K (1)| 00:28:53 | | 7 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 8 | BITMAP INDEX SINGLE VALUE | COMIX_TDV1 | | | | | rows returned out of 377,516,896 rows in total Rough estimate for SELECT on 3 years: 25 minutes...

Conclusions In general, Bit-Map index is very efficient in space usage 30 times smaller than normal Btree index Index is used in queries if relative cardinality is low If so, selection time is very low. According to AMS01 and AMS02 Cosmic Rays analyzed, BitMap index can be used We could thus use it for AMS02 expected space usage: 500 GB (table)+ 6 GB (idx)‏ EventStatusTable02 files will use ~ 150 GB...

AMS01 sample CARDINALITY:89860 average relative population 5.7%