A Primer on Multidimensional Clustering for UDB LUW.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Chapter 8 File organization and Indices.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
1  MyOnlineITCourses.com 1 MyOnlineITCourses.com Oracle Partitioning -- A Primer.
Backup & Recovery 1.
Oracle Database Administration Database files Logical database structures.
Indexing - revisited CS 186, Fall 2012 R & G Chapter 8.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Database Administration TableSpace & Data File Management
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Storage and Indexing1 Overview of Storage and Indexing.
Views Lesson 7.
Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Database Programming Sections 11 & 12 –Sequences, Indexes, and Synonymns.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Chapter 5 : Integrity And Security  Domain Constraints  Referential Integrity  Security  Triggers  Authorization  Authorization in SQL  Views 
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
Creating Indexes on Tables An index provides quick access to data in a table, based on the values in specified columns. A table can have more than one.
Chap 5. Disk IO Distribution Chap 6. Index Architecture Written by Yong-soon Kwon Summerized By Sungchan IDS Lab
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Table Structures and Indexing. The concept of indexing If you were asked to search for the name “Adam Wilbert” in a phonebook, you would go directly to.
Database Systems, 8 th Edition SQL Performance Tuning Evaluated from client perspective –Most current relational DBMSs perform automatic query optimization.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
APRIL 13 th Introduction About me Duško Mirković 7 years of experience.
SQL Basics Review Reviewing what we’ve learned so far…….
Select Operation Strategies And Indexing (Chapter 8)
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Practical Database Design and Tuning
Indexes By Adrienne Watt.
Indexing Structures for Files and Physical Database Design
Record Storage, File Organization, and Indexes
Physical Database Design and Performance
Database Management Systems (CS 564)
Database Performance Tuning and Query Optimization
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Lecture 12 Lecture 12: Indexing.
Physical Database Design
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Practical Database Design and Tuning
Chapter 4 Indexes.
CH 4 Indexes.
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Chapter 11 Database Performance Tuning and Query Optimization
A – Pre Join Indexes.
Presentation transcript:

A Primer on Multidimensional Clustering for UDB LUW

2 The DB2 Optimizer asks him for the best access path. He wrote an improved version of the DB2 Optimizer, using only 9 lines of code. He can type all SQL syntax with 100% accuracy from memory. He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32. He had the "Backspace" and "Delete" keys permanently removed from his keyboard. His sysadmins call him daily to ask if they can give him more disk space. He has never had a Network Security firewall rule refuse him access. On a slow day, he will reorg large tables completely in his mind. He once made an SQL statement run faster just by staring at it. He has never clicked on the “undo” arrow. The DB2 Optimizer asks him for the best access path. He wrote an improved version of the DB2 Optimizer, using only 9 lines of code. He can type all SQL syntax with 100% accuracy from memory. He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32. He had the "Backspace" and "Delete" keys permanently removed from his keyboard. His sysadmins call him daily to ask if they can give him more disk space. He has never had a Network Security firewall rule refuse him access. On a slow day, he will reorg large tables completely in his mind. He once made an SQL statement run faster just by staring at it. He has never clicked on the “undo” arrow. The DB2 Optimizer asks him for the best access path. He wrote an improved version of the DB2 Optimizer, using only 9 lines of code. He can type all SQL syntax with 100% accuracy from memory. He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32. He had the "Backspace" and "Delete" keys permanently removed from his keyboard. His sysadmins call him daily to ask if they can give him more disk space. He has never had a Network Security firewall rule refuse him access. On a slow day, he will reorg large tables completely in his mind. He once made an SQL statement run faster just by staring at it. He has never clicked on the “undo” arrow. The DB2 Optimizer asks him for the best access path. He wrote an improved version of the DB2 Optimizer, using only 9 lines of code. He can type all SQL syntax with 100% accuracy from memory. He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32. He had the "Backspace" and "Delete" keys permanently removed from his keyboard. His sysadmins call him daily to ask if they can give him more disk space. He has never had a Network Security firewall rule refuse him access. On a slow day, he will reorg large tables completely in his mind. He once made an SQL statement run faster just by staring at it. He has never clicked on the “undo” arrow. The DB2 Optimizer asks him for the best access path. He wrote an improved version of the DB2 Optimizer, using only 9 lines of code. He can type all SQL syntax with 100% accuracy from memory. He taught his dog to prefetch, so that when he throws one ball, the dog returns with 32. He had the "Backspace" and "Delete" keys permanently removed from his keyboard. His sysadmins call him daily to ask if they can give him more disk space. He has never had a Network Security firewall rule refuse him access. On a slow day, he will reorg large tables completely in his mind. He once made an SQL statement run faster just by staring at it. He has never clicked on the “undo” arrow.

A brief bio… 29 years IT, 19 years of DBA experience o UDB LUW on AIX o DB2/ZOS o Oracle Longest query I ever tuned was over 4 feet long when printed out Favorite saying: “Even a blind squirrel finds a nut once in a while”

Agenda What is clustering? What is multidimensional clustering (MDC)? Some design guidelines for MDC

Left Outer Join JOIN L TE F

Backup back Hint: The most important thing to a DBA

Create table in tablespace tablecreatetablespace

What is Clustering? Physical sequence of rows in a DB2 table. Determined by defining one index as the “clustering index”. As rows are inserted, DB2 attempts to put them in correct clustering location During Reorg, rows are sorted in clustering order before reloading back into table Is the table clustered or is the index clustered??

9 Regular non-clustering indexes Table Index On Region Index on Year

10 Clustering Index Table Clustering Index On Region Index on Year

11 Why are reads faster when a table is clustered?? The first I/O reads a page into memory which contains many rows with the same key or a range of key values o Example: App needs 500 rows for a given region…. If the DBMS knows that it will need to fetch several or many consecutive pages, then it can begin “prefetching” extents (multiple pages) into memory before application needs it 18 IOs3 IOsvs 1 IO

Sequential Prefetch “Holy Grail” when accessing large numbers of rows Significant reduction in I/O Physical reads vs Logical reads Tablespace Page size (bytes) Tablespace Extent size (pages) Tablespace Prefetch size (pages)

How UDB uses clustering Sequential prefetch is turned on if UDB determines cost savings Clustered data makes it more likely for sequential prefetch to be turned on Optimizer looks at clusterratio and clusterfactor (on syscat.indexes) Sequential detection can be turned on dynamically during query execution

So what’s the shortfall with Clustering? Clustering deteriorates over time (probably) – requiring reorgs Record based indexes with a pointer for every single record, so can become very large in size Only get one choice for the clustering index. If Joe needs the table clustered by timestamp and Bill needs it clustered by policy #, one of them will probably be unhappy.

Partitioned Database DA TA BA SE

MultiDimensional Clustering Dimensionalclustering

MultiDimensional Clustering (MDC) What if your data could be physically sequenced in more than one way at the same time?? Great in theory, but how do you make this happen in real life on a real table??

MultiDimensional Clustering Data is physically grouped together by “dimensions” into separate blocks, or extents Each page belongs to exactly one block All blocks are of equal size Tablespace Page size (bytes) Tablespace Extent size (pages) Tablespace Prefetch size (pages)

What is an extent? An extent is a set of contiguous data pages on disk, specified at tablespace creation time. Physical size of an extent determined by: o Extent Size (# of pages) o Page Size (kb)

Color Year Age Red Blue Green MDC with Three Dimensions

What is a (logical) cell? Contains all rows for a unique combination of dimension values Physically made up of one or more blocks (extents) Blocks are only allocated for logical cells which actually have records for a given combination of dimension values

Color Year Age Red Blue Green 2002, Red, 1 A “Cell”

What is a Slice? A slice is a set of blocks having a particular dimension key.

Color Year Age Red Blue Green 2002, Red, 12003, Red, 12004, Red, 1 A Red “Slice” of the Color Dimension

Color Year Age Red Blue Green 2004, Red, , Blue, , Green, 1 A 2004 “Slice” of the Year Dimension

Color Year Age Red Blue Green 2004, Red, , Blue, , Green, , Red, , Blue, , Green, , Red, , Blue, , Green, 1 A 1 “Slice” of the Age Dimension

How MDC works Rows are organized in extents based upon dimensions Dimension Block Index on Color Red Blue Green Dimension Block Index on Year

MultiDimensional Clustering MDC introduces indexes that are block-based – much smaller than record-based o A pointer for each block instead of a pointer for each row MDC allows a table to be physically clustered on more than one key or dimension MDC table is able to maintain and guarantee clustering over all dimensions automatically and continuously

MDC Indexes A dimension block index is automatically created for each dimension specified A composite block index is automatically created containing all columns across all dimensions Composite index used to maintain clustering Much lower overhead for logging

Creating an MDC table Create table t1 (age int, color char(10), year char(4), c1 int, c2 int) organize by dimensions (age, color, year) Three dimension block indexes (one each for age, color and year). A composite block index is also created which includes (age,color, year). Traditional “RID” indexes can also be created on an MDC Can logical AND/OR between BID and RID indexes

Color Year Age Red Blue Green 2004, Red, , Blue, , Green, , Red, , Blue, , Green, , Red, , Blue, , Green, 1 Select Processing in MDC (ex #1) Select … From Table Where Age = ‘1’

Color Year Age Red Blue Green 2002, Red, 12003, Red, 12004, Red, 1 Select … From Table Where color = ‘Red’ Select Processing in MDC (ex #2)

Color Year Age Red Blue Green 2002, Red, 1 Select Processing in MDC (ex #3) Select … From Table Where color = ‘Red’ And Age = 1 And Year = ‘2002’

Insert Processing in MDC Probe composite block index to see if this is a new combination of dimensions (new logical cell) If existing, search list of BIDs to look for space to insert row If new logical cell or all blocks full for an existing cell, then create a new block

Delete Processing in MDC If the record being deleted is not the last record in block, UDB just deletes the record and removes its RID from any record based indexes If deleting last record in block, UDB frees the block by changing its IN_USE status bit and removing the BID from all block indexes and also remove RID from record based indexes

Update Processing in MDC Updates on non-dimension values are done in place just as with regular tables o No need to update block indexes unless no space is found and a new block needs to be added to cell Updates of dimension values are treated as delete/insert o Block indexes will need to be updated

MDC Benefits Can cluster in multiple dimensions Clustering is automatically and dynamically maintained over time. Reorg not necessary for re-clustering Block indexes are much smaller and have much less overhead for maintenance and logging

Design Guidelines for MDC MDC is great tool But, used incorrectly, can make things worse just as much as it can make things better Requires knowledge of data and data useage by users

MDC Design Most important design criteria for MDC is to select proper dimension columns and appropriate exent size Columns that are used in queries as equality or range predicates Low cardinality Desire high density – blocks are mostly full Generally no more than 3 or 4 dimensions

MDC Size Considerations At least one extent will be allocated for every unique combination of dimensions in the data Evaluate dimension volumetrics and row size to establish tablespace extent size o Select dimcol1, dimcol2, dimcol3, count(*) from table Example: 8k page size * 32 page extent size gives 256k extent size If you have 1 million unique dimension combinations – minimum table size of 256 GB!!

What happens if you choose wrong?? A high cardinality column(s) will explode the size of your table and destroy performance!!! Remember that a block is physically allocated for each unique combination of dimension key values NEVER use a high cardinality column or a unique column for an MDC dimension

Down Right Stupid Stupid Choosing a unique column as a dimension is just:

Using column expressions with MDC What if a column is a good dimension candidate, but cardinality is way too high (ex: timestamp column) Create table t1 (c1 timestamp, c2 int, c3 int generated always as year(c1)) organize by dimensions (c2, c3) Monotonic – generated column increases/decreases the same as base column A non-monotonic column will only allow equality or IN predicates on the base column to use the block index

MDC tables and database partitioning DB2 LUW DPF partitioning is just a way to spread the data across partitions (not range partitioning like DB2/ZOS The reason for partitioning a table is independent of whether the table is an MDC table or a regular table Can partition on a dimension column or a non-dimension column o However, partitioning on a dimension column means that all rows for a particular dimension value exist on only 1 partition If partitioning, remember that logical cells can spread across partitions o Important for sizing of extents

Block Index Considerations Composite block index columns are ordered based upon “organize by dimensions” clause Create table t1 (c1 int, c2 int, c3 int, c4 int) organize by dimensions (c1, c4, (c3,c1), c2) o Composite index will be (c1,c4,c3,c2) Create table t1 (c1 int, c2 int, c3 int, c4 int) organize by dimensions (c1, c2, (c3,c1), c4) o Composite index will be (c1,c2,c3,c4)

The Customer is always right Everything Else Customer

To make a long story short

Questions???