ITCS 6163 Lecture 5. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.

Slides:



Advertisements
Similar presentations
Tuning Oracle SQL The Basics of Efficient SQLThe Basics of Efficient SQL Common Sense Indexing The Optimizer –Making SQL Efficient Finding Problem Queries.
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Bitmap Index Design and Evaluation Ariel Noy Data representation and retrieval seminar By: Chee-Yong Chan Yannis E.Ioannidis.
CSE544 Database Statistics Tuesday, February 15 th, 2011 Dan Suciu , Winter
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
BTrees & Bitmap Indexes
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Quick Review of Apr 17 material Multiple-Key Access –There are good and bad ways to run queries on multiple single keys Indices on Multiple Attributes.
ITIS 5160 Indexing. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Access Path Selection in a Relation Database Management System (summarized in section 2)
Lecture 6 Indexing Part 2 Column Stores. Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)--
CS 345: Topics in Data Warehousing Thursday, October 28, 2004.
Cloud Computing Lecture Column Store – alternative organization for big relational data.
CS 345: Topics in Data Warehousing Thursday, October 21, 2004.
1 Physical Data Organization and Indexing Lecture 14.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
Database Management 9. course. Execution of queries.
Partitioning – A Uniform Model for Data Mining Anne Denton, Qin Ding, William Jockheck, Qiang Ding and William Perrizo.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
Data Warehousing.
Lecture 5 Cost Estimation and Data Access Methods.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Prof. Bayer, DWH, Ch.5, SS Chapter 5. Indexing for DWH D1Facts D2.
BITMAPS & Starjoins. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
Physical Database Design I, Ch. Eick 1 Physical Database Design I Chapter 16 Simple queries:= no joins, no complex aggregate functions Focus of this Lecture:
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Improved Query Performance With Variant Indexes Patrick O’Neil, Dallan Quass Presented by Bo Han.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
All DBMSs provide variations of b-trees for indexing B-tree index
Tuning Oracle SQL The Basics of Efficient SQL Common Sense Indexing
CPS216: Data-intensive Computing Systems
Indexing Structures for Files and Physical Database Design
CS 540 Database Management Systems
ITIS 5160 Indexing.
Database System Implementation CSE 507
COMP 430 Intro. to Database Systems
Database Management Systems (CS 564)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Overview of Query Optimization
Chapter 15 QUERY EXECUTION.
File organization and Indexing
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Database Management Systems (CS 564)
BITMAP INDEXES E0 261 Jayant Haritsa Computer Science and Automation
Overview of Query Evaluation
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
Presentation transcript:

ITCS 6163 Lecture 5

Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of indexed keys. Dynamic, stable and exhibit good performance under updates. (But OLAP is not about updates….) Bitmaps: Space efficient Difficult to update (but we don’t care in DW). Can effectively prune searches before looking at data.

Bitmaps R = (…., A,….., M)  R (A) B 8 B 7 B 6 B 5 B 4 B 3 B 2 B 1 B

Query optimization Consider a high-selectivity-factor query with predicates on two attributes. Query optimizer: builds plans (P1) Full relation scan (filter as you go). (P2) Index scan on the predicate with lower selectivity factor, followed by temporary relation scan, to filter out non- qualifying tuples, using the other predicate. (Works well if data is clustered on the first index key). (P3) Index scan for each predicate (separately), followed by merge of RID.

Query optimization (continued) (P2) Blocks of data Pred. 2 answer t1 tn Index Pred1 (P3) t1 tn Index Pred2 Tuple list1 Tuple list2 Merged list

Query optimization (continued) When using bitmap indexes (P3) can be an easy winner! CPU operations in bitmaps (AND, OR, XOR, etc.) are more efficient than regular RID merges: just apply the binary operations to the bitmaps (In B-trees, you would have to scan the two lists and select tuples in both -- merge operation--) Of course, you can build B-trees on the compound key, but we would need one for every compound predicate (exponential number of trees…).

Bitmaps and predicates A = a1 AND B = b2 Bitmap for a1Bitmap for b2 AND = Bitmap for a1 and b2

Tradeoffs Dimension cardinality small dense bitmaps Dimension cardinality large sparse bitmaps Compression (decompression)

Bitmap for prod  Bitmap for prod  ….. Query strategy for Star joins Maintain join indexes between fact table and dimension tables Prod. Fact tableDimension table a... k …… …… Bitmap for type a Bitmap for type k ….. Bitmap for loc.  Bitmap for loc.  …..

Strategy example Aggregate all sales for products of location ,  or Bitmap for  Bitmap for  Bitmap for OR = Bitmap for predicate

Star-Joins Select F.S, D1.A1, D2.A2, …. Dn.An from F,D1,D2,Dn where F.A1 = D1.A1 F.A2 = D2.A2 … F.An = Dn.An and D1.B1 = ‘c1’ D2.B2 = ‘p2’ …. Likely strategy: For each Di find suitable values of Ai such that Di.Bi = ‘xi’ (unless you have a bitmap index for Bi). Use bitmap index on Ai’ values to form a bitmap for related rows of F (OR-ing the bitmaps). At this stage, you have n such bitmaps, the result can be found AND-ing them.

Example Selectivity/predicate = 0.01 (predicates on the dimension tables) n predicates (statistically independent) Total selectivity = 10 -2n Facts table = 10 8 rows, n = 3, tuples in answer = 10 8 / 10 6 = 100 rows. In the worst case = 100 blocks… Still better than all the blocks in the relation (e.g., assuming 100 tuples/block, this would be 10 6 blocks!)

Design Space of Bitmap Indexes The basic bitmap design is called Value-list index. The focus there is on the columns. If we change the focus to the rows, the index becomes a set of attribute values (integers) in each tuple (row), that can be represented in a particular way We can encode this row in many ways...

Attribute value decomposition C = attribute cardinality Consider a value of the attribute, v, and a sequence of numbers. Also, define b n =  C /  b i , then v can be decomposed into a sequence of n digits as follows: v = V 1 = V 2 b 1 + v 1 = V 3 (b 2 b 1 ) + v 2 b 1 + v 1 … n-1 i-1 = v n (  b j ) + …+ v i (  b j ) + …+ v 2 b 1 + v 1 where v i = V i mod b i and V i =  V i-1 /b i-1 

(decimal system!) 576 = 5 x 10 x x /100 = 5 | 76 76/10 = 7 | 6 6 Number systems How do you write 576 in: 576 = 1 x x x x x x x x x x / 2 9 = 1 | 64, 64/ 2 8 = 0|64, 64/ 2 7 = 0|64, 64/ 2 6 = 1|0, 0/ 2 5 = 0|0, 0/ 2 4 = 0|0, 0/ 2 3 = 0|0, 0/ 2 2 = 0|0, 0/ 2 1 = 0|0, 0/ 2 0 = 0|0 576/(7x7x5x3) = 576/735 = 0 | 576, 576/(7x5x3)=576/105=5| = 5 x (7x5x3)+51 51/(5x3) = 51/15 = 3 | = 5 x (7x5x3) + 3 (5 x 3) /3 =2 | = 5 x (7x 5 x 3) + 3 x (5 x 3 ) + 2 x (3)

Bitmaps R = (…., A,….., M) value-list index  R (A) B 8 B 7 B 6 B 5 B 4 B 3 B 2 B 1 B

Example sequence value-list index (equality)  R (A) B 2 2 B 1 2 B 0 2 B 2 1 B 1 1 B (1x3+0)

Encoding scheme Equality encoding: all bits to 0 except the one that corresponds to the value Range Encoding: the vi righmost bits to 0, the remaining to 1

Range encoding single component, base-9  R (A) B 8 B 7 B 6 B 5 B 4 B 3 B 2 B 1 B

Example (revisited) sequence value-list index(Equality)  R (A) B 2 2 B 1 2 B 0 2 B 2 1 B 1 1 B (1x3+0)

Example sequence range-encoded index  R (A) B 1 2 B 0 2 B 1 1 B

Design Space …. equality range

RangeEval Evaluates each range predicate by computing two bitmaps: BEQ bitmap and either BGT or BLT RangeEval-Opt uses only <= A < v is the same as A <= v-1 A > v is the same as Not( A <= v) A >= v is the same as Not (A <= v-1)

RangeEval-OPT