ITIS 5160 Indexing.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Bitmap Index Design and Evaluation Ariel Noy Data representation and retrieval seminar By: Chee-Yong Chan Yannis E.Ioannidis.
Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
A Ternary Unification Framework for Optimizing TCAM-Based Packet Classification Systems Author: Eric Norige, Alex X. Liu, and Eric Torng Publisher: ANCS.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
BTrees & Bitmap Indexes
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Quick Review of Apr 17 material Multiple-Key Access –There are good and bad ways to run queries on multiple single keys Indices on Multiple Attributes.
1 Query Optimization In Compressed Database Systems Zhiyuan Chen and Johannes Gehrke Cornell University Flip Korn AT&T Labs.
ITIS 5160 Indexing. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Access Path Selection in a Relation Database Management System (summarized in section 2)
Cloud Computing Lecture Column Store – alternative organization for big relational data.
CS 345: Topics in Data Warehousing Thursday, October 21, 2004.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Database Management 9. course. Execution of queries.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
ITCS 6163 Lecture 5. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
BITMAPS & Starjoins. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
March, 2002 Efficient Bitmap Indexing Techniques for Very Large Datasets Kesheng John Wu Ekow Otoo Arie Shoshani.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
All DBMSs provide variations of b-trees for indexing B-tree index
Tuning Oracle SQL The Basics of Efficient SQL Common Sense Indexing
CS422 Principles of Database Systems Indexes
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
CPS216: Data-intensive Computing Systems
Indexing Structures for Files and Physical Database Design
CS 540 Database Management Systems
Database Management System
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Database System Implementation CSE 507
Lecture 20: Indexing Structures
COMP 430 Intro. to Database Systems
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Overview of Query Optimization
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Chapter 15 QUERY EXECUTION.
File organization and Indexing
Chapter 11: Indexing and Hashing
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Lecture 15: Bitmap Indexes
15.6 Index Based Algorithms
Realtime Analytics OLAP & OLTP in the mix
Dual Bitmap Index: Space-Time Efficient Bitmap
Database Management Systems (CS 564)
BITMAP INDEXES E0 261 Jayant Haritsa Computer Science and Automation
INDEXING.
Chapter 12 Query Processing (1)
Overview of Query Evaluation
Query Optimization.
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 11: Indexing and Hashing
Table Suitable for Bitmap Index
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
MIS 451 Building Business Intelligence Systems
Presentation transcript:

ITIS 5160 Indexing

Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of indexed keys. Dynamic, stable and exhibit good performance under updates. (But OLAP is not about updates….) Bitmaps: Space efficient Difficult to update (but we don’t care in DW). Can effectively prune searches before looking at data.

Bitmaps R = (…., A,….., M) R (A) B8 B7 B6 B5 B4 B3 B2 B1 B0 3 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 1 0 0 8 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 7 0 1 0 0 0 0 0 0 0 5 0 0 0 1 0 0 0 0 0 6 0 0 1 0 0 0 0 0 0 4 0 0 0 0 1 0 0 0 0

Query optimization Consider a high-selectivity-factor query with predicates on two attributes. Query optimizer: builds plans (P1) Full relation scan (filter as you go). (P2) Index scan on the predicate with lower selectivity factor, followed by temporary relation scan, to filter out non-qualifying tuples, using the other predicate. (Works well if data is clustered on the first index key). (P3) Index scan for each predicate (separately), followed by merge of RID.

Query optimization (continued) tn Index Pred1 Blocks of data (P2) Tuple list1 (P3) Merged list Pred. 2 t1 tn Index Pred2 Tuple list2 answer

Query optimization (continued) When using bitmap indexes (P3) can be an easy winner! CPU operations in bitmaps (AND, OR, XOR, etc.) are more efficient than regular RID merges: just apply the binary operations to the bitmaps (In B-trees, you would have to scan the two lists and select tuples in both -- merge operation--) Of course, you can build B-trees on the compound key, but we would need one for every compound predicate (exponential number of trees…).

Bitmaps and predicates A = a1 AND B = b2 Bitmap for a1 Bitmap for b2 Bitmap for a1 and b2 = AND

Tradeoffs Dimension cardinality small dense bitmaps Dimension cardinality large sparse bitmaps Compression (decompression)

Star-Joins Select F.S, D1.A1, D2.A2, …. Dn.An Likely strategy: from F,D1,D2,Dn where F.A1 = D1.A1 F.A2 = D2.A2 … F.An = Dn.An and D1.B1 = ‘c1’ D2.B2 = ‘p2’ …. Likely strategy: For each Di find suitable values of Ai such that Di.Bi = ‘xi’ (unless you have a bitmap index for Bi). Use bitmap index on Ai’ values to form a bitmap for related rows of F (OR-ing the bitmaps). At this stage, you have n such bitmaps, the result can be found AND-ing them.

Bitmaps R = (…., A,….., M) value-list index R (A) B8 B7 B6 B5 B4 B3 B2 B1 B0 3 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 1 0 0 8 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 7 0 1 0 0 0 0 0 0 0 5 0 0 0 1 0 0 0 0 0 6 0 0 1 0 0 0 0 0 0 4 0 0 0 0 1 0 0 0 0

Example sequence <3,3> value-list index (equality) R (A) B22 B12 B02 B21 B11 B01 3 (1x3+0) 0 1 0 0 0 1 2 0 0 1 1 0 0 1 0 0 1 0 1 0 2 0 0 1 1 0 0 8 1 0 0 1 0 0 2 0 0 1 1 0 0 2 0 0 1 1 0 0 0 0 0 1 0 0 1 7 1 0 0 0 1 0 5 0 1 0 1 0 0 6 1 0 0 0 0 1 4 0 1 0 0 1 0

Encoding scheme Equality encoding: all bits to 0 except the one that corresponds to the value Range Encoding: the vi rightmost bits to 0, the remaining to 1

Range encoding single component, base-9 R (A) B8 B7 B6 B5 B4 B3 B2 B1 B0 3 1 1 1 1 1 1 0 0 0 2 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 8 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 7 1 1 0 0 0 0 0 0 0 5 1 1 1 1 0 0 0 0 0 6 1 1 1 0 0 0 0 0 0 4 1 1 1 1 1 0 0 0 0

RangeEval Evaluates each range predicate by computing two bitmaps: BEQ bitmap and either BGT or BLT RangeEval-Opt uses only <= A < v is the same as A <= v-1 A > v is the same as Not( A <= v) A >= v is the same as Not (A <= v-1)

Example (revisited) sequence <3,3> value-list index(Equality) R (A) B22 B12 B02 B21 B11 B01 3 (1x3+0) 0 1 0 0 0 1 2 0 0 1 1 0 0 1 0 0 1 0 1 0 2 0 0 1 1 0 0 8 1 0 0 1 0 0 2 0 0 1 1 0 0 2 0 0 1 1 0 0 0 0 0 1 0 0 1 7 1 0 0 0 1 0 5 0 1 0 1 0 0 6 1 0 0 0 0 1 4 0 1 0 0 1 0

Example sequence <3,3> range-encoded index R (A) B12 B02 B11 B01 3 1 0 1 1 2 1 1 0 0 1 1 1 1 0 2 1 1 0 0 8 0 0 0 0 2 1 1 0 0 2 1 1 0 0 0 1 1 1 1 7 0 0 1 0 5 1 0 0 0 6 0 0 1 1 4 1 0 1 0

RangeEval-OPT