Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.

Slides:



Advertisements
Similar presentations
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
Advertisements

Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Multidimensional Indexing
Hashing and Indexing John Ortiz.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Multidimensional Data
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
COMP 451/651 B-Trees Size and Lookup Chapter 1.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
Bitmap Index Buddhika Madduma 22/03/2010 Web and Document Databases - ACS-7102.
BTrees & Bitmap Indexes
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Quick Review of Apr 15 material Overflow –definition, why it happens –solutions: chaining, double hashing Hash file performance –loading factor –search.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Physical Data Warehouse Design Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
COMP 451/651 Multiple-key indexes
Primary Indexes Dense Indexes
BITMAP INDEXES Parin Shah (Id :- 207). Introduction A bitmap index is a special kind of index that stores the bulk of its data as bit arrays (commonly.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Overview of File Organizations and Indexing Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
CS 345: Topics in Data Warehousing Thursday, October 21, 2004.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
CSCE Database Systems Chapter 15: Query Execution 1.
Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
C-Store: Data Model and Data Organization Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.
Sec 14.7 Bitmap Indexes Shabana Kazi. Introduction A bitmap index is a special kind of index that stores the bulk of its data as bit arrays (commonly.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “If you don’t find it in the index, look very.
BITMAP INDEXES Sai Priya Rama Gopal SJSU ID : Class ID: 125.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Chapter 5 Multidimensional Indexes. One dimensional index can be used to support multidimensional query. F1=‘abcd’ F2= 123‘abcd#123’
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
File Organizations and Indexing
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
Multidimensional Access Structures COMP3017 Advanced Databases Dr Nicholas Gibbins –
1 Overview of Storage and Indexing Chapter 8. 2 Review: Architecture of a DBMS  A typical DBMS has a layered architecture.  The figure does not show.
BITMAP INDEXES Barot Rushin (Id :- 108).
Chapter 5. Multidimensional Indexes
How To Build a Compressed Bitmap Index
CS522 Advanced database Systems
Multidimensional Access Structures
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Database Management Systems (CS 564)
Introduction to Database Systems
Multidimensional Indexes
Lecture 19: Data Storage and Indexes
Presentation transcript:

Multidimensional Data

Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data about sales. - A sale is described by (store, day, item, color, size, custName, etc.). Sale = point in 6dim space. - A customer is described by (custName, age, salary, pcode, marital­status, etc.). Typical Queries Range queries: "How many customers for gold jewelry have age between 45 and 55, and salary less than 100K?" Nearest neighbor : "If I am at coordinates (a,b), what is the nearest McDonalds." They are expressible in SQL. Do you see how?

SQL Range queries: “How many customers for gold jewelry have age between 45 and 55, and salary less than 100K?” SELECT * FROM Customers WHERE age>=45 AND age<=55 AND sal<100; Nearest neighbor : “If I am at coordinates (a,b), what is the nearest McDonalds.” Suppose we have a relation Points(x,y,name) SELECT * FROM Points p WHERE p.name=‘McDonalds’ AND NOT EXISTS ( SELECT * FROM POINTS q WHERE (q.x-a)*(q.x-a)+(q.y-b)*(q.y-b) < (p.x-a)*(p.x-a)+(p.y-b)*(p.y-b) AND q.name=‘McDonalds’ );

Big Impediment For these types of queries, there is no clean way to eliminate lots of records that don't meet the condition of the WHERE­ clause. An Approach for range queries Index on attributes independently. - Intersect pointers in main memory to save disk I/O.

Attempt at using B-trees for MD-queries (example 14.26) Database = 1,000,000 points evenly distributed in a 1000×1000 square. Stored in 10,000 blocks (100 recs per block) B-tree secondary indexes on x and on y Range query {(x,y) : 450  x  550, 450  y  550} 100,000 pointers (i.e. 1,000,000/10) for the x range, and same for y 10,000 pointers for answer (found by pointer intersection) Retrieve 10,000 records. If they are stored randomly we need to do 10,000 I/O’s. Add here the cost of B-Trees: Root of each B-tree in main memory Suppose leaves have avg. 200 keys  500 disk I/O in each B-tree to get pointer lists  (for intermediate B-tree level) disk I/O’s Total 11,004 disk I/O’s, more than sequential scan of file = 10,000 I/O’s.

Bitmap Indexes

Suppose we have n tuples. A bitmap index for a field F is a collection of bit vectors of length n, one vector for each possible value that may appear in the field F. The vector for value v has 1 in position i if the i-th record has v in field F, and it has 0 there if not. (30, foo) (30, bar) (40, baz) (50, foo) (40, bar) (30, baz) foo bar baz

Graphical Picture MF CustidNameGenderRating 112JoeM3 115SamM5 119SueF5 112WuM Two bit strings for the Gender bitmap Customer table. We will index Gender and Rating. Note that this is just a partial list of all the records in the table Five bit strings for the Rating bitmap

Bitmap operations Bit maps are designed to support partial match and range queries. How? To identify the records holding a subset of the values from a given dimension, we can do a binary OR on the bitmaps from that dimension. - For example, the OR of bit strings for Age = (20, 21, 22) To identify the partial matches on a group of dimensions, we can simply perform a binary AND on the OR-ed maps from each dimension. These operations can be done very quickly since binary operations are natively supported by the CPU.

Bit Map example MF SELECT * FROM Customer WHERE gender = M AND (rating = 3 OR rating = 5) AND= First two records in our fact table are retrieved OR=

Gold-Jewelry Data (25; 60) (45; 60) (50; 75) (50; 100) (50; 120) (70; 110) (85; 140) (30; 260) (25; 400) (45; 350) (50; 275) (60; 260) What would be the bitmap index for age, and the bitmap index for salary? Suppose we want to find the jewelry buyers with an age in the range and a salary in the range What do we do?

Compressed Bitmaps Run length encoding is used to encode sequences or runs of zeros. Say that we have 20 zeros, then a 1, then 30 more zeros, then another 1. Naively, we could encode this as the integer pair - This would work. But what's the problem? - On a typical 32-bit machine, an integer uses 32 bits of storage. So our pair uses 64 bits. The original string only had 52!

Run Length Encoding So we must use a technique that stores our run-lengths as compactly as possible. Let’s say we have the string This is made up of runs with 3 zeros and 1 zero. - In binary, 3 = 11, while 1 is, of course, just 1 - This gives us a compressed representation of 111. The problem? - How do we decompress this? - We could interpret this as 1-11 or 11-1 or even This would give us three different strings after the decompression.

Proper Run Length Encoding Run of length i has i 0’s followed by a 1. Let’s say that that we have a run of length i. Let j be the number of bits required to represent i. To define a run, we will use two values: - The “unary” representation of j A sequence of j – 1 “1” bits followed by a zero (the zero signifies the end of the unary string) - The value i (using j bits) - The special cases of i = 0 and i = 1 use 00 and 01 respectively. Here j=1 so there are no 1 bits (since j-1 =0)  Here we have two “0” runs of length 13 and 6  13 can be represented by 4 bits, 6 requires 3 bits  Run 1: 3 “1” bits i →  Run 2: 2 “1” bits i →  Final compressed string:  Compression rate: 14/21 = 0.66 Example:

Decoding Let’s decode    3 Our sequence of run lengths is: 13, 0, 3. What’s the bitmap?

Summary Pros: 1.Bitmaps provide efficient storage for low cardinality dimensions. On sparse, high cardinality dimensions, compression can be effective. 2.Bit operations can support multi-dimensional partial match and range queries Cons: 1.De-compression requires run-time overhead 2.Bit operations on large maps and with large dimension counts can be expensive.