Physical Storage materialized views indexes partitions ETL CDC.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Data Warehousing and Decision Support, part 2
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Multidimensional Data Rtrees Bitmap indexes. R-Trees For “regions” (typically rectangles) but can represent points. Supports NN, “where­am­I” queries.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
BTrees & Bitmap Indexes
March 2010ACS-4904 Ron McFadyen1 Indexes B-tree index Bitmapped index Bitmapped join index A data warehousing DBMS will likely provide these, or variations,
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Physical Data Warehouse Design Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
By N.Gopinath AP/CSE. Two common multi-dimensional schemas are 1. Star schema: Consists of a fact table with a single table for each dimension 2. Snowflake.
Utility Service Database Design a database to keep track of service calls for a utility company: Customers call to report problems Call center manages.
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Data Partitioning in VLDB Tal Olier
Data Warehousing.
BI Terminologies.
1 Multiple Table Queries. 2 Objectives  Retrieve data from more than one table by joining tables  Using IN and EXISTS to query multiple tables  Nested.
SQL Jan 20,2014. DBMS Stores data as records, tables etc. Accepts data and stores that data for later use Uses query languages for searching, sorting,
Indexes and Views Unit 7.
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Sales Dim Date Dim Customers Dim Products Dim Categories Dim Geography The data warehouse is a simple and standard one, after all we.
7 Strategies for Extracting, Transforming, and Loading.
Chapter 5 Index and Clustering
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Columnstore Indexes.
Data Warehousing.
Chapter 8 Views and Indexes 第 8 章 视图与索引. 8.1 Virtual Views  Views:  “virtual relations”. Another class of SQL relations that do not exist physically.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
Indexes 22 Index Table Key Row pointer … WHERE key = 22.
9 Copyright © 2006, Oracle. All rights reserved. Summary Management.
All DBMSs provide variations of b-trees for indexing B-tree index
Storage Access Paging Buffer Replacement Page Replacement
Indexes By Adrienne Watt.
Indexing Structures for Files and Physical Database Design
Index An index is a performance-tuning method of allowing faster retrieval of records. An index creates an entry for each value that appears in the indexed.
CS 440 Database Management Systems
Parallel Databases.
Database Management System
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Applying Data Warehouse Techniques
Database Performance Tuning and Query Optimization
Chapter 15 QUERY EXECUTION.
Tapping the Power of Your Historical Data
File Processing : Query Processing
Databases.
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Applying Data Warehouse Techniques
Lecture 15: Bitmap Indexes
Typically data is extracted from multiple sources
Physical Storage Indexes Partitions Materialized views March 2004
Physical Storage Indexes Partitions Materialized views March 2006
Applying Data Warehouse Techniques
Physical Storage Indexes Partitions Materialized views March 2005
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Data warehouse architecture CIF, DM Bus Matrix Star schema
Query Processing CSD305 Advanced Databases.
Point-in-time balances Physical database Aggregation ETL Architecture
Dimensional Model January 16, 2003
Chapter 11 Database Performance Tuning and Query Optimization
Self-organizing Tuple Reconstruction in Column-stores
Applying Data Warehouse Techniques
Table Suitable for Bitmap Index
Applying Data Warehouse Techniques
Data Warehousing.
Presentation transcript:

Physical Storage materialized views indexes partitions ETL CDC

Indexes B+-tree index Bitmapped index Bitmapped join index

B-tree structures Most used access structure in database systems. A B+-tree is a balanced tree B+-trees are very good in environments with mixed reading and writing operations, concurrency, exact searches and range searches. B+-tree indexes present excellent performance figures when used to find few rows in big tables.

Bitmapped index Consider the following Customer table Cust_id gender province phone 22 M Ab (403) 444-1234 44 M Mb (204) 777-6789 77 F Sk (306) 384-8474 88 F Sk (306) 384-6721 99 M Mb (204) 456-1234 Note that Mb appears in rows 2, 5 Note that Sk appears in rows 3, 4; Ab in row 1 Note that M appears in rows 1, 2, 5; F in rows 3, 4

Bitmapped index A bitmapped index comprises a collection of bit arrays. There is one bit array for each distinct value of the indexed attribute. Useful for low-cardinality attributes Since gender has two values there are two bit arrays. Each bit array has the same number of bits as there are rows in the table. If we construct a bitmapped index for gender we would have two bit arrays of 5 bits each: M 1 1 0 0 1 F 0 0 1 1 0

Bitmapped index If we construct a bitmapped index for Province we would have three bit arrays of 5 bits each: Ab 1 0 0 0 0 Mb 0 1 0 0 1 Sk 0 0 1 1 0

Bitmapped index Consider a query Select Sum(s.amount) From Sales s Inner Join Customer c On ( … ) where c.gender =M and c.province = Mb The database query processor could choose to AND the arrays for gender=M and province=Mb The result indicates that rows 2 and 5 of Customer satisfy the query; only these two rows need to be joined to Sales M 1 1 0 0 1 Mb 0 1 0 0 1 Result 0 1 0 0 1

Bitmapped Join Index In general, a join index is a structure containing index entries (attribute value, row pointers), where the attribute values are in one table, and the row pointers are to related rows in another table Consider Customer Sales Date

Cust_id Cust_id Store_id Date_id Join Index Customer Cust_id gender province phone 22 M Ab (403) 444-1234 44 M Mb (204) 777-6789 77 F Sk (306) 384-8474 88 F Sk (306) 384-6721 99 M Mb (204) 456-1234 Sales Cust_id Store_id Date_id Amount row 1 22 1 90 100 2 44 2 7 150 3 22 2 33 50 4 44 3 55 50 5 99 3 55 25

Bitmapped Join Index In some database systems, a bitmapped join index can be created which indexes Sales by an attribute of customer: home province. Here is SQL for the index: CREATE BITMAP INDEX cust_sales_bji ON Sales(Customer.province) FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id;

Bitmapped Join Index There are three province values in Customer. The join index will have three entries where each has a province value and a bitmap: Mb 0 1 0 1 1 Ab 1 0 1 0 0 Sk 0 0 0 0 0 The bitmap index shows that rows 2, 4, 5 of the Sales fact table are rows for customers with province = Mb The bitmap index shows that rows 1, 3 of the Sales fact table are rows for customers with province = Ab The bitmap index shows that no rows of the Sales fact table are rows for customers with province = Sk

Bitmapped Join Index The bitmap join index could be used to evaluate the following query. In this query, the CUSTOMER table will not even be accessed; the query is executed using only the bitmap join index and the sales table. SELECT SUM(Sales.dollar_amount) FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id AND Customer.province = Mb; The bitmap index will show that rows 2, 4, 5 of the Sales fact table are rows for customers with province = M

Partitions Physical storage can be organized in partitions where the value of an attribute is used to determine where a row is placed. Often the attribute is a date. Dates are often used to restrict a query. Archiving older data is facilitated since one only has to drop the oldest partition.

Change Data Capture ETL processes need to obtain the changes that have occurred in the legacy environment. These changes can be captured in many ways: from the database logs database replication by analyzing data in tables according to some timestamp attribute by analyzing data in tables comparing data to what is in the warehouse