Data Warehouse and the Star Schema CSCI 242 ©Copyright 2015, David C. Roberts, all rights reserved.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Data Warehousing and Decision Support, part 2
Anindya Datta Debra VanderMeer Krithi Ramamritham Presented by –
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
SQL SERVER 2012 XVELOCITY COLUMNSTORE INDEX Conor Cunningham Principal Architect SQL Server Engine.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
DENORMALIZATION CSCI 6442 © Copyright 2015, David C. Roberts, all rights reserved.
Dimensional Modeling Business Intelligence Solutions.
Prof. Navneet Goyal Computer Science Department BITS, Pilani
Data Warehousing - 2 ISYS 650. Data Warehouse Design - Star Schema - Dimension tables – contain descriptions about the subjects of the business such as.
Data Warehouse IMS5024 – presented by Eder Tsang.
Multidimensional Database in Context of DB2 OLAP Server Khang Pham Class: CSCI397-16C Instructor: Professor Renner.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
By N.Gopinath AP/CSE. Two common multi-dimensional schemas are 1. Star schema: Consists of a fact table with a single table for each dimension 2. Snowflake.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Data Warehouse and the Star Schema CSCI 242 ©Copyright 2015, David C. Roberts, all rights reserved.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
CREATING SUBTOTALS IN EXCEL You own a sporting good store. Your store is divided into departments based on individual sports. You know how much.
Copying, Managing, and Transforming Data With DTS.
1DBTest2008. Motivation Background Relational Data Warehousing (DW) SQL Server 2008 Starjoin improvement Testing Challenge Extending Enterprise-class.
AN INTRODUCTION TO EXECUTION PLAN OF QUERIES These slides have been adapted from a presentation originally made by ORACLE. The full set of original slides.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Designing a Data Warehouse Issues in DW design. Three Fundamental Processes Data Acquisition Data Storage Data a Access.
Data Warehousing.
Data Warehouse and the Star Schema CSCI 242 ©Copyright 2014, David C. Roberts, all rights reserved.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
CS 345: Topics in Data Warehousing Thursday, October 21, 2004.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “ An Introduction to Multidimensional Database Technology,
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Presented By: Muhammad Rizvi Raghuram Vempali Surekha Vemuri.
1 Chapter 7 Optimizing the Optimizer. 2 The Oracle Optimizer is… About query optimization Is a sophisticated set of algorithms Choosing the fastest approach.
Data Partitioning in VLDB Tal Olier
Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
Analytics: SQL or NoSQL? Richard Taylor Chair Business Intelligence SIG.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
1 C-Store: A Column-oriented DBMS By New England Database Group.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
Star Transformations Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Normalized model vs dimensional model
Planning Warehouse Storage Chapter 9. Data Partitioning zBreaking up a data into separate physical units that can be handled independently zEase of: -
DW-2: Designing a Data Warehousing System 용 환승 이화여자대학교
Indexes and Views Unit 7.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 4 Logical & Physical Database Design
Chapter 5 Index and Clustering
Two-Tier DW Architecture. Three-Tier DW Architecture.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
1 Indexes ► Sort data logically to improve the speed of searching and sorting operations. ► Provide rapid retrieval of specified rows from the table without.
Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling.
4 Copyright © Oracle Corporation, All rights reserved. Modeling the Data Warehouse.
COMP 430 Intro. to Database Systems Denormalization & Dimensional Modeling.
1 Copyright © 2006, Oracle. All rights reserved. Defining OLAP Concepts.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
IT 5433 LM4 Physical Design. Learning Objectives: Describe the physical database design process Explain how attributes transpose from the logical to physical.
Data Warehouse and the Star Schema CSCI 242 ©Copyright 2016, David C. Roberts, all rights reserved.
Index An index is a performance-tuning method of allowing faster retrieval of records. An index creates an entry for each value that appears in the indexed.
Selected Topics: External Sorting, Join Algorithms, …
Dimensional Model January 16, 2003
Data Warehouse and the Star Schema
CSCI 6442 Main Memory Database
Presentation transcript:

Data Warehouse and the Star Schema CSCI 242 ©Copyright 2015, David C. Roberts, all rights reserved

Red Brick Invented data warehouse; they sold a hardware product with a star schema database You loaded the Red Brick Warehouse and then queried it for OLTP It featured new optimizations for star schemas, was very fast 2

Enter Sybase Sybase learned the optimization and developed their own product. The Sybase product was a stand-alone software data warehouse product It couldn’t do general-purpose database work, was just a data warehouse They appear to have copied the Red Brick idea, without selling hardware 3

Enter Oracle Oracle, later, also copied the same optimization They added a bitmap index to their database product, and added the star schema optimization Now their product could do data warehouse as well as database 4

Status Today Oracle dominates the field today IBM eventually bought Red Brick so still offers some sort of Red Brick product Sybase offers their OLTP product, now as an offering of SAP 5

THE ALGORITHM So what is this algorithm that is so copied? 6

Optimizing Star Queries Build a bitmap index on each foreign key column of the fact table Index is a 2-dimensional array, one column for each row being indexed, one row per value of that column Bitmap indexes are typically much smaller than b-tree indexes, that can be larger than the data itself 7

Bitmap Index Example 8

Query Processing The typical query is a join of foreign keys of dimension tables to the fact table This is processed in two phases: 1. From the fact table, retrieve all rows that are part of the result, using bitmap indexes 2. Join the result of the step above to the dimension tables 9

Example Query Find sales and profits from the grocery departments of stores in the West and Southwest districts over the last three quarters 10

Example Query SELECT store.sales_district, time.fiscal_period, SUM(sales.dollar_sales) revenue, SUM(dollar_sales) - SUM(dollar_cost) income FROM sales, store, time, product WHERE sales.store_key = store.store_key AND sales.time_key = time.time_key AND sales.product_key = product.product_key AND time.fiscal_period IN ('3Q95', '4Q95', '1Q96') and product.department = 'Grocery' AND store.sales_district IN ('San Francisco', 'Los Angeles') GROUP BY store.sales_district, time.fiscal_period; 11

Phase 1 Finding the rows in the SALES table (using bitmap indexes): SELECT... FROM sales WHERE store_key IN (SELECT store_key FROM store WHERE sales_district IN ('WEST', 'SOUTHWEST')) AND time_key IN (SELECT time_key FROM time WHERE quarter IN ('3Q96', '4Q96', '1Q97')) AND product_key IN (SELECT product_key FROM product WHERE department = 'GROCERY'); 12

Phase 2 Now the fact table is joined to dimension tables. For dimension tables of small cardinality, a full-table scan may be used. For large cardinality, a hash join could be used. 13

The Star Transformation Use bitmap indexes to retrieve all relevant rows from the fact table, based on foreign key values – This happens very fast Join this result set to the dimension tables – If there are many values, a hash join may be used – If there are fewer values, a b-tree driven join may be used 14