March 2010ACS-4904 Ron McFadyen1 Aggregate management References: Lawrence Corr Aggregate improvement www.intelligententerprise.com/011004/415warehouse1_1.jhtml.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Alternative Database topology: The star schema
Technical BI Project Lifecycle
DATA WAREHOUSE DATA MODELLING
Jennifer Widom On-Line Analytical Processing (OLAP) Introduction.
Dimensional Modeling Business Intelligence Solutions.
Prof. Navneet Goyal Computer Science Department BITS, Pilani
March 2010ACS-4904 Ron McFadyen1 Indexes B-tree index Bitmapped index Bitmapped join index A data warehousing DBMS will likely provide these, or variations,
Dimensional Modeling – Part 2
Chapter 11 Data Management Layer Design
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Data Warehousing (Kimball, Ch.2-4) Dr. Vairam Arunachalam School of Accountancy, MU.
Chapter 4: Organizing and Manipulating the Data in Databases
1.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
RDB/1 An introduction to RDBMS Objectives –To learn about the history and future direction of the SQL standard –To get an overall appreciation of a modern.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.
Faster and Smarter Data Warehouses with Oracle OLAP 11g.
1 Data Warehouses BUAD/American University Data Warehouses.
What is a schema ? Schema is a collection of Database Objects. Schema Objects are logical structures created by users to contain, or reference, their data.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Normalized model vs dimensional model
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
UNIT-II Principles of dimensional modeling
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Database Terms Hernandez, Chapter 3. Data/Information The values you store in the database are data. Pieces of Data in and of themselves is not particularly.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
ADVANCED TOPICS IN RELATIONAL DATABASES Spring 2011 Instructor: Hassan Khosravi.
BI Practice March-2006 COGNOS 8BI TOOLS COGNOS 8 Framework Manager TATA CONSULTANCY SERVICES SEEPZ, Mumbai.
An inside look into Retail Sales & Purchases. Refresh: (About US Census Bureau) Agency of the Federal Statistical System Accumulates and reports on American.
Session id: Darrell Hilliard Senior Delivery Manager Oracle University Oracle Corporation.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
INCREMENTAL AGGREGATION After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
All DBMSs provide variations of b-trees for indexing B-tree index
Database Management  .
Assignment 2 Due Thursday Feb 9, 2006
Dimensional Model January 14, 2003
Inventory is used to illustrate:
Retail Sales is used to illustrate a first dimensional model
On-Line Analytical Processing (OLAP)
CMPE 226 Database Systems April 11 Class Meeting
Enhance BI Applications and Simplify Development
An Introduction to Data Warehousing
Physical Storage Indexes Partitions Materialized views March 2004
Physical Storage Indexes Partitions Materialized views March 2006
Assignment 2 Due Thursday Feb 9, 2006
Retail Sales is used to illustrate a first dimensional model
Warehouse Architecture
Physical Storage Indexes Partitions Materialized views March 2005
Data warehouse architecture CIF, DM Bus Matrix Star schema
Dimensional Modeling.
Aggregate Improvement and Lost, shrunken, and collapsed
Point-in-time balances Physical database Aggregation ETL Architecture
Retail Sales is used to illustrate a first dimensional model
Role Playing Dimensions (p )
Dimensional Model January 16, 2003
Data Warehousing Concepts
Aggregate improvement Lost, shrunken, and collapsed Ralph Kimball
Examines blended and separate transaction schemas
Many aggregates can be defined for one base star schema
Presentation transcript:

March 2010ACS-4904 Ron McFadyen1 Aggregate management References: Lawrence Corr Aggregate improvement Lost, shrunken, and collapsed Ralph Kimball Aggregate navigation with (almost) no metadata

March 2010ACS-4904 Ron McFadyen2 Aggregate Navigation From “20 criteria for dimensional friendly systems” #4. Open Aggregate Navigation. The system uses physically stored aggregates as a way to enhance performance of common queries. These aggregates, like indexes, are chosen silently by the database if they are physically present. End users and application developers do not need to know what aggregates are available at any point in time, and applications are not required to explicitly code the name of an aggregate. All query processes accessing the data, even those from different application vendors, realize the full benefit of aggregate navigation.

March 2010ACS-4904 Ron McFadyen3 Aggregation Created for performance reasons Using one base star schema, many aggregates can be defined Some dimensions could be lost, some shrunken or collapsed – see Corr’s articles CustomerDate Product Store Transaction Price Cost Tax Qty Consider the Transaction-level schema:

March 2010ACS-4904 Ron McFadyen4 Aggregation CustomerDate Product Store Transaction Price Cost Tax Qty Month Product Store Price Cost Tax Qty Transaction-level schema Aggregate schema with: lost dimensions (transaction, customer) shrunken dimension (month) metrics are accumulated over the transactions, customers, month

March 2010ACS-4904 Ron McFadyen5 Aggregation Suppose you must create the schema: What are the steps to do so? –Product and Store already exist, but the others must be created Month Product Store Price Cost Tax Qty

March 2010ACS-4904 Ron McFadyen6 Aggregation Design Goal 1 Aggregates must be stored in their own fact tables; each distinct aggregation level must occupy its own unique fact table

March 2010ACS-4904 Ron McFadyen7 Aggregation Design Goal 2 The dimension tables attached to the aggregate fact tables must be shrunken versions of the dimension tables associated with the base fact table Product ProductSK SKU Description Brand Category Department Category SK? Category Department Category is a shrunken version of Product Attributes names from Product What values does the PK of Category assume? What name should the PK of Category have?

March 2010ACS-4904 Ron McFadyen8 Aggregation Design Goal 3 The base atomic fact table and all of its related aggregate fact tables must be associated together as a “family of schemas” so that the aggregate navigator knows which tables are related to one another. –Kimball’s paper presents an algorithm/technique for mapping an SQL statement from a base schema to an aggregate schema

March 2010ACS-4904 Ron McFadyen9 Aggregation Design Goal 4 Force all SQL created by any end-user data access tool or application to refer exclusively to the base fact table and its associated full-size dimension tables.

March 2010ACS-4904 Ron McFadyen10 Kimball’s Aggregate Navigation Algorithm 1. For any given SQL statement presented to the DBMS, find the smallest fact table that has not yet been examined in the family of schemas referenced by the query. "Smallest" in this case means the least number of rows. Finding the smallest unexamined schema is a simple lookup in the aggregate navigator metadata table. Choose the smallest schema and proceed to step Compare the table fields in the SQL statement to the table fields in the particular Fact and Dimension tables being examined. This is a series of lookups in the DBMS system catalog. If all of the fields in the SQL statement can be found in the Fact and Dimension tables being examined, alter the original SQL by simply substituting destination table names for original table names. No field names need to change. If any field in the SQL statement cannot be found in the current Fact and Dimension tables, then go back to step 1 and find the next larger Fact table. 3. Run the altered SQL. It is guaranteed to return the correct answer because all of the fields in the SQL statement are present in the chosen schema.

March 2010ACS-4904 Ron McFadyen11 Aggregate Navigation “Other data mart capabilities worth mentioning are aggregation and aggregate navigation. The nature of multidimensional analysis dictates the extensive use of aggregation. … Both UDB and O8i provide a database-resident means for navigating user queries to the optimum aggregated tables”

March 2010ACS-4904 Ron McFadyen12 Materialized Views An MV is a view where the result set is pre-computed and stored in the database Periodically an MV must be re-computed to account for updates to its source tables MVs represent a performance enhancement to a DBMS as the result set is immediately available when the view is referenced in a SQL statement MVs are used in some cases to provide summary or aggregates for data warehousing transparently to the end-user Query optimization is a DBMS feature that may re-write an SQL statement supplied by a user.

March 2010ACS-4904 Ron McFadyen13 Materialized Views Oracle example: CREATE MATERIALIZED VIEW hr.mview_employees AS SELECT employees.employee_id, employees. FROM employees UNION ALL SELECT new_employees.employee_id, new_employees. FROM new_employees; SQL Server has “indexed views” … see