OLAP CS 543 – Data Warehousing. CS 543 - Data Warehousing (Sp 2007-2008) - Asim LUMS2 Where Does OLAP Fit In? (1) OLAP = On-line analytical processing.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Technical BI Project Lifecycle
Online Analytical Processing OLAP
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Dimensional Modeling – Part 2
Exploiting the DW data DW is a platform for creating a wide array of reports It solves data feed problems, but does not lead to specific decision support.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
Components and Architecture CS 543 – Data Warehousing.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Physical Design CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 Physical Design Steps 1. Develop standards 2.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
INTRODUCTION TO OLAP MIS 497. Why OLAP? Online Analytical Processing vs. Online Transaction Processing Online Analytical Processing vs. Online Transaction.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Designing a Data Warehouse
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
1 Basic concepts of On-Line Analytical processing DT211 /4.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Ahsan Abdullah 1 Data Warehousing Lecture-12 Relational OLAP (ROLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Introduction to OLAP / Microsoft Analysis Services
Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “ An Introduction to Multidimensional Database Technology,
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
Dr. Abdul Basit Siddiqui Assistant Professor FURC, Islamabad.
OnLine Analytical Processing (OLAP)
1 Data Warehousing Lecture-13 Dimensional Modeling (DM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
Faster and Smarter Data Warehouses with Oracle OLAP 11g.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Online analytical processing (OLAP) is a category of software technology that enables analysts, managers, and executives to gain insight into data through.
Data Warehousing.
Ahsan Abdullah 1 Data Warehousing Lecture-10 Online Analytical Processing (OLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
BUSINESS ANALYTICS AND DATA VISUALIZATION
Designing Aggregations. Performance Fundamentals - Aggregations Pre-calculated summaries of data Intersections of levels from each dimension Tradeoff.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
UNIT-II Principles of dimensional modeling
1 On-Line Analytic Processing Warehousing Data Cubes.
Building Dashboards SharePoint and Business Intelligence.
OLAP in DWH Ján Genči PDT. 2 Outline OLAP Definitions and Rules The term OLAP was introduced in a paper entitled “Providing On-Line Analytical.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
What is OLAP?.
Session id: Darrell Hilliard Senior Delivery Manager Oracle University Oracle Corporation.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
1 Copyright © 2006, Oracle. All rights reserved. Defining OLAP Concepts.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
OLAP Theory-English version On-Line Analytical processing (Buisness Intelligence) Ing.Skorkovský,CSc Department of Corporate Economy Faculty of Economics.
On-Line Analytic Processing
Chapter 13 The Data Warehouse
Data Warehouse.
Introduction of Week 9 Return assignment 5-2
OLAP in DWH Ján Genči PDT.
Online analytical processing (OLAP) is a category of software technology that enables analysts, managers, and executives to gain insight into data through.
OLAP Defined by E F Codd “relational databases were never intended to provide the very powerful functions for data synthesis, analysis and consolidation.
Presentation transcript:

OLAP CS 543 – Data Warehousing

CS Data Warehousing (Sp ) - Asim LUMS2 Where Does OLAP Fit In? (1) OLAP = On-line analytical processing. OLAP is a characterization of applications, not a database design technique. Idea is to provide very fast response time in order to facilitate iterative decision-making. Analytical processing requires access to complex aggregations (as opposed to record-level access).

CS Data Warehousing (Sp ) - Asim LUMS3 Where Does OLAP Fit In? (2) Information is conceptually viewed as “cubes” for simplifying the way in which users access, view, and analyze data. Quantitative values are known as “facts” or “measures.”  e.g., sales $, units sold, etc. Descriptive categories are known as “dimensions.”  e.g., geography, time, product, scenario (budget or actual), etc.  Dimensions are often organized in hierarchies that represent levels of detail in the data (e.g., UPC, SKU, product subcategory, product category, etc.).

CS Data Warehousing (Sp ) - Asim LUMS4 OLAP FASMI Test Fast: Delivers information to the user at a fairly constant rate. Most queries should be delivered to the user in five seconds or less. Analysis: Performs basic numerical and statistical analysis of the data, pre-defined by an application developer or defined ad hoc by the user. Shared: Implements the security requirements necessary for sharing potentially confidential data across a large user population. Multi-dimensional: The essential characteristic of OLAP. Information: Accesses all the data and information necessary and relevant for the application, wherever it may reside and not limited by volume....from the OLAP Report by Pendse and Creeth.

CS Data Warehousing (Sp ) - Asim LUMS5 Need for Multidimensional Analysis A simple analysis  How many units of product A did we sell in the store in DHA, Lahore Typically, decision support requires more complex analyses How much revenue did the new product X generate during the last three months, broken down by individual months, in the Southern Region, by individual stores, broken down by the promotions, compared to estimates, and compared to the previous version of the the product?

CS Data Warehousing (Sp ) - Asim LUMS6 Kinds of Analyses Roll-ups to provide summaries and aggregates along the hierarchies of the dimensions Drill-downs from the top level to the lowest along the hierarchies of the dimensions Calculations involving facts and metrics Algebraic equations involving key performance indicators Moving averages and growth percentages Trend analyses using statistical methods

CS Data Warehousing (Sp ) - Asim LUMS7

8

9 OLAP? The name On-Line Analytical Processing was coined in a paper by E.F. Codd in 1993 (“Providing On-Line Analytical Processing for User Analysts”) A definition  OLAP is a category of software technology that enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access in a a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user

CS Data Warehousing (Sp ) - Asim LUMS10 OLAP Features

CS Data Warehousing (Sp ) - Asim LUMS11 Dimensional Analysis (1)

CS Data Warehousing (Sp ) - Asim LUMS12 Dimensional Analysis (2)

CS Data Warehousing (Sp ) - Asim LUMS13 Some Queries Display the total sales of all products for past five years in all stores Compare total sales for all stores, product by product, between years 2000 and Show comparison of sales by individual stores, product by product, between years 2000 and 1999 only for those products with reduced sales. Show the results of the previous queries, but rotating the columns with rows

CS Data Warehousing (Sp ) - Asim LUMS14 Hypercubes Multi-dimension cubes  Hard to visualize and display beyond three dimensions Multi-dimensional domain structure (MDS)  Represents each dimension as a line showing the values

CS Data Warehousing (Sp ) - Asim LUMS15 MDS

CS Data Warehousing (Sp ) - Asim LUMS16 Display of Hypercubes

CS Data Warehousing (Sp ) - Asim LUMS17

CS Data Warehousing (Sp ) - Asim LUMS18

CS Data Warehousing (Sp ) - Asim LUMS19 Drill-Down and Roll-Up

CS Data Warehousing (Sp ) - Asim LUMS20 Slice-and-Dice or Rotation

CS Data Warehousing (Sp ) - Asim LUMS21 OLAP Models/Implementations MOLAP: OLAP implemented with a multi-dimensional database. ROLAP: OLAP implemented with a relational database. HOLAP: OLAP implemented with a hybrid of multi- dimensional and relational database technologies. DOLAP: OLAP implemented for desktop decision support environments.

CS Data Warehousing (Sp ) - Asim LUMS22 ROLAP and MOLAP

CS Data Warehousing (Sp ) - Asim LUMS23 MOLAP Implementations OLAP has historically been implemented through use of multi- dimensional databases (MDDs). Dimensions are key business factors for analysis:  geographies (zip, state, region,...)  products (item, product category, product department,...)  dates (day, week, month, quarter, year,...) Very high performance via fast look-up into “cube” data structure to retrieve pre-calculated results. “Cube” data structures allow pre-calculation of aggregate results for each possible combination of dimensional values. Use of application programming interface (API) for access via front-end tools.

CS Data Warehousing (Sp ) - Asim LUMS24

CS Data Warehousing (Sp ) - Asim LUMS25 MOLAP Implementations Need to consider both maintenance and storage implications when designing strategy for when to build cubes. Maintenance Considerations: Every data item received into MDD must be aggregated into every cube (assuming “to-date” summaries are maintained). Storage Considerations: Although cubes get much smaller (e.g., more dense) as dimensions get less detailed (e.g., year vs. day), storage implications for building hundreds of cubes can be significant.

CS Data Warehousing (Sp ) - Asim LUMS26 MOLAP Implementations Typically outperform relational database technology because all answers are pre-computed into cubes (and overhead for accessing cubes is very low). Difficult to scale because of combinatorial explosion in the number and size of cubes when dimensions of significant cardinality are required. Beyond tens (sometimes small hundreds) of thousands of entries in a single dimension will break the MOLAP model because the pre-computed cube model does not work well when the cubes are very sparse in the population of individual cells. See

CS Data Warehousing (Sp ) - Asim LUMS27 Virtual Cubes Virtual cubes are used when there is a need to join information from two dissimilar cubes that share one or more common dimensions. Similar to a relational view; two (or more) cubes are linked along common dimension(s). Often used to save space by eliminating redundant storage of information.

CS Data Warehousing (Sp ) - Asim LUMS28 Partitioned Cubes One logical cube of data can be spread across multiple physical cubes on separate (or same) servers. The divide-and-conquer approach of partitioned cubes helps to mitigate the scalability limitations of a MOLAP environment. Ideal cube partitioning is completely invisible to end users.

CS Data Warehousing (Sp ) - Asim LUMS29 ROLAP Implementations Advances in database technologies and front-end tools have begun to allow deployment of OLAP using ANSI SQL RDBMS implementations. ROLAP facilitates deployment of much larger dimension tables than MOLAP implementations. Front-end tools to facilitate GUI access to multi- dimensional analysis capabilities. Aggregate awareness allows exploitation of pre-built summary tables for some front-end tools. Star schema designs are often used to facilitate OLAP against relational databases.

CS Data Warehousing (Sp ) - Asim LUMS30

CS Data Warehousing (Sp ) - Asim LUMS31 Simplified Third Normal Form (Retail)

CS Data Warehousing (Sp ) - Asim LUMS32 Simplified Star Schema

CS Data Warehousing (Sp ) - Asim LUMS33 Simplified Star Schema A vastly simplified physical data model! Collapse dimensional hierarchies into a single table for each dimension and create a single fact table from the header and detail records: Fewer tables. Fewer joins to get results.

CS Data Warehousing (Sp ) - Asim LUMS34 Star Schema for High Performance Business question: How many $ in raincoats did I sell in the first week of January through stores in Boston? Assume: 4 Billion rows in fact table. 20 different kinds (size, color, style) of raincoats (product category) out of 50,000 UPCs in store. 8 stores out of 400 are in BOSTON SMSA. 2 years of POS history in DBMS.

CS Data Warehousing (Sp ) - Asim LUMS35 Star Schema for High Performance Simple (poor performance) approach to query execution: 1. Join item table with filtering on raincoat product category (very selective) to fact table. 2. Join date table with filtering by week (next most selective) to result table. 3. Join store table with filtering on store to result table from step Aggregate.

CS Data Warehousing (Sp ) - Asim LUMS36 Star Schema for High Performance Advanced (better performance) approach to query execution: 1. Cartesian product join between dimensional tables. * Result is 20 x 8 x 7 = 1,120 rows. 2. Use composite index on item:store:day into fact table for very selective access. * Access less than percent of data in fact table! Sophisticated cost-based optimizers will figure this out.

CS Data Warehousing (Sp ) - Asim LUMS37 Forcing a Cartesian Product Join Add an addition “join_value” column in each dimensional table. Set join_value to same value in all rows of the dimensional tables. Add additional where clause predicates joining on this column between dimensional tables. NOTE: This shouldn't be necessary with a “smart” optimizer.

CS Data Warehousing (Sp ) - Asim LUMS38 Forcing a Cartesian Product Join Sample code: select sum(sales.sales_amt) from d_sales_detail,store,item,period where d_sales_detail.store_id = store.store_id and d_sales_detail.item_id = item.item_id and d_sales_detail.day_dt = period.day_dt and period.day_dt between '23-NOV-2000' and '24- DEC-2000' and item.trade_style_cd = 'BARBIE' and store.state_cd = 'CA' and store.join_value = period.join_value and store.join_value = item.join_value and period.join_value = item.join_value ;

CS Data Warehousing (Sp ) - Asim LUMS39 Star Schema for High Performance Problem: What if I want to know raincoat sales in first week of January regardless of store? Answer: Performance advantage of composite index in traditional RDBMS is severely impaired! B-tree indexing techniques do not allow for flexibility in the use of dimensions for query purposes. Bit indexing (and variations thereof) often allows much more generality in achieving high performance from a star schema.

CS Data Warehousing (Sp ) - Asim LUMS40 Star Schema for High Performance Bottom Line: It is not at all unusual to obtain an order of magnitude (or more) in performance advantage using a star schema with advanced indexing versus a more traditional relational database implementation. Despite what vendors may tell you, star schemas cannot be effectively implemented for all DSS business applications and/or data models.

CS Data Warehousing (Sp ) - Asim LUMS41 ROLAP Relational OLAP often makes heavy use of summary tables to provide near instantaneous access for multi-dimensional queries. Foundation is usually star schema or snowflake database design. Allows OLAP with much larger data sets than multi-dimensional database (MDD) products using cube structures (MOLAP).

CS Data Warehousing (Sp ) - Asim LUMS42 ROLAP Number of summary tables can get very large if discipline is not enforced... Assume a retail database with the following two dimensions on the fact table... Calendar: Day, Week, Period, Quarter, Year, All Days Geography: Store, Zone, District, Region, All Stores

CS Data Warehousing (Sp ) - Asim LUMS43 ROLAP Summary tables in a naive implementation require all combinations of the dimensions at each aggregation level summary tables!... Add in item, SKU, subcategory, category, and all items...now we are up to 150 pre- aggregates!

CS Data Warehousing (Sp ) - Asim LUMS44 ROLAP Summary tables are more of a maintenance issue than a storage issue in most production implementations. Notice that summary tables get much smaller as dimensions get less detailed (e.g., year vs. day). Should plan for double the size of the unsummarized data for ROLAP summaries in most environments. Every detail record that is received into warehouse must aggregate into EVERY summary table (assuming "to-date" summaries are maintained).

CS Data Warehousing (Sp ) - Asim LUMS45 ROLAP Warning: Do not assume that dimensions are always simple hierarchies. Example: Items are not just category, subcategory, SKU, and atomic item.... what about trade styles or manufacturer? Now we need summary tables along these lines as well...another 120 summary tables! Calendar vs. accounting period vs. billing cycle can be even worse...

CS Data Warehousing (Sp ) - Asim LUMS46 ROLAP Many ROLAP products have devised ways to reduce the number of summary tables: Ability to build summaries on-the-fly as demanded by end- user applications. Ability to aggregate efficiently from subset of the summary tables. Tools exist in some products to assist in DBAs in selecting the "best” aggregations to build. HOLAP (Hybrid OLAP) tools allow co-existence of pre- built cubes alongside relational OLAP structures.

CS Data Warehousing (Sp ) - Asim LUMS47 Intelligent Aggregation Selection Maximum performance boost implies lots of disk for every pre-calculation. Minimum performance boost implies no disk with zero pre-calculation. Strategy is to use meta data to heuristically determine optimum set of aggregates from which all other aggregates can be derived.

CS Data Warehousing (Sp ) - Asim LUMS48 Aggregate Wizards

CS Data Warehousing (Sp ) - Asim LUMS49 Fact Table Aggregates Enhance performance on common queries at coarser granularities. Save space to permit storing more history than possible with finer granularities. Take advantage of need to store other facts (with similar samples) at a particular granularity.

CS Data Warehousing (Sp ) - Asim LUMS50 Aggregate Advice Coarser granularity decreases potential cardinality, but usually increases density (e.g., daily summary table is typically twice the size of weekly summary table - not seven times). Strongly consider omitting candidate aggregates where expected cardinality is more than 10% that of next finer granularity stored. Keep the detail for drill down, even if you deploy aggregates for performance.

CS Data Warehousing (Sp ) - Asim LUMS51 Bottom Line There are many implementation techniques for delivery of an OLAP environment. Must fully consider the performance, scalability, complexity, and flexibility characteristics when deciding between MOLAP, ROLAP, and HOLAP. Understand your tools and RDBMS!

CS Data Warehousing (Sp ) - Asim LUMS52 MOLAP Vs. ROLAP

CS Data Warehousing (Sp ) - Asim LUMS53 Implementation Issues Data design and preparation Administration and performance OLAP platforms OLAP tools and products

CS Data Warehousing (Sp ) - Asim LUMS54 Data Design and Preparation Characteristics of data  Stores and uses much less data compared to a DW  Data is summarized. You will rarely find data at the lowest levels of detail as in the DW  Data is more flexible for processing and analysis partly because there is much less data to work with  Every instance of the OLAP system in your environment is customized for the purpose that instance serves OLAP data is generally customized Types and levels  Static and dynamic summary data  Permanent and transient detailed data

CS Data Warehousing (Sp ) - Asim LUMS55 Administration Administering and managing OLAP systems should be handled with that of the DW environment Some considerations  Expectations on what data would be accessed and how  Selection of the right business dimensions  Selection of the right filters in loading data from the DW  Choosing the aggregation, summarization, and precalculation  Size of the multidimensional database  Access and security privileges  Backup and restore facilities  Drill-through to the data warehouse; drill-through to another OLAP instance

CS Data Warehousing (Sp ) - Asim LUMS56 Performance OLAP takes most of the queries that normalling would run against the DW OLAP is designed for complex queries; so it should enhance overall query performance of the DW environment OLAP can precalculate and pre-aggregate data for quick response

CS Data Warehousing (Sp ) - Asim LUMS57 OLAP Platforms Usually, the data warehouse and OLAP systems reside on the same platform in the start. Later when the data warehouse becmes large and OLAP is a common task, OLAP system is moved to another platform A separate platform is needed when  The size and usage of the DW consumes all resources  Many departmental users desire OLAP capabilities  The stability and performance of OLAP degrades  OLAP tools require a different platform configuration than the DW