Chapter 4: Dimensions, Hierarchies, Operations, Modeling

Slides:



Advertisements
Similar presentations
Vorlesung Datawarehousing Table of Contents Prof. Rudolf Bayer, Ph.D. Institut für Informatik, TUM SS 2002.
Advertisements

OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
5.1Database System Concepts - 6 th Edition Chapter 5: Advanced SQL Advanced Aggregation Features OLAP.
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Chap8: Trends in DBMS 8.1 Database support for Field Entities 8.2 Content-based retrieval 8.3 Introduction to spatial data warehouses 8.4 Summary.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
1 9 Adv. DBMS Data Warehouse CSC5301 Review Hachim Haddouti.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
OLAP OPERATIONS. OLAP ONLINE ANALYTICAL PROCESSING OLAP provides a user-friendly environment for Interactive data analysis. In the multidimensional model,
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
DWH – Dimesional Modeling PDT Genči. 2 Outline Requirement gathering Fact and Dimension table Star schema Inside dimension table Inside fact table STAR.
Ahsan Abdullah 1 Data Warehousing Lecture-12 Relational OLAP (ROLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Data Warehouse & Data Mining
IST722 Data Warehousing Business Intelligence Development with SQL Server Analysis Services and Excel 2013 Michael A. Fudge, Jr.
Analysis Services 101 Dave Fackler, MCDBA, MCSE, MCT Director, Business Intelligence Practice Intellinet Corporation.
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
1 Cube Computation and Indexes for Data Warehouses CPS Notes 7.
OnLine Analytical Processing (OLAP)
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Prof. Bayer, DWH, Ch.4, SS Chapter 4: Dimensions, Hierarchies, Operations, Modeling.
Data Warehousing.
BI Terminologies.
Ahsan Abdullah 1 Data Warehousing Lecture-10 Online Analytical Processing (OLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center.
BUSINESS ANALYTICS AND DATA VISUALIZATION
Prof. R. BayerDWH, Ch. 3-1, SS Ch.3 The Multidimensional Data Model Ch. 3.1 Introduction to MDD Model Requirements: must support typical analyses,
Prof. Bayer, DWH, CH. 4.5, SS Chapt.4.5 Modeling of Features of Dimensions Within a dimension hierarchy, elements at the same level may have different.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
UNIT-II Principles of dimensional modeling
1 On-Line Analytic Processing Warehousing Data Cubes.
Chapter 3.2 Basic Concepts of the MDD-Model
What is OLAP?.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Prof. Bayer, DWH, Ch.7, SS20021 Chapt. 7 Multidimensional Hierarchical Clustering Fig. 3.1 Hierarchies in the `Juice and More´ schema Year (3) Month (12)
1 Online Analytical Processing (OLAP) Anjali Gupta Mithun Arora Aameek Singh Kranthi Kumar.
SQL Server Analysis Services Understanding Unified Dimension Model (UDM)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Data Analysis Decision Support Systems Data Analysis and OLAP Data Warehousing.
Plan for Final Lecture What you may expect to be asked in the Exam?
Data Analysis and OLAP Dr. Ms. Pratibha S. Yalagi Topic Title
Introduction to SQL Server Analysis Services
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 Business Intelligence and Data Warehouses
On-Line Analytic Processing
Entity-Relationship Model
Chapter 13 The Data Warehouse
What is OLAP OLAP allows to model data in a multidimensional way like a data cube in order to look for the data from many perspectives.
Chapter 5: Advanced SQL Database System concepts,6th Ed.
Data storage is growing Future Prediction through historical data
Data Warehouse.
Online Analytical Processing OLAP
DATA CUBE Advanced Databases 584.
Data Warehousing: Data Models and OLAP operations
DataMart (Data Warehouse) Tool:
Introduction of Week 9 Return assignment 5-2
DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation
OLAP in DWH Ján Genči PDT.
DWH – Dimesional Modeling
Online analytical processing (OLAP) is a category of software technology that enables analysts, managers, and executives to gain insight into data through.
Slides based on those originally by : Parminder Jeet Kaur
Chapt. 7 Multidimensional Hierarchical Clustering
Presentation transcript:

Chapter 4: Dimensions, Hierarchies, Operations, Modeling Prof. Bayer, DWH, Ch.4, SS 2000

Chapter 4.1 Hierarchical Dimensions Def: Hierarchical Dimensions are composite keys with an order on the key attributes. Prefixes are allowed as keys. Ex: dimension Time = ( Year, Month, Day) legal keys are: (Year) or (Year, Month) or (Year, Month, Day) Def: Basic facts are values in cells with full foreign keys Prof. Bayer, DWH, Ch.4, SS 2000

Aggregations, Summaries Def: Aggregations are facts in cells with partial keys. These facts are derived by aggregation functions. In a cube with derived facts the aggregation function must be specified. Ex: Sales on a monthly basis Sales (Year, Month) = S Sales (Year, Month, Days) Aggregation Functions: count, sum, avg, min, max, ... Prof. Bayer, DWH, Ch.4, SS 2000

Note on Aggregations Aggregations may be stored explicitely in the cube, but then they should be secured by integrity constraints Aggregations may be virtual and must be computed on demand when needed i.e., classical tradeoff between storage space, performance, flexibility Prof. Bayer, DWH, Ch.4, SS 2000

Relational Modeling Expand and complete partial key by ALL (Year, Month, ALL) (ALL, Month, ALL) (ALL, ALL, ALL) to obtain simple and complete relational keys via special symbol ALL Question: SQL to compute complete cube with all aggregations from base-cube? Prof. Bayer, DWH, Ch.4, SS 2000

Hierarchy Example Prof. Bayer, DWH, Ch.4, SS 2000

Chapter 4.2: OLAP Operations Def: Roll-up computes higher aggregations from lower aggregations or base facts according to hierarchies Ex: for base facts (Year, Month, Day) there are 3 roll-up functions: Roll-up (Year, Month, ALL) Roll-up (Year, ALL, ALL) Roll-up (ALL, ALL, ALL) which are supported in general (canonical roll-ups) Prof. Bayer, DWH, Ch.4, SS 2000

therefore 23 -1 aggregations or in general 2m -1 aggregations Additional Roll-ups: (ALL, Month, ALL) etc. therefore 23 -1 aggregations or in general 2m -1 aggregations for m hierarchy levels Note: see later chapters for the support of arbitrary aggregations Note: for m dimensions with h1, h2, ...hm hierarchy levels there are different aggregations for a given aggregation function. Prof. Bayer, DWH, Ch.4, SS 2000

Dim1: (4, 5) = cardinality of the dimension levels Dim2: (6, 7, 2) Size of base cube 2-dim example Dim1: (4, 5) = cardinality of the dimension levels Dim2: (6, 7, 2) (4 5) ( 6 7 2) 1680 = Size of base cube 42 20 84 Prof. Bayer, DWH, Ch.4, SS 2000

Size of hierarchically aggregated Cube 4 - 6 7 2 336 5 840 84 168 120 42 24 20 1 Number of cells per aggregation function 1645 Prof. Bayer, DWH, Ch.4, SS 2000

Size of completely aggregated cube 4 5 6 7 2 | 1 2 7 14 24 24 x 6 =144 168 5 x 168 = 840 840 + 168 6 x 168 1008 4 x 1008 = 4032 5 x 1008 = 4032 + 1008 = 5040 : : Prof. Bayer, DWH, Ch.4, SS 2000

Computation with binary Tree 4 5 1 20 4 1 6 1 6 24 120 20 4 1 1 1 7 7 1 7 7 20 168 24 28 4 140 840 120 2 1 2 1 1 1 2 1 2 1 2 1 2 2 1 2 120 140 40 20 336 168 48 24 56 28 8 4 1680 840 240 280 Prof. Bayer, DWH, Ch.4, SS 2000

Size of the Cube Lemma: Given a data cube with m dimensions with h1, ..., hm hierarchy levels resp. Let the hierarchy levels of dimension i have Then the base cube has and the cube with all aggregations has Prof. Bayer, DWH, Ch.4, SS 2000

Size of the Cube (2) The aggregated cube is larger than the base cube by the factor Prof. Bayer, DWH, Ch.4, SS 2000

Size of the hierarchically aggregated Cube For a hierarchy i with hi levels and there are hierarchical aggregation possibilities , i.e. Lemma: A hierarchically completely aggregated data cube has Prof. Bayer, DWH, Ch.4, SS 2000

size of the hierarchically aggregated cube plus base cube Ex: (4 5) (6 7 2) size of the hierarchically aggregated cube plus base cube = (1 + 4 + 20) * (1 + 6 + 42 + 84) = 25 * 133 = 3325 Ex: (4 5) (6 7 2) ( 8 3) size of base cube: 40,320 hierarchically aggregated cube plus base: = (1 + 4 + 20) * (1 + 6 + 42 + 84) * (1 + 8 + 24) = 3325 * 33 = 109,725 Prof. Bayer, DWH, Ch.4, SS 2000

hierarchically aggregated cube plus base: Ex: (4 5) (6 7 2) ( 8 3) (5 9) size of base cube: 1 814,400 hierarchically aggregated cube plus base: = 109,725 * (1 + 5 + 45) = 5 595,975 Prof. Bayer, DWH, Ch.4, SS 2000

Additional comments on aggregations 1. In addition to the size of the complete cube there is a factor of 5 for the various aggregation functions, e.g. sum, avg, min, max, count, ... 2. So far we did not consider general restrictions, e.g. „all Saturdays in March“ or „vacation months July and August“, which cross bounds of hierarchy levels Interactive query formulation results in an unlimited number of aggregations Optimization: restrictions corresponding to hierarchy levels shoud be pushed down, since they lead to query boxes Prof. Bayer, DWH, Ch.4, SS 2000

Roll-up (Year, Month, ALL) Roll-up (Year, ALL, ALL) Note: See later chapters for multidimensional indexes and MHC techniques and optimization of ROLAP-algebra to support hierarchical canonical aggregations like Roll-up (Year, Month, ALL) Roll-up (Year, ALL, ALL) Roll-up (ALL, ALL, ALL) but not Roll-up ( ALL, Month, ALL) Prof. Bayer, DWH, Ch.4, SS 2000

Non-hierarchical aggregation, e.g. March for all years Optimization Problem Non-hierarchical aggregation, e.g. March for all years decompose into union of several restrictions, e.g. S Sales (Year, Month, Day) where Month = March and (Year = 1996 or Year = 1997 or Year = 1998) see later for translation into ROLAP expression and transformations for optimization Prof. Bayer, DWH, Ch.4, SS 2000

Aggregation for month e.g. by covering QB of weeks and postfiltering Multiple Hierarchies e.g. the time hierarchy Aggregation for month e.g. by covering QB of weeks and postfiltering Prof. Bayer, DWH, Ch.4, SS 2000

Navigation Operations Drill Down: first show single result for aggregated value, e.g. sales per day, then show: hourly values for days with very high or very low sales in order to plan working hours for sales people better Other Examples: daily sales during Christmas season vacation bookings for skiing on fasching Prof. Bayer, DWH, Ch.4, SS 2000

Roll-up: Compute Aggregations Prof. Bayer, DWH, Ch.4, SS 2000

Slicing Selection of a smaller data cube or even reduction of a multidimensional datacube to fewer dimensions by a point restriction in some dimension (becomes pivot element) Prof. Bayer, DWH, Ch.4, SS 2000

Dicing (würfeln) rotate result, to show another view, e.g. exchanging rows and columns Slice management precomputing and caching of several slices for later or special use, e.g. for a special sales person Prof. Bayer, DWH, Ch.4, SS 2000

Chapter 4.3 Modeling Purpose: analysis of business processes, characteristic facts (Kennzahlen) for managers to support decisions (DSS) Steps of Decision Process: 1. Which business processes to model and analyze? 2. What are the measures, where do they come from? 3. Which degree of details, e.g. minutes like in SAP? Which precision is required for OLAP? 4. Common properties of measures to determine dimensions? Brand, Time, geogr. Region, Productgroup? Dependencies between levels of hierarchies? Prof. Bayer, DWH, Ch.4, SS 2000

5. Attributes of dimensions, e.g. screen size of TV cc and PS for cars focal length for camera Problem: how common are properties and dimensions? Non common properties cannot be modeled by levels of dimensions, are called features at GfK (up to 50), are numbered with meaning dependent on specific dimension element, e.g. TV: screen size color audio system Car: transmission cc PS #cyl ... Prof. Bayer, DWH, Ch.4, SS 2000

6. Constant or changing attributes of dimensions? E.g. New models of car makers new powersource: electrical, hydrogen, solar attributes are rather stable, but still should be planned ahead! (mergers like Daimler-Crysler) 7. Sparsity: one hypercube or several, i.e. multicube model? Influences storage requirements, query formulation and performance, cannot be hidden easily from user, maybe by views? Prof. Bayer, DWH, Ch.4, SS 2000

8. Caching and management of aggregates? Number of aggregates Maintenance costs Avg. Response time 100% 0% Total costs Time Optimal Number of aggregates Prof. Bayer, DWH, Ch.4, SS 2000

Chapter 4.4 Comparison of OLAP Architectures MOLAP: Multidimensional OLAP ROLAP: Relational OLAP 3. HOLAP: Hybrid OLAP Prof. Bayer, DWH, Ch.4, SS 2000

MOLAP Architecture Prof. Bayer, DWH, Ch.4, SS 2000

MDDBMS in ANSI-X3-Sparc Prof. Bayer, DWH, Ch.4, SS 2000

Logical components of a MDDBMS Prof. Bayer, DWH, Ch.4, SS 2000

ROLAP Architecture Prof. Bayer, DWH, Ch.4, SS 2000

HOLAP Architecture Prof. Bayer, DWH, Ch.4, SS 2000

flexible precomputations, partial aggregates parallelism Reasons for MOLAP performance write access Data Marts functional power Reasons for ROLAP scalability flexible precomputations, partial aggregates parallelism DB-mamagement and ACID Prof. Bayer, DWH, Ch.4, SS 2000