Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “An Introduction to Multidimensional Database Technology,”

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Correlation and Linear Regression
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Multidimensional Databases Prof. Navneet Goyal Computer Science Department BITS, Pilani.
Basic Business Statistics (10th Edition)
QUANTITATIVE DATA ANALYSIS
1 Business 90: Business Statistics Professor David Mease Sec 03, T R 7:30-8:45AM BBC 204 Lecture 11 = Finish Chapter Numerical Descriptive Measures (NDM)
Chap 3-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 3 Describing Data: Numerical.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Lecture 6: Multiple Regression
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Warehouse and Data Cube Lecture Notes for Chapter 3 Introduction to Data Mining By.
© 2003 Prentice-Hall, Inc.Chap 3-1 Business Statistics: A First Course (3 rd Edition) Chapter 3 Numerical Descriptive Measures.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
1 Basic concepts of On-Line Analytical processing DT211 /4.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Examples of continuous probability distributions: The normal and standard normal.
Continuous Probability Distributions
Business Intelligence & Multi-Dimensional Databases Nirmal Jonnalagedda.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
JDS Special Program: Pre-training1 Basic Statistics 01 Describing Data.
Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “ An Introduction to Multidimensional Database Technology,
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#11 By Dr. Osman Sadiq Paracha.
Online analytical processing (OLAP) is a category of software technology that enables analysts, managers, and executives to gain insight into data through.
Data Warehousing.
MRNA Expression Experiment Measurement Unit Array Probe Gene Sequence n n n Clinical Sample Anatomy Ontology n 1 Patient 1 n Disease n n ProjectPlatform.
UNIT-II Principles of dimensional modeling
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
Data Warehousing Multidimensional Analysis
AP Statistics Semester One Review Part 1 Chapters 1-3 Semester One Review Part 1 Chapters 1-3.
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
ANOVA, Regression and Multiple Regression March
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
What is OLAP?.
Data Warehousing.
The Data Warehouse Chapter Operational Databases = transactional database  designed to process individual transaction quickly and efficiently.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Copyright © Archer Decision Sciences, Inc. Our Model Store DimensionProduct Dimension District Region Total Brand Manufacturer Total StoresProducts.
SQL Server Analysis Services Understanding Unified Dimension Model (UDM)
Statistical Methods © 2004 Prentice-Hall, Inc. Week 3-1 Week 3 Numerical Descriptive Measures Statistical Methods.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Copyright © 2006, Oracle. All rights reserved. Defining OLAP Concepts.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
STAT 4030 – Programming in R STATISTICS MODULE: Basic Data Analysis
Basic Estimation Techniques
Basic Estimation Techniques
HMI 7530– Programming in R STATISTICS MODULE: Basic Data Analysis
MEGN 537 – Probabilistic Biomechanics Ch.3 – Quantifying Uncertainty
Introduction to Essbase
BUS173: Applied Statistics
S.M .JOSHI COLLEGE ,HADAPSAR
Online analytical processing (OLAP) is a category of software technology that enables analysts, managers, and executives to gain insight into data through.
Presentation transcript:

Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “An Introduction to Multidimensional Database Technology,” by Kenan Technologies.

Learning Objectives 1. Multidimensional Databases 2. Contrast MDD and Relational Databases 3. When is MDD (In)appropriate? 4. MDD Features 5. Pros/Cons of MDD

What is a Multi-Dimensional Database? A multidimensional database (MDDB) is a computer software system designed to allow for the efficient and convenient storage and retrieval of large volumes of data that are  intimately related and  stored, viewed and analyzed from different perspectives. These perspectives are called dimensions.

Contrasting Relational and Multi-Dimensional Models: An Example The Relational Structure

Multidimensional Structure Measurement Dimension Positions Dimension

The “Classic” Star Schema PERIOD KEY Store Dimension Time Dimension Product Dimension STORE KEY PRODUCT KEY PERIOD KEY Dollars Units Price Period Desc Year Quarter Month Day Fact Table PRODUCT KEY Store Description City State District ID District Desc. Region_ID Region Desc. Regional Mgr. Product Desc. Brand Color Size Manufacturer STORE KEY

Differences between MDDB and Relational Databases Normalized RelationalMDDB Data reorganized based on query. Perspectives are placed in the fields – tells us nothing about the contents Perspectives embedded directly in the structure. Browsing and data manipulation are not intuitive to user Data retrieval and manipulation are easy Slows down for large datasets due to multiple JOIN operations needed. Fast retrieval for large datasets due to predefined structure. Flexible. Anything an MDDB can do, can be done this way. Relatively Inflexible. Changes in perspectives necessitate reprogramming of structure.

Contrasting Relational Model and MDD-Example 2

Mutlidimensional Representation

Viewing Data - An Example Assume that each dimension has 10 positions, as shown in the cube above How many records would be there in a relational table? Implications for viewing data from an end-user standpoint?

Adding Dimensions- An Example

When is MDD (In)appropriate? First, consider situation 1

When is MDD (In)appropriate? Now consider situation 2 1. Set up a MDD structure for situation 1, with LAST NAME and Employee# as dimensions, and AGE as the measurement. 2. Set up a MDD structure for situation 2, with MODEL and COLOR as dimensions, and SALES VOLUME as the measurement.

When is MDD (In)appropriate? Note the sparseness in the second MDD representation MDD Structures for the Situations

When is MDD (In)appropriate? Highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis. When there are no interrelationships, the MDD structure is not appropriate.

MDD Features - Rotation Also referred to as “data slicing.” Each rotation yields a different slice or two dimensional table of data – a different face of the cube.

MDD Features - Rotation

MDD Features - Ranging The end user selects the desired positions along each dimension. Also referred to as "data dicing." The data is scoped down to a subset grouping

MDD Features - Roll-Ups & Drill Downs The figure presents a definition of a hierarchy within the organization dimension. Aggregations perceived as being part of the same dimension. Moving up and moving down levels in a hierarchy is referred to as “roll-up” and “drill-down.”

MDD Features: Multidimensional Computations Well equipped to handle demanding mathematical functions. Can treat arrays like cells in spreadsheets. For example, in a budget analysis situation, one can divide the ACTUAL array by the BUDGET array to compute the VARIANCE array. Applications based on multidimensional database technology typically have one dimension defined as a "business measurements" dimension. Integrates computational tools very tightly with the database structure.

The Time Dimension TIME as a predefined hierarchy for rolling-up and drilling-down across days, weeks, months, years and special periods, such as fiscal years. – Eliminates the effort required to build sophisticated hierarchies every time a database is set up. – Extra performance advantages

Pros/Cons of MDD Cognitive Advantages for the User Ease of Data Presentation and Navigation, Time dimension Performance Less flexible Requires greater initial effort

Tableau and some more Statistics

NORMAL DISTRIBUTIONS

Normal Distributions Most common type of distribution and one that is required for many statistical methods A function that represents the distribution of many random variables as a symmetrical bell curve

Normal Distribution

Normal Distribution (PDF)

Beauty of the Normal Distribution No matter what  and  are, the area between  -  and  +  is about 68%; the area between  -2  and  +2  is about 95%; and the area between  -3  and  +3  is about 99.7%. Almost all values fall within 3 standard deviations.

Rule

Examples

Is my data Normal?? 1.Look at the histogram! Does it appear bell shaped? 2.Compute descriptive summary measures- are mean, median, mode all relatively similar? 3.Do 2/3 of your observations lie within 1 std dev. of your mean? Do 95% lie within 2 std devs? 4.Look at the probability plot? Is it linear? 5.Run tests of normality (i.e. Kolmogorov- Smirnov). Warning: highly influenced by sample size!

Correlation Coefficient Measures the relative strength and direction that between 2 or more variables Requires 2+ measurements from the same independent variable/ individual Often visualized with scatterplots Often described by the correlation coefficient – This value often ranges from +1 to -1 – The closer the C.C. is to abs(1), the stronger the correlation. The closer to 0, the weaker the relationship – Positive indicates that high values in one variable are associated with high values in the second – Negative indicates high values in one variable are associated with low values in the second

R-squared This is a test to determine how well your model fits your data Performed after a regression analysis, ANOVA, or other experimental design R-squared = Explained Variation/Total Variation – Number between 0 and 1 (or 0% and 100%) – 0% indicates that the model explains none of the variability of the data around the mean – 100% indicates the model explains all the variability of the data around the mean

Running R in Tableau Integrating-Tableau-and-R-for-data- analytics-in-four-simple-steps Integrating-Tableau-and-R-for-data- analytics-in-four-simple-steps Download R and RStudio (both free) Note you must have Tableau 8.1 or greater! – You can download a free 30-day trial of Tableau (newest version 9.1), or as full-time students receive a 1-yr license Rserve.txt (on Canvas) – Gives the commands that you must run in RStudio to start an RServe instance

Lab: Introduction to Tableau See Canvas

HOMEWORK

Homework Chapter 7 from Keep up with Quants Tableau Workbook Project Work – Rough draft of two visualizations Finish Linear Programming from last week