Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “An Introduction to Multidimensional Database Technology,”

Similar presentations


Presentation on theme: "Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “An Introduction to Multidimensional Database Technology,”"— Presentation transcript:

1 Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “An Introduction to Multidimensional Database Technology,” by Kenan Technologies.

2 Learning Objectives 1. Multidimensional Databases 2. Contrast MDD and Relational Databases 3. When is MDD (In)appropriate? 4. MDD Features 5. Pros/Cons of MDD

3 What is a Multi-Dimensional Database? A multidimensional database (MDDB) is a computer software system designed to allow for the efficient and convenient storage and retrieval of large volumes of data that are  intimately related and  stored, viewed and analyzed from different perspectives. These perspectives are called dimensions.

4 Contrasting Relational and Multi-Dimensional Models: An Example The Relational Structure

5 Multidimensional Structure Measurement Dimension Positions Dimension

6 The “Classic” Star Schema PERIOD KEY Store Dimension Time Dimension Product Dimension STORE KEY PRODUCT KEY PERIOD KEY Dollars Units Price Period Desc Year Quarter Month Day Fact Table PRODUCT KEY Store Description City State District ID District Desc. Region_ID Region Desc. Regional Mgr. Product Desc. Brand Color Size Manufacturer STORE KEY

7 Differences between MDDB and Relational Databases Normalized RelationalMDDB Data reorganized based on query. Perspectives are placed in the fields – tells us nothing about the contents Perspectives embedded directly in the structure. Browsing and data manipulation are not intuitive to user Data retrieval and manipulation are easy Slows down for large datasets due to multiple JOIN operations needed. Fast retrieval for large datasets due to predefined structure. Flexible. Anything an MDDB can do, can be done this way. Relatively Inflexible. Changes in perspectives necessitate reprogramming of structure.

8 Contrasting Relational Model and MDD-Example 2

9 Mutlidimensional Representation

10 Viewing Data - An Example Assume that each dimension has 10 positions, as shown in the cube above How many records would be there in a relational table? Implications for viewing data from an end-user standpoint?

11 Adding Dimensions- An Example

12 When is MDD (In)appropriate? First, consider situation 1

13 When is MDD (In)appropriate? Now consider situation 2 1. Set up a MDD structure for situation 1, with LAST NAME and Employee# as dimensions, and AGE as the measurement. 2. Set up a MDD structure for situation 2, with MODEL and COLOR as dimensions, and SALES VOLUME as the measurement.

14 When is MDD (In)appropriate? Note the sparseness in the second MDD representation MDD Structures for the Situations

15 When is MDD (In)appropriate? Highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis. When there are no interrelationships, the MDD structure is not appropriate.

16 MDD Features - Rotation Also referred to as “data slicing.” Each rotation yields a different slice or two dimensional table of data – a different face of the cube.

17 MDD Features - Rotation

18 MDD Features - Ranging The end user selects the desired positions along each dimension. Also referred to as "data dicing." The data is scoped down to a subset grouping

19 MDD Features - Roll-Ups & Drill Downs The figure presents a definition of a hierarchy within the organization dimension. Aggregations perceived as being part of the same dimension. Moving up and moving down levels in a hierarchy is referred to as “roll-up” and “drill-down.”

20 MDD Features: Multidimensional Computations Well equipped to handle demanding mathematical functions. Can treat arrays like cells in spreadsheets. For example, in a budget analysis situation, one can divide the ACTUAL array by the BUDGET array to compute the VARIANCE array. Applications based on multidimensional database technology typically have one dimension defined as a "business measurements" dimension. Integrates computational tools very tightly with the database structure.

21 The Time Dimension TIME as a predefined hierarchy for rolling-up and drilling-down across days, weeks, months, years and special periods, such as fiscal years. – Eliminates the effort required to build sophisticated hierarchies every time a database is set up. – Extra performance advantages

22 Pros/Cons of MDD Cognitive Advantages for the User Ease of Data Presentation and Navigation, Time dimension Performance Less flexible Requires greater initial effort

23 Tableau and some more Statistics

24 NORMAL DISTRIBUTIONS

25 Normal Distributions Most common type of distribution and one that is required for many statistical methods A function that represents the distribution of many random variables as a symmetrical bell curve

26 Normal Distribution

27 Normal Distribution (PDF)

28

29

30 Beauty of the Normal Distribution No matter what  and  are, the area between  -  and  +  is about 68%; the area between  -2  and  +2  is about 95%; and the area between  -3  and  +3  is about 99.7%. Almost all values fall within 3 standard deviations.

31 68-95-99.7 Rule

32 Examples

33 Is my data Normal?? 1.Look at the histogram! Does it appear bell shaped? 2.Compute descriptive summary measures- are mean, median, mode all relatively similar? 3.Do 2/3 of your observations lie within 1 std dev. of your mean? Do 95% lie within 2 std devs? 4.Look at the probability plot? Is it linear? 5.Run tests of normality (i.e. Kolmogorov- Smirnov). Warning: highly influenced by sample size!

34 Correlation Coefficient Measures the relative strength and direction that between 2 or more variables Requires 2+ measurements from the same independent variable/ individual Often visualized with scatterplots Often described by the correlation coefficient – This value often ranges from +1 to -1 – The closer the C.C. is to abs(1), the stronger the correlation. The closer to 0, the weaker the relationship – Positive indicates that high values in one variable are associated with high values in the second – Negative indicates high values in one variable are associated with low values in the second

35 R-squared This is a test to determine how well your model fits your data Performed after a regression analysis, ANOVA, or other experimental design R-squared = Explained Variation/Total Variation – Number between 0 and 1 (or 0% and 100%) – 0% indicates that the model explains none of the variability of the data around the mean – 100% indicates the model explains all the variability of the data around the mean

36 Running R in Tableau http://www.simafore.com/blog/bid/120209/ Integrating-Tableau-and-R-for-data- analytics-in-four-simple-steps http://www.simafore.com/blog/bid/120209/ Integrating-Tableau-and-R-for-data- analytics-in-four-simple-steps Download R and RStudio (both free) Note you must have Tableau 8.1 or greater! – You can download a free 30-day trial of Tableau (newest version 9.1), or as full-time students receive a 1-yr license Rserve.txt (on Canvas) – Gives the commands that you must run in RStudio to start an RServe instance

37 Lab: Introduction to Tableau See Canvas

38 HOMEWORK

39 Homework Chapter 7 from Keep up with Quants Tableau Workbook Project Work – Rough draft of two visualizations Finish Linear Programming from last week


Download ppt "Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “An Introduction to Multidimensional Database Technology,”"

Similar presentations


Ads by Google