Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multidimensional Databases Prof. Navneet Goyal Computer Science Department BITS, Pilani.

Similar presentations


Presentation on theme: "Multidimensional Databases Prof. Navneet Goyal Computer Science Department BITS, Pilani."— Presentation transcript:

1 Multidimensional Databases Prof. Navneet Goyal Computer Science Department BITS, Pilani

2 June 1, 2015Dr. Navneet Goyal, BITS, Pilani2 Database Evolution Flat files Hierarchical and Network Relational Distributed Relational Multidimensional

3 June 1, 2015Dr. Navneet Goyal, BITS, Pilani3 Why Multi-Dimensional Databases? No single "best" data structure for all applications within an enterprise Organizations have abandoned the search for the HOLY GRAIL of globally accepted database Select the most appropriate data structure on a case-by-case basis from a palette of standard database structures Multidimensional Databases for OLAP?

4 June 1, 2015Dr. Navneet Goyal, BITS, Pilani4 Why Multi-Dimensional Databases? From econometric research conducted at MIT in the 1960s, the multidimensional database has matured into the database engine of choice for data analysis applications Inherent ability to integrate and analyze large volumes of enterprise data Offers a good conceptual fit with the way end- users visualize business data –Most business people already think about their businesses in multidimensional terms –Managers tend to ask questions about product sales in different markets over specific time periods

5 June 1, 2015Dr. Navneet Goyal, BITS, Pilani5 Multidimensional Database  Spreadsheets – A 2D database?  Functionalities  What about a stack of similar spreadsheets for different times?  Limitations? We can not relate data in different sheets easily

6 June 1, 2015Dr. Navneet Goyal, BITS, Pilani6 Multidimensional Database An MDDB is a computer software system designed to allow for the efficient and convenient storage and retrieval system of large volumes of data that is 1.Intimately related & 2.Stored, viewed and analyzed form different perspectives These perspectives are called Dimensions

7 June 1, 2015Dr. Navneet Goyal, BITS, Pilani7 A Motivating Example An automobile manufacturer wants to increase sale volumes by examining sales data collected throughout the organization. The evaluation would require viewing historical sales volume figures from multiple dimensions such as  Sales volume by model  Sales volume by color  Sales volume by dealer  Sales volume over time

8 June 1, 2015Dr. Navneet Goyal, BITS, Pilani8 Relational Structure

9 June 1, 2015Dr. Navneet Goyal, BITS, Pilani9 COLOR M O D E L Mini Van Sedan Coupe RedWhiteBlue 654355432 Sales Volumes Multidimensional Array Structure

10 June 1, 2015Dr. Navneet Goyal, BITS, Pilani10 RDBMS vs. MDD Multidimensional array structure represents a higher level of organization than the relational table Perspectives are embedded directly into the structure in the multidimensional model All possible combinations of perspectives containing a specific attribute (the color BLUE, for example) line up along the dimension position for that attribute. Perspectives are placed in fields in the relational model - tells us nothing about field contents.

11 June 1, 2015Dr. Navneet Goyal, BITS, Pilani11 MDD makes data browsing and manipulation intuitive to the end-user Any data manipulation action possible with a MDD is also possible using relational technology Substantial cognitive advantages in query formulation Substantial computational performance advantages in query processing when using MDD RDBMS vs. MDD

12 June 1, 2015Dr. Navneet Goyal, BITS, Pilani12 RDBMS vs. MDD

13 June 1, 2015Dr. Navneet Goyal, BITS, Pilani13 Mutlidimensional Representation Sales Volumes DEALERSHIP Mini Van Coupe Sedan BlueRedWhite M O D E L Clyde Gleason Carr COLOR

14 June 1, 2015Dr. Navneet Goyal, BITS, Pilani14 Viewing Data - An Example DEALERSHIP Sales Volumes M O D E L COLOR Assume that each dimension has 10 positions, as shown in the cube above

15 June 1, 2015Dr. Navneet Goyal, BITS, Pilani15 Viewing Data - An Example How many records would be there in a relational table? Implications for viewing data from an end-user standpoint? MODEL COLORDEALERSHIPVOLUME MINI VANBLUECLYDE 2 MINI VAN BLUEGLEASON 2 MINI VANBLUECARR 2 MINI VANREDCLYDE 1 MINI VANWHITEGLEASON 3 RECORD NUMBER.... 998 RECORD NUMBER.... 999 RECORD NUMBER....1000 SALES VOLUMES FOR ALL DEALERSHIPS

16 June 1, 2015Dr. Navneet Goyal, BITS, Pilani16 Volume figure when car type = SEDAN, color=BLUE, & dealer=GLEASON? RDBMS – all 1000 records might need to be searched to find the right record MDB has more ‘knowledge’ about where the data lies Max. of 30 position searches!! Average case 15 vs. 500 Performance Advantages

17 June 1, 2015Dr. Navneet Goyal, BITS, Pilani17 Total Sales across all colors and dealers when model = SEDAN? RDBMS – all 1000 records must be searched to get the answer MDB – Sum the contents of one 10x10 ‘slice’ Performance Advantages

18 June 1, 2015Dr. Navneet Goyal, BITS, Pilani18 Data manipulation that requires a minute in RDBMS may require only a few seconds in MDB MDBs are an order of magnitude faster than RDBMSs Performance benefits are more for queries that generate cross-tab views of data The performance advantages offered by multidimensional technology facilitates the development of interactive decision support applications like OLAP that can be impractical in a relational environment. Performance Advantages

19 June 1, 2015Dr. Navneet Goyal, BITS, Pilani19 Any data manipulation action possible with a multidimensional database is also possible using relational technology MDBs however offer several advantages like: – Ease of data presentation and navigation – Ease of maintenance – Performance RDBMS vs. MDB

20 June 1, 2015Dr. Navneet Goyal, BITS, Pilani20 Intuitive spreadsheet like data views are natural output of MDBs Obtaining the same views in a relational environment, requires either a complex SQL or a SQL generator against a RDB to convert the table outputs into a more intuitive format Top N queries are not possible with SQL at all Ease of Data Presentation & Navigation

21 June 1, 2015Dr. Navneet Goyal, BITS, Pilani21 Ease of maintenance because data is stored as it is viewed No additional overhead is required to translate user queries into requests for data To provide same intuitiveness, RDBs use indexes and sophisticated joins which require significant maintenance and storage Ease of Maintenance

22 June 1, 2015Dr. Navneet Goyal, BITS, Pilani22 Performance of MDBs can be matched by RDBs through database tuning Not possible to tune the database for all possible adhoc queries Tuning requires resources of an expensive DB specialist Aggregate navigators are helping RDBs to catch up with MDBs as far as aggregation queries are concerned Performance

23 June 1, 2015Dr. Navneet Goyal, BITS, Pilani23 Adding Dimension - An Example M O D E L Mini Van Coupe Sedan BlueRedWhite Clyde Gleason Carr COLOR Sales Volumes Coupe Sedan BlueRedWhite Clyde Gleason Carr COLOR DEALERSHIP Mini Van Coupe Sedan BlueRedWhite Clyde Gleason Carr COLOR JANUARYFEBRUARYMARCH Mini Van

24 June 1, 2015Dr. Navneet Goyal, BITS, Pilani24 When is MDD (In)appropriate? PERSONNEL LAST NAME EMPLOYEE# EMPLOYEE AGE SMITH0121 REGAN1219 FOX3163 WELD1431 KELLY5427 LINK0356 KRANZ4145 LUCUS3341 WEISS2319 First, consider situation 1

25 June 1, 2015Dr. Navneet Goyal, BITS, Pilani25 Now consider situation 2 1. Set up a MDD structure for situation 1, with LAST NAME and Employee# as dimensions, and AGE as the measurement. 2. Set up a MDD structure for situation 2, with MODEL and COLOR as dimensions, and SALES VOLUME as the measurement. When is MDD (In)appropriate?

26 June 1, 2015Dr. Navneet Goyal, BITS, Pilani26 When is MDD (In)appropriate? COLOR M O D E L Mini Van Sedan Coupe RedWhite Blue 654355 432 Sales Volumes EMPLOYEE # L A S T N A M E Kranz Weiss Lucas 413331 4519 Employee Age 413156 632119 Smith Regan Fox Weld Kelly Link 011454031223 27 Note the sparse between the two MDD representations MDD Structures for the Situations

27 June 1, 2015Dr. Navneet Goyal, BITS, Pilani27 When is MDD (In)appropriate? Our sales volume dataset has a great number of meaningful interrelationships Interrelationships more meaningful than individual data elements themselves. The greater the number of inherent interrelationships between the elements of a dataset, the more likely it is that a study of those interrelationships will yield business information of value to the company. Highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis

28 June 1, 2015Dr. Navneet Goyal, BITS, Pilani28 When is MDD (In)appropriate? No last name is matching with more than one emp # and no emp # is matching with more than one last name In contrast, there is a sales figure associated with every combination of model and color resulting in a completed filled up 3x3 matrix Performance suffers (RDB 9 vs. MDB 18)

29 June 1, 2015Dr. Navneet Goyal, BITS, Pilani29 When is MDD (In)appropriate? The relative performance advantages of storing multidimensional data in a multidimensional array increase as the size of the dataset increases The relative performance disadvantages of storing non-multidimensional data in a multidimensional array increase as the size of the dataset increases. NO inherent value of storing Non- multidimensional data (employee data) in multidimensional arrays

30 June 1, 2015Dr. Navneet Goyal, BITS, Pilani30 When is MDD Appropriate? The greater the number of inherent interrelationships between the elements of a dataset, the more likely it is that a study of those interrelationships will yield business information of value to the company. Most companies have limited time and resources to devote to analyzing data It therefore becomes critical that these highly interrelated dataset types be placed in a multidimensional data structure for greatest ease of access and analysis.

31 June 1, 2015Dr. Navneet Goyal, BITS, Pilani31 When is MDD Appropriate? Examples of applications that are suited for multidimensional technology: 1.Financial Analysis and Reporting 2.Budgeting 3.Promotion Tracking 4.Quality Assurance and Quality Control 5.Product Profitability

32 June 1, 2015Dr. Navneet Goyal, BITS, Pilani32 MDD Features - Rotation Sales Volumes COLOR M O D E L Mini Van Sedan Coupe RedWhiteBlue 654355432 MODEL C O L O R SedanCoupe Red White Blue 634553452 ( ROTATE 90 o ) View #1: ModelxColor View #2: ColorxModel Mini Van Also referred to as “data slicing.” Each rotation yields a different slice or two dimensional table of data.

33 June 1, 2015Dr. Navneet Goyal, BITS, Pilani33 MDD Features - Rotation

34 June 1, 2015Dr. Navneet Goyal, BITS, Pilani34 MDD Features - Rotation All the six views can be obtained by simple rotation In MDBs rotations are simple as no rearrangement of data is required Rotation is also referred to as “data slicing” No. of views 2D – 2 3D – 6 4D - ? 24

35 June 1, 2015Dr. Navneet Goyal, BITS, Pilani35 MDD Features - Ranging How sales volume of models painted with new metallic blue compared with the sales of normal blue color models? The user knows that only Sports Coupe and Mini Van models have received the new paint treatment Also the user knows that only 2 dealers viz, Carr and Clyde have unconstrained supply of these models

36 June 1, 2015Dr. Navneet Goyal, BITS, Pilani36 MDD Features - Ranging The end user selects the desired positions along each dimension. Also referred to as "data dicing." The data is scoped down to a subset grouping Sales Volumes DEALERSHIP Mini Van Coupe Metal Blue M O D E L Clyde Carr COLOR Normal Blue Mini Van Coupe Normal Blue Metal Blue Clyde Carr

37 June 1, 2015Dr. Navneet Goyal, BITS, Pilani37 MDD Features - Ranging The reduced array can now be rotated and used in computations in the same was as the parent array Referred to as “Data Dicing” as data is scoped down to a subset grouping Complex SQL query is required in RDB Performance is better in MDB as less resource consuming searches are required

38 June 1, 2015Dr. Navneet Goyal, BITS, Pilani38 MDD Features – Roll-Up & Drill-Down Users want different views of the same data For eg., Sales Volume by model vs, sales volume by dealership Many times views are similar Sales volume by dealership vs. volume by district Natural relationship between Sales Volumes at the DEALERSHIP level and Sales Volumes at the DISTRICT level Sales Volumes for all the dealerships in a district sum to the Sales Volumes for that district

39 June 1, 2015Dr. Navneet Goyal, BITS, Pilani39 MDD Features – Roll-Up & Drill-Down Multidimensional database technology is specially designed to facilitate the handling of these natural relationships Define two related aggregates on the same dimension One aggregation is dealership and the other district District is at a higher level of aggregation than dealership

40 June 1, 2015Dr. Navneet Goyal, BITS, Pilani40 MDD Features - Roll-Ups & Drill Downs The figure presents a definition of a hierarchy within the organization dimension. Aggregations perceived as being part of the same dimension. Moving up and moving down levels in a hierarchy is referred to as “roll-up” and “drill-down.”

41 June 1, 2015Dr. Navneet Goyal, BITS, Pilani41 MDD Features - Roll-Ups & Drill Downs

42 June 1, 2015Dr. Navneet Goyal, BITS, Pilani42 MDD Features: Drill-Down Through a Dimension

43 June 1, 2015Dr. Navneet Goyal, BITS, Pilani43 Queries High degree of structure in MDB makes the query language very simple and efficient Query language is intuitive Output is immediately useful to end user

44 June 1, 2015Dr. Navneet Goyal, BITS, Pilani44 Queries: Example Display sales volume by model for each dealership PRINT TOTAL.(SALES_VOLUME KEEP MODEL DEALERSHIP) Trends emerge and comparisons are easily made DEALERSHIP MODEL CLYDE GLEASON CARR MINI VAN7 56 SPORTS COUPE4 68 SEDAN 3 812

45 June 1, 2015Dr. Navneet Goyal, BITS, Pilani45 Queries: Example Corresponding SQL SELECT MODEL, DEALERSHIP, SUM(SALES_VOLUME) FROM SALES_VOLUME GROUP BY MODEL, DEALERSHIP ORDER BY MODEL, DEALERSHIP MODEL|DEALERSHIP|SUM(SALES_VOLUME) MINI VAN|CLYDE|7 MINI VAN|GLEASON|5 MINI VAN|CARR|6 SPORTS COUPE|CLYDE|4 SPORTS COUPE|GLEASON|6 SPORTS COUPE|CARR|8 SEDAN|CLYDE|3 SEDAN|GLEASON|8 SEDAN|CARR |12

46 June 1, 2015Dr. Navneet Goyal, BITS, Pilani46 Queries: Example Use report writer in addition to SQL and we get MINI VAN CLYDE7 GLEASON5 CARR6 SPORTS COUPE CLYDE4 GLEASON5 CARR8 SEDAN CLYDE3 GLEASON8 CARR12

47 June 1, 2015Dr. Navneet Goyal, BITS, Pilani47 MDD Features: Multidimensional Computations Well equipped to handle demanding mathematical functions. Can treat arrays like cells in spreadsheets. For example, in a budget analysis situation, one can divide the ACTUAL array by the BUDGET array to compute the VARIANCE array. Applications based on multidimensional database technology typically have one dimension defined as a "business measurements" dimension. Integrates computational tools very tightly with the database structure.

48 June 1, 2015Dr. Navneet Goyal, BITS, Pilani48 MDD Features: Multidimensional Computations BUSINESS MEASUREMENTS Mini Van Coupe Actual Budget Variance 16 120.33 11 100.1 8 10 - 0.2 16 0.0 Sedan Sales Volumes

49 June 1, 2015Dr. Navneet Goyal, BITS, Pilani49 The Time Dimension TIME as a predefined hierarchy for rolling-up and drilling-down across days, weeks, months, years and special periods, such as fiscal years. –Eliminates the effort required to build sophisticated hierarchies every time a database is set up. –Extra performance advantages

50 June 1, 2015Dr. Navneet Goyal, BITS, Pilani50 Contrasting Relational Model and MD Model

51 June 1, 2015Dr. Navneet Goyal, BITS, Pilani51 RDBMS vs. MDDB DO I still use RDBMS for my DW? MDDBs store data in hypercube, i.e., multidimensional array RDBMS store data as tables with row and columns that do not map directly to multidimensional view that user have of data EDW – RDBMS Data Marts - MDDB

52 June 1, 2015Dr. Navneet Goyal, BITS, Pilani52 RDBMS vs. MDDB: Trade-Offs SIZE –MDDBs limited by size Mid – 1990s 10GB caused problems Today – 100GB is OK –Large DWs are still better served by relational front-ends running against high performance and scalable RDBMS VOLATILITY –Highly volatile data are better handled by RDBMS –MDDBs take long to load and update

53 June 1, 2015Dr. Navneet Goyal, BITS, Pilani53 RDBMS vs. MDDB: Trade-Offs AGGREGATE STRATEGY –MDDBs support aggregates better –RDBMSs are catching up with the help of Aggregate Navigators INVESTMENT PROTECTION –Most organizations already have made significant investments in relational technology and skill sets –Continued use for another purpose (DW) provides additional ROI and lowers technical risk of failure –MDDBs – need to acquire new software and train staff to use it

54 June 1, 2015Dr. Navneet Goyal, BITS, Pilani54 RDBMS vs. MDDB: Trade-Offs TYPE OF USERS –Power users prefer the range of functionalities available in MOLAP tools –Users that require broad views of enterprise data require access to DW and therefore better served by a ROLAP tool

55 June 1, 2015Dr. Navneet Goyal, BITS, Pilani55 INTEGRATED ARCHITECTURE DB vendors have integrated their multidimensional and relational database products Multidimensional Front-end tools If queries require data that are not available in MDDB, the tools retrieve the data from the larger RDB Known as “DRILL-THROUGH”

56 Q & A

57 Thank You


Download ppt "Multidimensional Databases Prof. Navneet Goyal Computer Science Department BITS, Pilani."

Similar presentations


Ads by Google