Download presentation
Published byAlly Barstow Modified over 10 years ago
1
Relational On-Line Analytical Processing (ROLAP)
2
Data Warehouse Schema Star Schema Snowflake Schema Fact constellation
We need subject oriented and multidimensional data amodel fro data warehouse which facilitates online analysis
3
Star Schema A single, large and central fact table and one table for each dimension. The star schema separates business data into facts, which hold the measurable, quantitative data about a business, and dimensions which are descriptive attributes related to fact data. Examples of fact data include sales price, sale quantity, and time, distance, speed, and weight measurements. Related dimension attribute examples include product models, product colors, product sizes, geographic locations, and salesperson names.
4
Star Schema (contd..) Store Dimension Fact Table Time Dimension
Store Key Store Name City State Region Store Key Product Key Period Key Units Price Period Key Year Quarter Month Product Key Product Desc Product Dimension Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.
5
SnowFlake Schema Variant of star schema model.
A single, large and central fact table and one or more tables for each dimension. Dimension tables are normalized i.e. split dimension table data into additional tables
6
SnowFlake Schema (contd..)
Fact Table Store Dimension Time Dimension Store Key Product Key Period Key Units Price Period Key Year Quarter Month Store Key Store Name City Key City Dimension City Key City State Region Product Key Product Desc Product Dimension Drawbacks: Time consuming joins,report generation slow
7
Fact Constellation Booking Checkout Fact Constellation Promotion
Multiple fact tables that share many dimension tables Booking and Checkout may share many dimension tables in the hotel industry Hotels Travel Agents Promotion Room Type Customer Booking Checkout
8
Comparison Snowflake Schema Star Schema Ease of maintenance/change:
No redundancy and hence more easy to maintain and change Has redundant data and hence less easy to maintain/change Ease of Use: More complex queries and hence less easy to understand Less complex queries and easy to understand Query Performance: More foreign keys-and hence more query execution time Less no. of foreign keys and hence lesser query execution time Normalization: Has normalized tables Has De-normalized tables Type of Data warehouse: Good to use for datawarehouse core to simplify complex relationships (many:many) Good for datamarts with simple relationships (1:1 or 1:many) Joins: Higher number of Joins Fewer Joins Dimension table: It may have more than one dimension table for each dimension Contains only single dimension table for each dimension When to use: When dimension table is relatively big in size, snowflaking is better as it reduces space. When dimension table contains less number of rows, we can go for Star schema.
9
ROLAP and SQL To understand a Multidimensional view of data and how it can be implemented in a relational database. ROLAP – Relational On-Line Analytical Processing. To be aware of extensions have been added to SQL to make OLAP queries easier to write; in particular: Rollup Operator Cube Operator
10
Star schema (dimensional model) for property sales of DreamHome
11
Data Warehouse Definitions
A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process (Inmon, 1993). “The technique of taking large amounts of operational data and putting them into a large database, with the aim of analysing them every which way to yield useful information” Scoring Points – How Tesco Continues to Win Customer Loyalty. Terry Hunt Pub. Kogan Page
12
Three Complementary Trends
Data Warehousing: Consolidate data from many sources in one large repository. Loading, periodic synchronization of replicas. Semantic integration. OLAP: Complex SQL queries and views. Queries based on spreadsheet-style operations and “multidimensional” view of data. Interactive and “online” queries. Data Mining: Exploratory search for interesting trends and anomalies. (Another module/topic!)
13
Design Issues – Further Example
TIMES timeid date week month quarter year holiday_flag (Fact table) pid timeid locid sales SALES PRODUCTS LOCATIONS pid pname category price locid city state country Fact table in BCNF; dimension tables un-normalized. Dimension tables are small; updates/inserts/deletes are rare. So, anomalies less important than query performance. This kind of schema is very common in OLAP applications, and is called a star schema; computing the join of all these relations is called a star join.
14
Multidimensional Data Model
timeid locid sales pid Collection of numeric measures, which depend on a set of dimensions. E.g., measure Sales, dimensions Product (key: pid), Location (locid), and Time (timeid). timeid pid Slice locid=1 is shown: locid
15
MOLAP vs ROLAP Multidimensional data can be stored physically in a (disk-resident, persistent) array; called MOLAP systems. Alternatively, can store as a relation; called ROLAP systems. The main relation, which relates dimensions to a measure, is called the fact table. Each dimension can have additional attributes and an associated dimension table. E.g., Products(pid, pname, category, price) Fact tables are much larger than dimensional tables.
16
Dimension Hierarchies
For each dimension, the set of values can be organized in a hierarchy: PRODUCT TIME LOCATION year quarter country category week month state pname date city
17
OLAP Queries Influenced by SQL and by spreadsheets.
A common operation is to aggregate a measure over one or more dimensions. Find total sales. Find total sales for each city, or for each state. Roll-up: Aggregating at different levels of a dimension hierarchy. E.g., Given total sales by city, we can roll-up to get sales by state.
18
OLAP Queries Drill-down: The inverse of roll-up.
E.g., Given total sales by state, can drill-down to get total sales by city. E.g., Can also drill-down on different dimensions to get total sales by product for each state. Pivoting: Aggregation on selected dimensions. E.g., Pivoting on Location and Time yields this cross-tabulation: WI CA Total 1995 1996 1997 Total
19
Comparison with SQL Queries
The cross-tabulation obtained by pivoting can also be computed using a collection of SQLqueries: SELECT T.Year, L.state, SUM(S.sales) FROM Sales S, Times T, Locations L WHERE S.timeid=T.timeid AND S.timeid=L.timeid GROUP BY T.year, L.state This query generates the entries in the body of the pivot chart on slide 12.
20
SELECT L.state, SUM(S.sales)
FROM Sales S, Location L WHERE S.timeid=L.timeid GROUP BY L.state This query produces the summary row at the bottom of the pivot chart on slide 12.
21
SELECT T.Year, SUM(S.sales)
FROM Sales S, Times T WHERE S.timeid=T.timeid GROUP BY T.Year; This query produces the summary column on the right of the pivot chart on slide 12.
22
The cumulative sum in the bottom-right corner of the chart is produced by this query: SELECT SUM (S.sales) FROM Sales S, Locations L WHERE S.locid=L.locid;
23
The Cube Operator The Group by clause with the CUBE keyword is equivalent to a collection of GROUP BY statements: The result of the previous four queries can be produced with one query using the CUBE keyword. SELECT T.year, L.state, SUM (S.sales) FROM Sales S, Times T, Locations L WHERE S.timid=T.timeid AND S.locid.L.locid GROUP BY CUBE (T.year, L.state); The results of this query (on the following slide) is a tabular representation of the cross tabulation on slide 18.
24
T.Year L.State SUM(S.sales) 1995 WI 63 CA 81 NULL 144 1996 38 107 145 1997 75 35 110 176 223 399
25
OLAP: Example of Queries using Rollup keyword
QTY S1 P1 300 P2 200 S2 400 S3 S4 The above table (called SP) shows Suppliers who supply Parts in a certain quantity. S# = Supplier No and P# = Part number.
26
Queries Get the total shipment quantity.
Get total shipment quantities by supplier Get total shipment quantities by part Get total shipment quantities by supplier and part.
27
1 Get the total shipment quantity.
Select SUM (QTY) From SP; 2 Get total shipment quantities by supplier Select S# , SUM(QTY) From SP Group by (S#);
28
3 Get total shipment quantities by part.
Select P#, SUM(QTY) From SP Group by (P#); 4 Get total shipment quantities by supplier and part. Select S#, P#, SUM(QTY) From SP Group by (S#, P#);
29
Rollup Consider the following query
Select S#, P#, Sum (QTY) As TOTQTY From SP Group by Rollup (S#, P#); The query is a bundled SQL formulation of Queries 1, 2 and 4. Result is on the next slide. Called Rollup because the quantities have been “rolled up” along the Supplier dimension.
30
Result of Rollup S# P# TOTQTY S1 P1 300 P2 200 S2 400 S3 S4 NULL 500
700 1600
31
CUBE Consider the following query Select S#, P#, Sum (QTY) As TOTQTY
From SP Group by CUBE (S#, P#); The result of this query, on the next slide, is a bundle of all four of the original queries. Note C. J. Date describes this as “… a table (an SQL-style table, at any rate,) but it is hardly a relation….In fact the result table in this example can be regarded as an “outer union”….outer union is not a respectable relational operation. “An introduction to Database Systems” eighth edition. C.J.Date Addison Wesley 2004
32
Result of Cube S# P# TOTQTY S1 P1 300 P2 200 S2 400 S3 S4 NULL 500 700
600 1000 1600
33
Several Levels of Aggregation in same query – Rollup
The drawbacks to the previous queries without CUBE and Rollup is obvious. Formulating several similar but distinct queries is tedious Using these additions to SQL we can attempt to try and represent several levels of aggregation in a single query. This is the motivation behind “Rollup” and “Cube” which are extra options on the GROUP BY clause in SQL
34
Summary Multi-Dimensional views of data can be stored in a relational database. The SQL Group by clause can be used to aggregate data but queries can be tedious. The new keywords of Rollup and Cube have been added to SQL for use in Relational OnLine Analytical Processing (ROLAP) Practical exercises in the lab next week.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.