Data Warehouses and OLAP 1
Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question 4: Materialised Views Lab Experiments ◦ DB Systems and Administration Revision ◦ Testing out Aggregates, Rollups, Partial Rollups and Cube Extensions. 2
3
What is OLAP? 4
Online Analytical Processing (OLAP) is process of analysing current and historical data to identify useful patterns and support business strategies. The emphasis for OLAP is on complex, interactive, exploratory analysis of very large datasets created by integrating data from across all parts of an enterprise. For example A company analyses purchases by all customers to come up with new products of likely interest to the customers. OLAP data is usually stored in a Data Warehouse. 5
What are the characteristics of OLAP compared with OLTP? 6
Online Analytical Processing vs Online Transaction Processing: OLTPOLAP Many Updates Mostly Reads, Updates are rare Many Small Transactions Queries long, complex Quick ResponseAllows slower response Mb-Gb of DataGb-Tb od Data Raw Data Summarized, consolidated Data Up-To-Date Data Current and Historical Data Clerical Users Decision-Makers, Analysts as users Consistency, Recoverability Critical Minor inconsistency is allowed 7
What is a data Warehouse? 8
A data warehouse is a repository of information gathered from multiple sources, stored under a unified schema, usually at a single site. The data may be augmented with additional attributes, such as timestamps and summary information. Data is stored for a long time, permitting access to historical data. There is an interactive response time expected for complex queries, ad-hoc updates are uncommon. 9
How does it differ from a (transactional) database? 10
How does it differ from a (transactional) database? Traditional databases are generally used for OLTP and are normalised using complex table joins to reduce redundant data, operations are optimised for writing. Data Warehouses differ in that they normally do not normalise the information to reduce the response time in analytical processing, have higher performances in analytical queries and are optimised for read operations. 11
Explain the Terms: ◦ Data Cubes ◦ Multi-Dimensional Data Model ◦ MOLAP ◦ ROLAP ◦ Star Schema ◦ Snowflake Schema ◦ Roll-Up ◦ Drill-Down ◦ Splicing-and-Dicing 12
Explain the Terms: ◦ Data Cubes: The generalisation of a cross tab, which is 2-dimensional, into n-dimensions (for example, for Car Sales: Colour vs Make vs Size). ◦ Multi-Dimensional Data Models: Data that can be modelled as dimension attributes and measure attributes are called multi-dimensional data (i.e. “num_sold” is a measure attribute since it can be used to measure some value and can be aggregated, “make, colour, size” are called dimensional attributes since they define the dimensions of which measure attributes are viewed). ◦ MOLAP: Multidimensional OLAP Systems (MOLAP) are when OLAP systems use multi-dimensional arrays to store data cubes. ◦ ROLAP: Relational OLAP Systems (ROLAP) are when OLAP systems use relations in a relational database to store the data. The main relation which relates dimensions to measures are called the fact table (i.e. sales(prod_id, date, sho_id, num_sold) – it is very large and is the accumulation of facts such as sales). Each dimension can have additional attributes and an associated dimensional table (i.e. product(prod_id, price, colour) where prod_id is a foreign key, shops(shop_id, location, manager)). Dimension data is smaller and generally static. 13
Explain the Terms: ◦ Star Schema: A star schema consists of the fact table and one or more dimension tables. Dimension tables are usually not normalised to increase the read speed since there is not a need to join multiple tables together. A typical query often involves a join of the fact table and the dimension table. ◦ Snowflake Schema: Variation of the Star Schema where the dimension tables are normalised. ◦ Roll-Up: Moving from finer granularity to coarser granularity by means of aggregation (i.e. given total sales for each city, find the total sales for each state). ◦ Drill-Down: The inverse of roll-up (i.e. moving from coarser granularity to finer granularity). ◦ Slicing-and-Dicing: Slicing the existing data sets to find specific information relating to certain conditions (i.e. From the data cube, find the cross-tab on Model and Colour for Medium Cars). The cross-Tab can be viewed as a slice of the data cube. 14
Explain what materialised views are and what their role is in data warehouse systems. 15
Explain what materialised views are and what their role is in data warehouse systems. Materialised Views are database objects that contains the results of queries (i.e. CREATE VIEW AS SELECT COUNT(sid) FROM Student;). They are important in data warehouses as there may be situations where expensive aggregations are needed and it may not be possible to pre-compute all the aggregates. A materialised view can be used to store the aggregated data so that the response time is much faster than the traditional querying processes. 16
17
Login to the Oracle System with the following details: ◦ Username: SYSTEM ◦ Password: manager (lower case – do not attempt 3 times in a row) ◦ HostString: Create your account with the following command: ◦ Create user your_name identified by your_password; Grant yourself the following privileges with these commands: ◦ Grant connect to your_name; ◦ Grant create dba to your_name; ◦ Alter user your_name default tablespace DBS_SPACE; ◦ Alter user your_name temporary tablespace TEMP; ◦ Alter user your_name quota 10M on DBS_SPACE; Log out of the SYSTEM Account and into your new account, run the commands in dbbook.sql by typing in: ◦ *Assuming the file is on your C Drive. 18
Perform Aggregates, Rollups, Partial Rollups and Cube Extensions when you Group By. 19
Perform Aggregates, Rollups, Partial Rollups and Cube Extensions when you Group By. SQL Aggregates: --Find the oldest student. SELECT MAX(age) FROM Student; --Find the average budget from the departments SELECT AVG(budget) FROM dept; --Find the number of flights SELECT COUNT(flno) FROM flights; 20
Perform Aggregates, Rollups, Partial Rollups and Cube Extensions when you Group By. SQL Roll-Ups: --Select the average age Roll-Up for each student grouped by major and standing SELECT major, standing, AVG(age) AS avg_age_student FROM student GROUP BY ROLLUP (major, standing) ORDER BY major, standing; 21
Perform Aggregates, Rollups, Partial Rollups and Cube Extensions when you Group By. SQL Partial Roll-Ups: --Select the average age Roll-Up for each student grouped by major and standing --Note: This will only work if there are multiple students with the same names that have different majors and standings (John Smith?) SELECT major, standing, AVG(age) AS avg_age_student FROM student GROUP BY sname ROLLUP (major, standing) ORDER BY sname, major, standing; 22
Perform Aggregates, Rollups, Partial Rollups and Cube Extensions when you Group By. SQL Cube Extension: --Select the average age Roll-Up for each student grouped by major and standing SELECT major, standing, AVG(age) AS avg_age_student FROM student GROUP BY CUBE (major, standing) ORDER BY major, standing; 23
24