CMPE Database Systems Workshop June 12 Class Meeting

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.
Data Warehousing M R BRAHMAM.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Lab3 CPIT 440 Data Mining and Warehouse.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
ITEC 3220A Using and Designing Database Systems
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
OnLine Analytical Processing (OLAP)
CMPE 226 Database Systems September 16 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Cube Intro. Decision Making Effective decision making Goal: Choice that moves an organization closer to an agreed-on set of goals in a timely manner Goal:
CS 157B: Database Management Systems II March 20 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
1 Data Warehouses BUAD/American University Data Warehouses.
BI Terminologies.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What.
CS 157B: Database Management Systems II April 3 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
1 On-Line Analytic Processing Warehousing Data Cubes.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
CMPE 226 Database Systems November 18 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
CS 157B: Database Management Systems II April 10 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
CMPE 226 Database Systems April 19 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
CMPE 226 Database Systems April 5 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
CHAPTER 9 - Data Warehouse Implementation and Use
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
CMPE Database Systems Workshop June 9 Class Meeting
Chapter 8 - Data Warehouse and Data Mart Modeling
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Database Normalization
Data Warehouse.
On-Line Analytic Processing
Data warehouse and OLAP
Data Warehouses Brief Overview Add ETL Copyright © 2011 Curt Hill.
Data storage is growing Future Prediction through historical data
MIS2502: Data Analytics Dimensional Data Modeling
Data Warehouse.
Star Schema.
Applying Data Warehouse Techniques
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
Competing on Analytics II
Dimensional Model January 14, 2003
Inventory is used to illustrate:
MIS2502: Data Analytics Dimensional Data Modeling
CMPE 226 Database Systems February 21 Class Meeting
CS 174: Server-Side Web Programming February 12 Class Meeting
CMPE 226 Database Systems April 11 Class Meeting
CMPE 226 Database Systems April 4 Class Meeting
An Introduction to Data Warehousing
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
Data Warehousing: Data Models and OLAP operations
MIS2502: Data Analytics Dimensional Data Modeling
CMPE Database Systems Exercise #3 Solutions
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
Dimensional Model January 16, 2003
CMPE/SE 131 Software Engineering March 9 Class Meeting
DWH – Dimesional Modeling
CMPE/SE 131 Software Engineering March 7 Class Meeting
Data Warehousing.
Presentation transcript:

CMPE 180-38 Database Systems Workshop June 12 Class Meeting Department of Computer Engineering San Jose State University Summer 2017 Instructor: Ron Mak www.cs.sjsu.edu/~mak

Midterm Solutions: Question 1 Briefly describe the necessary steps to normalize a proper relational table to first normal form (1NF). No steps are necessary. Any proper relational table is already in first normal form.

Midterm Solutions: Question 2 Briefly describe the necessary steps to normalize a proper relational table that has a non-composite primary key to second normal form (2NF). No steps are necessary. Second normal form removes partial functional dependencies, where fields are dependent on a component of the composite primary key. If the primary key is non-composite, there are no partial functional dependencies.

Midterm Solutions: Question 3.a Year Department Leader ID Amount 2015 CMPE Sigurd Meldal 007777777 $12,000 CS Sami Khuri 002222222 $11,000 2016 Math Bem Cayco 005555555 $10,000 Xiao Su 008888888 You want to record the fact that in the year 2017, Mary Jane, who has ID 003333333 and does not belong to a department, is the leader of the Spartan Committee. Briefly explain why you can or cannot add a 2017 row for her and enter nulls for the Department and Amount fields. You cannot add a 2017 row where the Department field is null. The Department field is part of the composite primary key. Therefore, leaving that field null violates the entity integrity constraint.

Midterm Solutions: Question 3.b Year Department Leader ID Amount 2015 CMPE Sigurd Meldal 007777777 $12,000 CS Sami Khuri 002222222 $11,000 2016 Math Bem Cayco 005555555 $10,000 Xiao Su 008888888 Normalize this table to third normal form (3NF). ID  Leader is a transitive functional dependency. We can move those columns into a new table: Year Department ID Amount ID Leader

Midterm Solutions: Question 3.c Give a good reason why you may want to leave this table unnormalized. The original table has faster query response.

Midterm Solutions: Question 4.a

Midterm Solutions: Question 4.b

Midterm Solutions: Question 5.a Display the ProductID and ProductName of the cheapest product without using a nested query. SELECT productid, productname FROM product ORDER BY productprice LIMIT 1;

Midterm Solutions: Question 5.b Repeat the above task with a nested query. SELECT productid, productname FROM product WHERE productprice = (SELECT MIN(productprice) FROM product);

Midterm Solutions: Question 5.c Display the ProductID, ProductName, and VendorName for products whose price is below the average price of all products SELECT p.productid, p.productname, v.vendorname FROM product p, vendor v WHERE p.vendorid = v.vendorid AND productprice < (SELECT AVG(productprice) FROM product);

Midterm Solutions: Question 5.d Display the ProductID for the product that has been sold the most (i.e., that has been sold in the highest quantity). SELECT productid FROM soldvia GROUP BY productid HAVING SUM(noofitems) = (SELECT MAX(SUM(noofitems)) GROUP BY productid);

Midterm Solutions: Question 5.e The following query retrieves each product that has more than three items sold within all sales transactions: SELECT productid, productname, productprice FROM product WHERE productid IN (SELECT productid FROM soldvia GROUP BY productid HAVING SUM(noofitems) > 3); Rewrite it without using a nested query but instead with a join: SELECT p.productid, productname, productprice FROM product p, soldvia s WHERE p.productid = s.productid GROUP BY p.productid, p.productname, p.productprice HAVING SUM(s.noofitems) > 3;

Midterm Solutions: Question 6.a

Midterm Solutions: Question 6.b

Final Project Put your emphasis on data management. Data models: How you manage the data that are you are using. Data models: Operational tables Analytical tables Data operations: Queries and updates of the operational tables. How the analytical tables are loaded. Queries of the analytical tables for data analysis. You can use actual downloaded data or data created by data generation tools.

Final Project, cont’d User application that invokes the data operations. Web-based or desktop-based. PHP, Java, etc. Fancy GUI or data visualization not necessary. How well did you use the technologies you learned during the semester? RDBM, DW, XML, data virtualization (CIS), NoSQL Not all technologies have to be used.

Final Project, cont’d Written report What is the application? What data did you use, and where did you get it? Overview of your data models (in words). ER diagram Relational schemas Star schemas Operational and analytical queries Example user actions and screen shots of results.

Detailed vs. Aggregated Fact Tables In a detailed fact table, each row contains data about a single fact. In an aggregated fact table, each row contains a summary of multiple facts. Such as a sum (aggregation) of all sales of a product in a particular store during a single day.

Detailed Fact Table Example Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Detailed Fact Table Example, cont’d Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Detailed Fact Table Example, cont’d Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6 Each fact table record contains data about one sales fact.

Line-Item Detailed Fact Table Each row is a single line item of a particular transaction. Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Transaction-Level Detailed Fact Table Each row is a single transaction. Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Aggregated Fact Table Example Aggregated fact table DPCS: Total amount sold in dollars and units on a particular day for a particular product for a particular customer for a particular store. Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Aggregated Fact Table Example, cont’d Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Aggregated Fact Table Example, cont’d Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Aggregated Fact Table Example, cont’d Aggregated fact table DCS: Total amount sold in dollars and units on a particular day for a particular customer for a particular store for all products. Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Aggregated Fact Table Example, cont’d Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Aggregated Fact Table Example, cont’d Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Aggregated Fact Table Example, cont’d Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Break

Granularity of the Fact Table Fine level of granularity: Detailed fact table. Courser level of granularity: Aggregated fact table. Finer granularity: More analysis power. Courser granularity: Faster queries.

Granularity of the Fact Table, cont’d Granularity can also depend on how the data is collected and loaded into the fact table. Example: Load only daily sales. But then you lose the ability to analyze sales by the hour. A solution: Keep both fine-grained and aggregated tables and have them share the dimension tables.

Granularity of the Fact Table, cont’d Good DW design involves deciding which aggregates are worth storing as tables. The base fact tables contain data at the finest level of granularity required for analysis. Facts can be pre-summarized in aggregate tables at granularity levels that are determined to be optimal for certain analysis procedures.

Granularity of the Fact Table, cont’d Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Slowly Changing Dimensions In a typical dimension of a star schema, either: The values of a dimension’s attributes do not change or change extremely rarely. Examples: store address, customer gender OR: The values of a dimension’s attributes change occasionally and sporadically over time. Example: customer address Three approaches to handling slowly changing dimensions: Type 1, Type 2, and Type 3.

Slowly Changing Dimensions: Type 1 Simply change the value in the dimension table’s record. Often used to correct errors. Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Slowly Changing Dimensions: Type 2 Preserve history by creating an additional row with the new value. Often used with timestamps. Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6 Now you see why this analytical table uses separate a CustomerKey rather than the CustomerID.

Slowly Changing Dimensions: Type 3 Create “previous” and “current” columns. Only a fixed number of changes is possible. Record only a limited history. Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Snowflakes Dimension tables can be unnormalized. Fewer tables = faster joins = faster queries. Update anomalies are not a concern. Dimension tables are slowly changing. Updates happen rarely if at all. An undesirable snowflake results from unnecessarily normalizing dimension tables.

Snowflakes, cont’d Unnormalized dimension tables. Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

X X Snowflakes, cont’d Avoid creating a snowflake! Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

DW Architecture: Bill Inmon Approach Normalized Data Warehouse Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

DW Architecture: Ralph Kimball Approach Dimensionally Modeled Data Warehouse Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

DW Architecture: Independent Data Marts Inferior – Do not use! Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

Online Transaction Processing (OLTP) Online: The computer responds immediately or very quickly. online ≠ Internet OLTP online transaction processing operational database = OLTP system Update and query operational data. Present data Generate reports

Online Analytical Processing (OLAP) Query data from data warehouses and/or data marts to analyze and present data. OLAP tools support decision making. OLAP tools are read only. OLAP operations drill up and drill down slice and dice pivot

OLAP Drill Up and Drill Down Drill through dimension hierarchies. Examples: Location: country  state  region  city  store Time: year  quarter  month  week  day  hour Drill up AKA roll up Make the data granularity coarser. Aggregate the data. Drill down Make the data granularity finer.

OLAP Drill Up and Drill Down, cont’d http://www.tutorialspoint.com/dwh/dwh_olap.htm

OLAP Drill Up and Drill Down, cont’d http://www.tutorialspoint.com/dwh/dwh_olap.htm

Slice and Dice Slice: Select one value of a dimension attribute. http://www.tutorialspoint.com/dwh/dwh_olap.htm

Slice and Dice Dice: Select attribute values from two or more dimensions. http://www.tutorialspoint.com/dwh/dwh_olap.htm

Pivot Pivot: Reorganize query results by rotation. http://www.tutorialspoint.com/dwh/dwh_olap.htm

Conformed Dimensions When multiple star schemas share a common set of dimensions, the dimensions are called conformed dimensions. Conformed dimensions enable analyses to span multiple star schemas, where the schemas share a common view of the world. For example, all the schemas must share a common view of what a customer is. Drill across: A OLAP operation that spans multiple star schemas.

OLAP/BI Tools Users can query fact and dimension tables by using simple point-and-click query-building applications. Based on user actions, the tool generates and executes the SQL code on the data warehouse or data mart. SQL code to drill up or down, slice or dice, or pivot. Example OLAP/BI tools IBM Cognos, Oracle BI, TIBCO Spotfire, Tableau

OLAP/BI Tools, cont’d Typical OLAP tool layout Database Systems by Jukić, Vrbsky, & Nestorov Pearson 2014 ISBN 978-0-13-257567-6

OLAP/BI Tool Demos RadarSoft: http://olaponline.radar-soft.com/Demos/HtmlOLAPGrid.aspx Telerik: http://demos.telerik.com/aspnet-ajax/pivotgrid/examples/olap/defaultcs.aspx See also: http://www.softwareadvice.com/bi/?more=true#more

Assignment #4 Team assignment. Perform OLAP operations on your dimensional model from Assignment #3. For each of the following operations, write and execute SQL queries using your sample data: drill up drill down slice dice

Assignment #4, cont’d For each OLAP operation, show the query, and “before” and “after” query output. Example: Do a query that shows quarterly results. Then for a drill down, do a query that shows monthly results. For a drill up, do a query that aggregates and shows yearly results. Submit a zip file into Canvas containing: Dump of your dimensional model. SQL queries and text files containing the results.