Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.

Slides:



Advertisements
Similar presentations
Dept. of Computing Science, University of Aberdeen1 Writing SELECT SQL Queries Nigel Beacham based on materials.
Advertisements

Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Module 6: Working with Subqueries. Overview Introduction to Subqueries Using a Subquery as a Derived Table Using a Subquery as an Expression Using a Subquery.
Dimensional Modeling – Part 2
March 2010ACS-4904 Ron McFadyen1 Aggregate management References: Lawrence Corr Aggregate improvement
How to build your own… Super Model Dimensional Modelling for Analysis Services Darren Gosbell Principal Consultant - James & Monroe
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data.
ETL Design and Development Michael A. Fudge, Jr.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Banner and the SQL Select Statement: Part Four (Multiple Connected Select Statements) Mark Holliday Department of Mathematics and Computer Science Western.
SQL Review Tonga Institute of Higher Education. SQL Introduction SQL (Structured Query Language) a language that allows a developer to work with data.
Getting SQL Right the First Try (Most of the Time!) May, 2008 ©2007 Dan Tow, All rights reserved SingingSQL Presents.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Unnecessary parenthesis is added to tELTOracleOutput’s generated SQL 1 This is subquery for insert. This query is correct for Oracle command line. Result.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
UNIT-II Principles of dimensional modeling
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Copyright© 2015, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG7446 Advanced.
SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.
Enterprise manager Using the Enterprise manager. Purpose of the Enterprise Manager To design tables To populate / update tables To draw diagrams of tables.
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L12_JDBC_MySQL 1 Transations.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Connecting (relating) Data Tables to get Custom Records (Queries) Database Basics.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Session id: Darrell Hilliard Senior Delivery Manager Oracle University Oracle Corporation.
ISCG 6425 Data Warehousing Staging Tables. 4.2 ETL Process Northwind1 DataWarehouse Northwind2 STAGING AREA Different Data Sources.
Microsoft Access Lesson 5 Lexington Technology Center February 25, 2003 Bob Herring On the Web at
QUERY CONSTRUCTION CS1100: Data, Databases, and Queries CS1100Microsoft Access1.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
MIS 451 Building Business Intelligence Systems Data Staging.
COMP 430 Intro. to Database Systems Denormalization & Dimensional Modeling.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Structured Query Language
CIS 336 AID Your Dreams Our Mission/cis336aid.com
An Refresher and How-To Profile Data using SQL
Prepared by : Moshira M. Ali CS490 Coordinator Arab Open University
CIS 336 Slingshot Academy / cis336study.com
Assignment 2 Due Thursday Feb 9, 2006
CIS 336 str Competitive Success/snaptutorial.com
CIS 336 Competitive Success/snaptutorial.com
CIS 336 PAPERS Education Your Life-- cis336papers.com.
CIS 336 Lessons in Excellence-- cis336.com. CIS 336 Final Exam (Feb 2016) For more course tutorials visit CIS 336 Final Exam Question 1.
CIS 336 str Education for Service- -snaptutorial.com.
CIS 336 Education for Service-- snaptutorial.com.
CIS 336 STUDY Lessons in Excellence-- cis336study.com.
CIS 336 STUDY Education for Service-- cis336study.com.
CIS 336 Teaching Effectively-- snaptutorial.com
CIS 336 str Teaching Effectively-- snaptutorial.com.
CIS 336 PAPERS Education for Service-- cis336papers.com.
CMPE 226 Database Systems April 11 Class Meeting
MIS2502: Review for Exam 1 JaeHwuen Jung
Assignment 2 Due Thursday Feb 9, 2006
Retail Sales is used to illustrate a first dimensional model
Databases & Consistency
MIS2502: Review for Exam 1 Aaron Zhi Cheng
Retail Sales is used to illustrate a first dimensional model
Dimensional Model January 16, 2003
Aggregate improvement Lost, shrunken, and collapsed Ralph Kimball
Slides based on those originally by : Parminder Jeet Kaur
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Data Warehousing.
Presentation transcript:

Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing (DW) Week 7 Validating DW Data

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 2 Issues in DW Data Data validation techniques  Validating data in a Dimension table  Validating data in a Fact table Outline

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 3 The DW schema differs from the OLTP schema The DW tables, their PKs, FKs, and data and data types also differ! Issues in DW Data Normalized OLTP SchemaDW Schema

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 4 You designed the right DW schema.. Good! You can check if your dimension and fact tables are ok by checking if there is:  No problems with PKs (Surrogate with auto-increment)  No problems with FKs (link to dimension tables, especially dimTime )  No problems with Data types Issues in DW Data That’s GOOD

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 5 You know… ETL (Extract-Transform-Load) Issues in DW Data Northwind1 (for America) Northwind2 (for Europe) OLTP Databases ETL OLAP Database (for all counties) What if your ETL has “bugs” ?? ?

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 6 How do you know that your DW contains Correct and Complete data ??  Correctness  No wrong data, miscalculated, mistyped, etc.  Completeness  No missing data Recall that we have tried to check DW data by comparing number of rows  Select count(*) ….. Issues in DW Data This is still not enough  1 2 Bugs 

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 7 Our one simple goal  Identify if any record in the DW is WRONG Validation techniques  Do random checking  For example, pick any order to check if it has correct product, customer, etc..  Simple method and normally done “Manually”  Sometimes, we miss errors (as we do random checking)  Do complete checking  By writing SQL to do the job… hard job at the first time  However it guarantees 100% that you will not miss any error DW Data validation

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 8 We aim at NO ERRORS at all And…. we must design SQL to do the job for us!!  Sometimes, or several times, SQL can be complex  For example,  How to check if every customer in dimCustomers has a correct CustomerID DW Data validation SQL Question Write SQL statement to list all customers that have wrong CustomerID.

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 9 Does each dimension table contains correct and complete data from its source table? Especially if a dimension is de-normalized!! DW Data validation: Dimension tables ? Each product must correctly belong to its category!!

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 10 DW Data validation : Dimension tables SQL Answer SELECT * FROM dimCustomers dc WHEREdc.CustomerID NOT IN ( SELECTCustomerID FROMnorthwind1.dbo.Customers c, WHEREdc.CustomerID=c.CustomerID UNION SELECTCustomerID FROMnorthwind2.dbo.Customers c, WHEREdc.CustomerID=c.CustomerID ) This question requires a Sub-query SQL Question Write SQL statement to list all customers that have wrong CustomerID. This query will return NOTHING if all customers have their correct CustomerID

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 11 DW Data validation : Dimension tables SELECT * FROM dimCustomers dc WHEREdc.CustomerID NOT IN ( SELECTCustomerID FROMnorthwind1.dbo.Customers c, WHEREdc.CustomerID=c.CustomerID UNION SELECTCustomerID FROMnorthwind2.dbo.Customers c, WHEREdc.CustomerID=c.CustomerID ) This query will return that customer (1) who has wrong CustomerID !! Try to update any customer in DW with wrong CustomerID Then run this SQL again… to check if it can detect the error!! Let’s try it SELECT * FROM dimCustomers WHERE CustomerKey=1;-- To see the original CustomerID UPDATE dimCustomers SET CustomerID=‘XYZ’ WHERE CustomerKey=1;-- Now change it! 1 2

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 12 You need to understand how to write sub-queries in the SELECT clause  To check if all data in DW are correct  Correctness  To check if data in DW are NOT missing  Completeness Some highly recommended links on the Moodle  Please follow them and do your own study! DW Data validation : Dimension tables A Sub-query is very important for data validation!!! 1 2

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 13 Now, try a more difficult one,  How to check if every product in dimProducts has a correct category DW Data validation : Dimension tables SQL Question Write SQL statement to list all products that belong to wrong categories.

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 14 DW Data validation : Dimension tables SQL Question Write SQL statement to list all products that belong to wrong categories. SQL Answer SELECT * FROM dimProducts dp WHEREdp.CategoryName NOT IN ( SELECTCategoryName FROMnorthwind1.dbo.Products p, northwind1.dbo.Categories c WHEREp.CategoryID=c.CategoryID AND dp.ProductID=p.ProductID UNION SELECTCategoryName FROMnorthwind2.dbo.Products p, northwind2.dbo.Categories c WHEREp.CategoryID=c.CategoryID AND dp.ProductID=p.ProductID ) This question requires a Sub-query + JOIN!! This query will return NOTHING if all products have their correct CategoryName

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 15 DW Data validation : Dimension tables SELECT * FROM dimProducts dp WHEREdp.CategoryName NOT IN ( SELECTCategoryName FROMnorthwind1.dbo.Products p, northwind1.dbo.Categories c WHEREp.CategoryID=c.CategoryID AND dp.ProductID=p.ProductID UNION SELECTCategoryName FROMnorthwind2.dbo.Products p, northwind2.dbo.Categories c WHEREp.CategoryID=c.CategoryID AND dp.ProductID=p.ProductID ) This query will return all products which have wrong CategoryName !! Try to update any product in DW with wrong CategoryName Then run this SQL again… to check if it can detect the error!! Let’s try it

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 16 A fact table is always de-normalized…. So, you can expect that the SQL is going to be complicated  A fact table has a huge amount of data that come from a super “ BIG JOIN ” by using MERGE command Think about you are required to find a missing flight in the Indian Ocean !! DW Data validation : Fact tables

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 17 The fact table contains correct and complete data from its source table?  Correct FKs and PK  Correct Attribute Data DW Data validation: Fact tables Each Order must have correct Product, Customer, OrderDate and RequiredDate Each Order must have correct OrderID, UnitPrice, Qty, Discount Each Order must have correct TotalPrice 1 2

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 18 Validating all FKs and the PK  Validate the fact data against its original data in OLTP. For example,  Customers: Check if all orders correctly belong to their customer  Products: Check if all orders correctly contains their right product  Time: Check if all orders correctly have right OrderDate and RequiredDate DW Data validation: Fact tables 1

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 19 Validating Attribute Data  Validate each attribute (+pre-calculated attribute)  OrderID : From the Order and [ Order Details ] tables  UnitPrice: From the [ Order Details ] table  Qty: From the [ Order Details ] table  Discount : From the [ Order Details ] table  TotalPrice : By Quantity x UnitPrice from the [ Order Details ] table DW Data validation: Fact tables 2

ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 20 Validating data in each dimension table…. Validating data in a fact table Let’s try it now.. Have fun! DW Data validation : Summary