Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
James Serra – Data Warehouse/BI/MDM Architect
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Technical BI Project Lifecycle
Module 2 Designing a Logical Database Model. Module Overview Guidelines for Building a Logical Database Model Planning for OLTP Activity Evaluating Logical.
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Data Warehouse IMS5024 – presented by Eder Tsang.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
MIS 451 Building Business Intelligence Systems Logical Design (3) – Design Multiple-fact Dimensional Model.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Building a Data Warehouse with SQL Server Presented by John Sterrett.
ETL Design and Development Michael A. Fudge, Jr.
Database Design.  Define a table for each entity  Give the table the same name as the entity  Make the primary key the same as the identifier of the.
ETL By Dr. Gabriel.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
Data-mining & Data As we used Excel that has capability to analyze data to find important information, the data-mining helps us to extract information.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Vidas Matelis, Toronto SQL Server User Group November 13, 2008.
OnLine Analytical Processing (OLAP)
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
BI Terminologies.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Methodology – Monitoring and Tuning the Operational System.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Business Intelligence Training Siemens Engineering Pakistan Zeeshan Shah December 07, 2009.
Connecting (relating) Data Tables to get Custom Records (Queries) Database Basics.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data modeling. Presentation by – Anupama Vudaru, Phani Kondapalli Content by – Prathibha Madineni, Subrahmanyam Kolluri October 2010.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Physical Layer of a Repository. March 6, 2009 Agenda – What is a Repository? –What is meant by Physical Layer? –Data Source, Connection Pool, Tables and.
Event Title Event Date. Module 02—Introduction to Dimensional Modeling Techniques Name Title Microsoft Corporation.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Extending and Creating Dynamics AX OLAP Cubes
Logical Database Design and the Rational Model
Data warehouse and OLAP
CMPE 226 Database Systems April 11 Class Meeting
Adding Multiple Logical Table Sources
Retail Sales is used to illustrate a first dimensional model
Data Warehousing Concepts
DATABASE TECHNOLOGIES
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Data Warehousing.
Presentation transcript:

Sayed Ahmed Logical Design of a Data Warehouse

 Free Training and Educational Services  Training and Education in Bangla: Training and Education in Bangla:  Bangla.SaLearningSchool.com Bangla.SaLearningSchool.com  Training and Education in English: Training and Education in English:   English.SaLearningSchool.com English.SaLearningSchool.com  Ask a question and get answers: Ask a question and get answers:  Ask.JustEtc.net Ask.JustEtc.net

 Design a Data Warehouse  Star Schema  Snow Flake Schema  Dimension Tables  Fact Tables  Auditing  Surrogate Keys  Type 1, Type 2, Type 3, and Mixed solutions for slowly changing dimension data ( SCD management)  Pivoting for Analysis  To help with SSAS on data warehouse

 Design a Data Warehouse  Additive measures  Semi additive measures  Hierarchies for dimensions  Attributes in dimensions  Attributes in lookup tables  Long term data warehouse design  Usually Star Schema  Short term data warehouse design  POC  Usually snowflake schema

 Fact Tables  measures  foreign keys  and possibly an additional primary key  and lineage columns  granularity of fact tables  auditing and lineage needs  Measures can be  additive  non-additive  semi-additive

 dimension  keys  names  attributes  member properties  translations  and lineage

 attributes  natural hierarchies  many-to-many fact table relationships  you can introduce an additional intermediate dimension

 Not much – right  However, if you understand all the terms and can implement all these concepts in your data warehouse  That will be great  Not necessarily you will need to use all of these concepts; however, you may need to justify based on the situation, will all or any of these will help?  What will help and what will not help  Check our sub sequent videos and tutorials

 Any Concerns?   Or comment below...

 Download the Adventure Works databases  OLTP database (LOB database)  Data warehouse Database  From   For this tutorial, you can just check our slides  Though the following tools will help  And probably check the details in the downloaded databases esp. The AdventureWorksDW2012  You will need help from SQL Server and SQL Server MGMT Studio Tools

 Useful/Required SQL Server Components  Database Engine Services  Documentation Components  Management Tools - Basic  Management Tools – Complete  SQL Server Data Tools

 Data Warehouse Logical Design  Topics: Design and Implement a Data Warehouse  Design and implement dimensions.  Design and implement fact tables  Design Auditing  track the source and time for data coming into a DW through auditing i.e lineage information  Why a Data Warehouse?  It is hard to  generate reports from OLTP/LOB/Transactional database  To do Analysis on OLTP database data (some times)  Get useful information/useful summarized and details data to be used to take business decisions

 Why a Data Warehouse?  Data in OLTP are heavily normalized. The goal was to keep one data only in one single place to reduce redundancy and consistency of data  You may end up with many tables 100s, 1000s  To generate reports you may need to join many tables – will be slow  Historical data may not be there  Data quality is also an issue  For reporting or analyzing, you may need data from multiple databases across many departments

 So you can create a Data Warehouse  By cleaning data  With historical data  Combining data from multiple sources  Denormalizing data  Using specific design geared towards Data Warehouse design  Some or many consider DW design is less complex than relational database design  Though it also has some complex areas to address... (by those some or many)

 Usually two schemas are used for a DW  Star Schema-> looks like a star  Snow Flake Schema  Another one called Dimensional Model  Includes both Star and Snow Flake in the same Data Warehouse  Both Schemas has tables of two types  Dimension Tables  Fact Tables

 Fact Tables are in the center  A Fact table joins/combines all the data required for this reporting or for the business aspect of this reporting  Usually combines the primary keys of different tables that contain data for this report/business aspect  Dimension tables are all the other tables that contain actual data  Dimension tables are the tables that contain data  these can be the actual tables in the OLTP database without any modification (Snow Flake)  Or Dimension tables can be newly created by denormalizing the existing OLTP databases (Star)

 So, you know now what are dimension tables and what are fact tables  Fact tables contain primary keys of all related tables (here they are foreign keys)  Dimension tables contain data  Usually, it’s better that you keep your data warehouse separate from your OLTP database  So bring all the tables (dimension) here  Or denormalize them and bring them here in the new database

 If you just create Fact tables and take all the related tables from your OLTP/LOB databases  You get a Snow Flake Schema  Here all Dimension tables are still normalized (as you just took them from the actual database)  This is easy –  so good for short-term, quick, and experimental Data Warehouse  One note, your reporting and analysis services queries (MDX, DMV) will be slow with Snow Flake Schemas

 Now, when you denormalize the dimension tables  You get the start schema  The Fact tables remain the same for example  Star Schema is kind of standard and used a lot  Originally was developed in 1980’s

Sales amount for internet sales by different countries and historical years

 issues that I did not mention before  If your OLTP database was well designed (?)  It may be hard to find the tables related to the reporting  The table names and the column names can be tricky – do not follow any conventions – do not have meaning  So it can be hard to find data for the reporting

 Note: Reality:  The OLTP may not even be well designed (that makes reporting hard sometimes) even the relationships as well as normalization  – here we assumed that OLTP is perfect  In a long back project  I had to re-write/verify/check/change/optimize/had to deal with (whatever you say) 100s (not really 100s, can be close to 100) of queries for a reporting system  Had to change the interface from one button for one report (easy to get lost)  Into a drop down list of reports  The relations among data were arbitrary – actually had only in the mind of the designer – did not follow any standards – No ER – no standard concepts---  So it was a hard job..  Anyway..

 In such cases  Tools such as SQL Profiler might help  you could create a test environment,  try to insert some data through an LOB application  have SQL Profiler identify where the data was inserted  Another, issue with this particular example  No lookup for dates and years  You need to extract  The tables may not contain even historical data  No date field  So no historical data

 If sales data reside in multiple databases even by multiple departments  How do you merge  Identify and match  Customer data can be in different database with no common identification  Data quality can be low  Data missing  Partial data  Inconsistent data in multiple databases  Data can be represented differenlt in different database  M or F for gender  1 or 0 for gender

 You saw one Star Schema for Internet Sales  You can see another for Offline Sales  Another for Accounting  Your DW has many such Star Schemas  And these start schemas need to be connected/related  They will be connected when you use the same dimensions for them  i.e. If two star schemas have the same dimension they can share that dimension  Called: shared or conformed dimensions  For SSAS, you can use shared dimensions only  There is a concept of private dimension  Not a great idea in practical and real life applications  You cannot connect/compare/verify the data over the shared dimension

 Everything can be normalized  Or the first level can be normalized others are not

In the Star Schema, you could use these normalized product table to get snow flake schema (partially.) Could use all normalized dimensions to get full snow flake

 In Snow flake, you may see partial than full snow flakes in reality  Though, in reality, better to go for star schema  Queries will be faster

 The number of Dimension Tables connected to a fact table  Dimension of a star schema  Cube = 3 dimension  SSAS operates/analyzes on Cube

 I will be very short on this  In data warehouse, you may want some auditing tables  For every update, you should audit  who made the update,  when it was made,  and how many rows were transferred  to each dimension and  fact table  in your DW

 You will need additional fields/columns in your dimension and fact tables to track  When, and who, and from where the row data was/were updated  Your ETL process needs to be updated  If you used SSIS for the ETL  Modify SSIS packages so that you can record these information

 Any Concerns?   Or comment below...