Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
OCS Infotech Proprietary & Confidential Typical BI solution Architecture.
Module 2 Designing a Logical Database Model. Module Overview Guidelines for Building a Logical Database Model Planning for OLTP Activity Evaluating Logical.
Data Warehousing M R BRAHMAM.
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
2/10/05Salman Azhar: Database Systems1 On-Line Analytical Processing Salman Azhar Warehousing Data Cubes Data Mining These slides use some figures, definitions,
Dimensional Modeling – Part 2
Physical Database Monitoring and Tuning the Operational System.
Data Warehousing Design Transparencies
MIS 451 Building Business Intelligence Systems Logical Design (3) – Design Multiple-fact Dimensional Model.
Telecommunication Case Study CS 543 – Data Warehousing.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Principles of Dimensional Modeling
Lecture 5 CS.456 DATABASE DESIGN.
DWH – Dimesional Modeling PDT Genči. 2 Outline Requirement gathering Fact and Dimension table Star schema Inside dimension table Inside fact table STAR.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Atlanta Microsoft Database Forum Introduction to Data Warehousing Concepts Brian Thomas Solution Builders, Inc. Presented by March 8, 2004
Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier.
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
1 Data Warehousing Lecture-13 Dimensional Modeling (DM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
Chapter 3 and Module C DATABASES AND DATA WAREHOUSES Building Business Intelligence.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
1 Data Warehouses BUAD/American University Data Warehouses.
Module 1: Introduction to Data Warehousing and OLAP
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
Normalized model vs dimensional model
DW-2: Designing a Data Warehousing System 용 환승 이화여자대학교
More Dimensional Modeling. Facts Types of Fact Design Transactional Periodic Snapshot –Predictable time period –Ex. Monthly, yearly, etc. Accumulating.
UNIT-II Principles of dimensional modeling
Creating the Dimensional Model
1 Agenda – 04/02/2013 Discuss class schedule and deliverables. Discuss project. Design due on 04/18. Discuss data mart design. Use class exercise to design.
Pooja Sharma Shanti Ragathi Vaishnavi Kasala. BUSINESS BACKGROUND Lowe's started as a single hardware store in North Carolina in 1946 and since then has.
Chapter 4 Logical & Physical Database Design
Chapter 10 Designing Databases. Objectives:  Define key database design terms.  Explain the role of database design in the IS development process. 
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
ISAM 5931: Data Warehousing & Data Mining Group Project submitted by : Mudassar Hakim & Gaurav Wadhwani.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Data Warehousing DSCI 4103 Dr. Mennecke Chapter 2.
Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling.
COMP 430 Intro. to Database Systems Denormalization & Dimensional Modeling.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
Data Warehouse/Data Mart It’s all about the data.
1 Agenda TMA02 M876 Block 4. 2 Model of database development data requirements conceptual data model logical schema schema and database establishing requirements.
Defining Data Warehouse Structures Data Warehouse Data Access End User Data Access Data Sources Staging Area Data Marts Data Extract, Transform, and Load.
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Summarized from various resources Modern Database Management
Data Warehouse.
Star Schema.
Overview and Fundamentals
Dimensional Model January 14, 2003
Chapter 9 Designing Databases
Relational Database Model
Dimensional Modeling.
Retail Sales is used to illustrate a first dimensional model
Dimensional Model January 16, 2003
DWH – Dimesional Modeling
Presentation transcript:

Designing a Data Warehousing System

Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing Dimensions Establishing a Fact Table Implementing a Star Schema

Business Analysis Process Identifying Business Drivers and Objectives Gathering and Analyzing Information Establishing a Conceptual Data Model Identifying a Business Process Identifying Sources and Performing Transformations Establishing Duration

 Data Warehousing System Data in OLAP Environment Data Marts Data Warehouse Data from Operational Systems Data from Operational Systems Purchasing Production Accounting Enterprise Data OLTP Sales

Operational: OLTP Analytical: Data Warehouse Comparing Database Modeling Environments Defines Entities That Are Fully Normalized Follows Third Normal Form or Greater Produces a Complex Database Design Stores Data at the Lowest Level of Transactional Detail Increases the Number of Joined Tables in Queries Is Typically Static Defines Entities That Are Denormalized Produces a Simple Database Design That Is More Easily Understood by Users Stores Data Transactional level Summarized level Decreases the Number of Joined Tables in Queries Is Dynamic

 Modeling a Data Warehouse Data Warehouse Modeling Components Using a Star Schema Components of a Star Schema Using a Snowflake Schema Choosing a Schema

Data Warehouse Modeling Components Geographic Product Time Units $ $ Dimension Tables GeographicGeographic ProductProduct TimeTime Fact Table Measures FactsFacts DimensionDimension

Using a Star Schema Fact Table Dimension Table Time_DimTime_Dim TimeKey TheDate. TheDate. Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey TimeKey EmployeeKey ProductKey CustomerKey ShipperKey RequiredDate. RequiredDate. Employee_DimEmployee_Dim EmployeeKey EmployeeID. EmployeeID. Product_DimProduct_Dim ProductKey ProductID. ProductID. Customer_DimCustomer_Dim CustomerKey CustomerID. CustomerID. Shipper_DimShipper_Dim ShipperKey ShipperID. ShipperID.

Components of a Star SchemaEmployee_DimEmployee_Dim EmployeeKey EmployeeID. EmployeeID. EmployeeKeyTime_DimTime_Dim TimeKey TheDate. TheDate. TimeKeyProduct_DimProduct_Dim ProductKey ProductID. ProductID. ProductKeyCustomer_DimCustomer_Dim CustomerKey CustomerID. CustomerID. CustomerKeyShipper_DimShipper_Dim ShipperKey ShipperID. ShipperID. ShipperKey Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey TimeKey EmployeeKey ProductKey CustomerKey ShipperKey RequiredDate. RequiredDate. TimeKey CustomerKey ShipperKey ProductKey EmployeeKey Multipart Key MeasuresMeasures Dimensional Keys

Using a Snowflake Schema Secondary Dimension Tables Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey TimeKey EmployeeKey ProductKey CustomerKey ShipperKey RequiredDate. RequiredDate. Product_Brand_IdProduct_Brand_Id Product Brand Product Category ID Product_Category_IdProduct_Category_Id Product Category Product Category ID Product_DimProduct_Dim ProductKey Product Name Product Size Product Brand ID Primary Dimension Table

StarStarSnowflakeSnowflake Model Understandability Easier More Difficult Number of Tables Less More Query Complexity Simpler More Complex Query Performance Quicker Slower Choosing a Schema

Choosing the Grain Determining Data Requirements Choosing the Lowest Level of Detail Requires disk space Involves more process time Provides detailed data analysis capability Conforming Measures to the Stated Grain Design Considerations

 Establishing Dimensions Defining Dimension Characteristics Identifying Dimension Hierarchies Defining Conventional Dimensions Sharing Dimensions Among Other Data Marts Defining Other Types of Dimensions

Defining Dimension Characteristics Applying Characteristics to Dimension Tables Define a primary key Include highly correlated and descriptive character columns Designing for Usability and Extensibility Minimize or avoid using codes or abbreviations Create columns that are useful for levels of aggregation Avoid missing or null values Minimize the number of rows that change over time

Identifying Dimension Hierarchies Consolidated Hierarchy Store Location ContinentContinentCountryCountryRegionRegionCityCityStoreStore Separate Hierarchy Store Location Continent ContinentContinent Country CountryCountry Region RegionRegion City CityCity Store StoreStore 01

Defining Conventional Dimensions Time Dimension Break time down into individual attributes Represent time as work days, weekends, holidays, seasons, or fiscal periods Is limited to the grain of the fact table Geographic Dimension Product Dimension Customer Dimension

Sharing Dimensions Among Other Data Marts One instance exist and is shared among data marts TimeTime Multiple instances exist in individual data marts Sales Production Purchasing

Defining Other Types of Dimensions Defining Degenerate Dimensions Useful for consolidated reporting at the business event level Defining Junk Dimensions Useful for capturing important information without increasing the size of the fact table

 Establishing a Fact Table Defining the Fact Table Defining Precalculations Minimizing Fact Table Size Balancing Size and Performance

Defining the Fact Table Applying the Grain Ensuring Consistency Between Measures Using Additive and Numeric Values Summarizing Data

Defining Precalculations Single Row Precalculations Fact Table Values are derived from measures within that row Time Keys Product Keys PricePrice ~ ~ ~ ~ DiscountDiscount.10 ~ ~ ~ ~ RebateRebate 5.00 ~ ~ ~ ~ Extended Price ~ ~ ~ ~ Multiple Row Precalculations Fact Table Time Keys ~ ~... Product Keys ~ ~... Year-to-date sales 20, , ~ ~... Values are derived from multiple rows ((Price - (Price X Discount)) - Rebate) = Extended Price SUM(Extended Price) = Year-to-date Sales

Minimizing Fact Table Size Reducing the Number of Columns Data is redundant Data is not required for analysis Reducing the Size of Each Column Use surrogate keys Ensure that character and binary data is variable length

Balancing Size and Performance Storing Large Fact Tables Designing Star Schema Fact tables—long and narrow Dimension tables—short and wide Including Precalculated Data Improves query performance but increases the size of a fact table Moving Fact Table Columns to Another Table Reduces fact table size but may affect query performance

Lab A: Designing a Star Schema

 Implementing a Star Schema Estimating Size of the Data Warehouse Creating a Database Creating Tables Creating Constraints Creating Indexes

Variables: Years of data = 5 Customers = 10,000 Average number of transactions per customer per day = 4 Variables: Years of data = 5 Customers = 10,000 Average number of transactions per customer per day = 4 DescriptionDescription Number of rows in fact table Estimated row size of fact table Estimated data warehouse size Calculation Method 10,000 x 4 x 365 x 5 (7 IDs x 4 bytes) + (5 measures x 4 bytes) 48 bytes x 73,000,000 rows ValueValue 73,000,000 ~ 48 bytes ~3.5 GB Estimating Size of the Data Warehouse Size of Fact Table Grain Bytes per Row

Creating a Database Using CREATE DATABASE Options SIZE MAXSIZE FILEGROWTH Setting Database Options Trunc. log on chkpt. SELECT INTO/Bulkcopy

Creating Tables Creating a Table Specifying NULL or NOT NULL Generating Column Values

Creating Constraints Using PRIMARY KEY Constraints Does not allow duplicate values Allows index to be created Does not allow null values Using FOREIGN KEY Constraints Is the multipart key stored in the fact table Defines a reference to a column with a PRIMARY KEY or UNIQUE constraint Specifies the data values that are acceptable to update

Creating Indexes Steps for Creating Data Warehouse Indexes Define primary key in dimension tables Declare foreign key relationships Define primary key in fact table Define indexes on each foreign key in fact table Using Surrogate Keys Using Clustered Indexes Using Nonclustered Indexes Creating Composite Indexes

Recommended Practices Use Star Schema to Model Data Mart or Data Warehouse Database Do Not Mix Grain in Individual Fact Table Attributes Use Single Element Surrogate Keys When Defining Dimensions Define Shared Dimensions Use Facts That Are Both Numeric and Additive Choose Grain

Lab B: Implementing a Star Schema

Review Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing Dimensions Establishing a Fact Table Implementing a Star Schema