Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Chapter 10: Designing Databases
BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.
BUSINESS DRIVEN TECHNOLOGY Plug-In T4 Designing Database Applications.
ERWin Template Overview By: Dave Wentzel. Agenda u Overview of Templates/Macros u Template editor u Available templates u Independent column browser u.
Management Information Systems, Sixth Edition
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
The Relational Database Model. 2 Objectives How relational database model takes a logical view of data Understand how the relational model’s basic components.
BUSINESS DRIVEN TECHNOLOGY
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Building a Data Warehouse with SQL Server Presented by John Sterrett.
Data warehousing theory and modelling techniques Building Dimensional Models.
ETL Design and Development Michael A. Fudge, Jr.
ETL By Dr. Gabriel.
Page 1 ISMT E-120 Desktop Applications for Managers Introduction to Microsoft Access.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
The Relational Database Model
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Content Resource- Elamsari and Navathe, Fundamentals of Database Management systems.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
McGraw-Hill/Irwin Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved. ENTERPRISE INFORMATION SYSTEMS A PATTERN BASED APPROACH Chapter.
RAJIKA TANDON DATABASES CSE 781 – Database Management Systems Instructor: Dr. A. Goel.
Database Technical Session By: Prof. Adarsh Patel.
Dimensional model. What do we know so far about … FACTS? “What is the process measuring?” Fact types:  Numeric Additive Semi-additive Non-additive (avg,
Instructor: Churee Techawut Basic Concepts of Relational Database Chapter 5 CS (204)321 Database System I.
Lecture 7 Integrity & Veracity UFCE8K-15-M: Data Management.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Bus Architecture. Value Chain Identifies the natural logical flow of an organization’s primary activities Operational source systems produce snapshots.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Storing Organizational Information - Databases
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
M1G Introduction to Database Development 2. Creating a Database.
ISQS 3358, Business Intelligence Supplemental Notes on the Term Project Zhangxi Lin Texas Tech University 1.
Chapter 9 Logical Database Design : Mapping ER Model To Tables.
IMS 4212: Data Manipulation 1 Dr. Lawrence West, MIS Dept., University of Central Florida Additional Data Manipulation Statements INSERT.
INFO275 Database Management Term Project. Overview Your project will be to define, design and build a functioning database, to support an application.
1 Agenda – 04/02/2013 Discuss class schedule and deliverables. Discuss project. Design due on 04/18. Discuss data mart design. Use class exercise to design.
Fact Table The fact table stores business events. The attributes explain the conditions of the entity at the time the business event happened.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 3 The Relational Database Model.
7 Strategies for Extracting, Transforming, and Loading.
Session 1 Module 1: Introduction to Data Integrity
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Entity Relationship Diagram (ERD). Objectives Define terms related to entity relationship modeling, including entity, entity instance, attribute, relationship.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Description and exemplification use of a Data Dictionary. A data dictionary is a catalogue of all data items in a system. The data dictionary stores details.
Oracle Business Intelligence Foundation - Commonly Used Features in Repository.
Logical Database Design and Relation Data Model Muhammad Nasir
Physical Layer of a Repository. March 6, 2009 Agenda – What is a Repository? –What is meant by Physical Layer? –Data Source, Connection Pool, Tables and.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Rationale Databases are an integral part of an organization. Aspiring Database Developers should be able to efficiently design and implement databases.
In this session, you will learn to: Manage databases Manage tables Objectives.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Logical Database Design and the Rational Model
View Integration and Implementation Compromises
Data Warehouse.
Star Schema.
Database Systems Instructor Name: Lecture-12.
Dimensional Model January 14, 2003
Databases and Information Management
CHAPTER SIX OVERVIEW SECTION 6.1 – DATABASE FUNDAMENTALS
CHAPTER 4: LOGICAL DATABASE DESIGN AND THE RELATIONAL MODEL
Databases and Information Management
Databases and Information Management
Presentation transcript:

Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management

The Dimensional Model Our goal: Develop the physical design.

1. Developing The High-Level Dimensional Model The high-level dimensional model is a data model at the entity level. We start straight from the bus matrix. The design process generally follows these four steps: The Four-step Modeling Process a.Refer to the business processes in the bus matrix. Look at the rows for each process and try to identify potential entities for the data warehouse tables. You are not interested in particular attributes of the entities at this point. You just try to identify entities.

The Data Warehouse Bus Matrix Value chain

1. Developing The High-Level Dimensional Model b.Declare the grain. The grain is the level of detail captured in the fact table. The answer might be one row per order item, one row per customer call, or one row per employee status change.

1. Developing The High-Level Dimensional Model c.Choose the dimensions. Most of them will come from your understanding of the business processes and the bus matrix. It helps to refer to your communication with the operational people and their requests to verify your choice of dimensions. It is not a bad idea to start listing attributes for each dimension at this point.

1. Developing The High-Level Dimensional Model d.Choose the measures in the fact table. There are usually a set of numbers in the operational system. For example, product quantity, product sold price, discounts, etc. The above numbers support the business process (the order process in this case). From these numbers we can derive a series of facts such as the sales amount, the sales number (how many sales), etc.

The High-Level Dimensional Model

2. Developing the Detailed Dimensional Model We continue to finalize the development of attributes for the fact and dimensional tables. We assign attributes according to the table (fact or dimension) they belong. We Keep a list of issues as they arise from the design process (with respect to ETL, purpose of attributes) One of the most important decisions: – The assignment of the type 1 and type 2 dimension attributes.

3. Building the Physical Model Of course the best tool to build the physical model is a data modeling tool: deling_tools deling_tools These tools can forward engineer, that is, create the DDL statements to actually create your database. For our lab, you will be working directly with the ERD from SQL Server 2012 or with a system that you are willing to download and test.

Considerations for building the Physical Model Surrogate keys. The primary key for dimension tables should be a surrogate key assigned and managed by the DW/BI system. Create surrogate keys in SQL Server by enabling the IDENTITY property on the key column. Use integer (number) values for the surrogate PKs.

Considerations for building the Physical Model String Columns. You need to define the text column lengths in the physical design. You expect to see most of them in the dimension tables as in the fact table we find mostly measures. Use Unicode data where possible so that you can capture data from multiple heterogeneous data sources.

Considerations for building the Physical Model Null Values We avoid null values in the DW/BI database as we do not want them in the TPS database. In the TPS database we avoid null values by using default values. In the DW database we setup the prevention of nulls in the ETL system!

Considerations for building the Physical Model Housekeeping columns Every dimension table that has type 2 attributes needs to have additional columns that track the dates for which the dimension row is valid. For example, the RowStartDate and RowEndDate columns indicate the date range for which the dimension row is valid. Another useful attribute to have is the RowChangeReason to capture the reasoning behind the change in the slow changing dimension (SCD) So, practically, we need three columns for the SCDs.

Constraints and Supporting Objects Entity and Referential Integrity Constraints. All tables should have a primary key, which is that column or set of columns that will identify a single row when constrained to a single value. This is known as entity integrity. For the dimension tables, the primary key is obviously the surrogate key. For the fact tables, the primary key is usually a combination of all of the foreign keys from each dimension. In practice, data warehouse DBAs often do not create referential integrity constraints. Maintaining these structures is extremely expensive and risky because they depend on the ETL system to do the integrity work, something that might not be accurate and feasible. If you feel it's important, test the options in your environment to understand the cost.

Constraints and Supporting Objects Indexing Strategies. Dimension Table Indexing. Dimension tables with a single column integer surrogate primary key should have a clustered primary key index. A clustered index is created automatically for the PK in SQL server. A clustered index determines the physical order of data in a table. There can only be one clustered index per table.

Constraints and Supporting Objects Views All business user access to the relational data warehouse should be done through views. The rationale is to provide a protective layer between the users and the underlying database. This layer will be very helpful when you need to modify the DW/BI system after it is in production. The table names shouldn't even show up in a user's list of database objects. You may want to omit some columns from the view, especially some of the housekeeping columns described previously.

4. The Metadata Plan Metadata: the Bermuda Triangle of data warehousing.

The Purpose of Metadata Technical metadata (The usual metadata everyone refers to) Defines the objects and processes that make up the warehouse itself from a technical perspective. This includes the system metadata that defines the data structures, like tables, columns, data types, dimensions, and measures. Business metadata It tells us what data we have, where it comes from, what it means, and what its relationship is to other data in the warehouse. Business metadata often serves as documentation for the data warehouse. Process metadata Describes the results of various operations in the warehouse. In the ETL process, each task logs key data about its execution, like start time, end time, rows processed, result, and so on. This data is initially valuable for troubleshooting the ETL or query process. After people begin using the system, this data is a critical input to the performance monitoring and improvement process.

The (non-existent) Metadata Repository There is a need to store all of this metadata. Ideally, each tool would keep its metadata in a shared repository where it can be easily reused by other tools and integrated for reporting and analysis purposes. For example, when you use your ETL tool to design a package to load your dimensions, the ETL tool would save that package in the repository in a set of structures that at least allow inquiry into the content and structure of the package. If you wanted to know what transforms were applied to the data in a given dimension table, you could query the repository. Unfortunately, this wonderful, integrated, shared repository is rare in the DW/BI world today, and when it does exist, it must be built and maintained with significant custom effort. Each component keeps its own metadata in its own structures and formats.

Creating the Metadata Strategy 1.Our primary goal is to concentrate on business metadata first. 2.Educate the DW/BI team and key business users about the importance of metadata and the metadata strategy. 3.Design and implement the delivery approach for getting business metadata out to the user community. 4.Typically, this involves creating metadata access tools, like reports and browsers.