Applying Data Warehouse Techniques

Slides:



Advertisements
Similar presentations
Tips and Tricks for Dimensional Modeling
Advertisements

IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
Jose Chinchilla MCITP: Database Administrator, SQL Server 2008 MCITP: Business Intelligence Design and Implementation, SQL Server 2008 President & CEO,
DATA WAREHOUSE DATA MODELLING
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Building a Data Warehouse with SQL Server Presented by John Sterrett.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Jeremy Brinkman Director of Administrative Systems University of Northwestern Ohio Great Lakes Users’ Group Conference August 10-11,
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
Business Intelligence
1 Brett Hanes 30 March 2007 Data Warehousing & Business Intelligence 30 March 2007 Brett Hanes.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
OnLine Analytical Processing (OLAP)
Business Intelligence Zamaneh Jahed. What is Business Intelligence? Business Intelligence (BI) is a broad category of applications and technologies for.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
1 Data Warehouses BUAD/American University Data Warehouses.
INVENTORY CASE STUDY. Introduction Optimized inventory levels in stores can have a major impact on chain profitability: minimize out-of-stocks reduce.
MIS2502: Data Analytics The Information Architecture of an Organization.
ISQS 3358, Business Intelligence Supplemental Notes on the Term Project Zhangxi Lin Texas Tech University 1.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Building Dashboards SharePoint and Business Intelligence.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
INTELLIGENT DATA SOLUTIONS OM.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Just Enough Database Theory for Power Pivot / Power BI
Telling Stories with Data
Still a Toddler but growing fast
Operation Data Analysis Hints and Guidelines
On-Line Analytic Processing
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Summarized from various resources Modern Database Management
Data Warehousing Business Intelligence
Data Warehouse.
Star Schema.
Overview and Fundamentals
CMPE 226 Database Systems April 11 Class Meeting
Database Vs. Data Warehouse
Unidad II Data Warehousing Interview Questions
Applying Data Warehouse Techniques
An Introduction to Data Warehousing
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
Applying Data Warehouse Techniques
Data warehouse architecture CIF, DM Bus Matrix Star schema
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
Role Playing Dimensions (p )
Data Warehousing Concepts
Applying Data Warehouse Techniques
Review of Major Points Star schema Slowly changing dimensions Keys
Building a Microsoft BI solution step-by-step
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
Analytics, BI & Data Integration
Applying Data Warehouse Techniques
Review of Major Points Star schema Slowly changing dimensions Keys
Applying Data Warehouse Techniques
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Applying Data Warehouse Techniques Going from Descriptive to Predictive

About Me Graduated from Tennessee Tech in December 2011 Computer Science Nashville Native Working with SQL Server since 2010 Mostly Data Warehousing/Business Intelligence Some Application Development Twitter: @SpencerSwindell Email: spencer.swindell@gmail.com

Think Data Insights Enterprise Data Platform SQL BI Solutions Data Integration, Conversion, and Migrations Analytic Assessment & Roadmap Based in Nashville, TN

Overview The Case for a Data Warehouse Building the Warehouse Dimensional Modeling Using the Data Warehouse Building a dashboard with PowerBI Machine Learning Demo’s will be based on Freddie Mac Data Loans from 1999 – 2016 ~22 million loans ~1 billon service records

Value of a Data Warehouse Data can be stored an used in many forms in a business Application Databases Excel workbooks 3rd party applications/data sources Event stream NoSQL Databases Would like to analyze data across all these sources Data can be loaded into a centralized data warehouse for analysis

OLTP vs OLAP Application systems are typically optimized for dealing with a few rows of data at a time On-Line Transactional Processing (OLTP) Usually working with a single record at a time Processing a sales transaction, looking up a sales record for a return This is inefficient for analytical processing Working with thousands to millions of records at a time On-Line Analytical Processing (OLAP) Viewing Total Sales Orders by Sales Territory for FY 2016

The Dimensional Model Popularized by Ralph Kimball (The Data Warehouse Toolkit) ETL Processes data from source systems into a dimensional model The ETL will be about 70% of a DW Project Dimensional Models contain two types of tables Dimension Tables Nouns of the business – Describe the business process Examples: Date, Customer, Product, Store, Geography, Employee Fact Tables Verbs of the business – Measure the business process Examples: Sales, Patient Visit, Inventory, Attendance, Claims Gives us Scalability, Performance, and Simplicity

Dimension Tables Holds descriptive characteristics of a business process De-normalized tables allows for simple queries Dimension tables are small compared to fact tables Surrogate Key generated for each row and used in fact table Allows for single column joins using integers

Fact Tables Largest tables in the warehouse Defined by the Grain Columns are surrogate keys to dimensions and measurement values Typically will have millions of rows, in some cases billions Defined by the Grain The grain indicates what an individual row represents in a fact table “One row per line item in a sales transaction”

Star Schema

Modeling SQL Saturday

Slowly Changing Dimensions Type I – Update the record, historical data no persevered Type II – Add a new row, historical data persevered Type III – Add a new column, allows for comparative analysis

Type I Dimension Updates Initial State: Updated State:

Type II Dimension Updates Initial State: Updated State:

Type III Dimension Updates Initial State: Updated State:

Other Dimensions Other Types of Dimensions Mini-Dimension (Type IV) Subset of data to reduce table size of a large dimension Type VI Combination of techniques in types 1,2 and 3 (1+2+3 = 6) Junk Dimension Low cardinality elements combined into a single dimension Degenerate Dimension High cardinality elements left on fact table Role-Playing Dimension A dimension used many times in single business process

Type of Fact Tables Multiple ways to measure and store business events Some of these are used together to create a complete picture Transactional Fact Table Records events as they occur Data is typically not revisited Periodic Snapshot Fact Table Events are measured on intervals Data is not revisited, new snapshots are inserted into the table Accumulating Snapshot Fact Table Used for tables with defined beginning, intermediate, and end milestones Data is revisited and updated with new information

FreddicMac Data Data from from Freddie Mac Home Mortgages originating from January 1999 through March 2017 22,942,396 Loans 1,080,321,205 Loan Payments All loans are Fixed Rate, 15/20/30 Terms Data feed into cube Dashboard with PowerBI Return Interest Rate based on historical data Code: https://github.com/shswindell42/Freddie