Data Warehouse and the Star Schema CSCI 242 ©Copyright 2015, David C. Roberts, all rights reserved.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Data Warehousing M R BRAHMAM.
Data Warehouse Architecture Sakthi Angappamudali Data Architect, The Oregon State University, Corvallis 16 th May, 2005.
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Data Warehouse IMS5024 – presented by Eder Tsang.
Dimensional Modeling – Part 2
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
MIS 451 Building Business Intelligence Systems Logical Design (3) – Design Multiple-fact Dimensional Model.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data.
DATA WAREHOUSE (Muscat, Oman).
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
Data Warehouse and the Star Schema CSCI 242 ©Copyright 2014, David C. Roberts, all rights reserved.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
Data Warehouse and the Star Schema CSCI 242 ©Copyright 2015, David C. Roberts, all rights reserved.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
1 Data Warehouses BUAD/American University Data Warehouses.
Data Warehousing.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
BI Terminologies.
MIS2502: Data Analytics The Information Architecture of an Organization.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What.
MIS2502: Data Analytics Dimensional Data Modeling
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
UNIT-II Principles of dimensional modeling
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Data Warehousing Multidimensional Analysis
1 Agenda – 04/02/2013 Discuss class schedule and deliverables. Discuss project. Design due on 04/18. Discuss data mart design. Use class exercise to design.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Data Warehouse – Your Key to Success. Data Warehouse A data warehouse is a  subject-oriented  Integrated  Time-variant  Non-volatile  Restructure.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Data Warehouse and the Star Schema CSCI 242 ©Copyright 2016, David C. Roberts, all rights reserved.
Advanced Applied IT for Business 2
Data warehouse and OLAP
Data Warehouse—Subject‐Oriented
MIS2502: Data Analytics Dimensional Data Modeling
Data Warehouse.
Applying Data Warehouse Techniques
MIS2502: Data Analytics Dimensional Data Modeling
Competing on Analytics II
MIS2502: Data Analytics Dimensional Data Modeling
CMPE 226 Database Systems April 11 Class Meeting
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
MIS2502: Data Analytics Dimensional Data Modeling
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
MIS2502: Data Analytics Dimensional Data Modeling
Dimensional Model January 16, 2003
Applying Data Warehouse Techniques
Data Warehouse and the Star Schema
Applying Data Warehouse Techniques
Data Warehousing.
Presentation transcript:

Data Warehouse and the Star Schema CSCI 242 ©Copyright 2015, David C. Roberts, all rights reserved

2 Agenda Definition Why data warehouse Data warehouse in the enterprise Data warehouse design Relevance of normalization Star schema Processing the star schema

3 Definition Data warehouse: A repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated The point is that it’s not used for transaction processing; that is, it’s read-only. And the data can come from heterogeneous sources and it can all be queried in one database.

4 Why Data Warehouse A read lock on a table will prevent any updating of a table A long-running analytic operation on all rows of a table will prevent any updates Analysis (a.k.a. decision support) can seriously interfere with updates Using a duplicate table for analysis, recopied once a day, allows unlimited analysis and doesn’t interfere with OLTP.

5 Data Warehouse vs. OLTP OLTPDW PurposeAutomate day-to-day operations Analysis StructureRDBMSRMBMS Data ModelNormalizedDimensional AccessSQLSQL and business analysis programs DataData that runs the businessCurrent and historical information Condition of dataChanging, incompleteHistorical, complete, descriptive

6 How It Fits into the Enterprise OLTP3 Data Mart Data Warehouse Data Mart Data Mart Data Mart Application A Application B Application C User Extract, Transform And Load OLTP2 OLTP1 Integration

7 Data Warehouse Database Design A conventional database design for data warehouse would lead to joins on large amounts of data that would run slowly The star schema allows for fast processing of very large quantities of data in the data warehouse It also allows for very compact representation of events that occur many times

8 A Sample OLTP Schema orders products order items customers

9 Transformed to a Star Schema products customers sales channels times fact table dimension table dimension table dimension table dimension table

10 Star Schema Fact Table Customer Item Supplier Time Location

11 Fact Table The fact table contains the actual business process measurements or metrics for a specific event, called facts, usually numbers. A fact table represents facts by foreign keys from other tables, called “dimension” tables These foreign keys are usually generated keys, in order to save fact table space If you are building a DW of monthly sales in dollars, your fact table will contain monthly sales, one row per month. If you are building a DW of retail sales, each row of the fact table might have one row for each item sold.

12 Fact Table Design A fact table may contain one or more facts. Usually you create one fact table per business event. For example if you want to analyze the sales numbers and also advertising spending, they are two separate business processes. So you will create two separate fact tables, one for sales data and one for advertising cost data. On the other hand if you want to track the sales tax in addition to the sales number, you simply create one more fact column in the Sales fact table called Tax.

13 Dimension Table Dimension tables have a small number of rows (compared to fact tables) but a large number of columns For the lowest level of granularity of a fact in the fact table, a dimension table will have one row that gives all the categories for each value The dimension table is often all key, so a generated key is used so that the fact table reference to the dimension table can be small

14

15 Time Dimension Schema Column NameType Dim_IdINTEGER (4) MonthSMALL INTEGER (2) Month_NameVARCHAR (3) QuarterSMALL INTEGER (4) Quarter_NameVARCHAR (2) YearSMALL INTEGER (2)

16 Time Dimension Data TM _Dim_IdTM _MonthTM_Month_NameTM _Quarter TM_Quarter_N ame TM_Year Jan1Q Feb1Q Mar1Q Apr2Q May2Q22003

17 Location Dimension Schema Column NameType Dim_IdINTEGER (4) Loc_CodeVARCHAR (4) NameVARCHAR (50) State_NameVARCHAR (20) Country_NameVARCHAR (20)

18 Location Dimension Data Dim_IdLoc_CodeNameState_NameCountry_Name 1001 IL01Chicago LoopIllinoisUSA 1002 IL02Arlington HtsIllinoisUSA 1003NY01BrooklynNew YorkUSA 1004TO01TorontoOntarioCanada 1005MX01Mexico CityDistrito FederalMexico

19 Product Data Schema Column NameType Dim_IdINTEGER (4) SKUVARCHAR (10) NameVARCHAR (30) CategoryVARCHAR (30)

20 Product Data Dim_IdSKUNameCategory 1001DOVE6KDove Soap 6PkSanitary 1002MLK66F#Skim Milk 1 GalDairy 1003SMKSAL55Smoked Salmon 6ozMeat

Categories in Dimension Tables Categories may or may not be hierarchical; or can be both Categories provide canned values that can be given to users for queries 21

22 Granularity (Grain) of the Fact Table The level of detail of the fact table is known as the grain of the fact table. In this example the grain of the fact table is monthly sales number per location per product.

Note about Granularity There may be multiple star schemas at different levels of granularity, especially for very large data warehouses The first could be the finest—say, each transaction such as a sale The next could be an aggregation, like the previous example There could be more levels of aggregation 23

24 Design Approach 1. Identify the business process. In this step you will determine what is your business process that your data warehouse represents. This process will be the source of your metrics or measurements. 2. Identify the Grain You will determine what does one row of fact table mean. In the previous example you have decided that your grain is 'monthly sales per location per product'. It might be daily sales or even each sale could be one row. 3. Identify the Dimensions Your dimensions should be descriptive (SQL VARCHAR or CHARACTER) as much as possible and conform to your grain. 4. Finally Identify the facts In this step you will identify what are your measurements (or metrics or facts). The facts should be numeric and should confirm to the grain defined in step 2.

25 Monthly Sales Fact Table Schema Field NameType TM_Dim_IdINTEGER (4) PR_ Dim_IdINTEGER (4) LOC_ Dim_IdINTEGER (4) SalesINTEGER (4)

26 Monthly Sales Fact Table Data TM_Dim_IdPR_ Dim_IdLOC_ Dim_IdSales

27 Data Mart A data mart is a collection of subject areas organized for decision support based on the needs of a given department. Examples: finance has their data mart, marketing has theirs, sales has theirs and so on. Each department generally runs its own data mart. Ownership of the data mart allows each department to bypass the control that might coordinate the data found in the different departments. Each department's data mart is peculiar to and specific to its own needs. Typically, the database design for a data mart is built around a star-join structure designed for that department. The data mart contains only a modicum of historical information and is granular only to the point that it suits the needs of the department. The data mart may also include data from outside the organization, such as purchased normative salary data that might be purchased by an HR department.

28 About the Data Mart The structure of the data in the data mart may or may not be compatible with the structure of data in the data warehouse. The amount of historical data found in the data mart is different from the history of the data found in the warehouse. Data warehouses contain robust amounts of history, while data marts usually contain modest amounts of history. The subject areas found in the data mart are only faintly related to the subject areas found in the data warehouse. The relationships found in the data mart may not be those relationships that are found in the data warehouse. The types of queries satisfied in the data mart are quite different from those queries found in the data warehouse.

Walmart’s Data Warehouse Half a petabyte in capacity (.5 x bytes) World’s largest DW Tracks 100 million customers buying billions of products every week Every sale from every store is transmitted to Bentonville every night Walmart has more than 18,000 retail stores, employs 2.2 million, serves 245 million customers every week 29

Typical Questions How much orange juice did we sell last year, last month, last week in store X? What internal factors (position in store, advertising campaigns...) influence orange juice sales? How much orange juice are we going to sell next week, next month, next year? 30

31