Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.

Slides:



Advertisements
Similar presentations
The Organisation As A System An information management framework The Performance Organiser Data Warehousing.
Advertisements

Dimensional Modeling.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Supervisor : Prof . Abbdolahzadeh
IS 4420 Database Fundamentals Chapter 11: Data Warehousing Leon Chen
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Technical BI Project Lifecycle
Data Warehousing M R BRAHMAM.
DATA WAREHOUSE DATA MODELLING
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Data Warehouse IMS5024 – presented by Eder Tsang.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) The Data Warehouse Lifecycle Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
MIS 451 Building Business Intelligence Systems Logical Design (3) – Design Multiple-fact Dimensional Model.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
ETL Design and Development Michael A. Fudge, Jr.
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
Business Intelligence
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
CS 157B: Database Management Systems II March 20 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
BUS1MIS Management Information Systems Semester 1, 2012 Week 6 Lecture 1.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
1 Data Warehouses BUAD/American University Data Warehouses.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Data Warehousing.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
BI Terminologies.
Chapter 9: data warehousing
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
ITGS Databases.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
1 Agenda – 04/02/2013 Discuss class schedule and deliverables. Discuss project. Design due on 04/18. Discuss data mart design. Use class exercise to design.
Chapter 11: Data Warehousing Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Warehousing.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
Foundations of information systems : BIS 1202 Lecture 4: Database Systems and Business Intelligence.
Supervisor : Prof . Abbdolahzadeh
Data warehouse and OLAP
Data Warehouse.
Star Schema.
Applying Data Warehouse Techniques
Overview and Fundamentals
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
Dimensional Model January 16, 2003
Data Warehousing Concepts
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Data Warehousing.
Presentation transcript:

Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in how each are optimised. What is a cube and what are dimensions? High level overview of Performance Point Difference between a score card and a dashboard How do the data warehouse, cube and Performance Point relate to one another? At which point and how should calculated fields be added. The purpose and definition of Fact Tables, Dimension Tables etc. Quantifiable benefits organisations achieve through data warehousing

Data Warehouse vs Transaction Database Transaction Database – Handles day-to-day activities Takes Orders Manages Production Ships Orders Runs Accounts Changes frequently (every hour, minute, second) Data Warehouse – Handles Planning Looks at historical patterns of Sales Shows trends in demand and production Remains mainly static – New data is added and/or corrections made infrequently

Data Warehouse Overview Operational Source Systems Extract Data Staging Area Services: Clean, combine and standardise Conform dimensions NO USER QUERY SERVICES Data Store: Flat Files and Relational Tables Processing: Sorting and sequential processing Data Presentation Area Data Mart 1 DIMENSIONAL Atomic and Summary Data. Based on a single business process Extract DW Bus: Conformed Facts and Dimensions Data Mart 2,3, etc Data Access Tools Ad Hoc Query Tools Report Writers Analytic and Modelling Applications SQL MDX DMX Excel Reporting Services Report Builder Analysis Services PerformancePoint Access Load

A Data Warehouse Data Profiler Source Systems Corrections ETL Staging Tables DQ & ETL Control & Audit Metadata Data Quality DDS Reports NameDescription Data ProfilerAnalyses number of rows in tables, how many rows contain nulls, etc MetadataDatabase containing info about the data structure, data meaning, DQ rules, etc ETLExtract, Transform and Load process MDBMulti Dimensional Database MDB/ Cubes Pivot Tables Ad Hoc Queries Spreadsheets Reports Data Mining Dashboard Analytics Reports Scorecards Other BI Apps

Cubes The Data Warehouse Using an Enterprise Data Warehouse Data Profiler Source Systems Corrections ETL Staging Tables DQ & ETL Control & Audit Metadata Data Quality EDW ETL DDS BI Apps Finance Apps CRM Apps Reports NameDescription Data ProfilerAnalyses number of rows in tables, how many rows contain nulls, etc MetadataDatabase containing info about the data structure, data meaning, DQ rules, etc ETLExtract, Transform and Load process EDWEnterprise Data Warehouse

EXAMPLE OF A MULTI DIMENSIONAL DATABASE

What is a Multi Dimensional Database? Consider a sales operation: – We know that last year our total Widget Sales were 53,853 – How were those sales broken down? Broken down by Quarter: But we need more detail – What were the sales of Left, Right and Ambidextrous Widgets

Widget Sales in more detail Q1Q2Q3Q4 Total Sales Left Handed Widgets Right Handed Widgets Ambidextrous Widgets But we also need to know the sales by area:

Widget Sales in great detail Q1Q2Q3Q4 Sales8,27816,14818,50110,91653,853 Left Handed Widgets England Scotland Wales NI108 Right Handed Widgets6,1286,5097,7078,342 England2,3012,5653,4123,987 Scotland1,3871,4541,5501,651 Wales NI1,9001,8901,9802,014 Ambidextrous Widgets1,5001,6501,4991,663 England Scotland Wales NI1233

The Cube Q1Q2Q3Q4 Sales8,27816,14818,50110,91653,853 Left Handed Widgets England Scotland Wales NI108 Right Handed Widgets6,1286,5097,7078,342 England2,3012,5653,4123,987 Scotland1,3871,4541,5501,651 Wales NI1,9001,8901,9802,014 Ambidextrous Widgets1,5001,6501,4991,663 England Scotland Wales NI labels 3 labels 4 labels This structure can hold a certain number of data elements. The number of elements is the total number of separate labels multiplied together i.e this structure can hold 4 x 3 x 4 data elements. (= 48) Which makes it look a lot like a cube… That’s as far as the cube analogy can go, because a real data warehouse will have many different sets of independent labels – They are called Dimensions

Dimension Tables Dimension Tables contain the names of each member of the dimension: Product_IDProduct_NameCategory 101Left Handed WidgetRetail 102Right Handed WidgetRetail 103Ambidextrous WidgetSpecialist Primary Key

Fact Table Region_IDProduct_IDQuarterUnitsPrice

Fact Table & Dimension Table Relationship Region_IDProduct_IDQuarterUnitsPrice Product_IDProduct_Name 101Left Handed Widget 102Right Handed Widget 103Ambidextrous Widget One-to-Many Relationship

Normalised Data Structure – Structure designed for handling live transactions Dimensional Data Structure – AKA Denormalised Data Structure – Structure designed for querying Operational Data Store – Often a copy of a transactional database – Updated regularly from transactional systems – May be used for reporting Common terms used in data warehousing and what they mean - 1

Common terms used in data warehousing and what they mean - 2 Dimensional Modelling – Fact Table or Measure Table Holds historical records of events that occurred in a transactional system – Conformed Facts Facts from multiple fact tables are conformed when the technical definitions of the facts are equivalent. Conformed facts can have the same name in different tables and can be combined and compared mathematically – Dimension Table Has a number of Attributes, e.g. Product Name, Category, Colour, etc Used to slice and dice the data in the Fact Table – Attribute Property of a Dimension – Conformed Dimension Dimensions are conformed when the are exactly the same (including the keys) or one is a perfect subset ot the other The row headers produced in answer sets from two different conformed dimensions must be able to be matched perfectly

Conformed Dimensions - Example Business Processes Common Dimensions Date Product Store Promotion Warehouse Vendor Contract Shipper Retail Salesxxxx Retail Inventoryxxx Retail Deliveriesxxx Warehouse Inventoryxxxx Warehouse Deliveriesxxxx Purchase Ordersxxxxxx

Facts and Dimensions - Example

Common terms used in data warehousing and what they mean - 3 Slowly Changing Dimension (SCD) – A Dimension where the rows change slowly over time. An example would be a product Dimension where the Price attribute changes from year to year as a result of marketing/profitability issues. Type 1 SCD – Values are overwritten when they change Type 2 SCD – A new row is written when the value of an attribute changes Type 3 SCD – The previous value is put into an “Old Value” column Data Mart – A logical and physical subset of the data warehouse’s presentation area – Data Marts can be tied together using Drill-Across queries when their dimensions are conformed

Common terms used in data warehousing and what they mean - 4 Primary Key – Unique Identifier for a record Foreign Key – A value in a record that refers to a Primary Key in another table Surrogate Key – AKA Meaningless key, integer key, nonnatural key, artificial key, synthetic key – A new primary key that is created in a table to ensure uniqueness regardless of the source of new records. E.g. Two Customer tables in different sources may both have a primary key on CustomerID. This means that the same CustomerID could relate to two totally different customers, depending on which source they came from. So when the records are added to a Dimensional Data Warehouse, a new Primary Key is added which has no relationship to the sources’ primary keys Grain – The meaning of a single row in a table. The grain of a fact table represents the most atomic level by which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store“. Each record in this fact table is therefore uniquely defined by a day, product and store. In this case you would not be able to look at sales by the hour, nor could you look at individual sales Granularity – The level of detail captured in a data warehouse.

Surrogate Key (AKA Meaningless key, integer key, nonnatural key, artificial key, synthetic key) – Data Warehouses integrate data from multiple sources and therefore they can’t rely upon an application key in one table being different from another application key in another table in another database. – A new primary key that is created in a table to ensure uniqueness regardless of the source of new records. – Surrogate keys can be integers even if the application key isn’t This saves space e.g. Two Customer tables in different sources may both have a primary key on CustomerID. This means that the same CustomerID could relate to two totally different customers, depending on which source they came from. So when the records are added to a Dimensional Data Warehouse, a new Primary Key is added which has no relationship to the sources’ primary keys e.g Data changes over time. As an example, if the price of Left Handed Widgets is increased from to 47.90, we need to keep the old data and add new data. Therefore we need a key that doesn’t depend solely upon the product ID

Star Schema

Snowflake Schema Star Snowflake