Download presentation
Presentation is loading. Please wait.
Published byEugene Richardson Modified over 9 years ago
1
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in how each are optimised. What is a cube and what are dimensions? High level overview of Performance Point Difference between a score card and a dashboard How do the data warehouse, cube and Performance Point relate to one another? At which point and how should calculated fields be added. The purpose and definition of Fact Tables, Dimension Tables etc. Quantifiable benefits organisations achieve through data warehousing
2
Data Warehouse vs Transaction Database Transaction Database – Handles day-to-day activities Takes Orders Manages Production Ships Orders Runs Accounts Changes frequently (every hour, minute, second) Data Warehouse – Handles Planning Looks at historical patterns of Sales Shows trends in demand and production Remains mainly static – New data is added and/or corrections made infrequently
3
Data Warehouse Overview Operational Source Systems Extract Data Staging Area Services: Clean, combine and standardise Conform dimensions NO USER QUERY SERVICES Data Store: Flat Files and Relational Tables Processing: Sorting and sequential processing Data Presentation Area Data Mart 1 DIMENSIONAL Atomic and Summary Data. Based on a single business process Extract DW Bus: Conformed Facts and Dimensions Data Mart 2,3, etc Data Access Tools Ad Hoc Query Tools Report Writers Analytic and Modelling Applications SQL MDX DMX Excel Reporting Services Report Builder Analysis Services PerformancePoint Access Load
4
A Data Warehouse Data Profiler Source Systems Corrections ETL Staging Tables DQ & ETL Control & Audit Metadata Data Quality DDS Reports NameDescription Data ProfilerAnalyses number of rows in tables, how many rows contain nulls, etc MetadataDatabase containing info about the data structure, data meaning, DQ rules, etc ETLExtract, Transform and Load process MDBMulti Dimensional Database MDB/ Cubes Pivot Tables Ad Hoc Queries Spreadsheets Reports Data Mining Dashboard Analytics Reports Scorecards Other BI Apps
5
Cubes The Data Warehouse Using an Enterprise Data Warehouse Data Profiler Source Systems Corrections ETL Staging Tables DQ & ETL Control & Audit Metadata Data Quality EDW ETL DDS BI Apps Finance Apps CRM Apps Reports NameDescription Data ProfilerAnalyses number of rows in tables, how many rows contain nulls, etc MetadataDatabase containing info about the data structure, data meaning, DQ rules, etc ETLExtract, Transform and Load process EDWEnterprise Data Warehouse
6
EXAMPLE OF A MULTI DIMENSIONAL DATABASE
7
What is a Multi Dimensional Database? Consider a sales operation: – We know that last year our total Widget Sales were 53,853 – How were those sales broken down? Broken down by Quarter: But we need more detail – What were the sales of Left, Right and Ambidextrous Widgets
8
Widget Sales in more detail Q1Q2Q3Q4 Total Sales828816148185011091653853 Left Handed Widgets660740794911 Right Handed Widgets6128650977078342 Ambidextrous Widgets1500165014991663 But we also need to know the sales by area:
9
Widget Sales in great detail Q1Q2Q3Q4 Sales8,27816,14818,50110,91653,853 Left Handed Widgets650740794911 England300330355461 Scotland200235260261 Wales150165179181 NI108 Right Handed Widgets6,1286,5097,7078,342 England2,3012,5653,4123,987 Scotland1,3871,4541,5501,651 Wales540600765690 NI1,9001,8901,9802,014 Ambidextrous Widgets1,5001,6501,4991,663 England799808789901 Scotland400501367460 Wales300341320299 NI1233
10
The Cube Q1Q2Q3Q4 Sales8,27816,14818,50110,91653,853 Left Handed Widgets650740794911 England300330355461 Scotland200235260261 Wales150165179181 NI108 Right Handed Widgets6,1286,5097,7078,342 England2,3012,5653,4123,987 Scotland1,3871,4541,5501,651 Wales540600765690 NI1,9001,8901,9802,014 Ambidextrous Widgets1,5001,6501,4991,663 England799808789901 Scotland400501367460 Wales300341320299 NI1233 4 labels 3 labels 4 labels This structure can hold a certain number of data elements. The number of elements is the total number of separate labels multiplied together i.e this structure can hold 4 x 3 x 4 data elements. (= 48) Which makes it look a lot like a cube… That’s as far as the cube analogy can go, because a real data warehouse will have many different sets of independent labels – They are called Dimensions
11
Dimension Tables Dimension Tables contain the names of each member of the dimension: Product_IDProduct_NameCategory 101Left Handed WidgetRetail 102Right Handed WidgetRetail 103Ambidextrous WidgetSpecialist Primary Key
12
Fact Table Region_IDProduct_IDQuarterUnitsPrice 1101130045.20 1101233045.20 1101335545.20 1101446144.00 1102120039.00 1102223539.00 1102326038.50 1102426138.50
13
Fact Table & Dimension Table Relationship Region_IDProduct_IDQuarterUnitsPrice 1101130045.20 1101233045.20 1101335545.20 1101446144.00 1102120039.00 1102223539.00 1102326038.50 1102426138.50 Product_IDProduct_Name 101Left Handed Widget 102Right Handed Widget 103Ambidextrous Widget One-to-Many Relationship
14
Normalised Data Structure – Structure designed for handling live transactions Dimensional Data Structure – AKA Denormalised Data Structure – Structure designed for querying Operational Data Store – Often a copy of a transactional database – Updated regularly from transactional systems – May be used for reporting Common terms used in data warehousing and what they mean - 1
15
Common terms used in data warehousing and what they mean - 2 Dimensional Modelling – Fact Table or Measure Table Holds historical records of events that occurred in a transactional system – Conformed Facts Facts from multiple fact tables are conformed when the technical definitions of the facts are equivalent. Conformed facts can have the same name in different tables and can be combined and compared mathematically – Dimension Table Has a number of Attributes, e.g. Product Name, Category, Colour, etc Used to slice and dice the data in the Fact Table – Attribute Property of a Dimension – Conformed Dimension Dimensions are conformed when the are exactly the same (including the keys) or one is a perfect subset ot the other The row headers produced in answer sets from two different conformed dimensions must be able to be matched perfectly
16
Conformed Dimensions - Example Business Processes Common Dimensions Date Product Store Promotion Warehouse Vendor Contract Shipper Retail Salesxxxx Retail Inventoryxxx Retail Deliveriesxxx Warehouse Inventoryxxxx Warehouse Deliveriesxxxx Purchase Ordersxxxxxx
17
Facts and Dimensions - Example
18
Common terms used in data warehousing and what they mean - 3 Slowly Changing Dimension (SCD) – A Dimension where the rows change slowly over time. An example would be a product Dimension where the Price attribute changes from year to year as a result of marketing/profitability issues. Type 1 SCD – Values are overwritten when they change Type 2 SCD – A new row is written when the value of an attribute changes Type 3 SCD – The previous value is put into an “Old Value” column Data Mart – A logical and physical subset of the data warehouse’s presentation area – Data Marts can be tied together using Drill-Across queries when their dimensions are conformed
19
Common terms used in data warehousing and what they mean - 4 Primary Key – Unique Identifier for a record Foreign Key – A value in a record that refers to a Primary Key in another table Surrogate Key – AKA Meaningless key, integer key, nonnatural key, artificial key, synthetic key – A new primary key that is created in a table to ensure uniqueness regardless of the source of new records. E.g. Two Customer tables in different sources may both have a primary key on CustomerID. This means that the same CustomerID could relate to two totally different customers, depending on which source they came from. So when the records are added to a Dimensional Data Warehouse, a new Primary Key is added which has no relationship to the sources’ primary keys Grain – The meaning of a single row in a table. The grain of a fact table represents the most atomic level by which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store“. Each record in this fact table is therefore uniquely defined by a day, product and store. In this case you would not be able to look at sales by the hour, nor could you look at individual sales Granularity – The level of detail captured in a data warehouse.
20
Surrogate Key (AKA Meaningless key, integer key, nonnatural key, artificial key, synthetic key) – Data Warehouses integrate data from multiple sources and therefore they can’t rely upon an application key in one table being different from another application key in another table in another database. – A new primary key that is created in a table to ensure uniqueness regardless of the source of new records. – Surrogate keys can be integers even if the application key isn’t This saves space e.g. Two Customer tables in different sources may both have a primary key on CustomerID. This means that the same CustomerID could relate to two totally different customers, depending on which source they came from. So when the records are added to a Dimensional Data Warehouse, a new Primary Key is added which has no relationship to the sources’ primary keys e.g Data changes over time. As an example, if the price of Left Handed Widgets is increased from 45.20 to 47.90, we need to keep the old data and add new data. Therefore we need a key that doesn’t depend solely upon the product ID
21
Star Schema
22
Snowflake Schema Star Snowflake
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.