Download presentation
Presentation is loading. Please wait.
Published byBrent Richardson Modified over 9 years ago
1
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing (DW) Week 10 Other topics in DW
2
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 2 Advanced Dimensional modeling Slowly-Changing Dimensions Data Hierarchy Physical Database Design OLAP Cubes and Operations Outline
3
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 3 Slowly Changing Dimension ( SCD) is a dimension that changes slowly over time, rather than changing on regular schedule, time-base. Need to track changes in dimension attributes in order to report historical data For example, how will you deal with a customer dimension data if a customer changes an address from New Zealand to Australia ? Slowly Changing Dimension (SCD) CustomerKeyCustomerIDNameCountry 1John1John New Zealand CustomerKeyCustomerIDNameCountry 1John1John Australia
4
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 4 SCD has 6 types Type 0 - The passive method Type 1 - Overwriting the old value Type 2 - Creating a new additional record Type 3 - Adding a new column Type 4 - Using historical table Type 6 - Combine approaches of types 1,2,3 ( 1+2+3=6 ) Slowly Changing Dimension (SCD) This is why no Type 5
5
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 5 SCD Type 0 - The passive method No special action performed upon dimensional changes Some dimension data can remain the same as it was first time inserted, others may be overwritten. SCD Type 0
6
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 6 SCD Type 1 - Overwriting the old value NO history of dimension changes is kept in the database The old dimension value is simply overwritten be the new one. Easy to maintain and is often use for data which changes are caused by processing corrections (e.g., miss spelling) SCD Type 1 CustomerKeyCustomerIDNameCountry 1John1John New Sealand CustomerKeyCustomerIDNameCountry 1John1John New Zealand
7
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 7 SCD Type 2 - Creating a new additional record All history of dimension changes is kept in the database Attribute change captured by adding a new row with a new surrogate key to the dimension table Also 'effective date ' and 'current indicator ' columns are used SCD Type 2 CustomerKeyCustomerIDNameCountryStartDateEndDateFlag 1John1JohnNew Zealand01/01/201431/01/2014Y CustomerKeyCustomerIDNameCountryStartDateEndDateFlag 1John1JohnNew Zealand01/01/201431/12/2014N 2John1JohnAustralia01/01/201531/12/2015Y
8
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 8 SCD Type 3 – Adding a new column Only the current and previous value of dimension is kept in the database New value loaded into 'current ' column and the old one into 'previous ' column History is limited to the number of columns created for storing historical data The least commonly used technique SCD Type 3 CustomerKeyCustomerIDNameCurrent Country Previous Country 1John1John New Zealand CustomerKeyCustomerIDNameCurrent Country Previous Country 1John1John AustraliaNew Zealand
9
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 9 SCD Type 4 – Using Historical Table A separate historical table is used to track all dimension's attribute historical changes for each of the dimension The 'main ' dimension table keeps only the current data SCD Type 4 CustomerKeyCustomerIDNameCountry 1John1John Australia CustomerKeyCustomerIDNameCountryStartDateEndDate 1John1JohnNew Zealand01/01/201431/12/2014 1John1JohnAustralia01/01/201531/12/2015 Main table Historical table
10
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 10 SCD Type 6 – Combine approaches of types 1,2,3 Type 1 = Overwrite the old value Type 2 = Add a new record Type 3 = Add a new column SCD Type 6 CustomerKeyCustomerIDNameCurrent Country Historical Country StartDateEndDateFlag 1John1John New Zealand 01/01/201431/12/2014N 2John1JohnAustraliaNew Zealand01/01/201531/12/2015Y
11
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 11 Date (Time) dimensions Location dimensions Product dimensions Data Hierarchy Region Country City Category Product Type Product Quarter Year Month Week Day
12
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 12 Dimensional modeling Dimensions De-normalized (Star) or Normalized (Snowflake and Fact Constellation) Slowly Changing Dimension (6 types) Some interesting tutorial http://www.youtube.com/watch?v=Eam2SmYgIzg http://www.youtube.com/watch?v=Eam2SmYgIzg Data Hierarchy Dimension Modeling : A Summary
13
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 13 Database tables, indexes, partitions, summary tables Table design Dimension tables and Fact tables PKs, FKs, surrogate/natural keys, and constraints Partition design Sort and group data into different partitions Help speed up query and improve scalability! Index design To speed up query!! Physical Database Design Why do we have to care much about query’s “SPEED” in DW ??
14
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 14 Partitioning Split a table into several smaller tables Partitions can be stored in a single database, or multiple databases Improve scalability (when storing data) and performance (when storing and querying data) Think about the “ Fact Table ” that contains 1 billion data records ! Approaches Vertical partitioning Each small table contains some columns of the original table Horizontal partitioning Each small table contains some rows of the original table Partition Design
15
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 15 Vertical partitioning Each small table contains some columns of the original table Partition Design
16
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 16 Horizontal partitioning Each small table contains some rows of the original table Partition Design Which partitioning approach (Horizontal or Vertical) best helps DW database? ?
17
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 17 Index is useful and speed up processing When a column is used in “ searching/matching ” Country is used for searching in the WHERE clause So, indexing the “ Country ” will make the query processes faster! Index Design SELECT * FROM Customers WHERE Country =“New Zealand”
18
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 18 What about Queries in OLAP database \ Index Design SELECT p.ProductName, sum(f.TotalPrice) as [Total Revenue] FROM dimProducts p, dimCustomers c, factOrders f, dimTime t WHERE f.ProductKey=p.ProductKey AND f.CustomerKey=c.CustomerKey AND f.OrderDateKey=t.TimeKey AND c.Country ='UK' AND t.QuaterOfYear = 1 AND t.Year in (1996,1997,1998) GROUP BY p.ProductName ORDER BY p.ProductName From the above query, which column should be “indexed” ? ?
19
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 19 OLAP Cubes
20
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 20 An array of data understood in terms of its 0 or more dimensions. You can make an OLAP Cube from any DW schema For example, A star schema with 5 dimensions to a cube with 3 dimensions OLAP Cubes *from http://visibledata.wordpress.com/data/datacloud/datacube/
21
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 21 What a cube represents? Dimensions Data cell = The fact that relates to all dimensions OLAP Cubes
22
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 22 With a cube, you can… Slice Dice Drill Down Roll up Pivot Cubes Operations
23
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 23 Slice operation To create a rectangular subset of a cube with a fewer dimension by choosing a single value for one of its dimensions Number of dimensions is reduced by one E.g., from 3 dimensions to 2 Cubes Operations : Slice From http://www.tutorialspoint.com/dwh/dwh_olap.htm
24
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 24 Dice Operation To produce a subcube by allowing the analyst to pick specific values of multiple dimensions No dimension is reduced Cubes Operations : Dice From http://www.tutorialspoint.com/dwh/dwh_olap.htm
25
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 25 Drill Down Operation To navigate among levels of data ranging from the summarized to the more detailed No dimension is reduced Cubes Operations : Drill Down From http://www.tutorialspoint.com/dwh/dwh_olap.htm
26
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 26 Rollup Operation To summarize the data along a dimension (by aggregation) Similar to Group by Cubes Operations : Roll up From http://www.tutorialspoint.com/dwh/dwh_olap.htm
27
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 27 Pivot Operation To rotate the cube in space to see its various faces Cubes Operations : Pivot From http://www.tutorialspoint.com/dwh/dwh_olap.htm
28
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 28 Submission Both Phase-1 and Phase-2 (separate submissions) Due Monday 26 October at 9:30am Next week has a workshop Marking on Phase-2 NOT rely on DQLog produced from Phase-1 Interview sessions Will be conducted case by case ( not all students are required) Maximum penalty for “ cheating ” A friendly reminder “Assignment 2”
29
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 29 Assignment 2 Q/A Continue working on worksheets Last chance to work and submit… What’s next?
30
ISCG6425 Data Warehousing, Semester 1, 2014 (by Sira Yongchareon) Department of Computing, Faculty of Creative Industries and Business 30 Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.