Presentation is loading. Please wait.

Presentation is loading. Please wait.

SLOWLY CHANGING DIMENSIONS Features vs. Performance Benjamin Sigursteinsson Miracle Iceland.

Similar presentations


Presentation on theme: "SLOWLY CHANGING DIMENSIONS Features vs. Performance Benjamin Sigursteinsson Miracle Iceland."— Presentation transcript:

1 SLOWLY CHANGING DIMENSIONS Features vs. Performance Benjamin Sigursteinsson Miracle Iceland

2 Who am I?  Database programmer since 1987  BI/DW since 1997  Mostly Oracle to begin with  SQL Server entered the DW picture in 2005  DW projects in US, Europe, Middle East and of course Iceland.  Miracle Iceland - since 2003

3 Structure of session  Short overview of SCD’s (5 mins)  What are they  What is the problem?  Demonstration of 3-4 common approaches  Standard SCD wizard using SSIS  3 rd party SCD approach. Kimball.  T-SQL approach using MERGE  Manual SSIS approach

4 SCD types  Assume we all know what a dimension is  Basically 3 types of dimensions  Will not bother with type 3

5 SCD type 1  A „regular“ dimension. Nothing special here.  No history kept, behaves as most OLTP systems  Benefits  Changes overwritten  PK usually an integer, but could be the business key such as an SSN for a customer  Simple  Drawbacks  We loose history with each update

6 SCD type 2  History kept. Additional columns added to track changes : ValidFrom, ValidTo, isCurrent  Primary key always an integer of some sort.  Benefit  We can see the status as it was in the past  Drawback  Grows big. Updating slower.  Complex to maintain  Can icrease the number of dimensions (current value dimensions)  Use of it can confuse end users if not properly presented

7 SCD type 1 - Example  CustomerDim  Handling of SCD1  A change is made and the name of Coke is changed to Coca-Cola CustomerPKCustomerBKNameZip 100ABCSnapple10017 200DEFCoke10011 300GHIPepsi10012 CustomerPKCustomerBKNameZip 100ABCSnapple10017 200DEFCoca Cola10017 300GHIPepsi10012

8 SCD type 2 - Example CustomerPKCustomerBKNameZipValidFromValidToCurrent 100ABCSnapple100171.1.200031.12.2199Y 200DEFCoke100111.1.20007.11.2011N 300GHIPepsi100121.1.200031.12.2199Y 301DEFCoca-Cola100178.11.201131.12.2199Y  A new record has been inserted for the changed customer and the old one has been expired.  All new transactions will be on Coca Cola but the old ones will be Coke

9 What are we looking for?  Speed  Logging  Error handling  Ease of use  Flexibility  Sources and Destinations

10 Data we are using for demos  Destination table is CustomerDim, 1.6 m records  Street, Customerversion and Policy are SCD1  ZIP code is SCD2  Source is a single table with 125.000 records, there of:  22.952 SCD1 changes  13.840 SCD2 changes  500 new records

11 Very quickly – SSIS SCD Wizard  Introduced in 2005  Has been more or less unchanged since  Inflexible  Logging  Slow.  Easy to use at first, changes made are lost during modifications.  Only 1 data destination by default  Demo

12 Kimball SCD component  Name has been changed to Dimension Merge SCD? http://dimensionmergescd.codeplex.com/ http://dimensionmergescd.codeplex.com/  Todd McDermid  Far better than the SSIS SCD in terms of logging  More flexible  Speed...Needs some tweaking and depends on no. updates  Enhanced logging/auditing  More choice of outputs  Superior to the SSIS SCD in most aspects  NULL expiry dates are not the only option, we can use other methods of identifying the current record  Demo

13 T-SQL Merge  Merge statement added in 2008  A SET operation, not an atomic one  A classic UPSERT component  Functionality similar to  Try to update row  If it fails then insert it  SQL Server 2008 R2 has an OUTPUT clause that gives us the ability to do type 2 operations easily

14 T-SQL Merge  Very fast - fastest  Flexible and „easy“  My favourite  Foreign keys are a problem if using the output clause  Drop them before merge  Enable after merge  Limited logging and error handling, especially since you have to disable foreign keys in some cases  Demo

15 Manual SSIS – from scratch  Fast  Not so easy to implement always  Logging is decent, but has to be set up manually  Demo

16 Roundup (1 is best) WizardKSCDManualMerge Speed*4321 Logging3124 Flexibility4113 Destinations2114 Sources1114 Ease of use1*133 Error handling2122 Use your own judgement to evaluate. (ex. is speed more valuable than logging?)

17 THANK YOU! For attending this session and PASS SQLRally Nordic 2011, Stockholm


Download ppt "SLOWLY CHANGING DIMENSIONS Features vs. Performance Benjamin Sigursteinsson Miracle Iceland."

Similar presentations


Ads by Google