SLOWLY CHANGING DIMENSIONS Features vs. Performance Benjamin Sigursteinsson Miracle Iceland.

Slides:



Advertisements
Similar presentations
Normalization Building Database Relationships. page 21/4/2014 Presentation Normalization Youve been creating tables without giving much thought to them.
Advertisements

By: Jose Chinchilla July 31, Jose Chinchilla MCITP: SQL Server 2008, Database Administrator MCTS: SQL Server 2005/2008, Business Intelligence DBA.
Module 8 Importing and Exporting Data. Module Overview Transferring Data To/From SQL Server Importing & Exporting Table Data Inserting Data in Bulk.
Manipulating Data Schedule: Timing Topic 60 minutes Lecture
DEV-2: Getting Started with OpenEdge® Architect – Part I
SSIS Field Notes Darren Green Konesans Ltd. SSIS Field Notes After years of careful observation and recording of the Species SSIS, Genus ETL, in both.
Dimensional Modeling Business Intelligence Solutions.
Triggers The different types of integrity constraints discussed so far provide a declarative mechanism to associate “simple” conditions with a table such.
Fundamentals, Design, and Implementation, 9/e Chapter 11 Managing Databases with SQL Server 2000.
Data Replication with Materialized Views ISYS 650.
Top 10 SSIS Best Practices Tim Mitchell Artis Consulting The World’s Largest Community of SQL Server Professionals.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Copying, Managing, and Transforming Data With DTS.
DB Audit Expert v1.1 for Oracle Copyright © SoftTree Technologies, Inc. This presentation is for DB Audit Expert for Oracle version 1.1 which.
MS Access 2007 IT User Services - University of Delaware.
ETL Design and Development Michael A. Fudge, Jr.
ETL By Dr. Gabriel.
Module 9: Managing Schema Objects. Overview Naming guidelines for identifiers in schema object definitions Storage and structure of schema objects Implementing.
Gary MacDougall Premjit Singh Managing your Distributed Data.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Primavera Highlights During COLLABORATE  Primavera Key Note: Making the Most of Your Oracle Primavera Investment Dick Faris, Primavera Co-Founder & Oracle.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
An Investigation of Oracle and SQL Server with respect to Integrity, and SQL Language standards Presented by: Paul Tarwireyi Supervisor: John Ebden Date:
Lesson 1 -What is a Database? -Fields and Records
1 Oracle Database 11g – Flashback Data Archive. 2 Data History and Retention Data retention and change control requirements are growing Regulatory oversight.
Codeigniter is an open source web application. It occupies a very small amount of space in the memory and is most useful for developers who aim to develop.
1 InStar Studio Product Release December The AMS InStar Studio release results in a move to a more powerful and scalable platform for huge future.
CakePHP is an open source web development framework. It follows Model-View- Controller and is developed using PHP. IT is the basic for user to create.
Data Warehouse Database Design Methods For Technical IT Audience Peter Nolan
What is a Database? A Database is…  an organized set of stored information usually on one topic  a collection of records  a way to organize information.
DTS Conversion to SSIS Conversion Best Practices Mike Davis
SQL Server Indexes Indexes. Overview Indexes are used to help speed search results in a database. A careful use of indexes can greatly improve search.
Triggers A Quick Reference and Summary BIT 275. Triggers SQL code permits you to access only one table for an INSERT, UPDATE, or DELETE statement. The.
SQL: DDL. SQL Statements DDL - data definition language –Defining and modifying data structures (metadata): database, tables, views, etc. DML - data manipulation.
1. When things go wrong: how to find SQL error Sveta Smirnova Principle Technical Support Engineer, Oracle.
Copyright © Curt Hill Joins Revisited What is there beyond Natural Joins?
More Dimensional Modeling. Facts Types of Fact Design Transactional Periodic Snapshot –Predictable time period –Ex. Monthly, yearly, etc. Accumulating.
Oracle 11g: SQL Chapter 4 Constraints.
Chapter 4 Constraints Oracle 10g: SQL. Oracle 10g: SQL 2 Objectives Explain the purpose of constraints in a table Distinguish among PRIMARY KEY, FOREIGN.
06 | Modifying Data in SQL Server Brian Alderman | MCT, CEO / Founder of MicroTechPoint Tobias Ternstrom | Microsoft SQL Server Program Manager.
Build a database V: Create forms for a new Access database Overview: A window into your data So far in this series of courses, you’ve built tables, relationships,
Chapter 9 Vocabulary Databases. 1.Table – a collection of information, or data arranged in columns and rows. 2.Record – all of the information about one.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Learningcomputer.com SQL Server 2008 –Views, Functions and Stored Procedures.
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
Relational Database Management System(RDBMS) Structured Query Language(SQL)
Introduction to Teradata Client Tools. 2 Introduction to Teradata SQL  OBJECTIVES :  Teradata Product Components.  Accessing Teradata – Database /
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
Chapter 3: Relational Databases
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
BISM Introduction Marco Russo
 CONACT UC:  Magnific training   
Explore engage elevate Data Migration Without Tears Mike Feingold Empoint Ltd Tuesday 10th November 2015.
Notes: **A Row is considered one Record. **A Column is a Field. A Database is…  an organized set of stored information usually on one topic  a collection.
BTM 382 Database Management Chapter 8 Advanced SQL Chitu Okoli Associate Professor in Business Technology Management John Molson School of Business, Concordia.
Copyright 2015 Varigence, Inc. Unit and Integration Testing in SSIS A New Approach Scott @varigence.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
Session Name Pelin ATICI SQL Premier Field Engineer.
With Temporal Tables and More
W04 Connecting 3rd Party Application to ODBC
The Basics of Data Manipulation
Antonio Abalos Castillo
Auditing in SQL Server 2008 DBA-364-M
SQL Azure Database – No CDC, No Problem!
Contents Preface I Introduction Lesson Objectives I-2
Change Tracking Live Data Warehouse
Design for Flexibility and Performance - ETL Patterns with SSIS and Beyond And without further ado, here is Daniel with Using SSIS to Prepare Data for.
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

SLOWLY CHANGING DIMENSIONS Features vs. Performance Benjamin Sigursteinsson Miracle Iceland

Who am I?  Database programmer since 1987  BI/DW since 1997  Mostly Oracle to begin with  SQL Server entered the DW picture in 2005  DW projects in US, Europe, Middle East and of course Iceland.  Miracle Iceland - since 2003

Structure of session  Short overview of SCD’s (5 mins)  What are they  What is the problem?  Demonstration of 3-4 common approaches  Standard SCD wizard using SSIS  3 rd party SCD approach. Kimball.  T-SQL approach using MERGE  Manual SSIS approach

SCD types  Assume we all know what a dimension is  Basically 3 types of dimensions  Will not bother with type 3

SCD type 1  A „regular“ dimension. Nothing special here.  No history kept, behaves as most OLTP systems  Benefits  Changes overwritten  PK usually an integer, but could be the business key such as an SSN for a customer  Simple  Drawbacks  We loose history with each update

SCD type 2  History kept. Additional columns added to track changes : ValidFrom, ValidTo, isCurrent  Primary key always an integer of some sort.  Benefit  We can see the status as it was in the past  Drawback  Grows big. Updating slower.  Complex to maintain  Can icrease the number of dimensions (current value dimensions)  Use of it can confuse end users if not properly presented

SCD type 1 - Example  CustomerDim  Handling of SCD1  A change is made and the name of Coke is changed to Coca-Cola CustomerPKCustomerBKNameZip 100ABCSnapple DEFCoke GHIPepsi10012 CustomerPKCustomerBKNameZip 100ABCSnapple DEFCoca Cola GHIPepsi10012

SCD type 2 - Example CustomerPKCustomerBKNameZipValidFromValidToCurrent 100ABCSnapple Y 200DEFCoke N 300GHIPepsi Y 301DEFCoca-Cola Y  A new record has been inserted for the changed customer and the old one has been expired.  All new transactions will be on Coca Cola but the old ones will be Coke

What are we looking for?  Speed  Logging  Error handling  Ease of use  Flexibility  Sources and Destinations

Data we are using for demos  Destination table is CustomerDim, 1.6 m records  Street, Customerversion and Policy are SCD1  ZIP code is SCD2  Source is a single table with records, there of:  SCD1 changes  SCD2 changes  500 new records

Very quickly – SSIS SCD Wizard  Introduced in 2005  Has been more or less unchanged since  Inflexible  Logging  Slow.  Easy to use at first, changes made are lost during modifications.  Only 1 data destination by default  Demo

Kimball SCD component  Name has been changed to Dimension Merge SCD?  Todd McDermid  Far better than the SSIS SCD in terms of logging  More flexible  Speed...Needs some tweaking and depends on no. updates  Enhanced logging/auditing  More choice of outputs  Superior to the SSIS SCD in most aspects  NULL expiry dates are not the only option, we can use other methods of identifying the current record  Demo

T-SQL Merge  Merge statement added in 2008  A SET operation, not an atomic one  A classic UPSERT component  Functionality similar to  Try to update row  If it fails then insert it  SQL Server 2008 R2 has an OUTPUT clause that gives us the ability to do type 2 operations easily

T-SQL Merge  Very fast - fastest  Flexible and „easy“  My favourite  Foreign keys are a problem if using the output clause  Drop them before merge  Enable after merge  Limited logging and error handling, especially since you have to disable foreign keys in some cases  Demo

Manual SSIS – from scratch  Fast  Not so easy to implement always  Logging is decent, but has to be set up manually  Demo

Roundup (1 is best) WizardKSCDManualMerge Speed*4321 Logging3124 Flexibility4113 Destinations2114 Sources1114 Ease of use1*133 Error handling2122 Use your own judgement to evaluate. (ex. is speed more valuable than logging?)

THANK YOU! For attending this session and PASS SQLRally Nordic 2011, Stockholm