Death by 1000 changes An overview of several useful Microsoft SQL Server DML change capture technologies DML – Data manipulation language (compared to.

Slides:



Advertisements
Similar presentations
Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.
Advertisements

Your Data Any Place, Any Time Manageability. SQL Server 2008 Manageability Challenges Challenges face database administrators today : Managing complex.
By: Jose Chinchilla July 31, Jose Chinchilla MCITP: SQL Server 2008, Database Administrator MCTS: SQL Server 2005/2008, Business Intelligence DBA.
Module 12: Auditing SQL Server Environments
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
RDB/1 An introduction to RDBMS Objectives –To learn about the history and future direction of the SQL standard –To get an overall appreciation of a modern.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
1 The Instant Data Warehouse Released 15/01/ Hello and Welcome!! Today I am very pleased to announce the release of the 'Instant Data Warehouse'.
Triggers A Quick Reference and Summary BIT 275. Triggers SQL code permits you to access only one table for an INSERT, UPDATE, or DELETE statement. The.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
© 2008 Quest Software, Inc. ALL RIGHTS RESERVED. Perfmon and Profiler 101.
A State Perspective Mentoring Conference New Orleans, LA 2/28/2005 RCRAInfo Network Exchange.
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
7 Strategies for Extracting, Transforming, and Loading.
02 | Data Flow – Extract Data Richard Currey | Senior Technical Trainer–New Horizons United George Squillace | Senior Technical Trainer–New Horizons Great.
Ellis Paul Technical Solution Specialist – System Center Microsoft UK Operations Manager Overview.
SQL SERVER AUDITING. Jean Joseph DBA/Consultant Contact Info: Blog:
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Agenda for Today  DATABASE Definition What is DBMS? Types Of Database Most Popular Primary Database  SQL Definition What is SQL Server? Versions Of SQL.
SQL Server DML Change Capture An overview of several useful SQL Server data change capture technologies Matt Smith Software Architect, Enterprise Data.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Hitting the SQL Server “Go Faster” Button Rob Douglas #509 | Brisbane 2016.
MANAGEMENT DATA WAREHOUSE AND DATA COLLECTOR Ian Lanham.
Understanding Core Database Concepts Lesson 1. Objectives.
SQL Database Management
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
Building Enterprise Applications Using Visual Studio®
Building a Home Grown Auditing Infrastructure for SQL Server
With Temporal Tables and More
Data Warehouse ETL By Garrett EDmondson Thanks to our Gold Sponsors:
Smarter Technology for Better Business
Chris Index Feng Shui Chris
SQL Server Agent All the Knobs You Need to Know
Data, Space and Transaction Processing
Katowice,
Designing and Implementing an ETL Framework
Temporal Databases Microsoft SQL Server 2016
Leveraging the Business Intelligence Features in SharePoint 2010
Cleveland SQL Saturday Catch-All or Sometimes Queries
Get to know SQL Manager SQL Server administration done right 
Temporal Databases Microsoft SQL Server 2016
Antonio Abalos Castillo
Hitting the SQL Server “Go Faster” Button
Disease Monitoring with SQL Server BI
Example of a page header
IBM DATASTAGE online Training at GoLogica
Upgrading to Microsoft SQL Server 2014
SQL Server BI on Windows Azure Virtual Machines
SQL Server May Let You Do It, But it Doesn’t Mean You Should
Hitting the SQL Server “Go Faster” Button
Traveling in time with SQL Server 2017
Populating a Data Warehouse
Database Fundamentals
Migrating your SQL Server Instance
SQL Azure Database – No CDC, No Problem!
Turbo-Charged Transaction Logs
Typically data is extracted from multiple sources
MANAGING DATA RESOURCES
Get your ETL flow under statistical process control
Saravana Kumar CEO/Founder - Kovai Atomic Scope – Product Update.
SSDT and Database Project Basics
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Contents Preface I Introduction Lesson Objectives I-2
Andrew Fryer Microsoft UK
Governing Your Enterprise with Policy-Based Management
Change Tracking Live Data Warehouse
Understanding Core Database Concepts
T-SQL Tools: Simplicity for Synchronizing Changes Martin Perez.
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Death by 1000 changes An overview of several useful Microsoft SQL Server DML change capture technologies DML – Data manipulation language (compared to DDL – Data Definition Language and DCL Data Control Language) Choosing the wrong technology can kill your team’s productivity – hopefully this prezo will help you to make better decisions. Context: general information and speaking from a collective team experience Matt Smith Software Architect, Enterprise Data Warehouse Otter Products, LLC Image: channelawesome.com

Today’s Agenda Review three core SQL Server DML change capture technologies Custom T-SQL DML triggers SQL Server Change Tracking SQL Server Change Data Capture (CDC) Quick review of code exercises (time permitting) Scope of this is Business Intelligence, Data Warehousing and Reporting We won’t be covering other features such as Service Broker, C2 auditing Interactive presentation. If something is not clear, speak up. Let’s make this a dialogue.

Why capture DML changes Why capture DML changes? Capturing changes is useful for solving common business problems Process Auditing “The sales team is deleting sales order lines instead of canceling them.“ “I think someone is deleting purchase order line information.” Business KPI’s Order Status change tracking (need to know the first date/time an order went to Status 80) Did we hit our customer shipment targets? (monitor variance between promised and actual first shipment date) Data Warehousing/ETL Loading Incremental data staging/data warehouse loading, intra-day reporting table updates Infrastructure constraints (disk, network) Continuous Improvement Initiatives: cycle time Monitoring process performance (ex. Key accounts return process) Customers come to the team with these types of questions. Often the solution involves a process component and a technology (reporting) component. Process owners, filling gaps in systems auditing. “The sales team is deleting order lines instead of canceling them“ “I need to know who is changing details for product X” “I think someone is deleting purchase order lines” As for Business KPI’s, we need to monitor the performance of a business process. Infrastructure constraints – We can’t stage the entire table anymore because it’s too big and it takes too long - so we need to pull diff’s . pulling about 400GB of data (600M rows) with our larger tables, now we are pulling 1.5GB Data (<1M rows). We have now increased our window for indexing and made time to tune our source tables before we start processing dims and facts.

Method #1: Custom DML Triggers Why DML Triggers? It’s because we are developers and that’s what we do! After Insert/Update/Delete triggers. Pros Roll your own. Totally customizable and lots of options. It’s your logic Compatible with any version of SQL Server (Express to Enterprise) Track history of all or some changes (you choose) No external processes to worry about (SQL Agent Jobs, Capture Intervals) Quick to implement. Code some triggers and go to lunch Control sits firmly in dev team’s hands Cons You must customize the source schema. You might not be able to do this with your applications as you may void your support contract Trigger amnesia: Uh…oh yeah, I forgot about that trigger…Mass data update took forever to run. How many transactions? Trigger proliferation and technical debt: Triggers are fun, we like to write triggers, now triggers are everywhere and it’s harder to change things Trigger survey, how many maintain custom triggers? How many write & maintain triggers, any challenges? Ever tried to change/improve your database structure ?

Method #2: SQL Server Change Tracking “Change tracking is a lightweight solution…”- msft Pros Definitively tells you that a column has changed over a specific range of DML statements No customization of the source schema required (other than an a PK). Fairly easy to determine rows that have changed, a little harder to determine cols that have changed Included in SQL Server Standard edition (2008+) A dev and ops collaboration for configuration and maintenance Cons DML Change history not tracked. Change Tracking does not provide a sequential history of the changed values, just tells you that the value changed Does not provide you with the time that the change occurred Requires a Primary Key on the table for tracking changes. No heaps allowed Requires significant coding and job scheduling to extract your changed data Cons – DML Change History: You need to monitor the CT tables and watch for a change. You would then take action to capture the appropriate data Identifying changes – join back to your source table.

Method #3: SQL Server Change Data Capture (CDC) Pros Does NOT require a primary key (supports heaps). Exception: Net Changes TVF No customization of the source schema required. CDC reads from the SQL Server transaction log and writes to CDC tables (leverages sql server replication – run a trace and watch!) Enable for all columns or for only a subset of columns Useful functions and tracking tables based on time and Log Sequence number (LSN) ranges are built-in to help you extract change data Basic SQL Server Data Tools SSIS Integration components for ETL processing are included (Attunity) A dev and ops collaboration for configuration and maintenance Cons SQL Server ENT edition ONLY (2008+) You must be aware of transaction log management and HA/DR dependencies SQL Agent required for capture and cleanup jobs, dbowner must be sa CDC must be torn down and rebuilt in the event of transaction log database maintenance such as fixing VLF fragmentation or a Log Ship failover Catch: DDL! For table DDL changes involving PK’s or unique indexes you may need to disable and re-enable CDC on the table. Truncate table (alter table DDL) is restricted - must disable CDC on the table first Watch out for transaction log growth due to daily cleanup job (take smaller bites - schedule to run multiple times per day or limit with the threshold param) Not “center for disease control” – relevant to google searches LSN – Log Sequence Number – is our primary key for a transaction. Uniquely identifies an transaction in our TLOG.

Choosing a solution – Extracting customer requirements Good general questions to ask your customer before settling on a solution. Goal: Clarify the problem(s) you are solving for. What decisions will you be making with this data? Can you explain how you are going to be using this data? What KPI’s are you measuring? How is this important to the business? How do you plan to report on this data? How long do you want to retain this data? (negotiate) Can you please explain how you are going to be using this data? (Please help me to understand why you want this!) What decisions will you be making with this data? (Is this project just a waste of time?) What KPI’s are you measuring? (Do I need to capture all of the changes?) How is this important to the business? (Another round of re-prioritization?) How do you plan to report on this data? (Helps you to determine the scope of your ETL project once you implement the dml capture solution.) How long will you need this data? (Impacts change history retention. Watch out for data trolls.)

Choosing a solution – Technical review Licensing, Application behavior (really big high IO table, go CDC rather than add triggers to the table), Your Disk sub-system health Dev-Ops relationship, Growth potential for this solution: the # of objects for tracking changes(tends to increase over time) Where do you want to spend your dev time? Building an engine to capture changed data ,or developing solutions for your customers? Custom DML Triggers: I like to code triggers, I want everything. Use case: track for a couple of tables. I need to write a trigger to immediately email someone every time they change something. I like to write triggers to capture changes. All responsibility controlled by the DEV team. SQL Server Change Tracking: I just need to know that something changed. Use case: I am OK not having all of the change history but I need to know when something changed. I want a light footprint and don’t want to/can’t modify my source schema. Dev and Ops collaboration required. SQL Server Change Data Capture: I want everything. I want to know all change history, and then choose want I want to use for reporting, data warehouse loading, etc. I don’t want to/can’t modify my source schema. Dev and Ops collaboration required.

Now that you have selected a technology, the real work begins. Working with the changed data is the majority of the effort and requires creative solutions. Determine your ETL pattern(s): Stage all changed data, apply changes as required to reporting tables and data warehouse staging tables Change history tables grow quickly: Think about partitioning and trimming Consider data compression (PAGE) for change data destination tables, indexing and statistics updates for reporting tables (trace flag 2371) Maintenance window changes: For Change Tracking and Change Data Capture, transaction log maintenance may lead to gaps in your change data Schedule trim jobs to remove your change history which exceeds your retention requirements Recommendation: Avoid the Attunity SSIS CDC components. Use them to gain an understanding of how CDC works with SSIS for ETL, then roll your own ETL solution Image: https://msdn.microsoft.com/en-us/library/cc645937.aspx

Links and Contact This prezo along with demo code for Custom DML Triggers, Change Tracking and Change Data Capture is posted at www.mattfsmith.com Scripts: Useful Resources & References: Change Tracking (Mike Byrd - Solarwinds) : http://logicalread.solarwinds.com/sql-server-change- tracking-bulletproof-etl-p1-mb01 Change Data Capture: https://msdn.microsoft.com/en-us/library/bb510744.aspx SSIS CDC Components: http://www.mattmasson.com/2011/12/cdc-in-ssis-for-sql-server-2012-2/ Contact: matt@mattfsmith.com|linkedin.com/in/mattfsmith Call-outs: Good solarwinds link on Change Tracking. Matt Mason (SSIS PM) covers CDC Attunity components pretty well. Recommendion Set this up in your environment and see how it plays with replication, log shipping, etc. Test how it performs when failing over. Get to know CDC well before you consider enabling it in production.