Presentation is loading. Please wait.

Presentation is loading. Please wait.

Death by 1000 changes An overview of several useful Microsoft SQL Server DML change capture technologies DML – Data manipulation language (compared to.

Similar presentations


Presentation on theme: "Death by 1000 changes An overview of several useful Microsoft SQL Server DML change capture technologies DML – Data manipulation language (compared to."— Presentation transcript:

1 Death by 1000 changes An overview of several useful Microsoft SQL Server DML change capture technologies DML – Data manipulation language (compared to DDL – Data Definition Language and DCL Data Control Language) Choosing the wrong technology can kill your team’s productivity – hopefully this prezo will help you to make better decisions. Context: general information and speaking from a collective team experience Matt Smith Software Architect, Enterprise Data Warehouse Otter Products, LLC Image: channelawesome.com

2 Today’s Agenda Review three core SQL Server DML change capture technologies Custom T-SQL DML triggers SQL Server Change Tracking SQL Server Change Data Capture (CDC) Quick review of code exercises (time permitting) Scope of this is Business Intelligence, Data Warehousing and Reporting We won’t be covering other features such as Service Broker, C2 auditing Interactive presentation. If something is not clear, speak up. Let’s make this a dialogue.

3 Why capture DML changes
Why capture DML changes? Capturing changes is useful for solving common business problems Process Auditing “The sales team is deleting sales order lines instead of canceling them.“ “I think someone is deleting purchase order line information.” Business KPI’s Order Status change tracking (need to know the first date/time an order went to Status 80) Did we hit our customer shipment targets? (monitor variance between promised and actual first shipment date) Data Warehousing/ETL Loading Incremental data staging/data warehouse loading, intra-day reporting table updates Infrastructure constraints (disk, network) Continuous Improvement Initiatives: cycle time Monitoring process performance (ex. Key accounts return process) Customers come to the team with these types of questions. Often the solution involves a process component and a technology (reporting) component. Process owners, filling gaps in systems auditing. “The sales team is deleting order lines instead of canceling them“ “I need to know who is changing details for product X” “I think someone is deleting purchase order lines” As for Business KPI’s, we need to monitor the performance of a business process. Infrastructure constraints – We can’t stage the entire table anymore because it’s too big and it takes too long - so we need to pull diff’s . pulling about 400GB of data (600M rows) with our larger tables, now we are pulling 1.5GB Data (<1M rows). We have now increased our window for indexing and made time to tune our source tables before we start processing dims and facts.

4 Method #1: Custom DML Triggers
Why DML Triggers? It’s because we are developers and that’s what we do! After Insert/Update/Delete triggers. Pros Roll your own. Totally customizable and lots of options. It’s your logic Compatible with any version of SQL Server (Express to Enterprise) Track history of all or some changes (you choose) No external processes to worry about (SQL Agent Jobs, Capture Intervals) Quick to implement. Code some triggers and go to lunch Control sits firmly in dev team’s hands Cons You must customize the source schema. You might not be able to do this with your applications as you may void your support contract Trigger amnesia: Uh…oh yeah, I forgot about that trigger…Mass data update took forever to run. How many transactions? Trigger proliferation and technical debt: Triggers are fun, we like to write triggers, now triggers are everywhere and it’s harder to change things Trigger survey, how many maintain custom triggers? How many write & maintain triggers, any challenges? Ever tried to change/improve your database structure ?

5 Method #2: SQL Server Change Tracking
“Change tracking is a lightweight solution…”- msft Pros Definitively tells you that a column has changed over a specific range of DML statements No customization of the source schema required (other than an a PK). Fairly easy to determine rows that have changed, a little harder to determine cols that have changed Included in SQL Server Standard edition (2008+) A dev and ops collaboration for configuration and maintenance Cons DML Change history not tracked. Change Tracking does not provide a sequential history of the changed values, just tells you that the value changed Does not provide you with the time that the change occurred Requires a Primary Key on the table for tracking changes. No heaps allowed Requires significant coding and job scheduling to extract your changed data Cons – DML Change History: You need to monitor the CT tables and watch for a change. You would then take action to capture the appropriate data Identifying changes – join back to your source table.

6 Method #3: SQL Server Change Data Capture (CDC)
Pros Does NOT require a primary key (supports heaps). Exception: Net Changes TVF No customization of the source schema required. CDC reads from the SQL Server transaction log and writes to CDC tables (leverages sql server replication – run a trace and watch!) Enable for all columns or for only a subset of columns Useful functions and tracking tables based on time and Log Sequence number (LSN) ranges are built-in to help you extract change data Basic SQL Server Data Tools SSIS Integration components for ETL processing are included (Attunity) A dev and ops collaboration for configuration and maintenance Cons SQL Server ENT edition ONLY (2008+) You must be aware of transaction log management and HA/DR dependencies SQL Agent required for capture and cleanup jobs, dbowner must be sa CDC must be torn down and rebuilt in the event of transaction log database maintenance such as fixing VLF fragmentation or a Log Ship failover Catch: DDL! For table DDL changes involving PK’s or unique indexes you may need to disable and re-enable CDC on the table. Truncate table (alter table DDL) is restricted - must disable CDC on the table first Watch out for transaction log growth due to daily cleanup job (take smaller bites - schedule to run multiple times per day or limit with the threshold param) Not “center for disease control” – relevant to google searches LSN – Log Sequence Number – is our primary key for a transaction. Uniquely identifies an transaction in our TLOG.

7 Choosing a solution – Extracting customer requirements
Good general questions to ask your customer before settling on a solution. Goal: Clarify the problem(s) you are solving for. What decisions will you be making with this data? Can you explain how you are going to be using this data? What KPI’s are you measuring? How is this important to the business? How do you plan to report on this data? How long do you want to retain this data? (negotiate) Can you please explain how you are going to be using this data? (Please help me to understand why you want this!) What decisions will you be making with this data? (Is this project just a waste of time?) What KPI’s are you measuring? (Do I need to capture all of the changes?) How is this important to the business? (Another round of re-prioritization?) How do you plan to report on this data? (Helps you to determine the scope of your ETL project once you implement the dml capture solution.) How long will you need this data? (Impacts change history retention. Watch out for data trolls.)

8 Choosing a solution – Technical review
Licensing, Application behavior (really big high IO table, go CDC rather than add triggers to the table), Your Disk sub-system health Dev-Ops relationship, Growth potential for this solution: the # of objects for tracking changes(tends to increase over time) Where do you want to spend your dev time? Building an engine to capture changed data ,or developing solutions for your customers? Custom DML Triggers: I like to code triggers, I want everything. Use case: track for a couple of tables. I need to write a trigger to immediately someone every time they change something. I like to write triggers to capture changes. All responsibility controlled by the DEV team. SQL Server Change Tracking: I just need to know that something changed. Use case: I am OK not having all of the change history but I need to know when something changed. I want a light footprint and don’t want to/can’t modify my source schema. Dev and Ops collaboration required. SQL Server Change Data Capture: I want everything. I want to know all change history, and then choose want I want to use for reporting, data warehouse loading, etc. I don’t want to/can’t modify my source schema. Dev and Ops collaboration required.

9 Now that you have selected a technology, the real work begins.
Working with the changed data is the majority of the effort and requires creative solutions. Determine your ETL pattern(s): Stage all changed data, apply changes as required to reporting tables and data warehouse staging tables Change history tables grow quickly: Think about partitioning and trimming Consider data compression (PAGE) for change data destination tables, indexing and statistics updates for reporting tables (trace flag 2371) Maintenance window changes: For Change Tracking and Change Data Capture, transaction log maintenance may lead to gaps in your change data Schedule trim jobs to remove your change history which exceeds your retention requirements Recommendation: Avoid the Attunity SSIS CDC components. Use them to gain an understanding of how CDC works with SSIS for ETL, then roll your own ETL solution Image:

10 Links and Contact This prezo along with demo code for Custom DML Triggers, Change Tracking and Change Data Capture is posted at Scripts: Useful Resources & References: Change Tracking (Mike Byrd - Solarwinds) : tracking-bulletproof-etl-p1-mb01 Change Data Capture: SSIS CDC Components: Contact: Call-outs: Good solarwinds link on Change Tracking. Matt Mason (SSIS PM) covers CDC Attunity components pretty well. Recommendion Set this up in your environment and see how it plays with replication, log shipping, etc. Test how it performs when failing over. Get to know CDC well before you consider enabling it in production.


Download ppt "Death by 1000 changes An overview of several useful Microsoft SQL Server DML change capture technologies DML – Data manipulation language (compared to."

Similar presentations


Ads by Google