Save Time & Resources: Job Performance Tuning Strategies Presented By: Jeff Prom BI Data Architect Bridgepoint Education MCTS - Business Intelligence, Admin, Developer Save Time & Resources: Job Performance Tuning Strategies
Orange County User Group Local User Groups Orange County User Group 2rd Thursday of each month bigpass.pass.org Los Angeles User Group San Diego User Group 3rd Thursday of each odd month 1st & 3rd Thursday of each month meetup.com/sdsqlug sqlla.pass.org meetup.com/sdsqlbig Malibu User Group Los Angeles - Korean 3rd Wednesday of each month Every Other Tuesday sqlmalibu.pass.org sqlangeles.pass.org SQLSaturday San Diego September 22nd
Discount Code: SSDISODNS Annual International Conference November 6 -9 | Seattle, WA 2 Days of Pre-Cons 200+ sessions over 3 days Over 5,000 SQL Professionals Evening Networking Activities http://www.pass.org/summit/2018/Home.aspx Discount Code: SSDISODNS
Jobs Performing Poorly?
Job Performance: What’s at Stake? Time Operational support On-call support Money Support Personnel Hardware resources Do more with less Able to downsize server and save money? Confidence Deadlocks & Failures Jobs Run Long Delayed reports Employee Morale / Loss of Employees Marriages ?
Why Haven’t these been tuned yet? If it aint broke don’t touch it! Lack of time Lack of (qualified) resources Too complex Older code. Nobody on the team is familiar with it anymore. Mission critical process. Nobody want’s to work on it.
Which Job(s)? Where Exactly? Why is it slow? Fix it!
Which Job(s)? Which jobs are running long? Check Job History Job Activity Monitor Using Queries Monitoring Tools
Where Exactly? Identify: Job Step Package Where in the Package exactly An SSIS Object? A Query?
Why is it Slow? Pinpoint the Pain Points Find the root cause Check the logs Agent History SSIS Catalog Reports SSIS Logs Custom Logging Stored Procedure logging
You Can’t Fix it if You Don’t Know What’s Broke
Strategies & Examples
Indexes Add indexes if needed!! Check for/add missing indexes May need to break the query down Easy to add but may not always be faster. Test again after you add it!! Clustered Non-Clustered Filtered
SSIS - Lookups Full cache Loads all records into memory first before processing data May be loading millions of records into cache to process a small amount of records If doing many large lookups, consider doing lookups in the source stored procedure instead Time Saved: 1 hour per day
SSIS – Cached Connections Remove cached connections It may be loading millions of records into memory to process a small amount of records Multiple cached connections may strain the server Add as a join in the source query instead if possible Time Saved: 1 hour per day
SSIS – Data Conversion When source data types are slightly different than destination data types Seems like a good idea to always convert in SSIS, but may actually be a really bad idea Kills performance Especially bad with text values Just let it go to the source table if possible Time Saved: 3 hours per day
Delta Processing Truncate / Reload Delta By Easy Slow Not a good long-term solution with a significant amount of data Delta By Incremental ID value (inserts only) Date / Time HashBytes / RowHash / Checksum Use a control table or get max from data already loaded
Stage Filtering Filter records out BEFORE writing to a stage table Identify delta method Conditional Split Then add data transformations to only incoming records (if needed) Time Saved: 2+ hours per day Went from 10 hours 40 minutes to 4 minutes 42 seconds in one case
Task Factory – SCD Component Be cautious of hitting the same tables at the same time Blocking Deadlocks Slow Performance Consider using a stored procedure to do the SCD instead
Task Factory – Upsert Component Hits the same table at the same time Causes: Blocking Deadlocks Consider Instead: Using a stored procedure Creating two Serialized Data Flow Tasks One for Updates One for Inserts Result: No more deadlocks
Scalar-Valued Functions Executes for every record processed RBAR Kills the execution plan Consider moving the logic directly into the code/stored procedure instead https://www.databasejournal.com/features/mssql/article.php/3845381/T-SQL-Best-Practices-150-Don146t-Use-Scalar-Value-Functions-in-Column-List-or-WHERE-Clauses.htm
Partition Swapping Every table has at least 1 partition Works great with 2014 and below. May not be needed on 2016+ Select into is much faster than insert because of parallelism Select into (to load the data) Then swap partitions Time Saved: over 13 hours* *When combined with removing a scalar function Demo
Parameter Sniffing Symptoms: Stored Procedure runs really slow The code runs fast outside of the procedure A recompile doesn’t fix it Fix: Re-declare the incoming parameters Time Saved: 3 hours
May Not Always Be Where You Think Think outside the box! MySQL Example The problem was because of activity happening on the source server Backups Jobs Queries Reports Time Saved: 2 hours per day
T-SQL Execution Plans Cut down the amount of records whenever possible Actual – Check for missing indexes Estimated – Check for missing indexes Live Query Statistics Cut down the amount of records whenever possible If you need a single value, set it as a variable and assign it first. Then re-use it throughout the code.
T-SQL – Using Tables If there are a lot of joins, consider using: Temp Tables Derived Tables ‘Work’ Tables Updates Breaks the work down into chunks Execution plans are much simpler Can make a huge difference in performance
T-SQL - Troubleshooting Take original query Turn the select <column names> into select count(*) Break the query down into chunks to identify which part is slow Note: This is also a great strategy for testing to make sure record counts are accurate
T-SQL – Troubleshooting (continued) Try to get some sort of execution plan to start with Start with a minimum set of code and keep adding joins and/or indexes to see where the breaking point is May need to re-work the code to use temp tables, or some other strategy
T-SQL – Things to Avoid Subqueries Cursors (RBAR) Try to use joins instead Cursors (RBAR) Scalar-Valued Functions Complex Derived Table joins Use temp tables, etc instead Recursive CTE’s if possible They can be expensive operations
Thank You! Questions? Jeff Prom Blog: http://jeffprom.com Email: jeffprom@gmail.com LinkedIn: www.linkedin.com/in/JeffProm