When Good Design Goes Bad Bob Duffy
SQLSaturday #467 Sponsors
SQLSaturday #467 Admin Raffle prizes from Sponsors Will be drawn at 17:00, you need to be here in order to win if your name is drawn. Lunch will be served in the Exhibitor Area at 12:00 with sponsor sessions taking place in session rooms. If you need to leave early can you let registration know We need to know who is still here in the event of a fire evacuation etc. Enjoy the sessions and have a great time
Bob Duffy 20 years in database sector, 250+ projects 20 years in database sector, 250+ projects Senior Consultant with Microsoft Senior Consultant with Microsoft One of about 25 MCA for SQL Server globally (aka SQL Ranger) One of about 25 MCA for SQL Server globally (aka SQL Ranger) SQL MCM on SQL 2005 and 2008 SQL MCM on SQL 2005 and 2008 SQL Server MVP SQL Server MVP SSAS Maestro SSAS Maestro Database Architect at Prodata SQL Centre of Excellence Database Architect at Prodata SQL Centre of Excellence @bob_duffy
What We Will Cover Stored Procedures Clustered Tables Identity and Primary Keys IndexesFragmentation Naming Conventions Partitioning ORMs and nvarchar(max)
1. Stored Procedures Why use Stored Procedures ?
The Chunky v Chatty Debate Two Types of “Chunkiness” Data Transferred per Call Number of Statements per Batch or Business Transaction Network latency Important here
The Dreaded Search Screen
Stored Procedure Weigh In PerformanceSecurity Plan Cache MaintainabilityChunky Dynamic Design Patterns Slow Interpreted TSQL Application Agility Developer Agility The ORM Debate Chatty
2. Clustered vs Heap Best Practise: Cluster ALL Tables ? Ever Increasing, Narrow, Unique, Static Always use an Identity Column
The Good Stuff PrimaryKey Record Selection SELECT * FROM ORDERS WHERE ORDER_ID=1 Range Scans SELECT * FROM ORDER_LINES WHERE ORDER_ID=1 SELECT * FROM FACT WHERE DATE BETWEEN X AND Y Efficient Updates/Deletes
Insert Scalability Harder to Scale, especially for some key choices
Large Table Scan Workloads Non sequential Clustering Keys Cause Fragmentation Why is Fragmentation the Achilles Heel of table Scans Prevents sequential disk access More Pages => More IO Kills Read Ahead and disk performance
Heavy NCI Requirement Best Practise “Clustered Index are Better for Seeks” Well it depends on if the seek is on a NCI or not!
Clustered Index v Heap Ascending Keys Range Scans Lots of Deletes/Updates Tables with Heavy Primary Seek Most Logging Tables Insert and Scan heavy Tables OLTP Transaction Tables (banking) Bulk Loading
3 Always use Identity for Primary/CX Key Best Practise Always Use an Identity Column as Primary Key Extension: Always add a new Surrogate Key This may shoot you in the foot on large Fact Tables
The Distributed Database Choice of Identity will cause a lot of pain! How common is this Issue? Very Frequent with replication and new MPP Architectures
Distributed Write Cache ? Identity creates a bottleneck on the DB Serializes new records What if database offline ?
The Over Zealous Surrogate Key This may go bad if you are not a “single hop” data source
4. We Don’t Need no Indexes? Best Practise. Add Indexes.. To Reduce IO on important Queries Seek rather than scan. SELECT * WHERE CustomerID=2 Narrower Scan. SELECT SUM(Qty) DateKeyCustomerRegionKey Sales € Qty Cost Jan Feb Mar April May June July Aug
When Indexes go bad OLTP Small Tables Larger Results – See “The Tipping Point” by Kimberly Tripp When upsert is more important then select When every column Indexed High Throughput Queueing Design Patterns DWH Bad “Tipping Points” Staging Tables Tables that we scan When avoiding bad statistics is very hard Data Analytics, Data Science and Big Data Queries Where we need guaranteed query performance for varied workloads.
Guaranteed Performance !!?! We have a 1TB Table. Query SLA is 5 mins… Add indexes?
5 Fragmentation Best Practise – Defragment the hell out of your database Maintains optimal Scan performance
Why ignore Fragmentation Our workload is mainly seeks Helps with Insert Scalability Maintenance may load server Optimal read ahead may not be important More Memory will reduce reliance on contiguous disk blocks Most SANs only do random IO anyway
Naming Conventions Best Practice – use one! Goes Bad When prefix is meta data (object type, data type, size)
Naming – Common Sense Project with following prefix standards on SSIS DATA Source Transform Type (LOAD, TRANSFORM, EXTRACT) Package Control Flow Shape
Easier Maintenance/Archiving Storage Tiering Faster Query Performance Improved Parallelism Better Plans Partition Elimination Improve Data Load Improve Data Load The Holy Sliding Window Partitioning Best Practise – Partition when table it too big or too slow
Partitioning Introduces “forced” parallelism Often the query processor will do a better job Scan speed can be slower Parallelism may be worse
Many common queries may be slower Especially if all queries do not use the partition key Use of “ordered” NCI” is a car crash Index Seeks
8 ORMs Best Practise ? Hotly debated Good For CRUD Cache Management (Some of them) Portable Developer Agility Code First, Database Second Integrated Debugging Domain Business Model Key Management
Query Plan Nightmares Source:
When ORMs go Bad Can write truly horrible TSQL and Plans Higher overhead than ADO.NET Parameterisation nvarchar (4000)! The Disaster Scenario Lazy/Eager Loading Can be used as an excuse of lack of database expertise Hard to Index for (lots of Select *)
Everything has good and bad Aspects It Depends ;-)
Thank You