When Good Design Goes Bad Bob Duffy Database Architect Prodata SQL Centre of Excellence March 2015
Bob Duffy 20 years in database sector, 250+ projects 20 years in database sector, 250+ projects Senior Consultant with Microsoft Senior Consultant with Microsoft One of about 25 MCA for SQL Server globally (aka SQL Ranger) One of about 25 MCA for SQL Server globally (aka SQL Ranger) SQL MCM on SQL 2005 and 2008 SQL MCM on SQL 2005 and 2008 SQL Server MVP SQL Server MVP SSAS Maestro SSAS Maestro Database Architect at Prodata SQL Centre of Excellence Database Architect at Prodata SQL Centre of Excellence @bob_duffy
What We Will Cover Stored Procedures Clustered Tables Identity and Primary Keys IndexesFragmentation Naming Conventions PartitioningORM
1. Stored Procedures Why use Stored Procedures ?
The Dreaded Search Screen
The Chunky v Chatty Debate Two Types of “Chunkiness” Data Transferred per Call Number of Calls Network latency Important here
Stored Procedure Weigh In PerformanceSecurity Plan Cache MaintainabilityChunky Dynamic Design Patterns Slow Interpreted TSQL Application Agility Developer Agility The ORM Debate Chatty
2. Clustered vs Heap Best Practise: Cluster ALL Tables ? Ever Increasing, Narrow, Unique, Static Always use an Identity Column
When do Clustered Tables go Bad ? Harder to Scale, especially for some key choices
Large Table Scan Workloads Non sequential Clustering Keys Cause Fragmentation Why is Fragmentation the Achilles Heel of table Scans More Pages => More IO Kills Read Ahead and disk performance
Heavy NCI Requirement Best Practise “Clustered Index are Better for Seeks” Well it depends on if the seek is on a NCI or not!
Clustered Index v Heap Ascending Keys Range Scans Lots of Deletes Tables with Heavy Primary Seek Most Logging Tables Insert and Scan heavy Tables OLTP Transaction Tables (banking) Bulk Loading
3 Always use Identity for Primary/CX Key Best Practise Always Use an Identity Column as Primary Key Extension: Always add a new Surrogate Key This may shoot you in the foot on large Fact Tables
The Distributed Database Choice of Identity will cause a lot of pain! How common is this Issue? Very Frequent with replication and new MPP Architectures
The Distributed Write Cache ? Identity creates a bottleneck on the DB Serializes new records What if database offline ?
The Over Zealous Dimensional Modeller This may go bad if you are not a “single hop” data source
4. We Don’t Need no Indexes? Best Practise. Add Indexes.. To Reduce IO on important Queries Seek rather than scan. SELECT * WHERE CustomerID=2 Narrower Scan. SELECT SUM(Qty) DateKeyCustomerRegionKey Sales € Qty Cost Jan Feb Mar April May June July Aug
When Indexes go bad OLTP Small Tables Larger Results – See “The Tipping Point” by Kimberly Tripp When upsert is more important then select When every column Indexed High Throughput Queueing Design Patterns DWH Bad “Tipping Points” Staging Tables Tables that we scan When avoiding bad statistics is very hard Data Analytics Where we need guaranteed query performance for varied workloads.
Guaranteed Performance !!?! We have a 1TB Table. Query SLA is 5 mins… Add indexes?
5 Stop Worrying about Fragmentation Best Practise – Defragment the hell out of your database Why could this be bad ? Takes a long time and may interfere with query performance Why Could this be not worth the bother ? More Memory will reduce reliance on contiguous disk blocks Most SANs only do random IO anyway Its mainly important if our primary concerns are Scans
6 Naming Conventions Best Practice – use one! Goes Bad When prefix is meta data (object type, data type, size)
Naming – Common Sense Project with following prefix standards on SSIS DATA Source Transform Type (LOAD, TRANSFORM, EXTRACT) Package Control Flow Shape
7 Partitioning Maintenance Operations Parallel Queries Best Practise – Partition when table it too big or too slow Ordered Queries Serial Queries Dynamic Parallel Queries
8 ORMs Best Practise ? Hotly debated Good For Developer Agility Code First, Database Second Integrated Debugging Domain Business Model Cache Management Key Management Portable
Query Plan Nightmares Source:
When ORMs go Bad Can write truly horrible TSQL and Plans Naïve context Parameterisation The Disaster Scenario Lazy/Eager Loading Can be used as an excuse of lack of database expertise Hard to Index for (lots of Select *)
Everything has good and bad Aspects It Depends ;-)