Download presentation
Presentation is loading. Please wait.
1
Performance Tuning 101: Parallelism
Robert L Davis Database Engineer @SQLSoldier Performance Tuning 101: Parallelism
2
Agradecimiento a los patrocinadores
Premium Silver Personal
3
Robert L Davis @SQLSoldier PASS Security Virtual Chapter
Microsoft Certified Master Data Platform MVP @SQLSoldier Database Engineer BlueMountain Capital Management 17+ years working with SQL Server PASS Security Virtual Chapter Volunteers needed Database Engineer at BlueMountain Capital Management Foremer Principal Database Architect at DB Best Technologies Former Principal DBA at Outerwall, Inc Former Sr. Product Consultant with Idera Software Former Program Manager for SQL Server Certified Master program in Microsoft Learning Former Sr. Production DBA / Operations Engineer at Microsoft (CSS) Microsoft Certified Master: SQL Server 2008 / MCSM Charter: Data Platform Co-founder of the SQL PASS Security Virtual Chapter MCITP: Database Developer: SQL Server 2005 and 2008 MCITP: Database Administrator: SQL Server 2005 and 2008 MCSE: Data Platform MVP 2014 Co-author of Pro SQL Server 2008 Mirroring Former Idera ACE (Advisors & Community Educators) 2 time host of T-SQL Tuesday Guest Professor at SQL University, summer 2010, spring/summer 2011 Speaker at SQL PASS Summit 2010, 2011, and 2012 including a pre-con in 2012 Speaker/Pre-con at SQLRally 2012 17+ years working with SQL Server Writer for SQL Server Pro (formerly SQL Server Magazine) Member: Mensa Dog picture: Maggie and Woody SQLCruise instructor: Seattle to Alaska 2012 Speaker at SQL Server Intelligence Conference in Seattle 2012 Blog: Twitter:
4
Performance Tuning 101: Parallelism
Parallelism: Architecture
5
Performance Tuning 101: Parallelism
Parallelism: Architecture
6
Performance Tuning 101: Parallelism
Parallelism: Architecture
7
Performance Tuning 101: Parallelism
Parallelism: Architecture
8
Performance Tuning 101: Parallelism
Parallelism: Architecture Max Worker Threads = 576 for 8 logical CPUs = 72/scheduler
9
Performance Tuning 101: Parallelism
Parallelism: Architecture
10
Performance Tuning 101: Parallelism
Parallelism: Architecture
11
Performance Tuning 101: Parallelism
Parallelism: Architecture
12
Performance Tuning 101: Parallelism
Parallelism: Architecture
13
Performance Tuning 101: Parallelism
Parallelism: Architecture
14
Performance Tuning 101: Parallelism
Parallelism: Architecture
15
Performance Tuning 101: Parallelism
Parallelism: Architecture
16
Performance Tuning 101: Parallelism
Parallelism: Architecture
17
Performance Tuning 101: Parallelism
Parallelism: Architecture
18
Performance Tuning 101: Parallelism
Parallelism: Architecture
19
Performance Tuning 101: Parallelism
Parallelism: Architecture
20
Performance Tuning 101: Parallelism
SQL will generally keep all threads on the same NUMA node
21
Performance Tuning 101: Parallelism
SQL will generally keep all threads on the same NUMA node If node is overloaded and other node is not, it may choose to span nodes
22
Performance Tuning 101: Parallelism
SQL will generally keep all threads on the same NUMA node If node is overloaded and other node is not, it may choose to span nodes Memory partitioned per NUMA node though accessible to all nodes
23
Performance Tuning 101: Parallelism
SQL will generally keep all threads on the same NUMA node If node is overloaded and other node is not, it may choose to span nodes Memory partitioned per NUMA node though accessible to all nodes Local memory access faster than foreign memory access
24
Performance Tuning 101: Parallelism
SQL will generally keep all threads on the same NUMA node If node is overloaded and other node is not, it may choose to span nodes Memory partitioned per NUMA node though accessible to all nodes Local memory access faster than foreign memory access Old NUMA (before Nehalem):
25
Performance Tuning 101: Parallelism
SQL will generally keep all threads on the same NUMA node If node is overloaded and other node is not, it may choose to span nodes Memory partitioned per NUMA node though accessible to all nodes Local memory access faster than foreign memory access Old NUMA (before Nehalem): Foreign memory request sent to other node’s CPU for processing
26
Performance Tuning 101: Parallelism
SQL will generally keep all threads on the same NUMA node If node is overloaded and other node is not, it may choose to span nodes Memory partitioned per NUMA node though accessible to all nodes Local memory access faster than foreign memory access Old NUMA (before Nehalem): Foreign memory request sent to other node’s CPU for processing Current NUMA (after Nehalem):
27
Performance Tuning 101: Parallelism
SQL will generally keep all threads on the same NUMA node If node is overloaded and other node is not, it may choose to span nodes Memory partitioned per NUMA node though accessible to all nodes Local memory access faster than foreign memory access Old NUMA (before Nehalem): Foreign memory request sent to other node’s CPU for processing Current NUMA (after Nehalem): Foreign memory request sent directly to other node’s memory
28
Performance Tuning 101: Parallelism
Max Degree of Parallelism
29
Performance Tuning 101: Parallelism
Max Degree of Parallelism Server configuration starting point:
30
Performance Tuning 101: Parallelism
Max Degree of Parallelism Server configuration starting point: 8 or less CPUs: leave at 0
31
Performance Tuning 101: Parallelism
Max Degree of Parallelism Server configuration starting point: 8 or less CPUs: leave at 0 >8 CPUs: 8
32
Performance Tuning 101: Parallelism
Max Degree of Parallelism Server configuration starting point: 8 or less CPUs: leave at 0 >8 CPUs: 8 NUMA: Lesser of number of CPUs per node or 8
33
Performance Tuning 101: Parallelism
Max Degree of Parallelism Server configuration starting point: 8 or less CPUs: leave at 0 >8 CPUs: 8 NUMA: Lesser of number of CPUs per node or 8 Can be over-ridden by MaxDOP query hint
34
Performance Tuning 101: Parallelism
Max Degree of Parallelism Server configuration starting point: 8 or less CPUs: leave at 0 >8 CPUs: 8 NUMA: Lesser of number of CPUs per node or 8 Can be over-ridden by MaxDOP query hint Both over-ridden by Resource Governor (RG)
35
Performance Tuning 101: Parallelism
Max Degree of Parallelism Server configuration starting point: 8 or less CPUs: leave at 0 >8 CPUs: 8 NUMA: Lesser of number of CPUs per node or 8 Can be over-ridden by MaxDOP query hint Both over-ridden by Resource Governor (RG) Will use the lesser of MaxDOP or RG if both defined
36
Performance Tuning 101: Parallelism
Max DOP: What will it use exactly? Query Hint (QH) Resource Governor (RG) Server Config Effective MAXDOP of query Not set Not set (0) Server decides (up to 64) Set Use server config Use RG Use QH Use min(RG, QH) Use min (RG, QH) Adapted from by Jack Li
37
Performance Tuning 101: Parallelism
Cost Threshold for Parallelism
38
Performance Tuning 101: Parallelism
Cost Threshold for Parallelism All operations in an execution plan have an estimated cost value
39
Performance Tuning 101: Parallelism
Cost Threshold for Parallelism All operations in an execution plan have an estimated cost value Based loosely on the CPU ticks of a long-forgotten developer’s desktop who worked on the feature
40
Performance Tuning 101: Parallelism
Cost Threshold for Parallelism All operations in an execution plan have an estimated cost value Based loosely on the CPU ticks of a long-forgotten developer’s desktop who worked on the feature Used by the query optimizer to determine if a task is a candidate for parallelization
41
Performance Tuning 101: Parallelism
Cost Threshold for Parallelism All operations in an execution plan have an estimated cost value Based loosely on the CPU ticks of a long-forgotten developer’s desktop who worked on the feature Used by the query optimizer to determine if a task is a candidate for parallelization Increase setting to cause smaller plans to not parallelize but still allow bigger plans to use parallelism
42
Performance Tuning 101: Parallelism
Parallelism can be stripped out at run-time if server is short of memory or threads
43
Performance Tuning 101: Parallelism
Parallelism can be stripped out at run-time if server is short of memory or threads If cost for a serial plan is above the cost threshold for parallelism, a parallel plan will be generated, but SQL Server will use the lower total costing plan
44
Performance Tuning 101: Parallelism
Parallelism can be stripped out at run-time if server is short of memory or threads If cost for a serial plan is above the cost threshold for parallelism, a parallel plan will be generated, but SQL Server will use the lower total costing plan Will choose the serial plan if cost of parallel plan is higher
45
Performance Tuning 101: Parallelism
Demo
46
Performance Tuning 101: Parallelism
Fixing CXPacket Waits
47
Performance Tuning 101: Parallelism
Fixing CXPacket Waits Communication eXchange Packet
48
Performance Tuning 101: Parallelism
Fixing CXPacket Waits Communication eXchange Packet CXPacket waits are not what’s broken
49
Performance Tuning 101: Parallelism
Fixing CXPacket Waits Communication eXchange Packet CXPacket waits are not what’s broken Often indicative of query tuning opportunities
50
Performance Tuning 101: Parallelism
Fixing CXPacket Waits Communication eXchange Packet CXPacket waits are not what’s broken Often indicative of query tuning opportunities Over-parallelization can cause excessive waits
51
Performance Tuning 101: Parallelism
Fixing CXPacket Waits Communication eXchange Packet CXPacket waits are not what’s broken Often indicative of query tuning opportunities Over-parallelization can cause excessive waits Beware advice to set Max Degree of Parallelism to 1
52
Performance Tuning 101: Parallelism
Fixing CXPacket Waits Communication eXchange Packet CXPacket waits are not what’s broken Often indicative of query tuning opportunities Over-parallelization can cause excessive waits Beware advice to set Max Degree of Parallelism to 1 Only useful in very rare edge case
53
Performance Tuning 101: Parallelism
Fixing CXPacket Waits Communication eXchange Packet CXPacket waits are not what’s broken Often indicative of query tuning opportunities Over-parallelization can cause excessive waits Beware advice to set Max Degree of Parallelism to 1 Only useful in very rare edge case Goal most of the time is to find the right balance between execution speed and concurrency
54
Performance Tuning 101: Parallelism
The little yellow circle with double arrows means it was compiled as a parallel operation
55
Performance Tuning 101: Parallelism
The little yellow circle with double arrows means it was compiled as a parallel operation The thicker the arrow between icons, the more work was done
56
Performance Tuning 101: Parallelism
The little yellow circle with double arrows means it was compiled as a parallel operation The thicker the arrow between icons, the more work was done Properties tab can show you stats per thread for the highlighted icon or arrow
57
Performance Tuning 101: Parallelism
The little yellow circle with double arrows means it was compiled as a parallel operation The thicker the arrow between icons, the more work was done Properties tab can show you stats per thread for the highlighted icon or arrow Thread 0 will always show 0 rows as it is the watcher thread
58
Performance Tuning 101: Parallelism
The database engine still has the option to run with less threads or in serial even if compiled as a parallel operation
59
Performance Tuning 101: Parallelism
The database engine still has the option to run with less threads or in serial even if compiled as a parallel operation Plan details will still show the number of threads from the compiled plan but will only show 0 for all threads not used
60
Performance Tuning 101: Parallelism
Which operation did the most work?
61
Performance Tuning 101: Parallelism
Which operation did the most work? Look at the threads in the plan details
62
Performance Tuning 101: Parallelism
What is the Parallelism (Repartition Streams) operator doing?
63
Performance Tuning 101: Parallelism
What is the Parallelism (Repartition Streams) operator doing? Plan details shows it redistributes the rows more evenly
64
Performance Tuning 101: Parallelism
Q & A
65
Thank you for attending!
¡Gracias! Thank you for attending! My blog: Twitter: twitter.com/SQLSoldier
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.