What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm

Slides:



Advertisements
Similar presentations
TempDB: Performance and Manageability
Advertisements

DAT 342 Advanced SQL Server Performance and Tuning Bren Newman Program Manager SQL Server Development Microsoft Corporation.
Module 2: Database Architecture
Big Data Working with Terabytes in SQL Server Andrew Novick
A HEAP OF CLUSTERS A look into heaps vs. clustered tables Ami Levin CTO, DBSophic X.
Module 8: Server Management. Overview Server-level and instance-level resources such as memory and processes Database-level resources such as logical.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Data Recovery and Fixing Database Corruptions
TEMPDB Capacity Planning. Indexing Advantages – Increases performance – SQL server do not have to search all the rows. – Performance, Concurrency, Required.
Architecture Rajesh. Components of Database Engine.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
© 2008 Quest Software, Inc. ALL RIGHTS RESERVED. Perfmon and Profiler 101.
CS4432: Database Systems II Record Representation 1.
Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lock Tuning. Overview Data definition language (DDL) statements are considered harmful DDL is the language used to access and manipulate catalog or metadata.
Diagnosing Performance with Wait Statistics Robert L Davis Principal Database
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
Digging Out From Corruption Eddie Wuerch, MCM - Principal, Database Performance - Salesforce Marketing Cloud Data protection and loss recovery with SQL.
Storage Tuning for Relational Databases Philippe Bonnet – Spring 2015.
SQL Server Internals 101 AYMAN SENIOR MICROSOFT.
Data Manipulation Language Deep Dive into Internals of DML Uwe Ricken MCM:Microsoft Certified Master – SQL 2008 MVP:Most Valued Professional – SQL Server.
You Inherited a Database Now What? What you should immediately check and start monitoring for. Tim Radney, Senior DBA for a top 40 US Bank President of.
No more waiting. Sponsors About me  Database Technology Specialist  MVP  Blogger  Author 3
Ayman El-Ghazali Senior Microsoft.
SQL Server Magic Buttons! What are Trace Flags and why should I care? Steinar Andersen, SQL Service Nordic AB Thanks to Thomas Kejser for peer-reviewing.
SQL Server Storage Inside. About Hemantgiri S. Goswami Hemantgiri S. Goswami is a Lead Database Consultant for Pythian, a company head quartered in Ottawa,
An introduction to Wait Statistics
You Inherited a Database Now What?
Inside transaction logging
Chapter 2 Memory and process management
Module 11: File Structure
What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm
A Day in the Life of a Row Eddie Wuerch, mcm
CHP - 9 File Structures.
What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm
CS522 Advanced database Systems
Finding more space for your tight environment
SQL Server Monitoring Overview
Database Management Systems (CS 564)
Chapter Overview Understanding the Database Architecture
Introduction to SQL Server Management for the Non-DBA
Hustle and Bustle of SQL Pages
Lecture 10: Buffer Manager and File Organization
SQL Server May Let You Do It, But it Doesn’t Mean You Should
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Troubleshooting SQL Server Basics
The Vocabulary of Performance Tuning
The Vocabulary of Performance Tuning
Database Implementation Issues
Wellington, SQLSaturday#706
The Vocabulary of Performance Tuning
Module 11: Data Storage Structure
Adding Lightness Better Performance through Compression
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Secondary Storage Management Brian Bershad
It’s TEMPDB Why Should You Care?
The Vocabulary of Performance Tuning
DATABASE IMPLEMENTATION ISSUES
You Inherited a Database Now What?
Статистика ожиданий или как найти место "где болит"
Secondary Storage Management Hank Levy
Database Implementation Issues
The Vocabulary of Performance Tuning
Using wait stats to determine why my server is slow
Inside the Database Engine
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Database Implementation Issues
Inside the Database Engine
Inside the Database Engine
Presentation transcript:

What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm Salesforce Marketing Cloud

Hi! I’m Eddie :) Over 15 years SQL Server Microsoft Certified Master Salesforce Marketing Cloud (Indianapolis) Trillions of rows … 10s billion tx/day … PBs data & indexes … 24x7, no downtimes

The Three ‘C’s of Performance Capacity Configuration Code Disk Performance Memory Capacity Disk Performance Disk Allocation Contention Scans Hotspotting Insert/Update/Delete Metadata Contention TempDB Abuse

PAGELATCH_* PAGEIOLATCH_*

Perf Monitoring: Waits A process is either waiting or working Not working? It’s waiting on something specific: A lock Data Memory CPU Something to do Reading Wait Statistics is the first step of the Waits and Queues Method See the Microsoft whitepaper “SQL Server 2005 Waits and Queues”

What is a ‘Wait’? Signal Wait Resource Wait Time (ms) Time (ms) Request for unavailable resource (processing stops, wait begins) Process signaled, resource available (wait continues) Scheduler available (wait ends, processing continues) Resource Wait Time (ms) Signal Wait Time (ms) Running Suspended Runnable Running Wait Time (ms)

Latch Waits Fall into two main categories PAGELATCH_* PAGEIOLATCH_* Similar names, different issues, different solutions So… what causes a latch wait?

What is a Latch? Controls access to 8KB memory (one page) Tracked with a 64b record (a “BUF”) Protects page metadata, prevents inconsistent reads Only way to modify bitmap pages (GAM, SGAM, etc) Occur outside of user transactions - they are designed to be quick

Query Engine vs. Storage Engine Runs queries Manipulates data Only sees memory Calls Storage Engine for disk work Storage Engine All disk activity Disk->Memory (read) Memory->Disk (write) IPC

Latching – Why? Meet the Page Page header – 96 bytes Page header –page metadata Page ID Next Page ID Previous Page ID Owning object ID Index ID Index level Free space on page Next row offset …other stuff Page row data – starts at byte 97 Row-offset table – starts at last byte, moves backwards Data page (8KB)

Without Latching… Insert new row at offset 0x200 Process 1 Next row offset = 0x200 Insert new row at offset 0x200 0x080: Row 1 Data New Row Data – Process 1 0x140: Row 2 Data 0x200: Free New Row Data – Process 1 New Row Data – Process 2 Process 2 Insert new row at offset 0x200 New Row Data – Process 2 Data page 1:1000 (8KB) Source: Microsoft PSS Presentations

PAGELATCH_UP – Data Page Example Process 1 Next row offset = 0x260 Next row offset = 0x200 Insert new row at offset 0x200 0x080: Row 1 Data New Row Data – Process 1 0x140: Row 2 Data Process 1: LATCH_UP on page 1:1000 0x200: Free New Row Data – Process 1 New Row Data – Process 2 PAGELATCH_UP Wait Process 2: LATCH_UP on page 1:1000 Process 2 Insert new row at offset 0x260 New Row Data – Process 2 Data page 1:1000 (8KB) Source: Microsoft PSS Presentations

PAGELATCH_* - Data-in-Memory Modification Wait Usually indicates one of: Hotspotting GAM, SGAM, or PFS contention Creating and dropping lot of #temp tables (which is actually a combination of the previous two) Examples: PAGELATCH_SH – Waiting to get an shared latch (lock) on a memory page PAGELATCH_EX – Waiting to get an exclusive latch (lock) on a memory page Indicates: multiple possible issues, depending on the page on which the latch issues are occurring Inserts/updates/page splits outpacing allocation: There is one GAM, and SGAM page for each 4GB in each in each data file. Many processes requesting newly- allocated pages in the file are fighting over these pages. PFS pages exists every 8088 pages in a file, many BLOB and other shared-extent inserts can cause latch contention on these pages. Hotspotting: many separate threads inserting rows into a single data page, such as an index on an identity column Adding and dropping lots of #temp tables (the drop is the issue) Use DBCC PAGE(db_id:file_id:page_id:viewopts) on the page (displayed in the wait_resource column of the waits query) to determine the page type (see the m_type header field in the page header). Page types (m_type in page header): 1 – In-row data page – could be hotspotting in user databases, indicates problem with too many #temp table drops if object is sys.multiobjrefs 2 – Index page – All non-leaf clustered index pages and all non-clustered index pages 3 – Text mix page - Parts of LOB values plus internal parts of text trees 4 – Text tree page – Large chunks of LOB data from a single value 7 – Sort page 8 – GAM 9 – SGAM 10 – IAM – Index Allocation Map – A bitmap similar to the GAM and SGAM for tracking all extents within the 4GB file block for an index or allocation unit 11 – PFS 13 – Boot page – Database info, only one per database (file 1 page 9) 15 – File header 16 – Diff map page – (DCM or diff change map) – tracks which extents in the GAM interval have changed since last full backup 17 – ML map page – (BCM or bulk change map) – tracks which extents in the GAM interval have changed in bulk-logged mode since last full backup Who fixes it? For GAM, SGAM, and PFS contention issues: DBAs add more data files. Hotspotting: Tuning options include shrinking row sizes (varchar vs. char, char vs. nchar, tinyint/smallint/int/bigint, smalldatetime/datetime) and SQL Server compression so fewer pages are allocated to a table for the same amount of data For the tempdb table-drop issue, change SQL calls to not drop temp tables. Just let them go out of scope and be cleaned up by the deferred-drop mechanism More Info: Look in sys.dm_exec_requests to watch for statements waiting on PAGELATCH waits. Check the wait_resource columns for the page on which contention is occurring. It will look like 2:1:103, the format is db_id:file_id:page_id, and those three values are the first three parameters of DBCC PAGE(db_id, file_id, page_id[, printopts]). In the example, you can see the page header with: DBCC TRACEON(3604) DBCC PAGE (2, 1, 103, 0) See Paul Randal’s blog at www.sqlskills.com for much more detail

PAGELATCH_* - Hotspotting Symptom: PAGELATCH_* waits on data pages Often at the ‘end’ of an index (identity index, datetime index, etc.) How to address… Is this index or ordering scheme necessary? Can many single inserts be batched together? Edge case – could partition on a hash (has its own problems, not a plug-n-play solution) Could also be a ‘hot’ page – small table, many inserts and deletes Examples: PAGELATCH_SH – Waiting to get an shared latch (lock) on a memory page PAGELATCH_EX – Waiting to get an exclusive latch (lock) on a memory page Indicates: multiple possible issues, depending on the page on which the latch issues are occurring Inserts/updates/page splits outpacing allocation: There is one GAM, and SGAM page for each 4GB in each in each data file. Many processes requesting newly- allocated pages in the file are fighting over these pages. PFS pages exists every 8088 pages in a file, many BLOB and other shared-extent inserts can cause latch contention on these pages. Hotspotting: many separate threads inserting rows into a single data page, such as an index on an identity column Adding and dropping lots of #temp tables (the drop is the issue) Use DBCC PAGE(db_id:file_id:page_id:viewopts) on the page (displayed in the wait_resource column of the waits query) to determine the page type (see the m_type header field in the page header). Page types (m_type in page header): 1 – In-row data page – could be hotspotting in user databases, indicates problem with too many #temp table drops if object is sys.multiobjrefs 2 – Index page – All non-leaf clustered index pages and all non-clustered index pages 3 – Text mix page - Parts of LOB values plus internal parts of text trees 4 – Text tree page – Large chunks of LOB data from a single value 7 – Sort page 8 – GAM 9 – SGAM 10 – IAM – Index Allocation Map – A bitmap similar to the GAM and SGAM for tracking all extents within the 4GB file block for an index or allocation unit 11 – PFS 13 – Boot page – Database info, only one per database (file 1 page 9) 15 – File header 16 – Diff map page – (DCM or diff change map) – tracks which extents in the GAM interval have changed since last full backup 17 – ML map page – (BCM or bulk change map) – tracks which extents in the GAM interval have changed in bulk-logged mode since last full backup Who fixes it? For GAM, SGAM, and PFS contention issues: DBAs add more data files. Hotspotting: Tuning options include shrinking row sizes (varchar vs. char, char vs. nchar, tinyint/smallint/int/bigint, smalldatetime/datetime) and SQL Server compression so fewer pages are allocated to a table for the same amount of data For the tempdb table-drop issue, change SQL calls to not drop temp tables. Just let them go out of scope and be cleaned up by the deferred-drop mechanism More Info: Look in sys.dm_exec_requests to watch for statements waiting on PAGELATCH waits. Check the wait_resource columns for the page on which contention is occurring. It will look like 2:1:103, the format is db_id:file_id:page_id, and those three values are the first three parameters of DBCC PAGE(db_id, file_id, page_id[, printopts]). In the example, you can see the page header with: DBCC TRACEON(3604) DBCC PAGE (2, 1, 103, 0)

First Extent (8 pages – 64KB) Data File Structure The base unit of data storage is called a page. All pages are 8KB (8192 bytes) Pages are organized into 8-page extents of 64KB Page 0 Page 0 Head Page 1 Page 1 PFS Page 2 GAM Page 2 Page 3 SGAM Page 3 Page 4 empty Page 5 empty Page 6 DCM Page 7 BCM Page 8 Page 9* Page 10 Page 11 … Page 15 First Extent (8 pages – 64KB)

Bitmap Metadata Pages 8,096 bytes = 64,768 bits GAM, SGAM, BCM, DCM, IAM, …….. 96b Header 1101011000101010111010111… P.0 P.1 P.2 P.3 P.4 P.5 P.6 P.7 P.8 P.9 P.10 P.11 P.12 P.13 P.14 P.15 P.16 P.17 P.18 P.19 P.20 P.21 P.22 P.23 P.24 P.25 P.26 P.27 … First Extent Second Extent …. 8,096 bytes = 64,768 bits 1 bit/extent = 64,768 extents 64,768 extents * 64KB/extent = ~4GB of disk space per single GAM

File Space Allocation Proportional Fill in action Free Space File 1

File Space Allocation Proportional Fill in action File 1 Free Space

Multiple-File Benefits Multi-core systems introduce contention issues File 1 File 2 File 3 File 4 1 GAM and 1 SGAM for every 64,768 extents (4GB of file space). All allocations in that space affect the GAM. Page 0 File Header Page 1 PFS Page 1 PFS Page 2 GAM Page 3 SGAM 0 GB There are 64 PFS pages in the same 4GB (every 8,088 pages). Usually not an issue, except in TempDB 4 GB

PAGELATCH_* - GAM, SGAM, PFS Symptom: PAGELATCH_* waits on the following page types: GAM (header m_type = 8) SGAM (header m_type = 9) PFS (header m_type = 11) (more common in tempdb) These are allocation waits Usually solved by DBAs adding more data files Examples: PAGELATCH_SH – Waiting to get an shared latch (lock) on a memory page PAGELATCH_EX – Waiting to get an exclusive latch (lock) on a memory page Indicates: multiple possible issues, depending on the page on which the latch issues are occurring Inserts/updates/page splits outpacing allocation: There is one GAM, and SGAM page for each 4GB in each in each data file. Many processes requesting newly- allocated pages in the file are fighting over these pages. PFS pages exists every 8088 pages in a file, many BLOB and other shared-extent inserts can cause latch contention on these pages. Hotspotting: many separate threads inserting rows into a single data page, such as an index on an identity column Adding and dropping lots of #temp tables (the drop is the issue) Use DBCC PAGE(db_id:file_id:page_id:viewopts) on the page (displayed in the wait_resource column of the waits query) to determine the page type (see the m_type header field in the page header). Page types (m_type in page header): 1 – In-row data page – could be hotspotting in user databases, indicates problem with too many #temp table drops if object is sys.multiobjrefs 2 – Index page – All non-leaf clustered index pages and all non-clustered index pages 3 – Text mix page - Parts of LOB values plus internal parts of text trees 4 – Text tree page – Large chunks of LOB data from a single value 7 – Sort page 8 – GAM 9 – SGAM 10 – IAM – Index Allocation Map – A bitmap similar to the GAM and SGAM for tracking all extents within the 4GB file block for an index or allocation unit 11 – PFS 13 – Boot page – Database info, only one per database (file 1 page 9) 15 – File header 16 – Diff map page – (DCM or diff change map) – tracks which extents in the GAM interval have changed since last full backup 17 – ML map page – (BCM or bulk change map) – tracks which extents in the GAM interval have changed in bulk-logged mode since last full backup Who fixes it? For GAM, SGAM, and PFS contention issues: DBAs add more data files. Hotspotting: Tuning options include shrinking row sizes (varchar vs. char, char vs. nchar, tinyint/smallint/int/bigint, smalldatetime/datetime) and SQL Server compression so fewer pages are allocated to a table for the same amount of data For the tempdb table-drop issue, change SQL calls to not drop temp tables. Just let them go out of scope and be cleaned up by the deferred-drop mechanism More Info: Look in sys.dm_exec_requests to watch for statements waiting on PAGELATCH waits. Check the wait_resource columns for the page on which contention is occurring. It will look like 2:1:103, the format is db_id:file_id:page_id, and those three values are the first three parameters of DBCC PAGE(db_id, file_id, page_id[, printopts]). In the example, you can see the page header with: DBCC TRACEON(3604) DBCC PAGE (2, 1, 103, 0)

PAGEIOLATCH_* - I/O Waits Examples: PAGEIOLATCH_SH, PAGEIOLATCH_EX Indicates: Scanning from disk, may also indicate low memory or slow read I/O Who fixes it? Developers fix scans, operations adds memory and checks disk performance More info: check the waits query for waiting statements and check the plans

PAGEIOLATCH – More Detail Release LATCH_EX Acquire Latch_SH Read Page 1:100 LATCH_EX The BUF Call Async IO Fetch LATCH_SH The BUF Read Page 1:100 PAGEIOLATCH_SH Wait PAGEIOLATCH_EX Wait BUF (64 bytes) Data cache: Page 1:100

Thank you for attending! Eddie Wuerch Twitter: @eddiew