Super Scaling The LMAX Queue Pattern.

Slides:



Advertisements
Similar presentations
Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
new database engine component fully integrated into SQL Server 2014 optimized for OLTP workloads accessing memory resident data achive improvements.
Batches, Scripts, Transactions-SQL Server 7. A batch is a set of Transact-SQL statements that are interpreted together by SQL Server. They are submitted.
6 SQL Server Integration Same manageability, administration & development experience Integrated queries & transactions Integrated HA and backup/restore.
Project “Hekaton” adds in-memory technology to boost performance of OLTP workloads in SQL Server.
Meanwhile RAM cost continues to drop Moore’s Law on total CPU processing power holds but in parallel processing… CPU clock rate stalled… Because.
Fundamentals, Design, and Implementation, 9/e Chapter 11 Managing Databases with SQL Server 2000.
Microsoft SQL Server Administration for SAP SQL Server Architecture.
Applications hitting a wall today with SQL Server Locking/Latching Scale-up Throughput or latency SLA Applications which do not use SQL Server today.
Module 8: Server Management. Overview Server-level and instance-level resources such as memory and processes Database-level resources such as logical.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Database Technical Session By: Prof. Adarsh Patel.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
Dexterity | CONFIDENTIAL 2009 MRO | Analytics | Insights 1 Stored Procedures.
Stored Procedures, Transactions, and Error-Handling
1 Definition of a subquery Nested subqueries Correlated subqueries The ISNULL function Derived tables The EXISTS operator Mixing data types: CAST & CONVERT.
SQL Server 2014: In In-memory OLTP for Database Developers.
1099 Why Use InterBase? Bill Todd The Database Group, Inc.
Applications hitting a wall today with SQL Server Locking/Latching Scale-up Throughput or latency SLA Applications which do not use SQL Server.
SQL Server 2014 adds in-memory technology to boost performance of OLTP workloads.
IN-MEMORY OLTP By Manohar Punna SQL Server Geeks – Regional Mentor, Hyderabad Blogger, Speaker.
Module 8: Implementing Stored Procedures. Overview Implementing Stored Procedures Creating Parameterized Stored Procedures Working With Execution Plans.
Ἑ κατόν by Niko Neugebauer. Niko Neugebauer PASS EvangelistPASS Evangelist SQL Server MVPSQL Server MVP SQLPort ( founder & leaderSQLPort.
Moore’s Law means more transistors and therefore cores, but… CPU clock rate stalled… Meanwhile RAM cost continues to drop.
Applications hitting a wall today with SQL Server Locking/Latching Scale-up Throughput or latency SLA Applications which do not use SQL Server today.
How to kill SQL Server Performance Håkan Winther.
Vedran Kesegić. About me  M.Sc., FER, Zagreb  HRPro d.o.o. Before: Vipnet, FER  13+ years with SQL Server (since SQL 2000)  Microsoft Certified.
Mladen Prajdić SQL Server MVP Hekaton The New SQL Server In-Memory OLTP Engine.
Introducing Hekaton The next step in SQL Server OLTP performance Mladen Prajdić
SQL Server Internals & Architecture Naomi Williams, SQL DBA LinkedIn
Memory-Optimized Tables Querying at the speed of light.
With Temporal Tables and More
In-Memory Capabilities
Temporal Databases Microsoft SQL Server 2016
SQL Server In-Memory OLTP: What Every SQL Professional Should Know
Temporal Databases Microsoft SQL Server 2016
SQL Server Internals Overview
UFC #1433 In-Memory tables 2014 vs 2016
Designing Database Solutions for SQL Server
Building Modern Transaction Systems on SQL Server
9/11/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
SQL Server 2014 In-Memory OLTP
Chapter Overview Understanding the Database Architecture
SQL Server 2014 In-Memory Overview
Hustle and Bustle of SQL Pages
Graeme Malcolm | Data Technology Specialist, Content Master
මොඩියුල විශ්ලේෂණය Buffer Pool Extension භාවිතය.
Module 5: Implementing Data Integrity by Using Constraints
Migrating a Disk-based Table to a Memory-optimized one in SQL Server
The Vocabulary of Performance Tuning
The Vocabulary of Performance Tuning
In-Memory OLTP (IMOLTP) What Can It Do For Me?
Real world In-Memory OLTP
SQL 2014 In-Memory OLTP What, Why, and How
TEMPDB – INTERNALS AND USAGE
The PROCESS of Queries John Deardurff
Shaving of Microseconds
Transact SQL Performance Tips
Microsoft Ignite /1/ :19 PM
The PROCESS of Queries John Deardurff Website: ThatAwesomeTrainer.com
Statistics for beginners – In-Memory OLTP
In-Memory OLTP for Database Developers
The PROCESS of Queries John Deardurff
The Vocabulary of Performance Tuning
5 Azure Services Every .NET Developer Needs to Know
Chapter 11 Managing Databases with SQL Server 2000
Why Should I Care About … Partitioned Views?
Server-Side Programming
The Vocabulary of Performance Tuning
Presentation transcript:

Super Scaling The LMAX Queue Pattern

Thanks you our PLATINUM sponsors

Thanks you our GOLD and SILVER sponsors

About Me 15+ years plus database experience Speaker at the last three SQL Bits and at Pass events around Europe Some of my material on spinlocks is referenced by SQL Skills

I can do this purely in .Net why do this in the database ? Memory Cache Lines Building A High Performance Queue With SQL Server I can do this purely in .Net why do this in the database ?

Building A High Performance Queue With SQL Server C P U C P U A quote taken from Microsoft distinguished engineer Jim Gray out of the abstract from this article: “Queues need security, configuration, performance monitoring, recovery, and reorganization utilities. Database systems already have these features. A full-function MOM system duplicates these database features. Queue managers are simple TP-monitors managing server pools driven by queues. Database systems are encompassing many server pool features as they evolve to TP-lite systems.”

Backup and recovery tools Performance monitoring tools .Net Provides Concurrent Dictionaries, But Do You Get . . . C P U C P U Backup and recovery tools Performance monitoring tools High availability Natively compiled code Seamless integration with the database engine

High Performance Queueing: The Naïve Approach C P U C P U PUSH = Insert into a clustered index POP = DELETE with OUTPUT clause

Pop Push The LMAX Disruptor Queue Pattern To The Rescue FIFO Queue C P U C P U Pop Push Message Message Message Message Message FIFO Queue CREATE TABLE dbo.MyQLMax ( [slot] [bigint] NOT NULL ,[message_id] [bigint] NOT NULL ,[time] [datetime] NOT NULL ,[message] [char](300) NOT NULL ,[reference_count] [tinyint] NOT NULL )

The First Test Run C P U C P U CREATE PROCEDURE [dbo].[LMaxPush] AS BEGIN DECLARE @PushedMessageCount [bigint] = 0 ,@QueueSize [bigint] = 200000 ,@Slot [datetime] ,@i [int] = 1; SET NOCOUNT ON; WHILE @i < @QueueSize SET @Slot = NEXT VALUE FOR dbo.PushSequence; UPDATE dbo.MyQLMax SET [time] = GETDATE() ,[message] = 'Hello world' ,[message_id] = @Slot ,[reference_count] = [reference_count] + 1 WHERE slot = @Slot; SET @i += 1; END;

FIFO Queue But . . . C P U C P U PAGELATCH_EX Same Page Push thread 1 Push thread N Message Message Message PAGELATCH_EX FIFO Queue Message Message Same Page

Scalability up to 9 threads  The Solution ! C P U C P U Scalability up to 9 threads  Stop logically contiguous slots from being in the same page

Where Is The Bottleneck C P U C P U

Scalability up to 14 threads  Scalable Sequence Generation C P U C P U CREATE TABLE [dbo].[NonBlockingSequence] ( [ID] [bigint] IDENTITY (1, 1) NOT NULL ,PRIMARY KEY NONCLUSTERED HASH ( [ID] ) WITH ( BUCKET_COUNT = 524288 ) ) WITH ( MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA ) BEGIN TRANSACTION INSERT INTO [dbo].[NonBlockingSequence] DEFAULT VALUES; SELECT @Slot = SCOPE_IDENTITY() % @QueueSize; ROLLBACK TRANSACTION; Scalability up to 14 threads 

FIFO Queue The “Last page” Problem Killed The Naïve Approach C P U Push thread 1 Push thread N Message Message Message PAGELATCH_EX FIFO Queue Message Message Same Page What about the in-memory OLTP engine ?, after all it is 100% lock and latch free . . .

Hash Index Versus Range Index Memory Optimised Table Hash Versus Range Table Scalability C P U C P U CREATE TABLE [dbo].[MyQLmaxImOltp] ( [Slot] [bigint] IDENTITY(1,1) NOT NULL ,[message_id] [bigint] NULL ,[time] [datetime] NOT NULL ,[message] [char](300) COLLATE Latin1_General_CI_AS NOT NULL ,[reference_count] [tinyint] NOT NULL ,PRIMARY KEY NONCLUSTERED HASH ( [Slot] ) WITH ( BUCKET_COUNT = 4194304) ) WITH ( MEMORY_OPTIMIZED = ON ,DURABILITY = SCHEMA_AND_DATA ) CREATE TABLE [dbo].[MyQLmaxImOltp] ( [Slot] [bigint] IDENTITY(1,1) NOT NULL ,[message_id] [bigint] NULL ,[time] [datetime] NOT NULL ,[message] [char](300) COLLATE Latin1_General_CI_AS NOT NULL ,[reference_count] [tinyint] NOT NULL ,PRIMARY KEY NONCLUSTERED ( [Slot] ) ) WITH ( MEMORY_OPTIMIZED = ON ,DURABILITY = SCHEMA_AND_DATA ) Hash Index Versus Range Index

This Is How A Singleton Insert Workload Scales With These Tables C P U C P U Even with zero locking and latching, the naïve approach will not scale that well with a range index . . .

What About The In-Memory OLTP Engine ? C P U C P U The in-memory OLTP engine is lock and latch free, no performance bottlenecks to see here or are there . . . ?

LMAX Queue In Memory Code V1: Queue Table C P U C P U CREATE TABLE [dbo].[MyQLmaxImOltp] ( [Slot] [bigint] NOT NULL ,[message_id] [bigint] NULL ,[time] [datetime] NOT NULL ,[message] [char](300) COLLATE Latin1_General_CI_AS NOT NULL ,[reference_count] [tinyint] NOT NULL PRIMARY KEY NONCLUSTERED HASH ( [Slot] ) WITH ( BUCKET_COUNT = 2097152) ) WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA )

LMAX Queue In Memory Code V1: Slot Id Generation C P U C P U CREATE TABLE [dbo].[NonBlockingSequence]( [ID] [bigint] IDENTITY(1, 1) NOT NULL ,PRIMARY KEY NONCLUSTERED HASH ( [ID] ) WITH ( BUCKET_COUNT = 524288) ) WITH ( MEMORY_OPTIMIZED = ON ,DURABILITY = SCHEMA_AND_DATA ) CREATE PROCEDURE [dbo].[GetSlotId] @Slot int OUTPUT ,@QueueSize int WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT ,LANGUAGE = N'us_english') INSERT INTO dbo.NonBlockingSequence DEFAULT VALUES; SELECT @Slot = SCOPE_IDENTITY(); END;

LMAX Queue In Memory Code V1: Slot Id Generation C P U C P U CREATE TABLE [dbo].[NonBlockingSequence]( [ID] [bigint] IDENTITY(1, 1) NOT NULL ,PRIMARY KEY NONCLUSTERED HASH ( [ID] ) WITH ( BUCKET_COUNT = 524288) ) WITH ( MEMORY_OPTIMIZED = ON ,DURABILITY = SCHEMA_AND_DATA ) CREATE PROCEDURE [dbo].[GetSlotId] @Slot int OUTPUT ,@QueueSize int WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT ,LANGUAGE = N'us_english') INSERT INTO dbo.NonBlockingSequence DEFAULT VALUES; SELECT @Slot = SCOPE_IDENTITY(); END;

LMAX Queue In Memory Code V1: Push Message Procs C P U C P U CREATE PROCEDURE [dbo].[LmaxPushImOltp] AS BEGIN DECLARE @QueueSize int = 2000000 ,@MessagePushed int ,@Slot int ,@i int = 0; WHILE @i <= @QueueSize BEGIN TRAN EXEC GetSlotId @Slot OUTPUT, @QueueSize; ROLLBACK TRAN; EXEC dbo.PushMessageImOltp @Slot ,@MessagePushed OUTPUT; IF @MessagePushed = 0 SET @i += @QueueSize; END ELSE SET @i += 1; END; CREATE PROCEDURE [dbo].[PushMessageImOltp] @Slot int ,@MessagePushed int OUTPUT WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMI WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT ,LANGUAGE = N'us_english') DECLARE @QueueSize int = 200000; UPDATE [dbo].[MyQLmaxImOltp] SET time = GETDATE() ,message = 'Hello world' ,message_id = @Slot ,reference_count = reference_count + 1 WHERE Slot = @Slot AND reference_count = 0; SET @MessagePushed = @@ROWCOUNT; END;

How Well Does This Scale ? C P U C P U

Where Is The CPU Time Going ?

The push and get slot id procedures rolled into one Lets Try Reducing The Number Of Procedures Called C P U C P U ALTER PROCEDURE PushMessageImOltp @Slot int ,@MessagePushed int OUTPUT WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT ,LANGUAGE = N'us_english') DECLARE @QueueSize int = 200000; INSERT INTO [dbo].[NonBlockingSequence] DEFAULT VALUES; SELECT @Slot = SCOPE_IDENTITY() % @QueueSize; UPDATE [dbo].[MyQLmaxImOltp] SET time = GETDATE() ,message = 'Hello world' ,message_id = @Slot ,reference_count = reference_count + 1 WHERE Slot = @Slot AND reference_count = 0; SET @MessagePushed = @@ROWCOUNT; END; The push and get slot id procedures rolled into one

This Is The New Throughput Graph C P U C P U 44% improvement !!!

41 % of the total CPU time is being expended on one spinlock !!! When Throughput Falls Off A Cliff, Where Is The CPU Time Going ? C P U C P U 41 % of the total CPU time is being expended on one spinlock !!!

What Do The .DLLs In The Call Stack Represent ? C P U C P U Language Processing T-SQL Interpreter, Statistics Collection, result set rendering and Query Optimization SQLLANG.dll Iterators, Memory Grants, Latches, Spinlocks, Column Store Batch Engine Query Execution In Memory OLTP Engine Expression Service Expression evaluation, basic compression, data type handling and conversion Query Data Store Hekaton.dll, <native compiled proc.dll>, <in memory table.dll> SQLMIN.dll SQLTST.dll QDS.dll Storage Engine T-SQL Interpreter, Statistics Collection and Query Optimization SQLMIN.dll SQL OS Threads, Memory management framework, synchronization SQLDK.dll, SQLOS.dll The in-memory OLTP engine breaks this clean layering model . . .

What Hekaton.dll uses from SQLMIN.dll Hekaton.dll and Its Host Engine SQLMIN.dll C P U C P U Hekaton.dll What Hekaton.dll uses from SQLMIN.dll The metadata cache spinlock cmedhashset Backup and restore The logging infrastructure for the logging of undo The management of ‘Joint’ transactions, transactions which span both engines The allocation of large pages (64 ~ 256Kb)

What Is Spinlock<62,16,1> Doing ? C P U C P U

The Killer Metadata Cache Protection Spinlock !!! C P U C P U CMED_HASH_SET !!! Synchronises access to the meta data cache Used to check that objects have not been dropped prior to query execution Other than to use a fast CPU, there is little that can be done about this

Spinlock Activity Looks And The Disk Row-store Engine C P U C P U

Spinlock Activity Looks And The In-Memory Engine C P U C P U

Do Not Try This At Home !!! The database engine is coded C P U C P U The database engine is coded such that the spinlock to protect metadata cache is not taken out if you run a workload inside a system database

What About Popping Messages Off The Queue ? C P U C P U ALTER PROCEDURE PushMessageImOltp @MessagePushed int OUTPUT WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT ,LANGUAGE = N'us_english') DECLARE @QueueSize int = 200000; INSERT INTO [dbo].[NonBlockingPushSequence] DEFAULT VALUES; SELECT @Slot = SCOPE_IDENTITY() % @QueueSize; UPDATE [dbo].[MyQLmaxImOltp] SET time = GETDATE() ,message = 'Hello world' ,message_id = @Slot ,reference_count = reference_count + 1 WHERE Slot = @Slot AND reference_count = 0; SET @MessagePushed = @@ROWCOUNT; END;

What About Popping Messages Off The Queue ? C P U C P U CREATE PROCEDURE PopMessageImOltp @MessagePopped CHAR(300) OUTPUT WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT ,LANGUAGE = N'us_english') DECLARE ,@QueueSize int = 200000 @Slot int; INSERT INTO [dbo].[NonBlockingPopSequence] DEFAULT VALUES; SELECT @Slot = SCOPE_IDENTITY() % @QueueSize; UPDATE [dbo].[MyQLmaxImOltp] SET time = GETDATE() ,@MessagePopped = message ,message_id = @Slot ,reference_count = 0 WHERE Slot = @Slot AND reference_count = 1; END;

How The Push and Pop Working Together Scale . . . C P U C P U

What Have We Learned ? C P U C P U The naïve approach to implementing a queue is throttled by the “Last page problem” We can overcome the last page problem using the LMAX disruptor pattern, but we need to craft a scalable sequence generator and avoid push and pop threads from hitting the same page when using the legacy engine We pay a performance penalty by switching between the ‘Host’ and in-memory engines Using the in-memory engine can still result in spinlock pressure due its use of the legacy engine as a “Host engine”

Please review the event and sessions http://speakerscore.com/1FJ1 http://speakerscore.com/ZGVX 11/8/2018 | Footer Goes Here

My Contact Details C P U C P U chris@exadat.co.uk http://uk.linkedin.com/in/wollatondba ChrisAdkin8