Super Scaling The LMAX Queue Pattern
Thanks you our PLATINUM sponsors
Thanks you our GOLD and SILVER sponsors
About Me 15+ years plus database experience Speaker at the last three SQL Bits and at Pass events around Europe Some of my material on spinlocks is referenced by SQL Skills
I can do this purely in .Net why do this in the database ? Memory Cache Lines Building A High Performance Queue With SQL Server I can do this purely in .Net why do this in the database ?
Building A High Performance Queue With SQL Server C P U C P U A quote taken from Microsoft distinguished engineer Jim Gray out of the abstract from this article: “Queues need security, configuration, performance monitoring, recovery, and reorganization utilities. Database systems already have these features. A full-function MOM system duplicates these database features. Queue managers are simple TP-monitors managing server pools driven by queues. Database systems are encompassing many server pool features as they evolve to TP-lite systems.”
Backup and recovery tools Performance monitoring tools .Net Provides Concurrent Dictionaries, But Do You Get . . . C P U C P U Backup and recovery tools Performance monitoring tools High availability Natively compiled code Seamless integration with the database engine
High Performance Queueing: The Naïve Approach C P U C P U PUSH = Insert into a clustered index POP = DELETE with OUTPUT clause
Pop Push The LMAX Disruptor Queue Pattern To The Rescue FIFO Queue C P U C P U Pop Push Message Message Message Message Message FIFO Queue CREATE TABLE dbo.MyQLMax ( [slot] [bigint] NOT NULL ,[message_id] [bigint] NOT NULL ,[time] [datetime] NOT NULL ,[message] [char](300) NOT NULL ,[reference_count] [tinyint] NOT NULL )
The First Test Run C P U C P U CREATE PROCEDURE [dbo].[LMaxPush] AS BEGIN DECLARE @PushedMessageCount [bigint] = 0 ,@QueueSize [bigint] = 200000 ,@Slot [datetime] ,@i [int] = 1; SET NOCOUNT ON; WHILE @i < @QueueSize SET @Slot = NEXT VALUE FOR dbo.PushSequence; UPDATE dbo.MyQLMax SET [time] = GETDATE() ,[message] = 'Hello world' ,[message_id] = @Slot ,[reference_count] = [reference_count] + 1 WHERE slot = @Slot; SET @i += 1; END;
FIFO Queue But . . . C P U C P U PAGELATCH_EX Same Page Push thread 1 Push thread N Message Message Message PAGELATCH_EX FIFO Queue Message Message Same Page
Scalability up to 9 threads The Solution ! C P U C P U Scalability up to 9 threads Stop logically contiguous slots from being in the same page
Where Is The Bottleneck C P U C P U
Scalability up to 14 threads Scalable Sequence Generation C P U C P U CREATE TABLE [dbo].[NonBlockingSequence] ( [ID] [bigint] IDENTITY (1, 1) NOT NULL ,PRIMARY KEY NONCLUSTERED HASH ( [ID] ) WITH ( BUCKET_COUNT = 524288 ) ) WITH ( MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA ) BEGIN TRANSACTION INSERT INTO [dbo].[NonBlockingSequence] DEFAULT VALUES; SELECT @Slot = SCOPE_IDENTITY() % @QueueSize; ROLLBACK TRANSACTION; Scalability up to 14 threads
FIFO Queue The “Last page” Problem Killed The Naïve Approach C P U Push thread 1 Push thread N Message Message Message PAGELATCH_EX FIFO Queue Message Message Same Page What about the in-memory OLTP engine ?, after all it is 100% lock and latch free . . .
Hash Index Versus Range Index Memory Optimised Table Hash Versus Range Table Scalability C P U C P U CREATE TABLE [dbo].[MyQLmaxImOltp] ( [Slot] [bigint] IDENTITY(1,1) NOT NULL ,[message_id] [bigint] NULL ,[time] [datetime] NOT NULL ,[message] [char](300) COLLATE Latin1_General_CI_AS NOT NULL ,[reference_count] [tinyint] NOT NULL ,PRIMARY KEY NONCLUSTERED HASH ( [Slot] ) WITH ( BUCKET_COUNT = 4194304) ) WITH ( MEMORY_OPTIMIZED = ON ,DURABILITY = SCHEMA_AND_DATA ) CREATE TABLE [dbo].[MyQLmaxImOltp] ( [Slot] [bigint] IDENTITY(1,1) NOT NULL ,[message_id] [bigint] NULL ,[time] [datetime] NOT NULL ,[message] [char](300) COLLATE Latin1_General_CI_AS NOT NULL ,[reference_count] [tinyint] NOT NULL ,PRIMARY KEY NONCLUSTERED ( [Slot] ) ) WITH ( MEMORY_OPTIMIZED = ON ,DURABILITY = SCHEMA_AND_DATA ) Hash Index Versus Range Index
This Is How A Singleton Insert Workload Scales With These Tables C P U C P U Even with zero locking and latching, the naïve approach will not scale that well with a range index . . .
What About The In-Memory OLTP Engine ? C P U C P U The in-memory OLTP engine is lock and latch free, no performance bottlenecks to see here or are there . . . ?
LMAX Queue In Memory Code V1: Queue Table C P U C P U CREATE TABLE [dbo].[MyQLmaxImOltp] ( [Slot] [bigint] NOT NULL ,[message_id] [bigint] NULL ,[time] [datetime] NOT NULL ,[message] [char](300) COLLATE Latin1_General_CI_AS NOT NULL ,[reference_count] [tinyint] NOT NULL PRIMARY KEY NONCLUSTERED HASH ( [Slot] ) WITH ( BUCKET_COUNT = 2097152) ) WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA )
LMAX Queue In Memory Code V1: Slot Id Generation C P U C P U CREATE TABLE [dbo].[NonBlockingSequence]( [ID] [bigint] IDENTITY(1, 1) NOT NULL ,PRIMARY KEY NONCLUSTERED HASH ( [ID] ) WITH ( BUCKET_COUNT = 524288) ) WITH ( MEMORY_OPTIMIZED = ON ,DURABILITY = SCHEMA_AND_DATA ) CREATE PROCEDURE [dbo].[GetSlotId] @Slot int OUTPUT ,@QueueSize int WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT ,LANGUAGE = N'us_english') INSERT INTO dbo.NonBlockingSequence DEFAULT VALUES; SELECT @Slot = SCOPE_IDENTITY(); END;
LMAX Queue In Memory Code V1: Slot Id Generation C P U C P U CREATE TABLE [dbo].[NonBlockingSequence]( [ID] [bigint] IDENTITY(1, 1) NOT NULL ,PRIMARY KEY NONCLUSTERED HASH ( [ID] ) WITH ( BUCKET_COUNT = 524288) ) WITH ( MEMORY_OPTIMIZED = ON ,DURABILITY = SCHEMA_AND_DATA ) CREATE PROCEDURE [dbo].[GetSlotId] @Slot int OUTPUT ,@QueueSize int WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT ,LANGUAGE = N'us_english') INSERT INTO dbo.NonBlockingSequence DEFAULT VALUES; SELECT @Slot = SCOPE_IDENTITY(); END;
LMAX Queue In Memory Code V1: Push Message Procs C P U C P U CREATE PROCEDURE [dbo].[LmaxPushImOltp] AS BEGIN DECLARE @QueueSize int = 2000000 ,@MessagePushed int ,@Slot int ,@i int = 0; WHILE @i <= @QueueSize BEGIN TRAN EXEC GetSlotId @Slot OUTPUT, @QueueSize; ROLLBACK TRAN; EXEC dbo.PushMessageImOltp @Slot ,@MessagePushed OUTPUT; IF @MessagePushed = 0 SET @i += @QueueSize; END ELSE SET @i += 1; END; CREATE PROCEDURE [dbo].[PushMessageImOltp] @Slot int ,@MessagePushed int OUTPUT WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMI WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT ,LANGUAGE = N'us_english') DECLARE @QueueSize int = 200000; UPDATE [dbo].[MyQLmaxImOltp] SET time = GETDATE() ,message = 'Hello world' ,message_id = @Slot ,reference_count = reference_count + 1 WHERE Slot = @Slot AND reference_count = 0; SET @MessagePushed = @@ROWCOUNT; END;
How Well Does This Scale ? C P U C P U
Where Is The CPU Time Going ?
The push and get slot id procedures rolled into one Lets Try Reducing The Number Of Procedures Called C P U C P U ALTER PROCEDURE PushMessageImOltp @Slot int ,@MessagePushed int OUTPUT WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT ,LANGUAGE = N'us_english') DECLARE @QueueSize int = 200000; INSERT INTO [dbo].[NonBlockingSequence] DEFAULT VALUES; SELECT @Slot = SCOPE_IDENTITY() % @QueueSize; UPDATE [dbo].[MyQLmaxImOltp] SET time = GETDATE() ,message = 'Hello world' ,message_id = @Slot ,reference_count = reference_count + 1 WHERE Slot = @Slot AND reference_count = 0; SET @MessagePushed = @@ROWCOUNT; END; The push and get slot id procedures rolled into one
This Is The New Throughput Graph C P U C P U 44% improvement !!!
41 % of the total CPU time is being expended on one spinlock !!! When Throughput Falls Off A Cliff, Where Is The CPU Time Going ? C P U C P U 41 % of the total CPU time is being expended on one spinlock !!!
What Do The .DLLs In The Call Stack Represent ? C P U C P U Language Processing T-SQL Interpreter, Statistics Collection, result set rendering and Query Optimization SQLLANG.dll Iterators, Memory Grants, Latches, Spinlocks, Column Store Batch Engine Query Execution In Memory OLTP Engine Expression Service Expression evaluation, basic compression, data type handling and conversion Query Data Store Hekaton.dll, <native compiled proc.dll>, <in memory table.dll> SQLMIN.dll SQLTST.dll QDS.dll Storage Engine T-SQL Interpreter, Statistics Collection and Query Optimization SQLMIN.dll SQL OS Threads, Memory management framework, synchronization SQLDK.dll, SQLOS.dll The in-memory OLTP engine breaks this clean layering model . . .
What Hekaton.dll uses from SQLMIN.dll Hekaton.dll and Its Host Engine SQLMIN.dll C P U C P U Hekaton.dll What Hekaton.dll uses from SQLMIN.dll The metadata cache spinlock cmedhashset Backup and restore The logging infrastructure for the logging of undo The management of ‘Joint’ transactions, transactions which span both engines The allocation of large pages (64 ~ 256Kb)
What Is Spinlock<62,16,1> Doing ? C P U C P U
The Killer Metadata Cache Protection Spinlock !!! C P U C P U CMED_HASH_SET !!! Synchronises access to the meta data cache Used to check that objects have not been dropped prior to query execution Other than to use a fast CPU, there is little that can be done about this
Spinlock Activity Looks And The Disk Row-store Engine C P U C P U
Spinlock Activity Looks And The In-Memory Engine C P U C P U
Do Not Try This At Home !!! The database engine is coded C P U C P U The database engine is coded such that the spinlock to protect metadata cache is not taken out if you run a workload inside a system database
What About Popping Messages Off The Queue ? C P U C P U ALTER PROCEDURE PushMessageImOltp @MessagePushed int OUTPUT WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT ,LANGUAGE = N'us_english') DECLARE @QueueSize int = 200000; INSERT INTO [dbo].[NonBlockingPushSequence] DEFAULT VALUES; SELECT @Slot = SCOPE_IDENTITY() % @QueueSize; UPDATE [dbo].[MyQLmaxImOltp] SET time = GETDATE() ,message = 'Hello world' ,message_id = @Slot ,reference_count = reference_count + 1 WHERE Slot = @Slot AND reference_count = 0; SET @MessagePushed = @@ROWCOUNT; END;
What About Popping Messages Off The Queue ? C P U C P U CREATE PROCEDURE PopMessageImOltp @MessagePopped CHAR(300) OUTPUT WITH NATIVE_COMPILATION, SCHEMABINDING AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT ,LANGUAGE = N'us_english') DECLARE ,@QueueSize int = 200000 @Slot int; INSERT INTO [dbo].[NonBlockingPopSequence] DEFAULT VALUES; SELECT @Slot = SCOPE_IDENTITY() % @QueueSize; UPDATE [dbo].[MyQLmaxImOltp] SET time = GETDATE() ,@MessagePopped = message ,message_id = @Slot ,reference_count = 0 WHERE Slot = @Slot AND reference_count = 1; END;
How The Push and Pop Working Together Scale . . . C P U C P U
What Have We Learned ? C P U C P U The naïve approach to implementing a queue is throttled by the “Last page problem” We can overcome the last page problem using the LMAX disruptor pattern, but we need to craft a scalable sequence generator and avoid push and pop threads from hitting the same page when using the legacy engine We pay a performance penalty by switching between the ‘Host’ and in-memory engines Using the in-memory engine can still result in spinlock pressure due its use of the legacy engine as a “Host engine”
Please review the event and sessions http://speakerscore.com/1FJ1 http://speakerscore.com/ZGVX 11/8/2018 | Footer Goes Here
My Contact Details C P U C P U chris@exadat.co.uk http://uk.linkedin.com/in/wollatondba ChrisAdkin8