The Tally Table and Pseudo Cursors What they are and how they replace certain While Loops by Jeff Moden #315 Pittsburgh, Pennsylvania
The Tally Table and Pseudo Cursors What they are and how they replace certain While Loops by Jeff Moden #315 Pittsburgh, Pennsylvania
Your Speaker - Jeff Moden Nearly 2 decades of experience working with SQL Server Mostly Self Taught One of Leading Posters on SQLServerCentral.com More than 30,000 posts (heh… some are even useful) More than 30 articles on the “Black Arts” of T-SQL http://www.sqlservercentral.com/Authors/Articles/Jeff_Moden/80567/ Member since 2003 SQL Server MVP Since 2008 Winner of the “Exceptional DBA” award for 2011 Lead Application DBA, Lead SQL Developer, and SQL Mentor for Proctor Financial, Inc. SQL Server is both my profession and my hobby (Yeah, I know… I need to get a life ;-) The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Today’s Sponsors The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Agenda Introduction Glossary Introduction to “Pseudo Cursors” The Trouble with Loops Glossary Introduction to “Pseudo Cursors” The Hidden Power of SQL Server Introduction to the Tally Table Another Type of Pseudo Cursor Hidden RBAR The Slothfulness of Recursion A “Table-Less” Tally “Table” First Appeared in Itzik Ben-Gan’s Books Some Examples High Performance Convenience Quick Review Q’n’A The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Introduction The Trouble with Loops
Getting Started in a Programming Class What’s the first thing they teach you how to do in most programming classes? That’s right… It sounds like a funny thing to do but this means that you've finally got your programming environment setup and you're ready to begin to learn how to program. PRINT 'Hello World' The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
The Next Programming Milestone After learning about some syntax conventions, variables, data types, and a couple of other things, what is the next major milestone in learning how to program that is taught that’s absolutely essential to advanced programming techniques? Looping is the very essence of advanced programming skills. Modern programming would be useless without being able to repeat the execution of code in loops. How To Loop The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Counting with a While Loop Definition: Count from 1 to 100 and display the count. The Human Thinks: Declare a counter Preset the counter to 1 Display the count Add 1 to the counter Is the counter <= 100? If YES, branch back to display the new count. If No, quit. This is "Procedural" code. Easy to remember because you tell the program how to proceed every step of the way. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Typical “Row By Row” Solution --===== Count from 1 to 100 and display the count. DECLARE @Counter INT; --Declare a counter SET @Counter = 1; --Preset the counter to 1 WHILE @Counter <= 100 BEGIN SELECT @Counter; --Display the count SET @Counter = @Counter + 1; --Add 1 to counter END; --Is the counter <= 100? --If Yes, branch back to display the count. --If no, Quit. What we get for our troubles is a real mess... The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
The Mess a Loop Creates 100 Individual Result Sets Virtually useless to a GUI as a return. Will cause errors in SSMS if too many result sets are returned 100 Individual Messages This is the reason why SET NOCOUNT ON is so important. It IS a performance issue when loops are involved. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Cleaning Up the Mess In order to return a single result set for this classic loop problem, we have to… Create a Temp Table or a Table Variable (Added) Declare a counter Preset the counter to 1 Insert the count as a new row in the Temp Table (instead of just displaying it - Added) Add 1 to the counter Is the counter <= 100? If YES, branch back to INSERT the count. If No, continue (instead of quit). SELECT from the "Table" in the proper order (Added) Quit (moved here from decision) The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Code Becomes More Complex… --===== Suppress the auto-display of row counts for performance SET NOCOUNT ON; --New code added --===== Create a place to store the results CREATE TABLE #MyHead (N INT); --New code added --===== Count from 1 to 100 and display the count. DECLARE @Counter INT; --Declare a counter SET @Counter = 1; --Preset the counter to 1 WHILE @Counter <= 100 BEGIN INSERT INTO #MyHead (N) --New code added SELECT @Counter; --Display the count (same as before) SET @Counter = @Counter + 1; --Add 1 to counter END; --Is the counter <= 100? --If Yes, branch back to display the count. --If no, continue. --===== Display the count SELECT N FROM #MyHead ORDER BY N; --New code added The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Performance Gets Worse Declaration of a new object. 101 individual calculations. 100 checks to make sure we didn't go over the limit. 100 individual INSERTs. Each INSERT requires a separate execution plan even if the SQL Server Optimizer decides it can reuse the same plan. Each INSERT requires a separate lock. Each INSERT requires a separate transaction (now there's a hint) Requires a final SELECT Takes ~14 seconds (~8 seconds in an explicit transaction) for a million rows on this laptop NOT INCLUDING THE FINAL SELECT (demo). The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
A MUCH Simpler Way to Count Wouldn't it be neat if most looping problems were as easy as... THIS???? --==== Count from 1 to 100 -- using a Tally table SELECT N FROM dbo.Tally WHERE N BETWEEN 1 AND 100 ORDER BY N ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Glossary
Things that Loop Cursor RBAR Hidden RBAR Generally means anything with a Cursor in it. Many folks also call While Loops and Recursive CTE’s a "Cursor". Most folks use these to process things "Row By Row". RBAR Pronounced "ree-bar" like the steel rods permanently stuck in cement (appropriate, don't you think?). Is a "Modenism" for any process that runs "Row By Agonizing Row". Hidden RBAR Things that look "set based" but are not. Contains a hidden "cursor" of one type or another. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Types of Progamming Procedural Programming Declarative Programming Essentially, RBAR programming. Human tells computer what to do AND how to do it… Row by Agonizing Row. Works fine in GUI's. Kills most all hopes of performance in SQL Server because this type of programming overrides the very nature of the Optimizer in SQL Server. Declarative Programming Human tells computer what to do. Computer figures out HOW to do it. Usually, Set Based programming. Works WITH the Optimizer instead of overriding it. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Programming in SQL Server Set Based Programming Declarative Programming In SQL Server Does NOT mean "all in one query" especially since a single query can contain Hidden RBAR. Does NOT mean something that doesn't have a loop especially since a single query can contain Hidden RBAR. CAN mean something that has a loop because certain queries require multiple SETS of information to be processed. Does mean "touching" each row only once, if possible, and as few times as possible if not. Requires a simple paradigm shift in thinking. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Other “Loops” and “Cursors” Recursion The act of a bit of SQL Code calling itself. It "iterates" over itself making a Hidden RBAR loop. An example of this is a "Recursive CTE" which does nothing more than call itself. The act of "making the call" can crush performance and can eat about 3 times (or more) the resources of a simple While Loop. Pseudo Cursor The hidden but very high speed looping effect that set based code experiences behind the scenes. A simple SELECT "iterates" through rows behind the scenes but in a manner that SQL knows best. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Introduction to “Pseudo Cursors” The Hidden Power of SQL Server
The Unexpected “Magic” of SELECT What does the following code snippet give you? --===== Top 100 rows of data from the table SELECT TOP (100) * FROM sys.all_columns; How does it work? Behind the scenes Does some preparation Reads one row Displays one row Makes a decision as to whether it’s done or not and loops back if it’s not done. Sound familiar? It should because… It’s a LOOP! Start Counter = 0 Display The Row Add 1 to Counter <=100 Return Read a Row The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Say it with me – “Pseudo Cursor” The term "Pseudo Cursor" was coined by R. Barry Young over on SQLServerCentral.com. It's a super important concept that I use to call a "Set Based Loop". It's a whole lot more complicated behind the scenes but it helps to think of a Pseudo Cursor as... A SELECT finds a row, reads the row, processes the row, and LOOPS back to read the next row… at an incredible speed. Behind the scenes, a SELECT is a machine language level Cursor (loop). Since these loops or cursors don't appear in T-SQL code, Barry called them "Pseudo Cursors". You DON’T necessarily have to use what's in the row of a Pseudo Cursor to use the rows. Say what? The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Simple Pseudo Cursors at Work What do the following snippets do? --===== Returns all COLUMNS and ROWS SELECT * FROM sys.all_columns ; --===== Returns a COLUMN of "1’s" -- Note that no data was used from the table SELECT 1 --===== Returns a COLUMN sequential numbers SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
What can you use a “Pseudo Cursor” for? In particular, what can you do with a Pseudo Cursor that DOESN’T use anything from the “source” tables? One of the problems with most databases is that they don’t have enough data to do any performance testing with. You can use the “rows” of a Pseudo Cursor as a “loop” to create millions of rows of test data in a couple of heartbeats… … and you don’t need a very big table to do that if you understand how to use a friend of the Pseudo Cursor, the CROSS JOIN… The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Building a Monster “Row Source” Let’s start off simple. We’ll just build a table with a Million numbered rows (and then add to it). To do such a thing, we need something with a very large number of rows… like a Million. Especially on new systems, no such table exists to use as a “row source”. In fact the largest table on the whole server turns out to be sys.all_columns and it has only about 4,000 rows in it (on new 2005 system, more in others). Hmmm… what’s 4,000 times 4,000? A CROSS JOIN on sys.all_columns will easily produce up to a 16 million row Pseudo Cursor. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Simple Million Row Test Table What does the following code do? --===== Create and populate a test table on-the-fly -- with a COLUMN of sequential numbers from -- 1 to 1,000,000. This takes 745 ms (demo). SELECT TOP 1000000 SomeID = IDENTITY(INT,1,1) INTO #MyHead FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2 ; I know what you’re thinking. “Test table? Is that all you’ve got?” The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
The Million Row Test Table --===== Create and populate a 1,000,000 row test table. -- "SomeID" has a range of 1 to 1,000,000 unique numbers -- "SomeInt" has a range of 1 to 50,000 numbers -- "SomeLetters2" has a range of "AA" to "ZZ" -- "SomeMoney has a range of 10.00 to 100.00 numbers -- "SomeDate" has a range of >=01/01/2010 & <01/01/2020 whole dates -- "SomeDateTime" has a range of >=01/01/2010 & <01/01/2020 Date/Times -- "SomeRand" contains the value of RAND just to show it can be done -- without a loop SELECT TOP 1000000 SomeID = IDENTITY(INT,1,1), SomeInt = ABS(CHECKSUM(NEWID())) % 50000 + 1, SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID())) % 26 + 65) + CHAR(ABS(CHECKSUM(NEWID())) % 26 + 65), SomeMoney = CAST(RAND(CHECKSUM(NEWID())) * 90 + 10 AS DECIMAL(9,2)), SomeDate = DATEADD(dd,ABS(CHECKSUM(NEWID())) % DATEDIFF(dd,'2010','2020'),'2010'), SomeDateTime = DATEADD(dd,DATEDIFF(dd,0,'2010'), RAND(CHECKSUM(NEWID())) * DATEDIFF(dd,'2010','2020')), SomeRand = RAND(CHECKSUM(NEWID())) INTO dbo.JBMTest FROM sys.all_columns ac1 --Cross Join forms up to a 16 million row CROSS JOIN sys.all_columns ac2 --Pseudo Cursor ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
What else can use “Pseudo Cursor” for? In particular, what can you do with a Pseudo Cursor that DOES use something from the “source” table? THAT’s what a Tally Table is all about. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Introduction to the Tally Table Another Type of “Pseudo Cursor”
What’s IN a Tally Table? A single column of sequential numbers Starts at 1 or 0 (Can have some problems with 0) Ends at some "sufficiently" large number. My Tally table usually ends with 11,000. I need more than 8,000 to split VARCHAR(8000) I need to be able to easily create 30 years worth of DAYS which is almost 11,000 days Is "Keyed" for speed. Clustered PK on "N“ (ABSOLUTELY ESSENTIAL) FILLFACTOR = 100 (ABSOLUTELY ESSENTIAL) INT because most functions will use INT's against it. Be REAL careful about implicit conversions here. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
How to Build a Tally Table --===================================================================== -- Create a Tally table from 1 to 11000 --===== Create and populate the Tally table on the fly. SELECT TOP 11000 IDENTITY(INT,1,1) AS N --Makes a NOT NULL column INTO dbo.Tally FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2 --Cross Join for up to 16 Million Rows ; --===== Add a CLUSTERED Primary Key to maximize performance ALTER TABLE dbo.Tally ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100 --===== Allow the general public to use it GRANT SELECT ON dbo.Tally TO PUBLIC The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Splitting at the Character Level --===== Simulate a passed parameter DECLARE @Parameter VARCHAR(8000); SET @Parameter = 'Element01,Element02,Element03'; --===== Declare a character counter (RBAR SOLUTION) DECLARE @N INT; SET @N = 1; --===== While the character counter is less then the length of the string WHILE @N <= DATALENGTH(@Parameter) BEGIN --==== Display the character counter and the character at that -- position. SELECT @N, SUBSTRING(@Parameter,@N,1); --==== Increment the character counter SET @N = @N + 1; END; --===== Do the same thing as the loop did... "Step" through the variable -- and return the character position and the character... SELECT N, SUBSTRING(@Parameter,N,1) FROM dbo.Tally WHERE N <= DATALENGTH(@Parameter) ORDER BY N; N --- ---- 1 E 2 l 3 e 4 m 5 e 6 n 7 t 8 0 9 1 10 , 11 E 12 l 13 e 14 m 15 e 16 n 17 t 18 0 19 2 20 , 21 E ... The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
How It Works Just like counting from 1 to 100, both the loop and the Tally table count from 1 to the length of the parameter. Look at the following graphic. Both the loop and the Tally table do exactly the same thing except the Tally table only uses 1 SELECT and returns a single result set. The rows of the Tally Table act as the counter except it's set based. The Tally Table is a direct replacement for the loop. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Finding the Position of Delimiters The next logical step would be to find the Delimiters. Here's how it's done with a loop... --===== Loop Method ============================================================== --===== Simulate a passed parameter DECLARE @Parameter VARCHAR(8000); SET @Parameter = 'Element01,Element02,Element03'; -- 111111111122222222223 -- 123456789012345678901234567890 --===== Declare a variable to remember the position of the current comma DECLARE @N INT ; --===== Find the first delimiter, if one exists SET @N = CHARINDEX(',',@Parameter); --===== Loop through and find each delimiter starting with the -- location of the previous delimiter. WHILE @N > 0 BEGIN SELECT @N; --==== Find the next comma and add 1 to it. -- Return a 0 when no more commas are found. SELECT @N = CHARINDEX(',',@Parameter,@N+1); END; Results N ---- 10 20 The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Finding the Position of Delimiters …and here’s how it’s done with a Tally Table --===== Tally Table Method ========================== --===== Simulate a passed parameter DECLARE @Parameter NVARCHAR(4000); SET @Parameter = 'Element01,Element02,Element03'; -- 111111111122222222223 -- 123456789012345678901234567890 --===== Now, find all the Delimiters SELECT N FROM dbo.Tally t WHERE t.N <= DATALENGTH(@Parameter) AND SUBSTRING(@Parameter, t.N, 1) = ',' Results N ---- 10 20 The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
How It Works Again, the reason why the Tally Table method works is that it's joined to the variable at the character level and seeks out the delimiters using a Pseudo Cursor. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Doing the Final Split Up to this point, we've been able to find each delimiter using both a loop and a Tally Table. What we need to do now is find the NEXT delimiter to isolate the characters between the CURRENT delimiter and the NEXT delimiter. Once we've done that, we need to either store or display the characters that we've isolated as a group. This effectively splits the elements out from between the delimiters. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
“Inch Worm” Splitter Notice first and last elements are different. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Loop Code for 8K Splitter CREATE FUNCTION dbo.Split8KLoop ( @pString VARCHAR(8000), @pDelimiter CHAR(1) ) RETURNS @Return TABLE (ItemNumber INT, Item VARCHAR(8000)) AS BEGIN --===== Declare some obviously named variables DECLARE @StartPointer INT, @EndPointer INT, @Counter INT; --===== Find the first delimiter (@EndPointer), if it exists SELECT @StartPointer = 1, @EndPointer = CHARINDEX(@pDelimiter, @pString), @Counter = 1; --===== If we found at least one delimiter, loop until we don't find any more WHILE @EndPointer > 0 BEGIN --===== Inserts the split item INSERT INTO @Return (ItemNumber, Item) SELECT ItemNumber = @Counter, Item = SUBSTRING(@pString, @StartPointer, @EndPointer - @StartPointer); --===== Finds the next split item, if it exists SELECT @StartPointer = @EndPointer + 1, @EndPointer = CHARINDEX(@pDelimiter, @pString, @StartPointer), @Counter = @Counter + 1; END; --===== Inserts the last or only split item Item = SUBSTRING(@pString, @StartPointer, 8000); RETURN; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
A Tally Table 8k Splitter CREATE FUNCTION dbo.Split8KTally --===== Define I/O parameters (@pString VARCHAR(8000), @pDelimiter CHAR(1)) RETURNS TABLE WITH SCHEMABINDING AS RETURN WITH cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once -- for each delimiter) (NOTE: THIS IS NOT A RECURSIVE CTE) SELECT 1 UNION ALL SELECT t.N+1 FROM dbo.Tally t WHERE t.N BETWEEN 1 AND ISNULL(DATALENGTH(@pString),0) AND SUBSTRING(@pString,t.N,1) = @pDelimiter ), cteLen(N1,L1) AS (--==== Return start and length (for use in substring) SELECT s.N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000) FROM cteStart s ) --===== Do the actual split. The ISNULL/NULLIF combo handles the length -- for the final element when no delimiter is found. SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1), Item = SUBSTRING(@pString, l.N1, l.L1) FROM cteLen l ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
The Slothfulness of Recursion Hidden RBAR The Slothfulness of Recursion
Recursive CTE’s They’re easy to write. They have a small physical footprint in code. They’re “slick” because they look “Set-Based”. They have no explicit loop. They’re S-L-O-W. They’re resource intensive. They’re “Hidden RBAR” The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
An rCTE to Count Here’s a simple rCTE (recursive CTE) that counts from 0 to 11 to create a year’s worth of months for 2011. WITH cteCounter AS (--==== Counter rCTE counts from 0 to 11 SELECT 0 AS N --This provides the starting point (anchor) of zero UNION ALL SELECT N + 1 --This is the recursive part FROM cteCounter WHERE N < 11 )--==== Add the counter value to a start date and you get multiple dates SELECT StartOfMonth = DATEADD(mm,N,'2011') ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
How It Works The basic, non-technical, non-scientific explanation for the operation of code would be this... The "anchor" value is set to "0" (zero). This now means the rCTE has a row with a zero in it. The recursive part takes over. It looks at itself and says "What's the last value that I put into myself?", does a SELECT to add 1 to that value, and then checks the predicate in the WHERE clause. If the value that was just made is within the limits defined by the WHERE clause, the rCTE saves that value in itself and then it loops back to Step 2 The process continues to re-iterate through the loop formed by Steps 2 and 3 until the value being built (N+1) exceeds the limits of the WHERE clause. Once that happens, the rCTE exits to the SELECT (or other) statement that immediately follows the rCTE. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Behind the Scenes The rCTE actually makes a “Work” Table in TempDB. Another name for this table is a “System Temp Table”. (12 row(s) affected) Table 'Worktable'. Scan count 2, logical reads 73, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Notice the number of reads to render just 11 rows? Yes, they’re “logical reads” (memory), but that’s still I/O. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Performance of a Counting rCTE On the chart on the next page, the painfully obvious loser is the Red line which is the rCTE. There are 3 other methods of counting on this chart. Compare and believe… The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Performance of a Counting rCTE The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Even Low Counts are Painful The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Resources? You’ve GOT To See This! The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
A “Table-less” Tally “Table” First Appeared In Itzik Ben-Gan’s Books
Modified Ben-Gan Cascading CTE’s This produces virtually no reads and can be almost as fast as a physical Tally Table especially when used in an “iTVF” (inline Table Valued Function). WITH E1(N) AS ( --=== Create Ten 1's SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 --10 ), E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100 E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000 E8(N) AS (SELECT 1 FROM E4 a, E4 b), --100,000,000 E16(N) AS (SELECT 1 FROM E8 a, E8 b), --10,000,000,000,000,000 cteTally(N) AS (SELECT TOP (@pMaxValue) ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E16) SELECT t.N --Some query that uses the sequential numbering FROM cteTally t ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
How It Works Each CTE is named for a “power of 10”… as in 10En. For example, 10E2 = 100. The first CTE, E1, returns up to 10 rows and is simply ten 1’s UNION ALL’d together. The second CTE, E2, is nothing more than a CROSS JOIN of E1. It returns up to 10x10 or 100 rows. Each following En CTE is a CROSS JOIN that squares the number of rows of the previous CTE. cteTally does two things… The TOP very effectively limits the number of rows that are created. ROW_NUMBER() converts the rows into a numbered sequence just like a Tally Table. It could be created as a separate iTVF and still be as fast as a physical Tally Table.. Typically, for any functions on VARCHAR(8000), only CTE’s E1 through E4 (10,000) rows are included to simplify the code a bit. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
“Goldilocks” Pseudo Cursors Extending the Concept “Goldilocks” Pseudo Cursors
It’s Not Just a Table More important than the Tally Table itself, there’s a concept on how you can avoid the RBAR of explicit loops that we’ve learned. You don’t necessarily need a Tally Table. You don’t necessarily need a big honkin’ cascading CTE. Sometimes, all you need is a “Goldilocks” table. Something “just right”. It needs to be easy to use (UDF). It needs to be fast (iTVF). The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Calculate Dates in Current Week Just enough of an “inline” Tally Table to do the job. Inline Table Valued Function or iTVF. Very Fast. --===== Works in SQL Server 2000 and above. -- Note that can't use CROSS APPLY in 2000. -- Note that a week starts on Sunday in this code. CREATE FUNCTION dbo.DatesInWeek(@SomeDate DATETIME) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT DateInWeek = DATEADD(dd,DATEDIFF(dd,-1,@SomeDate)/7*7+t.N,-1) FROM (SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6) t (N) ; GO --===== Works in SQL Server 2008 and above CREATE FUNCTION dbo.DatesInWeek(@SomeDate DATETIME) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT DateInWeek = DATEADD(dd,DATEDIFF(dd,-1,@SomeDate)/7*7+t.N,-1) FROM (VALUES (0),(1),(2),(3),(4),(5),(6)) t (N) ; GO The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Super Brief Intro to CROSS APPLY Basically nothing more than a “Correlated Subquery” that can return more than 1 row. Usually, VERY fast. Great for incorporating multi-line Table Valued Functions (mTVF’s). Even better when incorporating “inline” Table Valued Functions (iTVF’s). Paul White’s excellent articles on Apply http://www.sqlservercentral.com/articles/APPLY/69953/ http://www.sqlservercentral.com/articles/APPLY/69954/ The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Use the iTVF to Get the Current Week --===== Create a table and insert 2 dates with -- an identifier so we can tell what is what. CREATE TABLE #SomeTable ( SomeTableID INT IDENTITY(1,1), SomeDateTime DATETIME ) ; INSERT INTO #SomeTable (SomeDateTime) SELECT '2000-03-01' UNION ALL SELECT GETDATE() ; SELECT t.SomeTableID, fn.DateInWeek FROM #SomeTable t CROSS APPLY dbo.DatesInWeek(t.SomeDateTime) fn ; SomeTableID DateInWeek ----------- ----------------------- 1 2000-02-27 00:00:00.000 1 2000-02-28 00:00:00.000 1 2000-02-29 00:00:00.000 1 2000-03-01 00:00:00.000 1 2000-03-02 00:00:00.000 1 2000-03-03 00:00:00.000 1 2000-03-04 00:00:00.000 2 2012-09-09 00:00:00.000 2 2012-09-10 00:00:00.000 2 2012-09-11 00:00:00.000 2 2012-09-12 00:00:00.000 2 2012-09-13 00:00:00.000 2 2012-09-14 00:00:00.000 2 2012-09-15 00:00:00.000 The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Sometimes Goldilocks is a Big Girl There are times where a Tally Table won’t help but the act of generating a sequence like a Tally Table will. This Pseudo Cursor spans the whole table. This numbers each group of dupes and deletes all but the first one from each group. WITH cteEnumerateDupes AS ( SELECT DupeNumber = ROW_NUMBER() OVER (PARTITION BY SomeInt ORDER BY SomeDate) FROM dbo.JBMTest ) DELETE FROM cteEnumerateDupes WHERE DupeNumber > 1 ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
High Performance Convenience Some More Examples High Performance Convenience
Without a Doubt… Without a doubt, most folks agree that SQL Server doesn’t handle String functionality very well. Without a doubt, most folks agree that String functionality should be left up to the GUI or Reporting Tool. Without a doubt, most folks agree that if you can’t handle Strings in either of those, then you should use a CLR. Without a doubt, if you can’t do any of those things, you’d better know how to do it all in T-SQL. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
dbo.SplitDelimited8K using cteTally CREATE FUNCTION dbo.DelimitedSplit8K --===== Define I/O parameters (@pString VARCHAR(8000), @pDelimiter CHAR(1)) RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000... -- enough to cover VARCHAR(8000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT 1 UNION ALL SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter cteLen(N1,L1) AS(--==== Return start and length (for use in substring) SELECT s.N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000) FROM cteStart s ) --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1), Item = SUBSTRING(@pString, l.N1, l.L1) FROM cteLen l ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Testing/Using the Splitter --======================================================================================================= -- TEST 1: -- This tests for various possible conditions in a string using a comma as the delimiter. -- The expected results are laid out in the comments --===== Conditionally drop the test tables to make reruns easier for testing. -- (this is NOT a part of the solution) IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL DROP TABLE #JBMTest ; --===== Create and populate a test table on the fly (this is NOT a part of the solution). -- In the following comments, "b" is a blank and "E" is an element in the left to right order. -- Double Quotes are used to encapsulate the output of "Item" so that you can see that all blanks -- are preserved no matter where they may appear. SELECT * INTO #JBMTest FROM ( --# & type of Return Row(s) SELECT 0, NULL UNION ALL --1 NULL SELECT 1, SPACE(0) UNION ALL --1 b (Empty String) SELECT 2, SPACE(1) UNION ALL --1 b (1 space) SELECT 3, SPACE(5) UNION ALL --1 b (5 spaces) SELECT 4, ',' UNION ALL --2 b b (both are empty strings) SELECT 5, '55555' UNION ALL --1 E SELECT 6, ',55555' UNION ALL --2 b E SELECT 7, ',55555,' UNION ALL --3 b E b SELECT 8, '55555,' UNION ALL --2 b B SELECT 9, '55555,1' UNION ALL --2 E E SELECT 10, '1,55555' UNION ALL --2 E E SELECT 11, '55555,4444,333,22,1' UNION ALL --5 E E E E E SELECT 12, '55555,4444,,333,22,1' UNION ALL --6 E E b E E E SELECT 13, ',55555,4444,,333,22,1,' UNION ALL --8 b E E b E E E b SELECT 14, ',55555,4444,,,333,22,1,' UNION ALL --9 b E E b b E E E b SELECT 15, ' 4444,55555 ' UNION ALL --2 E (w/Leading Space) E (w/Trailing Space) SELECT 16, 'This,is,a,test.' --E E E E ) d (SomeID, SomeValue) --===== Split the CSV column for the whole table using CROSS APPLY (this is the solution) SELECT test.SomeID, test.SomeValue, split.ItemNumber, split.Item, QuotedItem = QUOTENAME(split.Item,'"') FROM #JBMTest test CROSS APPLY dbo.DelimitedSplit8K(test.SomeValue,',') split The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
How fast is it? (1,000 Row Test) The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Building Dates for Whole Years DECLARE @StartYear DATETIME, @EndYear DATETIME, @Cutoff DATETIME ; SELECT @StartYear = '1950', @EndYear = '2050', @Cutoff = DATEADD(yy,1,@EndYear) SELECT TOP(DATEDIFF(yy,@StartYear,@Cutoff)) DATEADD(yy,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,@StartYear) FROM dbo.Tally t1, --Being used as a row-source here dbo.Tally t2 The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Building a Date Range DECLARE @StartDate DATETIME, @EndDate DATETIME, @Cutoff DATETIME ; SELECT @StartDate = '2011-07-05', @EndDate = '2012-03-01', @Cutoff = DATEADD(dd,1,@EndDate) SELECT TOP(DATEDIFF(dd,@StartDate,@Cutoff)) DATEADD(dd,t.N-1,@StartDate) FROM dbo.Tally t --11,000 > 30 years of days ORDER BY t.N The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Building a Time Interval Range DECLARE @StartDate DATETIME, @EndDate DATETIME, @Cutoff DATETIME, @Interval INT ; SELECT @StartDate = '2011-07-05', @EndDate = '2012-03-01', @Cutoff = DATEADD(dd,1,@EndDate), @Interval = 10 SELECT TOP(DATEDIFF(mi,@StartDate,@Cutoff)/@Interval) DATEADD(mi, (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1) * @Interval, @StartDate) FROM dbo.Tally t1, --Used as a row-source dbo.Tally t2 Change “mi” to “hh” for hours The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Building Time Interval “Bins” DECLARE @StartDate DATETIME, @EndDate DATETIME, @Cutoff DATETIME, @Interval INT ; SELECT @StartDate = '2011-07-05', @EndDate = '2012-03-01', @Cutoff = DATEADD(dd,1,@EndDate), @Interval = 1 WITH cteStartTimes AS ( SELECT TOP(DATEDIFF(hh,@StartDate,@Cutoff)/@Interval) StartTime = DATEADD(hh,(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1)*@Interval,@StartDate) FROM dbo.Tally t1, --Used as a row-source dbo.Tally t2 ) SELECT StartTime, Cutoff = DATEADD(hh,1,StartTime) FROM cteStartTimes ORDER BY StartTime; Change “hh” to “dd” for Days Change “hh” to “mi” for Minutes The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Great for Preventing SQL INJECTION Cleaning Strings CREATE FUNCTION dbo.CleanString8K ( @pString VARCHAR(8000), @pPattern VARCHAR(8000) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN WITH E1(N) AS ( --=== Create Ten 1's SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10 E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100 E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000 cteTally(N) AS (SELECT TOP (DATALENGTH(@pString)) ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4) SELECT CleanedString = SELECT '' + SUBSTRING(@pString, t.N, 1) FROM cteTally t WHERE SUBSTRING(@pString, t.N, 1) COLLATE Latin1_General_BIN LIKE @pPattern COLLATE Latin1_General_BIN FOR XML PATH('') ); @pPattern = [A-Z] = Upper Case Alpha Only @pPattern = [a-z] = Lower Case Alpha Only @pPattern = [0-9] = Numeric Digits Only @pPattern = [A-Za-z0-9] = Alpha-Numeric Only The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Initial Caps The Tally Table and Pseudo Cursors CREATE FUNCTION dbo.InitialCap --Original code by Usman Butt, modified for extra functionality by Jeff Moden --===== Declare the IO of the function (@String VARCHAR(8000)) RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== Force the first character of the string to upper case. -- Obviously, non-letter values will not be changed by UPPER. SELECT InitialCapString = UPPER(LEFT(@String,1)) --First character always + ( --=== If the current character in the given string isn't a letter then -- concatenate the next character as an UPPER case character. -- Otherwise, make it lower case character. -- The COLLATE clause speeds up non-default collations. SELECT CASE WHEN SUBSTRING(@String, t.N , 1) COLLATE Latin1_General_BIN LIKE '[^A-Za-z'']' COLLATE Latin1_General_BIN OR SUBSTRING(@String, t.N , 4) COLLATE Latin1_General_BIN LIKE '[^A-Za-z][A-Za-z][A-Za-z][A-Za-z]' COLLATE Latin1_General_BIN THEN UPPER(SUBSTRING(@String, t.N+1, 1)) ELSE LOWER(SUBSTRING(@String, t.N+1, 1)) END FROM dbo.Tally t WHERE t.N < LEN(@String) ORDER BY t.N FOR XML PATH(''), TYPE ).value('text()[1]', 'varchar(8000)') ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Quick Review
Quick Review Thanks for listening, folks! In the Introduction, we found out why loops can be so slow. We learned the definition of some “new” terms in the Glossary including “RBAR” and “Hidden RBAR”. We learned of the hidden power in SQL Server through the use of “Pseudo Cursors”. Need Test Data? Build it! We learned what a Tally Table is and how it works as a high peformance “Pseudo Cursor”. We learned that Recursive Counting CTE’s are a form of “Hidden RBAR” and are nearly as bad as While Loops for performance and are resource hogs. We learned how to create and use a “table-less” Tally “Table” that lives only in memory and causes virtually no reads. We learned how to use the Tally Table and cteTally through the use of several examples. Thanks for listening, folks! The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Recommended Reading
Recommended Reading The "Numbers" or "Tally" Table: What it is and how it replaces a loop. http://www.sqlservercentral.com/articles/T-SQL/62867/ Tally OH! An Improved SQL 8K “CSV Splitter” Function http://www.sqlservercentral.com/articles/Tally+Table/72993/ Generating Test Data: Parts 1 and 2 http://www.sqlservercentral.com/articles/Data+Generation/87901/ http://www.sqlservercentral.com/articles/Test+Data/88964/ How to Make Scalar UDFs Run Faster (SQL Spackle) http://www.sqlservercentral.com/articles/T-SQL/91724/ Hidden RBAR: Counting with Recursive CTE's http://www.sqlservercentral.com/articles/T-SQL/74118/ Creating a comma-separated list (SQL Spackle) –Wayne Sheffield http://www.sqlservercentral.com/articles/comma+separated+list/71700/ Understanding and Using APPLY: Parts 1 and 2 –Paul White http://www.sqlservercentral.com/articles/APPLY/69953/ http://www.sqlservercentral.com/articles/APPLY/69954/ The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved
Q’n’A The Tally Table and Pseudo Cursors What they are and how they replace certain While Loops by Jeff Moden #315 Pittsburgh, Pennsylvania