The Tally Table and Pseudo Cursors

Slides:



Advertisements
Similar presentations
Characteristic Functions. Want: YearCodeQ1AmtQ2AmtQ3AmtQ4Amt 2001e (from fin_data table in Sybase Sample Database) Have: Yearquartercodeamount.
Advertisements

Big Data Working with Terabytes in SQL Server Andrew Novick
Lists Introduction to Computing Science and Programming I.
Binary Arithmetic Math For Computers.
Lists in Python.
Python November 28, Unit 9+. Local and Global Variables There are two main types of variables in Python: local and global –The explanation of local and.
HAP 709 – Healthcare Databases SQL Data Manipulation Language (DML) Updated Fall, 2009.
Views Lesson 7.
SQL Spackle #1 Jeff Moden 19 May About Your Speaker Mostly Self Trained Started with SQL Server in 1995 More than 25,000 posts on SQLServerCentral.com.
Guide to Oracle 10g ITBIS373 Database Development Lecture 4a - Chapter 4: Using SQL Queries to Insert, Update, Delete, and View Data.
There’s a particular style to it… Rob Hatton
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
CSC 108H: Introduction to Computer Programming Summer 2011 Marek Janicki.
Creating Database Objects
AP CSP: Cleaning Data & Creating Summary Tables
More about comments Review Single Line Comments The # sign is for comments. A comment is a line of text that Python won’t try to run as code. Its just.
Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area
Cleveland SQL Saturday Catch-All or Sometimes Queries
Query Optimization Techniques
Loops BIS1523 – Lecture 10.
Indices.
Arrays: Checkboxes and Textareas
Containers and Lists CIS 40 – Introduction to Programming in Python
Dynamic SQL: Writing Efficient Queries on the Fly
The "Numbers" or "Tally" Table:
Reporting Overview Business Goals Demystify the report menu
CS1371 Introduction to Computing for Engineers
Variables, Expressions, and IO
What Are They? Who Needs ‘em? An Example: Scoring in Tennis
Functions CIS 40 – Introduction to Programming in Python
Dynamic SQL Writing Efficient Queries on the Fly
Error Handling Summary of the next few pages: Error Handling Cursors.
Intro to PHP & Variables
Sentinel logic, flags, break Taken from notes by Dr. Neil Moore
While Loops BIS1523 – Lecture 12.
SQL Server May Let You Do It, But it Doesn’t Mean You Should
MySQL - Creating donorof database offline
Designing and Debugging Batch and Interactive COBOL Programs
Phil Tayco Slide version 1.0 Created Oct 2, 2017
Dynamic SQL for the DBA by Jeff Moden
The Killing Cursors Cyndi Johnson
Topics Introduction to File Input and Output
Conditions and Ifs BIS1523 – Lecture 8.
Sentinel logic, flags, break Taken from notes by Dr. Neil Moore
Number and String Operations
Java Programming Loops
Dynamic SQL: Writing Efficient Queries on the Fly
Tally Ho! -- Explore the Varied Uses of Tally Tables
Working with Long Strings by Jeff Moden
File I/O in C Lecture 7 Narrator: Lecture 7: File I/O in C.
We’re moving on to more recap from other programming languages
Fundamentals of Data Representation
Tally Function with Error Checking by Jeff Moden
Cyndi Johnson Senior Software Engineer at AdvancedMD Killing Cursors.
Computing Fundamentals
Sorting "There's nothing in your head the sorting hat can't see. So try me on and I will tell you where you ought to be." -The Sorting Hat, Harry Potter.
Building Java Programs
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
For loops Taken from notes by Dr. Neil Moore
Tonga Institute of Higher Education IT 141: Information Systems
Java Programming Loops
Diving into Query Execution Plans
Tonga Institute of Higher Education IT 141: Information Systems
IST 318 Database Administration
Cyndi Johnson Senior Software Engineer at AdvancedMD Killing Cursors.
Topics Introduction to File Input and Output
Creating Database Objects
Improving the Performance of Functions
Presentation transcript:

The Tally Table and Pseudo Cursors What they are and how they replace certain While Loops by Jeff Moden #315 Pittsburgh, Pennsylvania

The Tally Table and Pseudo Cursors What they are and how they replace certain While Loops by Jeff Moden #315 Pittsburgh, Pennsylvania

Your Speaker - Jeff Moden Nearly 2 decades of experience working with SQL Server Mostly Self Taught One of Leading Posters on SQLServerCentral.com More than 30,000 posts (heh… some are even useful) More than 30 articles on the “Black Arts” of T-SQL http://www.sqlservercentral.com/Authors/Articles/Jeff_Moden/80567/ Member since 2003 SQL Server MVP Since 2008 Winner of the “Exceptional DBA” award for 2011 Lead Application DBA, Lead SQL Developer, and SQL Mentor for Proctor Financial, Inc. SQL Server is both my profession and my hobby (Yeah, I know… I need to get a life ;-) The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Today’s Sponsors The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Agenda Introduction Glossary Introduction to “Pseudo Cursors” The Trouble with Loops Glossary Introduction to “Pseudo Cursors” The Hidden Power of SQL Server Introduction to the Tally Table Another Type of Pseudo Cursor Hidden RBAR The Slothfulness of Recursion A “Table-Less” Tally “Table” First Appeared in Itzik Ben-Gan’s Books Some Examples High Performance Convenience Quick Review Q’n’A The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Introduction The Trouble with Loops

Getting Started in a Programming Class What’s the first thing they teach you how to do in most programming classes? That’s right… It sounds like a funny thing to do but this means that you've finally got your programming environment setup and you're ready to begin to learn how to program. PRINT 'Hello World' The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

The Next Programming Milestone After learning about some syntax conventions, variables, data types, and a couple of other things, what is the next major milestone in learning how to program that is taught that’s absolutely essential to advanced programming techniques? Looping is the very essence of advanced programming skills. Modern programming would be useless without being able to repeat the execution of code in loops. How To Loop The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Counting with a While Loop Definition: Count from 1 to 100 and display the count. The Human Thinks: Declare a counter Preset the counter to 1 Display the count Add 1 to the counter Is the counter <= 100? If YES, branch back to display the new count. If No, quit. This is "Procedural" code. Easy to remember because you tell the program how to proceed every step of the way. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Typical “Row By Row” Solution --===== Count from 1 to 100 and display the count. DECLARE @Counter INT; --Declare a counter SET @Counter = 1; --Preset the counter to 1 WHILE @Counter <= 100 BEGIN SELECT @Counter; --Display the count SET @Counter = @Counter + 1; --Add 1 to counter END; --Is the counter <= 100? --If Yes, branch back to display the count. --If no, Quit. What we get for our troubles is a real mess... The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

The Mess a Loop Creates 100 Individual Result Sets Virtually useless to a GUI as a return. Will cause errors in SSMS if too many result sets are returned 100 Individual Messages This is the reason why SET NOCOUNT ON is so important. It IS a performance issue when loops are involved. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Cleaning Up the Mess In order to return a single result set for this classic loop problem, we have to… Create a Temp Table or a Table Variable (Added) Declare a counter Preset the counter to 1 Insert the count as a new row in the Temp Table (instead of just displaying it - Added) Add 1 to the counter Is the counter <= 100? If YES, branch back to INSERT the count. If No, continue (instead of quit). SELECT from the "Table" in the proper order (Added) Quit (moved here from decision) The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Code Becomes More Complex… --===== Suppress the auto-display of row counts for performance SET NOCOUNT ON; --New code added --===== Create a place to store the results CREATE TABLE #MyHead (N INT); --New code added --===== Count from 1 to 100 and display the count. DECLARE @Counter INT; --Declare a counter SET @Counter = 1; --Preset the counter to 1 WHILE @Counter <= 100 BEGIN INSERT INTO #MyHead (N) --New code added SELECT @Counter; --Display the count (same as before) SET @Counter = @Counter + 1; --Add 1 to counter END; --Is the counter <= 100? --If Yes, branch back to display the count. --If no, continue. --===== Display the count SELECT N FROM #MyHead ORDER BY N; --New code added The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Performance Gets Worse Declaration of a new object. 101 individual calculations. 100 checks to make sure we didn't go over the limit. 100 individual INSERTs. Each INSERT requires a separate execution plan even if the SQL Server Optimizer decides it can reuse the same plan. Each INSERT requires a separate lock. Each INSERT requires a separate transaction (now there's a hint) Requires a final SELECT Takes ~14 seconds (~8 seconds in an explicit transaction) for a million rows on this laptop NOT INCLUDING THE FINAL SELECT (demo). The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

A MUCH Simpler Way to Count Wouldn't it be neat if most looping problems were as easy as... THIS???? --==== Count from 1 to 100 -- using a Tally table SELECT N FROM dbo.Tally WHERE N BETWEEN 1 AND 100 ORDER BY N ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Glossary

Things that Loop Cursor RBAR Hidden RBAR Generally means anything with a Cursor in it. Many folks also call While Loops and Recursive CTE’s a "Cursor". Most folks use these to process things "Row By Row". RBAR Pronounced "ree-bar" like the steel rods permanently stuck in cement (appropriate, don't you think?). Is a "Modenism" for any process that runs "Row By Agonizing Row". Hidden RBAR Things that look "set based" but are not. Contains a hidden "cursor" of one type or another. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Types of Progamming Procedural Programming Declarative Programming Essentially, RBAR programming. Human tells computer what to do AND how to do it… Row by Agonizing Row. Works fine in GUI's. Kills most all hopes of performance in SQL Server because this type of programming overrides the very nature of the Optimizer in SQL Server. Declarative Programming Human tells computer what to do. Computer figures out HOW to do it. Usually, Set Based programming. Works WITH the Optimizer instead of overriding it. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Programming in SQL Server Set Based Programming Declarative Programming In SQL Server Does NOT mean "all in one query" especially since a single query can contain Hidden RBAR. Does NOT mean something that doesn't have a loop especially since a single query can contain Hidden RBAR. CAN mean something that has a loop because certain queries require multiple SETS of information to be processed. Does mean "touching" each row only once, if possible, and as few times as possible if not. Requires a simple paradigm shift in thinking. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Other “Loops” and “Cursors” Recursion The act of a bit of SQL Code calling itself. It "iterates" over itself making a Hidden RBAR loop. An example of this is a "Recursive CTE" which does nothing more than call itself. The act of "making the call" can crush performance and can eat about 3 times (or more) the resources of a simple While Loop. Pseudo Cursor The hidden but very high speed looping effect that set based code experiences behind the scenes. A simple SELECT "iterates" through rows behind the scenes but in a manner that SQL knows best. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Introduction to “Pseudo Cursors” The Hidden Power of SQL Server

The Unexpected “Magic” of SELECT What does the following code snippet give you? --===== Top 100 rows of data from the table SELECT TOP (100) * FROM sys.all_columns; How does it work? Behind the scenes Does some preparation Reads one row Displays one row Makes a decision as to whether it’s done or not and loops back if it’s not done. Sound familiar? It should because… It’s a LOOP! Start Counter = 0 Display The Row Add 1 to Counter <=100 Return Read a Row The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Say it with me – “Pseudo Cursor” The term "Pseudo Cursor" was coined by R. Barry Young over on SQLServerCentral.com. It's a super important concept that I use to call a "Set Based Loop". It's a whole lot more complicated behind the scenes but it helps to think of a Pseudo Cursor as... A SELECT finds a row, reads the row, processes the row, and LOOPS back to read the next row… at an incredible speed. Behind the scenes, a SELECT is a machine language level Cursor (loop). Since these loops or cursors don't appear in T-SQL code, Barry called them "Pseudo Cursors". You DON’T necessarily have to use what's in the row of a Pseudo Cursor to use the rows. Say what? The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Simple Pseudo Cursors at Work What do the following snippets do? --===== Returns all COLUMNS and ROWS SELECT * FROM sys.all_columns ; --===== Returns a COLUMN of "1’s" -- Note that no data was used from the table SELECT 1 --===== Returns a COLUMN sequential numbers SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

What can you use a “Pseudo Cursor” for? In particular, what can you do with a Pseudo Cursor that DOESN’T use anything from the “source” tables? One of the problems with most databases is that they don’t have enough data to do any performance testing with. You can use the “rows” of a Pseudo Cursor as a “loop” to create millions of rows of test data in a couple of heartbeats… … and you don’t need a very big table to do that if you understand how to use a friend of the Pseudo Cursor, the CROSS JOIN… The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Building a Monster “Row Source” Let’s start off simple. We’ll just build a table with a Million numbered rows (and then add to it). To do such a thing, we need something with a very large number of rows… like a Million. Especially on new systems, no such table exists to use as a “row source”. In fact the largest table on the whole server turns out to be sys.all_columns and it has only about 4,000 rows in it (on new 2005 system, more in others). Hmmm… what’s 4,000 times 4,000? A CROSS JOIN on sys.all_columns will easily produce up to a 16 million row Pseudo Cursor. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Simple Million Row Test Table What does the following code do? --===== Create and populate a test table on-the-fly -- with a COLUMN of sequential numbers from -- 1 to 1,000,000. This takes 745 ms (demo). SELECT TOP 1000000 SomeID = IDENTITY(INT,1,1) INTO #MyHead FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2 ; I know what you’re thinking. “Test table? Is that all you’ve got?” The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

The Million Row Test Table --===== Create and populate a 1,000,000 row test table. -- "SomeID" has a range of 1 to 1,000,000 unique numbers -- "SomeInt" has a range of 1 to 50,000 numbers -- "SomeLetters2" has a range of "AA" to "ZZ" -- "SomeMoney has a range of 10.00 to 100.00 numbers -- "SomeDate" has a range of >=01/01/2010 & <01/01/2020 whole dates -- "SomeDateTime" has a range of >=01/01/2010 & <01/01/2020 Date/Times -- "SomeRand" contains the value of RAND just to show it can be done -- without a loop SELECT TOP 1000000 SomeID = IDENTITY(INT,1,1), SomeInt = ABS(CHECKSUM(NEWID())) % 50000 + 1, SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID())) % 26 + 65) + CHAR(ABS(CHECKSUM(NEWID())) % 26 + 65), SomeMoney = CAST(RAND(CHECKSUM(NEWID())) * 90 + 10 AS DECIMAL(9,2)), SomeDate = DATEADD(dd,ABS(CHECKSUM(NEWID())) % DATEDIFF(dd,'2010','2020'),'2010'), SomeDateTime = DATEADD(dd,DATEDIFF(dd,0,'2010'), RAND(CHECKSUM(NEWID())) * DATEDIFF(dd,'2010','2020')), SomeRand = RAND(CHECKSUM(NEWID())) INTO dbo.JBMTest FROM sys.all_columns ac1 --Cross Join forms up to a 16 million row CROSS JOIN sys.all_columns ac2 --Pseudo Cursor ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

What else can use “Pseudo Cursor” for? In particular, what can you do with a Pseudo Cursor that DOES use something from the “source” table? THAT’s what a Tally Table is all about. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Introduction to the Tally Table Another Type of “Pseudo Cursor”

What’s IN a Tally Table? A single column of sequential numbers Starts at 1 or 0 (Can have some problems with 0) Ends at some "sufficiently" large number. My Tally table usually ends with 11,000. I need more than 8,000 to split VARCHAR(8000) I need to be able to easily create 30 years worth of DAYS which is almost 11,000 days Is "Keyed" for speed. Clustered PK on "N“ (ABSOLUTELY ESSENTIAL) FILLFACTOR = 100 (ABSOLUTELY ESSENTIAL) INT because most functions will use INT's against it. Be REAL careful about implicit conversions here. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

How to Build a Tally Table --===================================================================== -- Create a Tally table from 1 to 11000 --===== Create and populate the Tally table on the fly. SELECT TOP 11000 IDENTITY(INT,1,1) AS N --Makes a NOT NULL column INTO dbo.Tally FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2 --Cross Join for up to 16 Million Rows ; --===== Add a CLUSTERED Primary Key to maximize performance ALTER TABLE dbo.Tally ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100 --===== Allow the general public to use it GRANT SELECT ON dbo.Tally TO PUBLIC The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Splitting at the Character Level --===== Simulate a passed parameter DECLARE @Parameter VARCHAR(8000); SET @Parameter = 'Element01,Element02,Element03'; --===== Declare a character counter (RBAR SOLUTION) DECLARE @N INT; SET @N = 1; --===== While the character counter is less then the length of the string WHILE @N <= DATALENGTH(@Parameter) BEGIN --==== Display the character counter and the character at that -- position. SELECT @N, SUBSTRING(@Parameter,@N,1); --==== Increment the character counter SET @N = @N + 1; END; --===== Do the same thing as the loop did... "Step" through the variable -- and return the character position and the character... SELECT N, SUBSTRING(@Parameter,N,1) FROM dbo.Tally WHERE N <= DATALENGTH(@Parameter) ORDER BY N; N --- ---- 1 E 2 l 3 e 4 m 5 e 6 n 7 t 8 0 9 1 10 , 11 E 12 l 13 e 14 m 15 e 16 n 17 t 18 0 19 2 20 , 21 E ... The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

How It Works Just like counting from 1 to 100, both the loop and the Tally table count from 1 to the length of the parameter. Look at the following graphic. Both the loop and the Tally table do exactly the same thing except the Tally table only uses 1 SELECT and returns a single result set. The rows of the Tally Table act as the counter except it's set based. The Tally Table is a direct replacement for the loop. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Finding the Position of Delimiters The next logical step would be to find the Delimiters. Here's how it's done with a loop... --===== Loop Method ============================================================== --===== Simulate a passed parameter DECLARE @Parameter VARCHAR(8000); SET @Parameter = 'Element01,Element02,Element03'; -- 111111111122222222223 -- 123456789012345678901234567890 --===== Declare a variable to remember the position of the current comma DECLARE @N INT ; --===== Find the first delimiter, if one exists SET @N = CHARINDEX(',',@Parameter); --===== Loop through and find each delimiter starting with the -- location of the previous delimiter. WHILE @N > 0 BEGIN SELECT @N; --==== Find the next comma and add 1 to it. -- Return a 0 when no more commas are found. SELECT @N = CHARINDEX(',',@Parameter,@N+1); END; Results N ---- 10 20 The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Finding the Position of Delimiters …and here’s how it’s done with a Tally Table --===== Tally Table Method ========================== --===== Simulate a passed parameter DECLARE @Parameter NVARCHAR(4000); SET @Parameter = 'Element01,Element02,Element03'; -- 111111111122222222223 -- 123456789012345678901234567890 --===== Now, find all the Delimiters SELECT N FROM dbo.Tally t WHERE t.N <= DATALENGTH(@Parameter) AND SUBSTRING(@Parameter, t.N, 1) = ',' Results N ---- 10 20 The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

How It Works Again, the reason why the Tally Table method works is that it's joined to the variable at the character level and seeks out the delimiters using a Pseudo Cursor. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Doing the Final Split Up to this point, we've been able to find each delimiter using both a loop and a Tally Table. What we need to do now is find the NEXT delimiter to isolate the characters between the CURRENT delimiter and the NEXT delimiter. Once we've done that, we need to either store or display the characters that we've isolated as a group. This effectively splits the elements out from between the delimiters. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

“Inch Worm” Splitter Notice first and last elements are different. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Loop Code for 8K Splitter CREATE FUNCTION dbo.Split8KLoop ( @pString VARCHAR(8000), @pDelimiter CHAR(1) ) RETURNS @Return TABLE (ItemNumber INT, Item VARCHAR(8000)) AS BEGIN --===== Declare some obviously named variables DECLARE @StartPointer INT, @EndPointer INT, @Counter INT; --===== Find the first delimiter (@EndPointer), if it exists SELECT @StartPointer = 1, @EndPointer = CHARINDEX(@pDelimiter, @pString), @Counter = 1; --===== If we found at least one delimiter, loop until we don't find any more WHILE @EndPointer > 0 BEGIN --===== Inserts the split item INSERT INTO @Return (ItemNumber, Item) SELECT ItemNumber = @Counter, Item = SUBSTRING(@pString, @StartPointer, @EndPointer - @StartPointer); --===== Finds the next split item, if it exists SELECT @StartPointer = @EndPointer + 1, @EndPointer = CHARINDEX(@pDelimiter, @pString, @StartPointer), @Counter = @Counter + 1; END; --===== Inserts the last or only split item Item = SUBSTRING(@pString, @StartPointer, 8000); RETURN; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

A Tally Table 8k Splitter CREATE FUNCTION dbo.Split8KTally --===== Define I/O parameters (@pString VARCHAR(8000), @pDelimiter CHAR(1)) RETURNS TABLE WITH SCHEMABINDING AS RETURN WITH cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once -- for each delimiter) (NOTE: THIS IS NOT A RECURSIVE CTE) SELECT 1 UNION ALL SELECT t.N+1 FROM dbo.Tally t WHERE t.N BETWEEN 1 AND ISNULL(DATALENGTH(@pString),0) AND SUBSTRING(@pString,t.N,1) = @pDelimiter ), cteLen(N1,L1) AS (--==== Return start and length (for use in substring) SELECT s.N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000) FROM cteStart s ) --===== Do the actual split. The ISNULL/NULLIF combo handles the length -- for the final element when no delimiter is found. SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1), Item = SUBSTRING(@pString, l.N1, l.L1) FROM cteLen l ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

The Slothfulness of Recursion Hidden RBAR The Slothfulness of Recursion

Recursive CTE’s They’re easy to write. They have a small physical footprint in code. They’re “slick” because they look “Set-Based”. They have no explicit loop. They’re S-L-O-W. They’re resource intensive. They’re “Hidden RBAR” The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

An rCTE to Count Here’s a simple rCTE (recursive CTE) that counts from 0 to 11 to create a year’s worth of months for 2011. WITH cteCounter AS (--==== Counter rCTE counts from 0 to 11 SELECT 0 AS N --This provides the starting point (anchor) of zero UNION ALL SELECT N + 1 --This is the recursive part FROM cteCounter WHERE N < 11 )--==== Add the counter value to a start date and you get multiple dates SELECT StartOfMonth = DATEADD(mm,N,'2011') ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

How It Works The basic, non-technical, non-scientific explanation for the operation of code would be this... The "anchor" value is set to "0" (zero). This now means the rCTE has a row with a zero in it. The recursive part takes over. It looks at itself and says "What's the last value that I put into myself?", does a SELECT to add 1 to that value, and then checks the predicate in the WHERE clause. If the value that was just made is within the limits defined by the WHERE clause, the rCTE saves that value in itself and then it loops back to Step 2 The process continues to re-iterate through the loop formed by Steps 2 and 3 until the value being built (N+1) exceeds the limits of the WHERE clause. Once that happens, the rCTE exits to the SELECT (or other) statement that immediately follows the rCTE. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Behind the Scenes The rCTE actually makes a “Work” Table in TempDB. Another name for this table is a “System Temp Table”. (12 row(s) affected) Table 'Worktable'. Scan count 2, logical reads 73, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Notice the number of reads to render just 11 rows? Yes, they’re “logical reads” (memory), but that’s still I/O. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Performance of a Counting rCTE On the chart on the next page, the painfully obvious loser is the Red line which is the rCTE. There are 3 other methods of counting on this chart. Compare and believe… The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Performance of a Counting rCTE The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Even Low Counts are Painful The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Resources? You’ve GOT To See This! The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

A “Table-less” Tally “Table” First Appeared In Itzik Ben-Gan’s Books

Modified Ben-Gan Cascading CTE’s This produces virtually no reads and can be almost as fast as a physical Tally Table especially when used in an “iTVF” (inline Table Valued Function). WITH E1(N) AS ( --=== Create Ten 1's SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 --10 ), E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100 E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000 E8(N) AS (SELECT 1 FROM E4 a, E4 b), --100,000,000 E16(N) AS (SELECT 1 FROM E8 a, E8 b), --10,000,000,000,000,000 cteTally(N) AS (SELECT TOP (@pMaxValue) ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E16) SELECT t.N --Some query that uses the sequential numbering FROM cteTally t ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

How It Works Each CTE is named for a “power of 10”… as in 10En. For example, 10E2 = 100. The first CTE, E1, returns up to 10 rows and is simply ten 1’s UNION ALL’d together. The second CTE, E2, is nothing more than a CROSS JOIN of E1. It returns up to 10x10 or 100 rows. Each following En CTE is a CROSS JOIN that squares the number of rows of the previous CTE. cteTally does two things… The TOP very effectively limits the number of rows that are created. ROW_NUMBER() converts the rows into a numbered sequence just like a Tally Table. It could be created as a separate iTVF and still be as fast as a physical Tally Table.. Typically, for any functions on VARCHAR(8000), only CTE’s E1 through E4 (10,000) rows are included to simplify the code a bit. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

“Goldilocks” Pseudo Cursors Extending the Concept “Goldilocks” Pseudo Cursors

It’s Not Just a Table More important than the Tally Table itself, there’s a concept on how you can avoid the RBAR of explicit loops that we’ve learned. You don’t necessarily need a Tally Table. You don’t necessarily need a big honkin’ cascading CTE. Sometimes, all you need is a “Goldilocks” table. Something “just right”. It needs to be easy to use (UDF). It needs to be fast (iTVF). The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Calculate Dates in Current Week Just enough of an “inline” Tally Table to do the job. Inline Table Valued Function or iTVF. Very Fast. --===== Works in SQL Server 2000 and above.      -- Note that can't use CROSS APPLY in 2000. -- Note that a week starts on Sunday in this code.  CREATE FUNCTION dbo.DatesInWeek(@SomeDate DATETIME) RETURNS TABLE WITH SCHEMABINDING AS  RETURN  SELECT DateInWeek = DATEADD(dd,DATEDIFF(dd,-1,@SomeDate)/7*7+t.N,-1)    FROM (SELECT 0 UNION ALL SELECT 1 UNION ALL           SELECT 2 UNION ALL SELECT 3 UNION ALL          SELECT 4 UNION ALL SELECT 5 UNION ALL          SELECT 6) t (N) ; GO --===== Works in SQL Server 2008 and above  CREATE FUNCTION dbo.DatesInWeek(@SomeDate DATETIME) RETURNS TABLE WITH SCHEMABINDING AS  RETURN  SELECT DateInWeek = DATEADD(dd,DATEDIFF(dd,-1,@SomeDate)/7*7+t.N,-1)    FROM (VALUES (0),(1),(2),(3),(4),(5),(6)) t (N) ; GO The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Super Brief Intro to CROSS APPLY Basically nothing more than a “Correlated Subquery” that can return more than 1 row. Usually, VERY fast. Great for incorporating multi-line Table Valued Functions (mTVF’s). Even better when incorporating “inline” Table Valued Functions (iTVF’s). Paul White’s excellent articles on Apply http://www.sqlservercentral.com/articles/APPLY/69953/ http://www.sqlservercentral.com/articles/APPLY/69954/ The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Use the iTVF to Get the Current Week --===== Create a table and insert 2 dates with -- an identifier so we can tell what is what.  CREATE TABLE #SomeTable         (         SomeTableID  INT IDENTITY(1,1),         SomeDateTime DATETIME         ) ;  INSERT INTO #SomeTable         (SomeDateTime)  SELECT '2000-03-01' UNION ALL  SELECT GETDATE() ;  SELECT t.SomeTableID,         fn.DateInWeek    FROM #SomeTable t   CROSS APPLY dbo.DatesInWeek(t.SomeDateTime) fn ; SomeTableID DateInWeek ----------- ----------------------- 1 2000-02-27 00:00:00.000 1 2000-02-28 00:00:00.000 1 2000-02-29 00:00:00.000 1 2000-03-01 00:00:00.000 1 2000-03-02 00:00:00.000 1 2000-03-03 00:00:00.000 1 2000-03-04 00:00:00.000 2 2012-09-09 00:00:00.000 2 2012-09-10 00:00:00.000 2 2012-09-11 00:00:00.000 2 2012-09-12 00:00:00.000 2 2012-09-13 00:00:00.000 2 2012-09-14 00:00:00.000 2 2012-09-15 00:00:00.000 The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Sometimes Goldilocks is a Big Girl There are times where a Tally Table won’t help but the act of generating a sequence like a Tally Table will. This Pseudo Cursor spans the whole table. This numbers each group of dupes and deletes all but the first one from each group. WITH cteEnumerateDupes AS (  SELECT DupeNumber = ROW_NUMBER() OVER (PARTITION BY SomeInt ORDER BY SomeDate)    FROM dbo.JBMTest )  DELETE FROM cteEnumerateDupes   WHERE DupeNumber > 1 ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

High Performance Convenience Some More Examples High Performance Convenience

Without a Doubt… Without a doubt, most folks agree that SQL Server doesn’t handle String functionality very well. Without a doubt, most folks agree that String functionality should be left up to the GUI or Reporting Tool. Without a doubt, most folks agree that if you can’t handle Strings in either of those, then you should use a CLR. Without a doubt, if you can’t do any of those things, you’d better know how to do it all in T-SQL. The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

dbo.SplitDelimited8K using cteTally CREATE FUNCTION dbo.DelimitedSplit8K --===== Define I/O parameters (@pString VARCHAR(8000), @pDelimiter CHAR(1)) RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000... -- enough to cover VARCHAR(8000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT 1 UNION ALL SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter cteLen(N1,L1) AS(--==== Return start and length (for use in substring) SELECT s.N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000) FROM cteStart s ) --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1), Item = SUBSTRING(@pString, l.N1, l.L1) FROM cteLen l ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Testing/Using the Splitter --======================================================================================================= -- TEST 1: -- This tests for various possible conditions in a string using a comma as the delimiter. -- The expected results are laid out in the comments --===== Conditionally drop the test tables to make reruns easier for testing. -- (this is NOT a part of the solution) IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL DROP TABLE #JBMTest ; --===== Create and populate a test table on the fly (this is NOT a part of the solution). -- In the following comments, "b" is a blank and "E" is an element in the left to right order. -- Double Quotes are used to encapsulate the output of "Item" so that you can see that all blanks -- are preserved no matter where they may appear. SELECT * INTO #JBMTest FROM ( --# & type of Return Row(s) SELECT 0, NULL UNION ALL --1 NULL SELECT 1, SPACE(0) UNION ALL --1 b (Empty String) SELECT 2, SPACE(1) UNION ALL --1 b (1 space) SELECT 3, SPACE(5) UNION ALL --1 b (5 spaces) SELECT 4, ',' UNION ALL --2 b b (both are empty strings) SELECT 5, '55555' UNION ALL --1 E SELECT 6, ',55555' UNION ALL --2 b E SELECT 7, ',55555,' UNION ALL --3 b E b SELECT 8, '55555,' UNION ALL --2 b B SELECT 9, '55555,1' UNION ALL --2 E E SELECT 10, '1,55555' UNION ALL --2 E E SELECT 11, '55555,4444,333,22,1' UNION ALL --5 E E E E E SELECT 12, '55555,4444,,333,22,1' UNION ALL --6 E E b E E E SELECT 13, ',55555,4444,,333,22,1,' UNION ALL --8 b E E b E E E b SELECT 14, ',55555,4444,,,333,22,1,' UNION ALL --9 b E E b b E E E b SELECT 15, ' 4444,55555 ' UNION ALL --2 E (w/Leading Space) E (w/Trailing Space) SELECT 16, 'This,is,a,test.' --E E E E ) d (SomeID, SomeValue) --===== Split the CSV column for the whole table using CROSS APPLY (this is the solution) SELECT test.SomeID, test.SomeValue, split.ItemNumber, split.Item, QuotedItem = QUOTENAME(split.Item,'"') FROM #JBMTest test CROSS APPLY dbo.DelimitedSplit8K(test.SomeValue,',') split The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

How fast is it? (1,000 Row Test) The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Building Dates for Whole Years DECLARE @StartYear DATETIME, @EndYear DATETIME, @Cutoff DATETIME ; SELECT @StartYear = '1950', @EndYear = '2050', @Cutoff = DATEADD(yy,1,@EndYear) SELECT TOP(DATEDIFF(yy,@StartYear,@Cutoff)) DATEADD(yy,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1,@StartYear) FROM dbo.Tally t1, --Being used as a row-source here dbo.Tally t2 The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Building a Date Range DECLARE @StartDate DATETIME, @EndDate DATETIME, @Cutoff DATETIME ; SELECT @StartDate = '2011-07-05', @EndDate = '2012-03-01', @Cutoff = DATEADD(dd,1,@EndDate) SELECT TOP(DATEDIFF(dd,@StartDate,@Cutoff)) DATEADD(dd,t.N-1,@StartDate) FROM dbo.Tally t --11,000 > 30 years of days ORDER BY t.N The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Building a Time Interval Range DECLARE @StartDate DATETIME, @EndDate DATETIME, @Cutoff DATETIME, @Interval INT ; SELECT @StartDate = '2011-07-05', @EndDate = '2012-03-01', @Cutoff = DATEADD(dd,1,@EndDate), @Interval = 10 SELECT TOP(DATEDIFF(mi,@StartDate,@Cutoff)/@Interval) DATEADD(mi, (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1) * @Interval, @StartDate) FROM dbo.Tally t1, --Used as a row-source dbo.Tally t2 Change “mi” to “hh” for hours The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Building Time Interval “Bins” DECLARE @StartDate DATETIME, @EndDate DATETIME, @Cutoff DATETIME, @Interval INT ; SELECT @StartDate = '2011-07-05', @EndDate = '2012-03-01', @Cutoff = DATEADD(dd,1,@EndDate), @Interval = 1 WITH cteStartTimes AS ( SELECT TOP(DATEDIFF(hh,@StartDate,@Cutoff)/@Interval) StartTime = DATEADD(hh,(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1)*@Interval,@StartDate) FROM dbo.Tally t1, --Used as a row-source dbo.Tally t2 ) SELECT StartTime, Cutoff = DATEADD(hh,1,StartTime) FROM cteStartTimes ORDER BY StartTime; Change “hh” to “dd” for Days Change “hh” to “mi” for Minutes The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Great for Preventing SQL INJECTION Cleaning Strings CREATE FUNCTION dbo.CleanString8K ( @pString VARCHAR(8000), @pPattern VARCHAR(8000) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN WITH E1(N) AS ( --=== Create Ten 1's SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10 E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100 E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000 cteTally(N) AS (SELECT TOP (DATALENGTH(@pString)) ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4) SELECT CleanedString = SELECT '' + SUBSTRING(@pString, t.N, 1) FROM cteTally t WHERE SUBSTRING(@pString, t.N, 1) COLLATE Latin1_General_BIN LIKE @pPattern COLLATE Latin1_General_BIN FOR XML PATH('') ); @pPattern = [A-Z] = Upper Case Alpha Only @pPattern = [a-z] = Lower Case Alpha Only @pPattern = [0-9] = Numeric Digits Only @pPattern = [A-Za-z0-9] = Alpha-Numeric Only The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Initial Caps The Tally Table and Pseudo Cursors CREATE FUNCTION dbo.InitialCap --Original code by Usman Butt, modified for extra functionality by Jeff Moden --===== Declare the IO of the function (@String VARCHAR(8000)) RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== Force the first character of the string to upper case. -- Obviously, non-letter values will not be changed by UPPER. SELECT InitialCapString = UPPER(LEFT(@String,1)) --First character always + ( --=== If the current character in the given string isn't a letter then -- concatenate the next character as an UPPER case character. -- Otherwise, make it lower case character. -- The COLLATE clause speeds up non-default collations. SELECT CASE WHEN SUBSTRING(@String, t.N , 1) COLLATE Latin1_General_BIN LIKE '[^A-Za-z'']' COLLATE Latin1_General_BIN OR SUBSTRING(@String, t.N , 4) COLLATE Latin1_General_BIN LIKE '[^A-Za-z][A-Za-z][A-Za-z][A-Za-z]' COLLATE Latin1_General_BIN THEN UPPER(SUBSTRING(@String, t.N+1, 1)) ELSE LOWER(SUBSTRING(@String, t.N+1, 1)) END FROM dbo.Tally t WHERE t.N < LEN(@String) ORDER BY t.N FOR XML PATH(''), TYPE ).value('text()[1]', 'varchar(8000)') ; The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Quick Review

Quick Review Thanks for listening, folks! In the Introduction, we found out why loops can be so slow. We learned the definition of some “new” terms in the Glossary including “RBAR” and “Hidden RBAR”. We learned of the hidden power in SQL Server through the use of “Pseudo Cursors”. Need Test Data? Build it! We learned what a Tally Table is and how it works as a high peformance “Pseudo Cursor”. We learned that Recursive Counting CTE’s are a form of “Hidden RBAR” and are nearly as bad as While Loops for performance and are resource hogs. We learned how to create and use a “table-less” Tally “Table” that lives only in memory and causes virtually no reads. We learned how to use the Tally Table and cteTally through the use of several examples. Thanks for listening, folks! The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Recommended Reading

Recommended Reading The "Numbers" or "Tally" Table: What it is and how it replaces a loop. http://www.sqlservercentral.com/articles/T-SQL/62867/ Tally OH! An Improved SQL 8K “CSV Splitter” Function http://www.sqlservercentral.com/articles/Tally+Table/72993/ Generating Test Data: Parts 1 and 2 http://www.sqlservercentral.com/articles/Data+Generation/87901/ http://www.sqlservercentral.com/articles/Test+Data/88964/ How to Make Scalar UDFs Run Faster (SQL Spackle) http://www.sqlservercentral.com/articles/T-SQL/91724/ Hidden RBAR: Counting with Recursive CTE's http://www.sqlservercentral.com/articles/T-SQL/74118/ Creating a comma-separated list (SQL Spackle) –Wayne Sheffield http://www.sqlservercentral.com/articles/comma+separated+list/71700/ Understanding and Using APPLY: Parts 1 and 2 –Paul White http://www.sqlservercentral.com/articles/APPLY/69953/ http://www.sqlservercentral.com/articles/APPLY/69954/ The Tally Table and Pseudo Cursors 04 October 2014 © Copyright by Jeff Moden - All Rights Reserved

Q’n’A The Tally Table and Pseudo Cursors What they are and how they replace certain While Loops by Jeff Moden #315 Pittsburgh, Pennsylvania