More Windowing Functions KEVIN MCCARTY
What are Windowing Functions Again? Introduced in SQL Server 2005 (SQL 2003 Standard) Used to provide operations on subsets, subgroupings of data ◦This was a difficult operation before, requiring liberal use of temp tables, CTEs, subqueries and a lot of code ◦Often with a significant performance hit We have already discussed ranking functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE) We will finish our discussion by covering the other types of windowing functions ◦Aggregate (SUM, AVG, COUNT, ROWS, RANGE) ◦Analytic/Offset (LAG, LEAD, FIRST_VALUE, LAST_VALUE) ◦Distribution (PERCENT_RANK, CUME_DIST, PERCENTILE, DISC, PERCENTILE_CONT)
Quick Review - The OVER clause A core component of a windowing function is the OVER clause Use the OVER clause to define the “window” or specific set of rows to apply the windowing function ◦OVER must be combined with an ORDER BY clause (which makes sense) SELECT BusinessEntityID AS SalesID, FirstName + ' ' + LastName AS FullName, SalesLastYear, ROW_NUMBER() OVER(ORDER BY SalesLastYear ASC) AS RowNumber FROM Sales.vSalesPerson;
Partitioning Use Partitioning to limit sets of rows to those with the same value in a partitioning column ◦Another term for this is “framing” and is similar to the GROUP BY ◦Useful for isolating sets with specific characteristics without having to run a separate query SELECT BusinessEntityID AS SalesID, FirstName + ' ' + LastName AS FullName, SalesLastYear, TerritoryGroup, ROW_NUMBER() OVER(PARTITION BY TerritoryGroup ORDER BY SalesLastYear ASC) AS RowNumber FROM Sales.vSalesPerson;
ROWS/RANGE Functions Use the ROWS or RANGE functions to further refine a partition ◦Requires use of the ORDER BY functions ◦Rows or ranges are based upon proximity to the current row ◦Row functions ◦CURRENT ROW – current row in the resultset (ROWS and RANGE) ◦UNBOUNDED PRECEDING – the beginning of a partition (ROWS and RANGE) ◦UNBOUNDED FOLLOWING – the end of a partition (ROWS and RANGE) ◦n PRECEDING – number of rows before the current row (ROWS only) ◦n FOLLOWING – number of rows after the current row (ROWS only)
RANGE vs. ROWS
Windowing Components Conceptual relationship between window elements: Result set (OVER) Window partition (PARTITION BY) Frame (ROWS BETWEEN) Note: Shameless theft from Microsoft courseware
Framing CURRENT ROW is just what it says, the current row in the result set ◦You can bound from the current row to either the beginning of the set (UNBOUNDED PRECEDING) or some number in between (n PRECEDING)
Framing Cont. You can also bound to the end of the result set (UNBOUNDED FOLLOWING) or somewhere in between (n FOLLOWING) Note – shameless thievery from functions-in-sql-server-part-2-the-frame/ functions-in-sql-server-part-2-the-frame/
Offset/Analytic Functions There are four offset (also referred to as analytic) you can use to avoid having to do a bunch of self-joins ◦LAG – Apply a previous (1 or more) row ◦Lead – Apply a subsequent (1 or more) row
Offset/Analytic Functions There are four offset (also referred to as analytic) you can use to avoid having to do a bunch of self-joins ◦FIRST_VALUE – get value of first row for all records ◦LAST_VALUE – get value of last row for all records ◦Note – ORDER BY is required
Distribution Functions PERCENT_RANK – calculates as a percentage, the rank of a given row compared to the overall ranking ◦Useful to see where a row “scores” ◦CUME_DIST – calculates as a value where a row stands compared to the total ◦Use to see how far a row is compared to the total/bottom
Distribution Functions cont. SQL Server 2012 also introduced two inverse distribution functions PERCENTILE_DISC – determines where along a distribution curve (using a discrete model) a value lies PERCENTILE_CONT – determines where along a distribution curve (using a continuous model) a value lies Note that a discrete model uses actual values whereas a continuous model can extrapolate across a series of values
Distribution Functions PERCENT_RANK (percentile rank) CUME_DIST (cumulative distribution) PERCENTILE_DISC (percentile using discrete distribution model) PERCENTILE_CONT (percentile using continuous distribution model)
Basic Syntax OVER ( [ ] -- optional [ ] [ ] - optional ) SELECT BusinessEntityID AS SalesID, FirstName + ' ' + LastName AS FullName, SalesLastYear, ROW_NUMBER() OVER(ORDER BY SalesLastYear ASC) AS RowNumber FROM Sales.vSalesPerson;