Implementing Common Business Calculations in DAX Chris Webb chris@crossjoin.co.uk www.crossjoin.co.uk
Who am I? Chris Webb (chris@crossjoin.co.uk) Independent consultant specialising in Analysis Services and MDX (and PowerPivot and DAX!): http://www.crossjoin.co.uk Blogger: http://cwebbbi.spaces.live.com Author: MDX Solutions 2nd Edition Expert Cube Development With Analysis Services 2008
Agenda What is DAX and why should I learn it? Calculated columns and calculated measures Row context and filter context Calculate() Values() Demos, demos, demos!
What is DAX? DAX is the new calculation language for PowerPivot MDX is still the query language It is a multidimensional calculation language like MDX, but the design goals were: Make it easy to do common calculations (easier than MDX) Make it easy to use for Excel power users, hence the Excel-based syntax DAX expressions are limited to a single (though often very long) line of code
Why learn DAX? Your power users will be learning it, and you’ll need to understand what’s going on in their PowerPivot models It’s very likely that it will be used in future versions of SSAS – possibly as an alternative to MDX for some calculations We love both DAX and MDX, and MDX is not going away. The challenge for us is to bring them together. We have some promising directions. That said – I think you’ll see DAX evolving in new important directions that MDX will never cover, and a large and growing portion of the calc work will be done in DAX. So my advice stands: all you guys need to become DAX gurus ASAP. Amir Netz, Microsoft
DAX Syntax and Functions DAX syntax is based on Excel syntax Only thing to watch out for is you need to use && and || for AND and OR Supports around 80 Excel functions Uses its own data types In Excel you only have numbers or strings Uses the concept of BLANK rather than NULL PowerPivot BLANKs behave in exactly the same way as SSAS nulls The UI for editing DAX is terrible – can use NotePad++ and Colin Banfield’s DAX language template instead
Referencing Columns and Measures DAX usually requires fully qualified names, eg: MyTable[MyColumn] MyTable[MyMeasure] You can use unqualified names in calculated column definitions, when referring to other columns in the same table DAX cannot use standard Excel cell references to data elsewhere in the workbook.
Calculated Columns DAX can be used to create two types of calculation: calculated columns and calculated measures Calculated columns are derived columns in a PowerPivot table Are defined inside the PowerPivot UI After they are created, they behave like any other column Their values are calculated immediately after the real data has been loaded into PowerPivot Can be used to do basic ETL work, eg concatenating first and last names, deriving years from dates etc
Calculated Measures Calculated measures provide the numeric values you see aggregated in pivot tables All measures are calculated measures in PowerPivot! Are defined within the pivot table in Excel Are calculated at query time Basic sum, count, min, max and average calculated measures can be created from the right-click menu More advanced calculated measures can be created by entering your own DAX expression
Row Context and Filter Context Row context refers to the current row where a calculation is taking place There is no row context without a DAX aggregation function Filter context refers to the currently selected item on each column on each table Very similar to the CurrentMember function in MDX, but handles multiselect elegantly Filter context follows the one-to-many relationships between tables Both row and filter context must be taken into account when writing expressions for calculated measures
Row and Filter Context Examples
Filter Context
Filter Context
Row Context
Row Context with Calculation
Row Context with Calculation
Calculate() The Calculate() function is the key to all advanced DAX calculation functionality Signature: Calculate(Expression, SetFilter1, SetFilter2,...) It allows any expression to be evaluated in a specific filter context It works as follows: Modifies the current filter context according to the SetFilter arguments you pass in Shifts the row context onto the filter context Evaluates your expression in the new filter context
Calculate() SetFilter arguments can either be: Boolean expressions, eg CALCULATE(COUNTROWS(), Consultants[Skill]="MDX") Table functions, so the filter context for any table is set to the rows returned by the table function, eg CALCULATE(COUNTROWS(), FILTER(Consultants, Consultants[Measure1]>3)) Useful table functions include: Values(), which returns a list of distinct values in a column in the current filter context All(), which returns a list of all the values in a column ignoring filter context
Ratio to All and Ratio to Parent A common example of a ‘ratio to all’ and ‘ratio to parent’ calculations is a market shares The numerator for this kind of calculation is easy: SUM('Consultants'[Measure1]) The denominator is the ‘all’ or ‘parent’ total To get the total value of a measure with all filter context cleared we need to use All(Table), eg: CALCULATE(SUM('Consultants'[Measure1]), ALL(Consultants)) To get the total value of a measure with filter context for just one column removed, we need All(Column), eg: CALCULATE(SUM('Consultants'[Measure1]), ALL(Consultants[Consultant]))
Ratio to All and Ratio to Parent This makes the final calculations: =SUM('Consultants'[Measure1])/ CALCULATE(SUM('Consultants'[Measure1]), ALL(Consultants)) and =SUM(Consultants[Measure1])/ CALCULATE(SUM('Consultants'[Measure1]), ALL(Consultants[Consultant]))
Previous Year Growth Basic algorithm to calculate year-on-year growth is: ((Sales for Current Time Period) - (Sales for Same Time Period in Previous Year)) / (Sales for Same Time Period in Previous Year) The big problem is how to calculate sales for the same time period in the previous year in the absence of SSAS-like hierarchies?
Previous Year Growth The following is the best approach: =CALCULATE(SUM(FactInternetSales[SalesAmount]) , DATEADD(DimDate[FullDateAlternateKey], -1, YEAR) , ALL(DimDate)) Works by: Finding all the dates in the current year Shifting each date back one year Setting this as the filter context Summing SalesAmount for these dates BUT will only give correct results if you have a complete set of dates in your year!
Previous Year Growth Always a good idea in DAX to break complex calculations up into a series of simpler calculated measures So, if we use the previous formula to define a calculated measure called Previous Year Sales, the growth calculation becomes: =IF(FactInternetSales[Previous Year Sales]=0 , BLANK() ,(SUM(FactInternetSales[SalesAmount]) - FactInternetSales[Previous Year Sales]) / FactInternetSales[Previous Year Sales]) Notice, like in MDX, we need to trap division by zero
Rules for Time Intelligence functions Five rules for using Time Intelligence functions: Never use the datetime column from the fact table in time functions. Always create a separate Time table, and make sure it contains complete years Create relationships between fact tables and the Time table. Make sure that relationships are based on a datetime column (and NOT based on another artificial key column). The datetime column in the Time table should be at day granularity (without fractions of a day).
If you don’t follow the rules... As we’ve already seen, not having complete years makes it hard to do relative time calculations In my example, since I joined on a surrogate key and not a DateTime key, I needed to add All(DimDate) to my calculation You can’t use RELATED() to bring the date down to the fact table either, because this causes a circular reference error! Need to either alter the underlying relational table or view, or import the date dimension table twice
Year to Date Luckily there are many built-in time intelligence functions for common calculations Eg for doing year-to-date sums, we have TotalYTD: =TOTALYTD( SUM(FactInternetSales[SalesAmount]) ,DimDate[FullDateAlternateKey] , ALL(DimDate)) In this case, TotalYTD is a variant of Calculate with some filters set automatically
Total to Date A total-to-date gets the running total from the first date we have in our data No built-in function for it We need to use Calculate and set the filter to all dates from the first ever date to the last date in the current context We can use DATESBETWEEN to get this date range The first date can be got by using BLANK() The last date with the LASTDATE() function
Total to Date The final version is: =CALCULATE( SUM(FactInternetSales[SalesAmount]) , DATESBETWEEN('DimDate'[FullDateAlternateKey] , BLANK() , LASTDATE( 'DimDate'[FullDateAlternateKey])) , All('DimDate'))
Values() – like CurrentMember but better! The Values() function acts like the MDX CurrentMember function But it is better: Values() returns a table, so it handles multiselect Although when the table it returns only contains one row we can still do a direct comparison with another value The Distinct() function works exactly the same way as Values() but: Values() will return the Unknown Member Distinct() will not
Distinct Count To find a distinct count, we need to count the rows in a table containing only the distinct values from a column The Values() function returns such a table Because it’s a table, all we need to do is count the number of rows in it: =COUNTROWS(VALUES(Consultants[Skill]))
Ranks There is no rank function in PowerPivot So, for example if we wanted to calculate a rank for dates by sales, we would need to use the following approach: For each date Find a table containing the complete list of all dates Filter that list of all dates to find those whose sales were higher Count the number of rows in that table Add 1
Ranks This gives as a first attempt: =COUNTROWS( FILTER(ALL(DimDate[DateKey]) ,(DimDate[DailySales]( VALUES(DimDate[DateKey])) < DimDate[DailySales]))) + 1
Rank only for Dates With this knowledge, we can avoid doing the calculation when more than one date is in the row context: =IF( COUNTROWS(VALUES(DimDate[DateKey]))>1 , BLANK() ,COUNTROWS( FILTER(ALL(DimDate[DateKey]) ,(DimDate[DailySales](VALUES(DimDate[DateKey])) <DimDate[DailySales]) )) + 1)
Time Utility dimensions in PowerPivot Time Utility dimensions are an SSAS technique for applying one calculation to multiple measures Also known as Date Tool or Shell dimensions Since PowerPivot does not allow calculated members on anything other than the Measures dimension, it isn’t possible to create a true Time Utility dimension But... You can build something almost the same: Create a calculated measure that allows you to select which real measure you want Create a calculated measure that allows you to select which calculation you want, and apply it to the previous calculated measure
Final Thoughts DAX does the easy stuff very easily DAX does the medium-hard stuff well too Not sure if power users will get it, but It’s easier than MDX It’s more SQL-like, so more developers will get it It’s more elegant than MDX in many ways, eg multiselect DAX does not do the really hard stuff well at all So for financial apps, for example, MDX still wins But for how long? I like DAX a lot
Links http://blogs.msdn.com/powerpivot/ http://www.powerpivot.com/ http://powerpivotgeek.com http://powerpivottwins.com/ http://powerpivotpro.com/ http://www.powerpivot-info.com/ http://cwebbbi.spaces.live.com http://social.msdn.microsoft.com/Forums/en-US/sqlkjpowerpivotforexcel/threads http://www.business-intelligence.kdejonge.net/ http://sqlblog.com/blogs/marco_russo/
Thanks!
Coming up… P/X001 How to Get Full Access to a Database Backup in 3 Minutes or Less Idera P/L001 End-to-end database development has arrived Red Gate P/L002 Weird, Deformed, and Grotesque –Horrors Stories from the World of IT Quest P/L005 Expert Query Analysis with SQL Sentry SQLSentry P/T007 Attunity Change Data Capture for SSIS Attunity #SQLBITS