Advanced MDX Tips And Tricks Chris Webb
Who Am I? Chris Webb UK-based consultant and trainer: Author/co-author of several books: MDX Solutions Expert Cube Development with SSAS 2008 Analysis Services 2012: The BISM Tabular Model Power Query for Power BI and Excel Data Platform MVP Blogger:
Agenda Subselects – how they work and their performance implications Session cubes and why they are a bad thing Partition elimination, prefetching and MDX queries The UnOrder() function Static, dynamic and inline named sets and related performance problems Solve Order and its quirks
What are subselects? MDX subselects are a way of filtering the members on a dimension before a query is run Widely used by Excel PivotTables and SSRS-generated MDX queries to calculate subtotals Often confused with the MDX WHERE clause – in some cases they produce the same result, but not always Differences: The WHERE clause changes the CURRENTMEMBER Subselects affect how values aggregate up through a hierarchy
Performance implications Regular subselects can reduce the effectiveness of aggregations More seriously, subselects can prevent Analysis Services from caching calculation results for longer than the lifetime of a query Happened in all cases before SSAS 2012 SP1 CU4 Still happens in some cases after that In Excel you can prevent the use of subselects in PivotTable MDX queries by: Setting the “Include Filtered Items In Totals” option Using named sets to control the selection of members
Subselects and Excel – other issues By default, subselects cannot include calculated members from non- measures dimensions This is why before Excel 2010/SSAS 2008R2 Excel had problems with Time Utility/Date Tool/Shell dimensions The fix is the Subqueries connection string property Excel uses subqueries=2 in its connection string Subselects are also ignored by DRILLTHROUGH statements before SSAS 2016 In Excel 2013 and before, you cannot drillthrough on a multiselect Fixed if you use Excel 2016 with SSAS 2016 or higher
Session cubes and why they are a bad thing The Excel PivotTable Group functionality allows users to create custom groupings of members for subtotalling In the background this generates a CREATE SESSION CUBE command Only works on SSAS Multidimensional, not Tabular This can cause massive performance problems! Creates and processes a shadow cube on the server, which is slow, eats disk space and CPU. This can be prevented by setting the server property SessionCubesMode=1 – highly recommended
Partition elimination Making sure a query only reads data from the right partitions is very important for performance You should know to always set the Slice property on a partition But this isn’t always enough to ensure partition elimination – SSAS does not always work out which members are in which partitions You need to look at: The Data IDs (internal keys generated by SSAS) of members on attributes, available using the MDX DataID() function The spread of Data IDs in each partition, returned by the PARTITION_DIMENSION_STAT DMV
Partition elimination To ensure partition elimination happens, you can: Use user hierarchies instead of attribute hierarchies in queries where possible Add subselects to queries to add an extra hint Try to force members to be ordered in a different way when dimension processing takes place, eg by reordering columns in KeyColumns property Unnecessary partition reads may also be the result of prefetching Prefetching is where SSAS reads more data than a query needs This is usually good for performance You can stop it happening using the connection string properties disable prefetch facts=true; cache ratio=1
The UnOrder() function Items in MDX sets always have an order Usually determined by an attribute’s Order By property Other MDX operations, like the Order() or TopCount() functions, may also impose an order on a set The UnOrder() function allows SSAS to ignore ordering on a set Calculations that use sets with a large number of items (>1000s) may benefit from using the UnOrder() function
Static, dynamic and inline named sets Named sets can be defined in four places: On the cube, in the MDX Script (ie the Calculations tab) At the session level – Excel does this In the WITH clause of a query Inside an MDX expression, any time a set expression is used – these are inline named sets All named sets except inline named sets can be Static – they are evaluated once, when the MDX Script is executed Dynamic – they are evaluated for each query, in the context of the WHERE clause
Named sets and query performance Before SSAS 2012, any use of a named set in the first parameter of Sum() or Aggregate() prevented block computation ie performance would be very bad The use of inline named sets can still cause this even after SSAS 2012 Dynamic sets can also cause query performance problems when they are referenced by calculated members This causes the dynamic set to be evaluated every time a query is run Even if the calculated member and named set are not directly referenced by the query! The workaround is to use StrToSet() to refer to the measure
Dynamic named sets and multiselect Multiselect = filtering where more than one member from a hierarchy is selected Multiselect in Excel relies on the use of subselects in queries Multiselect = the selection of multiple members on an Excel filter or slicer Dynamic named sets are the only way for a calculation to detect which members have been selected Excel can create session named sets if you tick the “Recalculate set with every update”
Solve Order – the basics Solve Order controls the order that calculations in the same cell are evaluated in Only relevant when you have two or more calculated members on different hierarchies in the same query Not the same thing as dependencies between calculations! Most important where doing the calculations in a different order gives a different result Although even when results are the same there can be a performance difference
Controlling Solve Order Solve Order is influenced by a number of different factors: The order of dimensions in the cube The SOLVE_ORDER property of a calculation Whether the calculation is defined on the cube or in the WITH clause of a query The order the calculations are defined on the Calculations tab of the cube The SCOPE_ISOLATION property Whether the calculation uses the Aggregate() or VisualTotals() functions Which functions are used in the Aggregate() function, and which dimension the calculation is on
Thanks!