Presentation is loading. Please wait.

Presentation is loading. Please wait.

Processing Tabular Models

Similar presentations


Presentation on theme: "Processing Tabular Models"— Presentation transcript:

1 Processing Tabular Models
{ "refresh": { "type": "automatic", "objects": [ "database": "SQLSaturday" } ] Bill Anton

2 Bill Anton anton@opifexsolutions.com
Downloads and additional references:

3 Agenda Architecture Tabular Data Structures 101 Processing Options
Performance Common Strategies

4 NOT( Agenda ) SSAS Multidimensional Cubes Power BI

5 Architecture

6 Architectures (On-Premise)

7 Architectures - Hybrid

8 Architectures - Clown

9 Tabular Data Structures 101
Database Model Table A Column A Column B Table B

10 Tabular Data Structures 101
Dictionaries 1 per column Value vs Hash Partitions At least 1 per table (standard vs enterprise) made up of 1 or more segments Segments a “chunk” of rows (default: 8mm**) Configurable at the instance-level Calculated Columns Stored like regular columns (but not compressed) Good or bad? Hierarchies e.g. Calendar Year -> Month -> Date Can improve query performance Relationships Optimizes lookups & filtering across tables Value vs Hash Value > Hash (query performance) Hash is always used for strings Not always possible for numeric columns Dates are numeric columns (under the covers) See Hints available with 2017 (e.g. processing large table with lots of segments… last segment realize you can no longer use value encoding… ** For tables w/ < 16mm rows, only 1 segment is created. Segments cannot cross partitions Small tables (< 16mm rows), only 1 segment DefaultSegmentRowCount to increase # of rows / segment Advanced Segment is the basis for parallelism in queries (1 core/segment)

11 Processing Commands Command Description ProcessDefault
processes any objects (tables, partitions, etc) that are currently in an unprocessed ProcessData reads data from source and builds compressed dictionaries and partition segments ProcessRecalc builds any calculated columns, calculated tables, hierarchies, and/or relationships that need to be rebuilt. ProcessFull ProcessData + ProcessRecalc ProcessAdd appends new data to existing data + ProcessRecalc ProcessClear empties all data from the model ProcessDefrag Rebuilds dictionaries to clear out values that no longer exist Comments: ProcessData doesn’t rebuild dictionaries when run on a partition (so Defrag) ProcessAdd is much more complicated than it seems Questions: Why would a dictionary need to be rebuilt? Defrag… - needed only when processing at partition level

12 Demo 00: ProcessDefault Does the least amount of work needed to bring the model to a “query-able” state { "refresh": { "type": "automatic", "objects": [ "database": "AdventureWorks" } ]

13 Phases of Processing Open connection Execute SQL Encoding Compression
ProcessRecalc Encoding Values are stored as integers VALUE > HASH Avoid Re-Encoding if possible Compression “Proprietary” Timeboxing

14 Processing Phases Segment = 8 million rows (default)
Source: Source: Segment = 8 million rows (default) Segment size is adjustable, but not something to spend much time on (trial and error)

15 Parallel Processing 1-2 cores per table/partition
2 tables processed in parallel 2 partitions processed in parallel Enterprise Edition (On-Prem) Standard SKU (Azure)

16 Parallel Processing (the correct way)
2 tables processed in parallel 2 partitions processed in parallel Enterprise Edition (On-Prem) Standard SKU (Azure) ProcessData ProcessRecalc

17 Demo 01: ProcessRecalc Builds any calculated columns, calculated tables, hierarchies, and/or relationships that needs to be rebuilt. { "refresh": { "type": "calculate", "objects": [ "database": "AdventureWorks" } ] ProcessData & ProcessRecalc Sequences Wrap commands in “a transaction” Controls Parallelism

18 Process Recalc Can only be run at the database-level
Always needed after a Process Data, Process Clear, or after merging partitions (via TMSL/XMLA). Never needed after a Process Full, Process Add, or Process Default (Recalc is built in) Merging partitions via SSMS includes a Process Recalc. Becomes an important factor when optimizing processing for models with expensive calculated columns (or tables) builds any calculated columns, calculated tables, hierarchies, and/or relationships that need to be rebuilt.

19 Process Defrag Only needed when dealing with partitions (e.g. incremental processing). At the table-level dictionaries are rebuilt. Can be expensive on large tables, but can have a dramatic performance improvement. Can only be run using TMSL/XMLA

20 At best they’re a shortcut to save time.
At worst they cause serious performance problems. Most of they time they just add to technical debt by spreading out the business logic making the solution more difficult to maintain over time. Prototyping and business-led self-service… NOT ENTERPRISE SOLUTIONS.

21 Every time you create a calculated column, a kitten DIES!!!
There’s a time and place for calculated columns and tables… e.g. role-playing date tables

22 Common Processing Strategies
ProcessFull at the database level Pros: simple, data remains available for queries Cons: requires most memory Total Memory Usage 20 GB 10 GB 10 GB

23 Common Processing Strategies
ProcessFull at the database level Pros: simple, data remains available for queries Cons: requires most memory ProcessClear + ProcessFull (separate transactions!!) Pros: simple Cons: database will be offline for however long it takes to complete ProcessFull Incremental Processing (many flavors) Pros: quick (can be “near” real-time) Cons: complicated, Enterprise Edition (or Standard SKU)

24 Less Common Strategies
ProcessAdd Pros: very fast, minimal memory overhead Cons: very complicated to implement (e.g. Journalized tables, out-of-line binding) Model Flipping Pros: low latency Cons: complicated and expensive Process Add append to segments In practice many companies say they require (near) real-time… Then they find out the true cost (DQ performance or $$$)

25 Performance Considerations
Processing is resource intensive (CPU, Memory) Balance is key! Throughput vs Resource Constraints Intra-day Processing or Overnight? Don’t forget to tune the source Use Perfmon and Extended Events to Monitor (or purchase BI Sentry or BI Manager)

26 Additional Resources Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services Refresh Command (TMSL) As Partition Processing (Whitepaper + Code) Performance Monitoring (DIY)


Download ppt "Processing Tabular Models"

Similar presentations


Ads by Google