Processing Tabular Models

Slides:



Advertisements
Similar presentations
Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
Advertisements

Big Data Working with Terabytes in SQL Server Andrew Novick
Antonio Elinon Caratrel Consultants Pty Ltd. Agenda Enterprise Architecture (EA) to Business Intelligence (BI) to Accounting Intelligence (AI) Accounting.
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
David Dye.  Introduction  Introduction to PowerPivot  Working With PowerPivot.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
--A Gem of SQL Server 2012, particularly for Data Warehousing-- Present By Steven Wang.
Honest Bob’s Cube Processing Bob Duffy Database Architect Prodata SQL Centre of Excellence 18 th July, 2014.
How to kill SQL Server Performance Håkan Winther.
OM. Platinum Level Sponsors Gold Level Sponsors Pre Conference Sponsor Venue Sponsor Key Note Sponsor.
Honest Bob’s Cube Processing Bob Duffy Database Architect.
Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.
11 Copyright © 2009, Oracle. All rights reserved. Enhancing ETL Performance.
Database Design: Solving Problems Before they Start! Ed Pollack Database Administrator CommerceHub.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Review DirectQuery in SSAS 2016, best practices and use cases
SQL Database Management
SQL Server Analysis Services Tabular Model
Victoria Power BI User Group Meeting
DBMS Programs MS SQL Server & MySQL
Introduction to Partitioning in SQL Server
SQL Server Statistics and its relationship with Query Optimizer
Power BI Internals Eugene
Project Management: Messages
Visual Studio Database Tools (aka SQL Server Data Tools)
How To Pass Oracle 1z0-060 Exam In First Attempt?
What’s new in SQL Server 2017 for BI?
Power BI Performance Tips & Tricks
Delivering enterprise BI with Azure Analysis Services
Using a Gateway to Leverage On-Premises Data in Power BI
Power BI Architecture, Best Practices, and Performance Tuning
- for the SSASMD Developer
Lead SQL BankofAmerica Blog: SQLHarry.com
Antonio Abalos Castillo
LOCO Extract – Transform - Load
6/12/2018 2:19 PM BRK3245 DirectQuery in Analysis Services: best practices, performance, and use cases Marco Russo SQLBI © Microsoft Corporation. All rights.
6/16/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Dynamic SQL: Writing Efficient Queries on the Fly
Finding more space for your tight environment
Introduction to SQL Server Management for the Non-DBA
Using a Gateway to Leverage On-Premises Data in Power BI
Software Architecture in Practice
Database Performance Tuning and Query Optimization
Download Free Verified Microsoft Study Material Exam Dumps Realexamdumps.com
Modeling and Analytics Features Coming in Analysis Services vNext
Azure Automation and Logic Apps:
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Power BI Performance …Tips and Techniques.
Introduction to tabular models
Introduction to tabular models
Agenda Database Development – Best Practices Why Performance Matters ?
Creating HIGH PERFORMANCE TABULAR MODELS
20 Questions with Azure SQL Data Warehouse
Dynamic SQL: Writing Efficient Queries on the Fly
Please thank our sponsors!
Applying Data Warehouse Techniques
Processing Analysis Services Tabular Models
Data Modeling and Prototyping
Power BI with Analysis Services
Chapter 11 Database Performance Tuning and Query Optimization
TechEd /10/2019 8:11 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.
Applying Data Warehouse Techniques
Power BI at Enterprise-Scale
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Applying Data Warehouse Techniques
Advanced Database Topics
Data Modeling and Prototyping
Presentation transcript:

Processing Tabular Models { "refresh": { "type": "automatic", "objects": [ "database": "SQLSaturday" } ] Bill Anton

Bill Anton anton@opifexsolutions.com Downloads and additional references: https://github.com/byobi/SQLSat_TabularProcessing

Agenda Architecture Tabular Data Structures 101 Processing Options Performance Common Strategies

NOT( Agenda ) SSAS Multidimensional Cubes Power BI

Architecture

Architectures (On-Premise)

Architectures - Hybrid

Architectures - Clown

Tabular Data Structures 101 Database Model Table A Column A Column B Table B

Tabular Data Structures 101 Dictionaries 1 per column Value vs Hash Partitions At least 1 per table (standard vs enterprise) made up of 1 or more segments Segments a “chunk” of rows (default: 8mm**) Configurable at the instance-level Calculated Columns Stored like regular columns (but not compressed) Good or bad? Hierarchies e.g. Calendar Year -> Month -> Date Can improve query performance Relationships Optimizes lookups & filtering across tables Value vs Hash Value > Hash (query performance) Hash is always used for strings Not always possible for numeric columns Dates are numeric columns (under the covers) See https://channel9.msdn.com/Events/TechEd/Australia/2012/DBI315 Hints available with 2017 (e.g. processing large table with lots of segments… last segment realize you can no longer use value encoding… ** For tables w/ < 16mm rows, only 1 segment is created. Segments cannot cross partitions Small tables (< 16mm rows), only 1 segment DefaultSegmentRowCount to increase # of rows / segment Advanced Segment is the basis for parallelism in queries (1 core/segment)

Processing Commands Command Description ProcessDefault processes any objects (tables, partitions, etc) that are currently in an unprocessed ProcessData reads data from source and builds compressed dictionaries and partition segments ProcessRecalc builds any calculated columns, calculated tables, hierarchies, and/or relationships that need to be rebuilt. ProcessFull ProcessData + ProcessRecalc ProcessAdd appends new data to existing data + ProcessRecalc ProcessClear empties all data from the model ProcessDefrag Rebuilds dictionaries to clear out values that no longer exist Comments: ProcessData doesn’t rebuild dictionaries when run on a partition (so Defrag) ProcessAdd is much more complicated than it seems Questions: Why would a dictionary need to be rebuilt? Defrag… - needed only when processing at partition level

Demo 00: ProcessDefault Does the least amount of work needed to bring the model to a “query-able” state { "refresh": { "type": "automatic", "objects": [ "database": "AdventureWorks" } ]

Phases of Processing Open connection Execute SQL Encoding Compression ProcessRecalc Encoding Values are stored as integers VALUE > HASH Avoid Re-Encoding if possible Compression “Proprietary” Timeboxing

Processing Phases Segment = 8 million rows (default) Source: https://channel9.msdn.com/Events/TechEd/Europe/2014/DBI-B414 Source: https://channel9.msdn.com/Events/TechEd/Australia/2012/DBI315 Segment = 8 million rows (default) Segment size is adjustable, but not something to spend much time on (trial and error)

Parallel Processing 1-2 cores per table/partition 2 tables processed in parallel 2 partitions processed in parallel Enterprise Edition (On-Prem) Standard SKU (Azure)

Parallel Processing (the correct way) 2 tables processed in parallel 2 partitions processed in parallel Enterprise Edition (On-Prem) Standard SKU (Azure) ProcessData ProcessRecalc

Demo 01: ProcessRecalc Builds any calculated columns, calculated tables, hierarchies, and/or relationships that needs to be rebuilt. { "refresh": { "type": "calculate", "objects": [ "database": "AdventureWorks" } ] ProcessData & ProcessRecalc Sequences Wrap commands in “a transaction” Controls Parallelism

Process Recalc Can only be run at the database-level Always needed after a Process Data, Process Clear, or after merging partitions (via TMSL/XMLA). Never needed after a Process Full, Process Add, or Process Default (Recalc is built in) Merging partitions via SSMS includes a Process Recalc. Becomes an important factor when optimizing processing for models with expensive calculated columns (or tables) builds any calculated columns, calculated tables, hierarchies, and/or relationships that need to be rebuilt.

Process Defrag Only needed when dealing with partitions (e.g. incremental processing). At the table-level dictionaries are rebuilt. Can be expensive on large tables, but can have a dramatic performance improvement. Can only be run using TMSL/XMLA

At best they’re a shortcut to save time. At worst they cause serious performance problems. Most of they time they just add to technical debt by spreading out the business logic making the solution more difficult to maintain over time. Prototyping and business-led self-service… NOT ENTERPRISE SOLUTIONS.

Every time you create a calculated column, a kitten DIES!!! There’s a time and place for calculated columns and tables… e.g. role-playing date tables

Common Processing Strategies ProcessFull at the database level Pros: simple, data remains available for queries Cons: requires most memory Total Memory Usage 20 GB 10 GB 10 GB

Common Processing Strategies ProcessFull at the database level Pros: simple, data remains available for queries Cons: requires most memory ProcessClear + ProcessFull (separate transactions!!) Pros: simple Cons: database will be offline for however long it takes to complete ProcessFull Incremental Processing (many flavors) Pros: quick (can be “near” real-time) Cons: complicated, Enterprise Edition (or Standard SKU)

Less Common Strategies ProcessAdd Pros: very fast, minimal memory overhead Cons: very complicated to implement (e.g. Journalized tables, out-of-line binding) Model Flipping Pros: low latency Cons: complicated and expensive Process Add append to segments In practice many companies say they require (near) real-time… Then they find out the true cost (DQ performance or $$$)

Performance Considerations Processing is resource intensive (CPU, Memory) Balance is key! Throughput vs Resource Constraints Intra-day Processing or Overnight? Don’t forget to tune the source Use Perfmon and Extended Events to Monitor (or purchase BI Sentry or BI Manager)

Additional Resources Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services https://blogs.msdn.microsoft.com/karang/2013/08/02/sql-2012-tabular-performance-tuning-of-tabular-models-in-sql-server-2012-analysis-services/ Refresh Command (TMSL) https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models-scripting-language-commands/refresh-command-tmsl?view=sql-analysis-services-2017 As Partition Processing (Whitepaper + Code) https://github.com/Microsoft/Analysis-Services/tree/master/AsPartitionProcessing Performance Monitoring (DIY) http://byobi.com/2016/02/performance-monitoring-for-ssas-extended-events-cheat-sheet/ http://byobi.com/2016/03/performance-monitoring-for-ssas-perfmon-counter-cheat-sheet/