Creating HIGH PERFORMANCE TABULAR MODELS James BERESFORD DIRECTOR AGILE BI SYDNEY, AUSTRALIA
Creating High Performance Tabular Models Thinking Tabular Compression Snowflaking and Stars Complexity Loading Performance Security Server stuff 2 | 11/21/201811/21/2018
A little about me www.bimonkey.com @BI_Monkey www.agilebi.com.au 3 | 11/21/201811/21/2018
Thinking Tabular Tabular and OLAP are different Tabular and SQL are different I will guide you to new thinking! 4 | 11/21/201811/21/2018
Compression #1 – How it works A few clever things going on but… Dictionary Compression is the one that matters the most 5 | 11/21/201811/21/2018
Compression #1 – Dictionary Compression Values are translated to Keys Value Key Bob Smith 1 Fred Gorilla 2 Jane Chimp 3 Single Byte Many Bytes 6 | 11/21/201811/21/2018
Compression #1 – Dictionary Compression Value Key Amount Bob Smith 1 23.50 Fred Gorilla 2 10.99 Jane Chimp 3 11.99 23.78 22.98 1.05 7.66 What you see is not what the engine is storing Displayed but not stored 7 | 11/21/201811/21/2018
Compression #2 – Not all Data is equal Some data compresses better than others, or is more efficient generally Good OK Bad Repeated Text Date Unique Text Integers Decimals (less dp the better) DateTime Boolean Currency Floats 8 | 11/21/201811/21/2018
Compression #3 - Repetition Repetition is ok Repetition is ok 9 | 11/21/201811/21/2018
Compression #3 - Repetition Many rows of data is often better than many columns vs 10 | 11/21/201811/21/2018
Compression #4 – Digging in SSAS DMV DISCOVER_OBJECT_MEMORY_USAGE 11 | 11/21/201811/21/2018
Compression #4 – Digging in Handily, I built an explorer http://www.bimonkey.com/2014/12/exploring- memory-usage-in-tabular-models/ 12 | 11/21/201811/21/2018
Compression #4 – Digging in Demo! Not all data is equal Rows vs Columns User Experience 13 | 11/21/201811/21/2018
Relationships are expensive Snowflaking Relationships are expensive 14 | 11/21/201811/21/2018
Snowflaking Flatten Everything 15 | 11/21/201811/21/2018
Snowflaking – OLAP style Animal Gorilla Orang Utang Grey Hippo Dwarf Hippo Zebra Normal Horse This Graffe That Giraffe Great White Etc… Animal SubCategory Great Apes Regular Apes Hippopotamuses Horses Giraffes Sharks Freshwater Eels Animal Category Apes Ungulates Fish 16 | 11/21/201811/21/2018
Snowflaking – Tabular style Animal Category SubCategory Apes Great Apes Gorilla Regular Apes Orang Utang Ungulates Hippopotamuses Grey Hippo Dwarf Hippo Horses Zebra Normal Horse Giraffes This Graffe That Giraffe Fish Sharks Great White Etc… 17 | 11/21/201811/21/2018
Snowflaking – Tabular style Demo! Hierarchies User Experience 18 | 11/21/201811/21/2018
Snowflaking – Tabular style Need to flatten to get hierarchies anyway Relationships cost Repetition is ok 19 | 11/21/201811/21/2018
Stars Simple stars are ok! Use Integer Keys* There is a cardinality limit around 2m keys per query * This needs validating 20 | 11/21/201811/21/2018
Complexity Complex stars are less ok Many to Many handling is not great But it does work… The more relationships to navigate, the more likely it is to break After a point the processing kills the model 21 | 11/21/201811/21/2018
Loading – what can you control? Compresses Reads Data Row Calculations Processing Complete Builds Relationships 22 | 11/21/201811/21/2018
ETL is more efficient than Cube Use Columnstores Use Views to abstract Loading ETL is more efficient than Cube Use Columnstores Use Views to abstract Reduce the work involved in processing just to compression and building relationships 23 | 11/21/201811/21/2018
Learn to love DAX Studio Performance Storage Engine Good Formula Engine Bad Learn to love DAX Studio 24 | 11/21/201811/21/2018
Performing – Storage Engine Brute force scanning of memory Fast, sub second responses over vast data Storage – Brute Scanning Simple but Fast 25 | 11/21/201811/21/2018
Performing – Formula Engine Single Threaded calculations Biggest Bottleneck Formula – CPU Calculations Complex but Slow 26 | 11/21/201811/21/2018
Badly written security slows everything down Same rules apply to security as to all other calculations 27 | 11/21/201811/21/2018
Demo: Hierarchy walking approaches Security Demo: Hierarchy walking approaches 28 | 11/21/201811/21/2018
25GB/s - $9/GB 1GB/s - $0.03/GB Server Stuff Memory overhead = 3x Cube size Caches Query Execution NEVER RUN OUT OF MEMORY!! 25GB/s - $9/GB 1GB/s - $0.03/GB 29 | 11/21/201811/21/2018
More, faster RAM Faster CPU NUMA issues Server Stuff 30 | 11/21/201811/21/2018
A Big Thanks to our Sponsors