Scaling SQL Server 2000 Analysis Services to the MAX Dave Wickert Program Manager SQL Server BI Practices Team Microsoft Corporation.

Scaling SQL Server 2000 Analysis Services to the MAX Dave Wickert Program Manager SQL Server BI Practices Team Microsoft Corporation

Scalability “For our purposes, scalability is the result of a system where performance and response times are smooth, even and predictable as the number of users is increased”

Agenda Out-of-the-box defaults Out-of-the-box defaults Performance tuning Performance tuning  Aggregation design  Storage modes Cube design Cube design  Partitioning  Memory management  Schema optimization Architectural design Architectural design  NLB Clustering (web-farm of Analysis servers)

For more details on all these topics… For more details on all these topics… For information that didn’t fit in this talk… For information that didn’t fit in this talk… 1. Analysis Services Performance Guide available at: http://www.microsoft.com/technet /prodtechnol/sql/maintain/optimize/AnSvcsPG.asp 2. Analysis Services Operations Guide available at: http://www.microsoft.com/technet /prodtechnol/sql/maintain/operate/AnServOG.asp 3. “Creating Large-Scale, Highly Available OLAP Sites” white paper available at: http://www.microsoft.com/sql/evaluation/bi/creatingOLAPsites.asp We can’t cover it all...

Out-of-the-box Defaults All are ‘server’ properties Data and temp folder location Data and temp folder location Validate the default memory settings Validate the default memory settings Increase the process buffer size Increase the process buffer size  100-200MB minimum Always set the system-wide processing log file Always set the system-wide processing log file Enable error reporting Enable error reporting

Out-of-the-box Defaults Dave Wickert Program Manager BI Practices Team

Aggregation Design What aggregations are What aggregations are How many aggregations there are How many aggregations there are How big aggregations are How big aggregations are How aggregations help query performance How aggregations help query performance Why too many aggregations cost in processing Why too many aggregations cost in processing Controls over the aggregation design Controls over the aggregation design  Setting level and partition counts  Setting Aggregation Usage property Aggregation design tools Aggregation design tools Aggregation design strategies Aggregation design strategies

Design aggregates for the Foodmart SALES cube Dave Wickert Program Manager BI Practices Team

What Aggregations Are Subtotals at a certain level from every dimension Customers All Customers CountryStateCityName Product All Products CategoryBrandItemSKU Facts Customer IDSKUUnits SoldSales 345-231351232$45.67 563-0145123634$67.32 … Highest Level Aggregation CustomerProductUnits SoldSales All 347814123$345,212,301.30 Intermediate Aggregation CountryItem IDUnits SoldSales Cansd4529456$23,914.30 USyu6784623$57,931.45 …

125 possible combinations 125 possible combinations  Five customer levels, five product levels, five time levels Imagine a cube with ten dimensions, five levels each Imagine a cube with ten dimensions, five levels each  5 10 = 9,765,625 or ~ 10 million theoretical combinations! General rule: multiply the number of levels in each dimension General rule: multiply the number of levels in each dimension How Many Aggregations Customer All Customers CountryStateCityNameProduct All Products CategoryBrandItemSKU Time All Time YearQuarterMonthDay Goal is to find the best subset of this potentially huge number of possibilities

Aggregations at lower levels have more possible cells… Aggregations at lower levels have more possible cells… (All, All, All)1 x 1 x 1= 1 (Country, Item, Quarter)3 x 7621 x 12= 274,356 (Name, SKU, Day)3811 x 8211 x 1095= 34,264,872,495 The size varies based on the scarcity of the data itself since empty cells are never stored. The size varies based on the scarcity of the data itself since empty cells are never stored. Obviously, it also depends on the number of measures (and the number of bytes used; based on the measure’s datatype). Obviously, it also depends on the number of measures (and the number of bytes used; based on the measure’s datatype). Size of Aggregations Customer All Customers (1) Country (3) State (80) City (578) Name (3811) Product All Products (1) Category (60) Brand (911) Item (7621) SKU (8211) Time All Time (1) Year (3) Quarter (12) Month (36) Day (1095)

Aggs ‘Cost’ In Processing Aggs computed from base facts Aggs computed from base facts (Name, SKU, Day)  (State, Item, Quarter) (Name, SKU, Day)  (Name, Category, All)  An agg ‘miss’ is if no aggs are below the one being queried – thus the only way to calculate it is to go to the base facts Aggs may be computed from other aggs Aggs may be computed from other aggs (State, Item, Quarter)  (Country, Item, Quarter) (City, Category, All)  (State, All, All) (State, All, All)  (All, All, All) Remember the cube with ten 5-level dims Remember the cube with ten 5-level dims  5 10 = 9,765,625 or ~ 10 million combinations  Computing even 10% of these is costly! Goal is find the optimal agg set that helps query performance the most but don’t cost too much in processing

Aggregation Design Wizards Evaluate cost / benefit of aggs Evaluate cost / benefit of aggs  Relative to other aggregations  Designed in “waves” from top of pyramid  Cost is related to aggregation size  Benefit is related to “distance” from another aggregation Storage Design wizard Storage Design wizard  Assumes all combinations of levels are equally likely Usage Based Optimization wizard Usage Based Optimization wizard  Assumes query pattern resembles your selection from the query log  Representative history is needed Fact Table

Getting control over your aggregation design Dave Wickert Program Manager BI Practices Team

Controls Over Agg Design Row counts drive the design process Level member counts Level member counts  Property of each dimension level  Used to estimate sizes of aggregations  Set at creation or by manual intervention Partition row counts Partition row counts  Property of each partition  Used to estimate data density  Set at creation or by manual intervention Reset these values before starting aggregation design! Reset these values before starting aggregation design!

Controls Over Agg Design Aggregation usage property Property of a dimension Property of a dimension Used to include or exclude levels from consideration (not from the design) Used to include or exclude levels from consideration (not from the design)  Standard (standard dims default)  Top Level Only (virtual dims default)  Bottom Level Only  Top and Bottom Levels (changing dims default)  Custom  Uses Enable Aggregations property of each level Trick: Set rarely queried dims to ‘Top Level Only’ -- Each time used reduces the cube complexity by a dimension. Transfers the cost from processing to query time.

Agg Design Strategies Guidelines Design initial overall agg set with Storage Design wizard Design initial overall agg set with Storage Design wizard  10% - 20% performance gain (higher complexity cubes may be even lower) – limit to no more than 20-40 minutes of ‘design’ time. Pilot usage – collect query logs Pilot usage – collect query logs Design UBO agg set Design UBO agg set  Aggregation Usage “Standard” on most dimensions  Use higher performance gain  Merge new aggregations with existing set Periodically add UBO aggs as usage changes Periodically add UBO aggs as usage changes  Merge new aggregations with existing set  Eventually may need to start over

Partition Aggregation Utility Dave Wickert Program Manager BI Practices Team

Partitioning Partitioning helps query performance Partitioning helps query performance  Set the slice value on partitions Partitioning can help processing performance Partitioning can help processing performance  Be more selective about what you process  Process in parallel Partitioning and the data lifecycle Partitioning and the data lifecycle  Partitioning by time is most common  Used to remove old data  Aggregation designs can vary over time

 Month4 Partitioning And Agg Design (rolling ‘n’ months) Aggregation designs are per-partition Aggregation designs are per-partition Perform UBO on most recent partition Perform UBO on most recent partition Copy design to empty “base” partition Copy design to empty “base” partition Clone “base” for new partitions Clone “base” for new partitions Month1MonthBASE (no data) Month2Month3Month35Month36Month37 Design aggs based on most recent usage Copy new agg design to base partition Clone base for new partition Incremental updates in new partition Ne w dat a Delete oldest partition when obsolete

Memory Management Dimension, replica and shadow memory Dimension, replica and shadow memory Memory used during processing Memory used during processing Query results cache Query results cache Memory cleaning Memory cleaning Handling large memory needs Handling large memory needs  Concern is virtual memory of msmdsrv process  Need enough physical memory to avoid paging ReplicaMemory Shadow Dimensions ProcessingBuffers Read ahead Connections DimensionMemory AvailableCache Minimum allocated memory Memory conservation threshold

Server Memory Usage Larger memory needs By default AS can use up to 2GB By default AS can use up to 2GB  Address space of a Win32 process  Msmdsrv process is not AWE-aware Use the /3GB switch for more Use the /3GB switch for more  Change boot.ini file to enable 3GB  Requires Windows Advanced Server or Datacenter  Set AS high and low memory limits appropriately To use even more memory: 64-bit servers To use even more memory: 64-bit servers  Use SQL Server 2000 (64-bit) Analysis Services  Use Windows Server 2003

Why 64-bit? All about capacity; not performance; large virtual address space All about capacity; not performance; large virtual address space  Tens of millions of members in a dimension  Able to cache large amount of aggregates (virtually unlimited query cache)  Huge processing buffers at the same time as large query cache Performance impact Performance impact  Marginal for most applications (% delta; X)  Exception: Process buffers in multiple GB (10- 20GB); never having to go to temp files

Minimizing the effect on the host RDBMS Optimize your cube schema Optimize your cube schema  Goal: pure table scan of the fact table If you have the capacity, run the RDBMS and AS on the same machine If you have the capacity, run the RDBMS and AS on the same machine  No network roundtrips  With a full process, it can mean LOTS of data

Schema optimization Dave Wickert Program Manager BI Practices Team

Schema Optimization Eliminate joins in RDBMS when processing Eliminate joins in RDBMS when processing  Goal is a fact table scan Note: Not all joins can be eliminated, e.g. the one used to enforce the data slice if using partitioning Criteria in BOL and the Performance Guide Criteria in BOL and the Performance Guide  Topic: “Optimizing Cube Schemas” Doing schema optimization Doing schema optimization  Use Optimize Schema tool in Cube Editor -or-  By-hand by setting the member key column to the fact table Note: A dimension is ‘unoptimized’ if you remove it and then re-add it to the cube

More Topics All Discussed in Performance and Operations Guides Changing dimensions and flex aggs Changing dimensions and flex aggs Cache warming Cache warming Incremental updates Incremental updates Tuning the RDBMS Tuning the RDBMS Data types Data types Unique member keys Unique member keys Virtual cubes Virtual cubes Distinct counts Distinct counts Optimizing hardware Optimizing hardware Performance counters Performance counters Process buffer tuning Process buffer tuning 64-bit systems 64-bit systems Storage mode selection, i. e. MOLAP vs. ROLAP vs. HOLAP Storage mode selection, i. e. MOLAP vs. ROLAP vs. HOLAP Database placement on servers Database placement on servers Optimizing cell writeback Optimizing cell writeback More schema optimization More schema optimization Parent-child dims and ROLAP Parent-child dims and ROLAP Grouping measures by usage Grouping measures by usage Middle tier systems / connection pooling Middle tier systems / connection pooling

For queries: For queries:  “Service” process is fully multithreaded  Various types of worker thread queues for queries For processing: For processing:  Parallelized at the segment-level and read-ahead buffers  Therefore, for large SMP box, use the Parallel Processing Utility  Latest on at the Microsoft.com Download Center – search for “Analysis Services”  Marginal returns above 8-12 requests in parallel  Lazy aggregator is single threaded Scale-up performance SMP – more CPUs

Uses “web farm” technology Uses “web farm” technology May not need it so long as you ‘hit’ aggregates; normal cases focus on: May not need it so long as you ‘hit’ aggregates; normal cases focus on:  Good aggregation design (usage-based)  Good partitioning design However, there can be problems areas However, there can be problems areas  “Wide queries”, e.g. top count, median vs. mean, etc.  Very complex cubes with random queries, where the probability of hitting an aggregate is very low Scale-out performance NLB Clustering

InternetInternet CorpnetCorpnet NLB clustered (one system to the outside world) Data flows in – Cubes flow out Scale-out performance NLB Clustering...

Easy to add new capacity Easy to add new capacity  Just roll in a new server, install, update the OLAP data folder and converge it into the cluster Can mix and match with any size system Can mix and match with any size system No additional HW or SW No additional HW or SW  All you need is Windows Adv Server Linear scalability Linear scalability  and... high availability since each system is standalone Scale-out performance Why NLB?

Requires networking expertise Requires networking expertise Requires scripting support Requires scripting support  Application Center can be used to assist Requires “n” copies of all data folder Requires “n” copies of all data folder  One for each node in the cluster Not all AS capabilities can be supported, i.e. writeback or “what-if” is needed Not all AS capabilities can be supported, i.e. writeback or “what-if” is needed Only used to address querying as the bottleneck; does not help processing if it is the problem Only used to address querying as the bottleneck; does not help processing if it is the problem Scale-out performance Why not NLB?

Summary Out-of-the-box defaults Out-of-the-box defaults Performance tuning Performance tuning  Aggregation design; Storage modes Cube design Cube design  Partitioning; Memory management  Schema optimization; Changing dimensions Architectural design Architectural design  Scale-out with NLB Clustering (web-farm of Analysis servers)

Call To Action Get more involved with the scalability feature of Analysis Services; read the Performance and Operations Guides; make it right – get the most from your system For more information, please email SCDLITE@microsoft.com You can download all presentations at www.microsoft.com/usa/southcentral/

SQL Server Summit Brought To You By:

Scaling SQL Server 2000 Analysis Services to the MAX Dave Wickert Program Manager SQL Server BI Practices Team Microsoft Corporation.

Similar presentations

Presentation on theme: "Scaling SQL Server 2000 Analysis Services to the MAX Dave Wickert Program Manager SQL Server BI Practices Team Microsoft Corporation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scaling SQL Server 2000 Analysis Services to the MAX Dave Wickert Program Manager SQL Server BI Practices Team Microsoft Corporation.

Similar presentations

Presentation on theme: "Scaling SQL Server 2000 Analysis Services to the MAX Dave Wickert Program Manager SQL Server BI Practices Team Microsoft Corporation."— Presentation transcript:

Similar presentations

About project

Feedback