Albert van Dok SQL Zaterdag 12 november 2011
Background Life Before BISM What is BISM BISM Positioning Questions
From data towards information By nature the demand for (new) information and insights will always evolve To connect and integrate (new) datasources is an essential part Preparing data for use Data cleansing Define relationships Data enrichment Add calculations Versioning
Applications Analytical solutions Operational reports Dashboards & Scorecards Data Mining Require- ments Quick delivery Integration of data by business user Ad hoc reports Excellent performance Flexible Issues Operational reports from an analytical system Wrong use of tools or BI tools not flexible (Performance) problems Long implementation times Highly depended on IT
DW Datamart Data Model Reporting Tool ToolData Source MOLAP OLAP Browser OLAP Browser Reporting Tool OLTP
DW Datamart Data Model Reporting Tool ToolData Source MOLAP OLAP Browser OLAP Browser Reporting Tool OLTP UDM
DW Datamart Data Model Reporting Tool ToolData Source OLAP Browser OLAP Browser Analysis Services Reporting Tool MOLAP OLTP UDM XML/A Cache Security End-user model Transalations Actions KPI… Calculations Basic dim. model Cube & Dimensions Storage & Caching policies Linked Objects Datasource view Security End-user model Transalations Actions KPI… Calculations Basic dim. model Cube & Dimensions Storage & Caching policies Linked Objects Datasource view UDM
Data is stored in structures suitable for analysis and ad-hoc querying offering a good performance MOLAP - Multidimensional OLAP Both fact data and aggregations are processed, compressed, stored, and indexed using a special format optimized for multidimensional data. ROLAP - Relational OLAP Both fact data and aggregations remain in the relational data source, eliminating the need for special processing HOLAP - Hybrid OLAP This mode uses the relational data source to store the fact data, but pre- processes aggregations and indexes, storing these in a special format, optimized for multidimensional data. Characteristics: Good performance (based on aggregations) Flexible Scalable
UDM Excel 2010 Reporting Services 2008 R2 & Report Builder 3 SharePoint 2010 Excel Services PerformancePoint Services Visio Services 3rd party SSAS clients MDX Besides the advantages the UDM: Is often too complex for simple reporting purposes Has a steep learning curve Uses MDX which is different than SQL… Must be implemented by a BI professional Needs small investment just to start
New paradigm “Business intelligence for the masses” “Managed self-service business intelligence” Put simple, powerful BI tools in the hands of “knowledge workers” Familiar tools: Excel People who own the data Excel spreadsheet, Access database or SharePoint list data Reality: Office power users
Powerpivot for Excel Free Addin for Excel Running 32/64bit and lots of RAM… Contains Vertipaq engine (SSAS running in process with Excel) Powerpivot for Sharepoint Comes with SQL Server 2008 R2 x64 Sharepoint 2010 extention Vertipaq running on server side For sharing and managing PowerPivot applications
PowerPivot has its own semantic model which can be seen as BISM v1 enables connecting data from various data sources add relations between tables add calculations, two places: in tables – calculated columns (DAX) over the whole model – calculated measures (DAX) works in cached (VertiPaq) mode Covers personal and team BI segments
In-memory column-based database Very high data compression Doesn’t require the process of designing and building aggregations and other tunning Support partitioning and paging on large data sizes
15 4 Jim … $1,5005 Liz … $06 Dave … $9,000 7 Sue … $10108 Bob … $509 Jim … $1,300 1 Bob … $30002 Sue … $5003 Ann … $1,700 Page 1 Page 2 Page 3 64 bytes CPU L2 Cache L1 Cache Memory (DBMS Buffer Pool) 1 … $30002 … $5003 … $ … $15005 … $06 … $ $ $500.. $ $ $0.. $ $ $50.. $ … $10108 … $509 … $ $ $500.. $ $ $0.. $ $ $50.. $1300 8K bytes 64 bytes Select id, name, BalDue from Customers where BalDue > $500 Query summary: 3 pages read from disk Up to 9 L1 and L2 cache misses (one per tuple) Query summary: 3 pages read from disk Up to 9 L1 and L2 cache misses (one per tuple) Don’t forget that: - An L2 cache miss can stall the CPU for up to 200 cycles Don’t forget that: - An L2 cache miss can stall the CPU for up to 200 cycles
16 64 bytes CPU L2 Cache L1 Cache Memory 8K bytes 64 bytes Id Name Bob Sue Ann Jim Liz Dave Sue Bob Jim BalDue Street … … … … … Takeaways: Each cache miss brings only useful data into the cache Processor stalls reduced by up to a factor of: 8 (if BalDue values are 8 bytes) 16 (if BalDue values are 4 bytes) Caveats: Not to scale! An 8K byte page of BalDue values will hold 1000 values (not 5) Not showing disk I/Os required to read id and Name columns Select id, name, BalDue from Customers where BalDue > $500
Assume: Customer table has 10M rows, 200 bytes/row (2GB total size) Id and BalDue values are each 4 bytes long, Name is 20 bytes Query: Select id, Name, BalDue from Customer where BalDue > $1000 Row store execution: Scan 10M rows 80MB/sec = 25 sec. Column store execution: Scan 3 columns, each with 10M entries = 3.5 sec. (id 40MB, Name 200MB, BalDue 40MB) About a 7X performance improvement for this query!! But we can do even better using compression
Powerpivot
Although Powerpivot for Excel is great, it has certain limitations Limit to 2Gb, no support for partitions, queries Vertipaq cache, daily scheduled data refresh in Sharepoint, acces to workbook PowerPivot and Analysis Services are two different products hence two models Powerpivot targets business users, model managed in Excel Analysis Services targets BI professionals and IT, model managed on the server “Can we have one model which integrate both worlds and seamlessly transition BI applications from Personal BI to Team BI to Organizational/Professional BI?”
BISM v2 One model for all reporting, analysis, dashboards, scorecards personal, team, corporate BI Has a relational and multidimensional API Support both cached (Molap & VertiPaq) and the pass-through (realtime) mode only SQL Server data sources for now Pass-through no additional database data stays as is in the original structures ideal for the realtime analysis
In “Denali” every cube automatically becomes a BI Semantic Model To create a BI semantic model you create a: multidimensional model, tabular model, PowerPivot workbook Every model looks like cubes/dimensions/measure groups/data sources/data source views under the covers they share a common Analysis Services file format. this shared underlying structure that makes the BI semantic model work
Hybrid model supporting multidimensional and tabular data models Developed using an multidimensional or a tabular project Choice depends on application needs and skillset Tabular Familiar model, easier to build, faster time to solution Not all advanced concepts (e.g. many-to-many) not available natively in the model… need calculations to simulate these Easy to wrap a model over a raw database or warehouse for reporting & analytics Multidimensional Sophisticated model, higher learning curve Advanced concepts baked into the model and optimized (parent-child, many-to-many, attribute relationships, key vs. name, etc.) Ideally suited for OLAP type apps (e.g. planning, budgeting, forecasting) that need the power of the multidimensional model
Represents the intelligence or semantics in the model Defines entities and relations between them User-oriented DAX Based on Excel formulas and relational concepts – easy to get started Complex solutions require steeper learning curve – row/filter context, Calculate, etc Calculated columns enable new scenarios, however no named sets or calc members MDX Based on understanding of multidimensional concepts – higher initial learning curve Complex solutions require steeper learning curve – CurrentMember, overwrite semantics, etc. Ideally suited for apps that need the power of multidimensional calculations – scopes, assignments, calc members
This layer integrates data from multiple sources – relational databases, business applications, flat files, OData feeds, etc. Two modes: cached and pass-through Cached:: pulls in data from all the sources and stores it in a compressed data structure MOLAP and VertiPaq Passthrough: pushes query processing and business logic down to the data source ROLAP and DirectQuery
UDM Excel 2010 Reporting Services „Denali” SharePoint 2010 Excel Services Reporting Services PerformancePoint Services Visio Services 3rd party SSAS clients SharePoint 2010 Power View MDX
BISM Excel 2010 Reporting Services „Denali” SharePoint 2010 Excel Services Reporting Services PerformancePoint Services Visio Services 3rd party SSAS clients SharePoint 2010 Power View 3rd party SSAS clients MDX DAX
BISM Excel 2010
BISM in ‘Denali’ includes: hierarchies, KPIs, parent-child, drillthrough, perspectives additional DAX functions (RankX, DistinctCount, GroupBy, Lookup) security (role-based with Active Directory, column/row based) BISM does not include: some of the UDM features scripts, actions, translations, role-playing dimensions object model write-back other realtime for non-SQL Server data sources MDX query support for realtime
BISM and the tabular model
Relatively simple model Fast response Flexible DAX calculations are similar to Excel formulas More understandable and user-friendly to majority of people Same model across all scenarios Easily scale from personal BI to corporate BI Faster development than in UDM Prototyping by end-users Easier changes of model Reduction of cost in developing the full BI solution
MOLAP is much more complex than PowerPivot, but it offers greater scalability ROLAP is even more limited, but it scales above 50TB space PowerPivot models can grow up to 2GB which is the limit set by SharePoint if they want to be shared among others. Otherwise, only the memory is the limit BISM comes in the middle and fills the space between MOLAP and PowerPivot For the space way above the 50TB there are new ColumnStore indexes (in the relational engine)
Two projects for building a BI Semantic Model Future plan is to integrate these into 1 model Use Vertipaq as an SSAS storage Use MDX scripts in tabular projects DAX queries are not supported in multidimensional projects and thereby Power Viewer, which uses DAX to retrieve data from the model
BI Semantic Model features Role playing dimensions Translations Actions MDX Scripts Realtime over Oracle, Teradata, DB2… Programmability BISM object model MDX query support for Realtime Write back
BISM is not a replacement for UDM DAX is not a replacement for MDX Column store databases offering blazing fast performance Every model has its advantages BI architects must decide when to apply which model BISM v2 in not complete, expect changes!
Mail to