Intro to BI Architecture| Warren Sifre
Analytics Classifications Ideal Layers Traditional BI Architecture Agenda Analytics Classifications Ideal Layers Traditional BI Architecture Challenges Encountered by Poor Architecture Preview of Advance Analytics Architecture
Owner of Broadstroke Consulting. In the IT Industry since the 1990’s. About Me Owner of Broadstroke Consulting. In the IT Industry since the 1990’s. Developed system integration solutions against many different database platforms. Passion in Solutions Architecture at both hardware and software levels. Interests in SQL Server, Azure, MongoDB, Hadoop, Python/C#/Java/PowerShell and Information Security (Hacking) Love Baseball, Spartan Racing, Star Wars and much more!!!
Ideal Architecture
Ideal Layers of Data Architecture
Ideal Layers – Operational Data Store Layer Operational Data Store Layer provides a way to store and retain data from source systems with the purpose of acquiring Near Real Time insights into what is happening within the organization at the very moment. The intention is for the data to remain the same way it came in from the source system, with very little to no transformations and no aggregations. This allows for the data to be consumed by any number of processes for various reasons and preserve the authenticity of the data. Tools – MS SQL Server, Oracle, MySQL or any Database Engine
Ideal Layers – Enterprise Data Repository Layer Enterprise Data Repository Layer provides a way to permanently store data from source systems with the intention of not only reporting on what happen, but provides the opportunity to determine why it happened. This is also a cornerstone for advance analytics where Will It Happen? and How Can We Make It Happen? type of analysis can occur. Tools – MS SQL Server, Oracle, and database engines.
Ideal Layers – Semantic Model Layer Semantic Model Layer is the keeper of all Business Logic related to how values are calculated. Tools – MS SQL Analysis Services, Oracle, Teradata, any other Semantic layer engine.
Ideal Layers – Presentation Layer Presentation Layer is where the various tools would be employed to leverage the data in the Semantic Layer Models and display them as Dashboards, Paginated Reports or any other method of displaying the data stored within. Tools – Power BI, Tableau, Qlik, Tibco, Logi, iDashboard, Excel, etc…
Commonly Found Architecture LIMITING ARCHITECTURE!!!
Ideal Traditional BI Architecture
Traditional BI Architecture – Operational Data Store ODS (Operational Data Store) – This database would contain the data originating from Source Systems in its original form. No transformations would take place in this area. The data contain herein is not kept permanently, but long enough to feed the Data Warehouse. The data in the ODS would be updated as close to real time as possible. Features Leveraged – Replication, Change Data Capture, Log Shipping, ELT Tools Database Engines - MS SQL Server, Oracle, MySQL, DB2 ELT - SQL Integration Services, Informatica, Pentaho, Information Builders, IBM
Ideal Traditional BI Architecture
Traditional BI Architecture – Data Warehouse / Data Marts Data Warehouse / Data Marts (Enterprise Data Repository Layer) – These databases would contain ALL data pertaining to the organization by Subject Area over a longer duration, ideally for 7-10 years. This is the source of all the Departmental and Enterprise Semantic Models and contains the data in a Dimensional Model format which is ideal to build these Semantic Models. The data in these databases would be refreshed at an interval closer to hours or daily. The level of detail of this data would be at the lowest possible granularity. Features Leveraged – Dimensional Modeling, ETL, ColumnStore Indexing, Table Partitioning, Surrogate Keys, Data Cleansing Tools Database Engines - MS SQL Server, Oracle, MySQL, DB2 ETL - SQL Integration Services, Informatica, Pentaho, Information Builders, IBM Data Cleansing – SQL Data Quality Services, Pentaho, Information Builders
Traditional BI Architecture – Master Data Management Master Data Management (Enterprise Data Repository Layer) – This database would contain the MASTERED lists of values used by the enterprise and would provide these values to the Data Warehouse / Data Marts. These values could be Customer Names, Product Names, Service Offerings, or any other list of values requiring a certain level of control and do not have a high frequency of change. Features Leveraged – Database Engine, ETL, MDM Tools Database Engines - MS SQL Server, Oracle, MySQL, DB2 ETL - SQL Integration Services, Informatica, Pentaho, Information Builders, IBM MDM – SQL Master Data Services, Informatica MDM, SAP, Orchestra Networks
Ideal Traditional BI Architecture
Traditional BI Architecture – Semantic Models Operational Near Real Time (Semantic Model) – This database contains aggregations that are computed in near real time and would use the data in the ODS as its source. This allows for immediate dashboard reporting or alerting to events occurring now. Departmental (Semantic Model) – This database contains information specific to a department and aggregations are computed at an interval matching that of either the Data Warehouse or Data Mart. This allows for the use of any reporting or dashboard tool to be leveraged. Enterprise (Semantic Model) – This database contains information related to the enterprise with information aggregated at a higher level for the enterprise audience such as Chiefs, VPs, and Directors. Features Leveraged – Semantic Layer Engine, MDX Tools Semantic Layer Engines – SQL Analysis Services, OBIEE, SAP
Ideal Traditional BI Architecture
Traditional BI Architecture – Presentation Objects Operational Dashboards (Presentation Layer) – This would provide near real time information about what is currently happening in the organization in the form of a dashboard. Operational Alerts (Presentation Layer) – This would provide alerts such as Twitter or Emails to audiences needing to be notified of specific events detected in the ODS data. Departmental Paginated Reports (Presentation Layer) – This would be any tool providing your typical Matrix or Table format of information for Departmental audience. The key element here is that more than one tool can be leveraged to consume the same information at any time. Departmental Dashboards (Presentation Layer) – This would be any tool providing dashboards specific to a department’s requirements. Enterprise Paginated Reports (Presentation Layer) – This would be any tool providing your typical Matrix or Table format of information for Enterprise audience. The key element here is more than one tool can be leveraged to consume the same information at any time. Enterprise Dashboards (Presentation Layer) – This would be any tool providing dashboards specific to Enterprise requirements.
Challenges Encountered by Architecture Decisions New Tool Deployment New Report Development Calculation Modification
Challenge 1: New Tool Deployment Dashboards / Paginated Reports / Mobile Reports Security Business Logic Data Source(s) Very little to no effort reusability. Large effort to Migrate from one tool to another. Limits the ability of other tools to leverage the same information and calculations. Does not allow for efficient tools replacement or expansion!!!
Challenge 1: New Tool Deployment Database Source(s) SQL, Oracle, Teradata, etc… Business Logic Semantic Layer Dashboards Tool #1 Paginated Reports Tool #2 Security Active Directory Mobile Reports Tool #3 Component Segmentation allows for appropriate business teams to retain responsibility and allow parallel development in any given area.
Challenge 2: New Report Generation Business makes the determination that there is a need for new report(s) to help answer immediate questions. Business has new question Report Request Requirements Gathered Report Development process Insight identified and new questions arise
Challenge 2: New Report Generation Business makes the determination that there is a need for new report(s) to help answer immediate questions. Business starts the process of requesting a new report. Business has new question Report Request Requirements Gathered Report Development process Insight identified and new questions arise
Challenge 2: New Report Generation Business makes the determination that there is a need for new report(s) to help answer immediate questions. Business starts the process of requesting a new report. Requirements are gathered. Business has new question Report Request Requirements Gathered Report Development process Insight identified and new questions arise
Challenge 2: New Report Generation Business makes the determination that there is a need for new report(s) to help answer immediate questions. Business starts the process of requesting a new report. Requirements are gathered. Report Development commences, with this comes the iterative process of User Acceptance. Business has new question Report Request Requirements Gathered Report Development process Insight identified and new questions arise
Challenge 2: New Report Generation Business makes the determination that there is a need for new report(s) to help answer immediate questions. Business starts the process of requesting a new report. Requirements are gathered. Report Development commences, with this comes the iterative process of User Acceptance. Report is reviewed and new questions arise. Business has new question Report Request Requirements Gathered Report Development process Insight identified and new questions arise
Challenge 2: New Report Generation Business makes the determination that there is a need for new report(s) to help answer immediate questions. Business starts the process of requesting a new report. Requirements are gathered. Report Development commences, with this comes the iterative process of User Acceptance. Report is reviewed and new questions arise. Business has new question Report Request Requirements Gathered Report Development process Insight identified and new questions arise Process Repeats
Challenge 2: New Report Generation Depending on the architecture of the environment, the development of new reports can not only take weeks to completed, but can also consume large amounts of resources. [Time] - Investigation into Effort Required to deliver [Data Duplication] - Database Space Consumption
Challenge 3 – Calculation Modification Modification of calculations can be a very laborious task in the commonly found architecture. Each calculation is embedded in each report. This results in multiple reports needing to be modified when one calculation modification is required. Report 1 Report 2
Challenge 3 – Calculation Modification Report 1 Report 1 Report 1 Report 2 Report 2 Report 3 Report 2 Report 4 Report 3 Report 5
Challenge 3 – Calculation Modification Make Calculation Modification User Acceptance Testing Deploy to Production Make Calculation Modification User Acceptance Testing Deploy to Production Make Calculation Modification User Acceptance Testing Deploy to Production The path from making a calculation modification to the finish product being available to the user community can be time consuming. Especially if we have to repeat this process for each report modified.
Commonly Found Architecture LIMITING ARCHITECTURE!!!
Ideal Architecture