Driving Data Quality Initiatives with Agile Analytics Ken Raetz Principal Think Data Insights, LLC
Agenda Our time today… Agile Development vs. Agile Analytics Data Quality Concepts Data Wrangling Stages of Analytics Agile Data Quality Steps during Analytics work
Think Data Insights A little about us… Enterprise Business Intelligence Experts 30+ years of business, technology, consulting experience Based in Nashville, TN Apply Agile Approach to Analytics and BI Microsoft SQL Server BI Solution Experts Power BI Solution Experts Excel Power BI Solution Experts Power Planner, Power Update Solutions Healthcare solutions with sister company, Visualize Health, LLC
Agile Methodology Early and continuous delivery Change is normal Frequent delivery Strong collaboration Co-location Simplicity Sustainable pace Adapt to changes
Analytics More than predictive analytics DW/BI Source Data Modeling ETL/ELT Reporting Visualizations Data Management/Data Warehousing Data integration Reporting Business Intelligence KPI/Dashboards Predictive Analytics
Agile Analytics Data complexity and volatility Business requirements change Must deliver successfully Keep it simple Work closely with business SMEs Identity/document issues as they come Don’t stop to address every issue
Continuous Delivery Objectives mobilize teams Teams evaluate data Rapid analytics approach Data produces insights Insights refine objectives Business Objective Organize Team Evaluate Data Rapid Data Analytics Deliver Insights
Derailed by lack of data quality Wrong objective Ineffective team selected Data correctness becomes focus Technical Specs More work for IT Business objectives not met Business Objective Organize Team Evaluate Data Rapid Data Analytics Deliver Insights Wrong Objective Wrong Team Fix Data No Analytics IT Backlog
Data Quality Control Precision Accuracy Usefulness Consistency Missing / Unknown Completeness
Data Wrangling The key to successful analytics projects Extraction – “Data Gathering” Analysis – “Data Profiling” Transformation – Define Data Quality Rules Loading/Visualizing – Destination Platform Consistency
Top of the Funnel Data correction closest to the source Business Process Source DB Extraction Staging/ODS Transformation DW/BI Report Data correction closest to the source
Agile Data Quality Approach As you go along the path… Extract Analyze Transform Load/Viz 1 - Identify Find data issues 2 - Evaluate Potential solutions 3 - Document Business/tech DQ specs 4 - Implement Short-term fix 5 - Collaborate Define new scope
Agile Data Quality Approach EXTRACT Extract Analyze Transform Load/Viz 1 - Identify 2 - Evaluate 3 - Document 4 - Implement 5 - Collaborate
Stage: Extraction Data quality at the source 1 - Identify Correct source? Complete data? Accuracy and timing? Time-dependent? 2 - Evaluate Profile data Compare source DB Daily comparison Completeness (products, customers, etc.) 3 – Document Source SME DB/File details (tables, views) Frequency concerns Prioritize 4 – Implement Supplement source data Fake data SQL Rules (IF, CASE WHEN, LIKE) Snapshot 5 - Collaborate IT/Business Dev teams Vendor
Agile Data Quality Approach As you go along the path… Extract Analyze Transform Load/Viz 1 - Identify Source data Completeness 2 - Evaluate Profiling DB Compare Time Compare 3 - Document SME DB/Files Prioritize 4 - Implement Supplement data Other sources 5 - Collaborate Dev/IT Vendor
Agile Data Quality Approach Analyze Extract Analyze Transform Load/Viz 1 - Identify 2 - Evaluate 3 - Document 4 - Implement 5 - Collaborate
Stage: Analyze Data quality during detailed analysis 1 - Identify Integrate-able? Missing data? Meaningful and complete? Historical preservation? 2 - Evaluate Detail profiling Key analysis NULL/Missing data Distinct values App logic profiling 3 – Document Source data completeness Mapping rules Orphaned data Lookup rules 4 – Implement Hard-code logic Repair data Build lookups lists Find other data 5 - Collaborate IT/Business Dev teams Leadership
Agile Data Quality Approach As you go along the path… Extract Analyze Transform Load/Viz Completeness Accuracy History 1 - Identify Source data Completeness 2 - Evaluate Profiling DB Compare Time Compare Detail profile NULL/Missing App Logic Mapping rules Orphaned data Lookup rules 3 - Document SME DB/Files Prioritize Hard-code logic Repair data Build lookup lists 4 - Implement Supplement data Other sources 5 - Collaborate Dev/IT Vendor Dev/IT Leadership
Agile Data Quality Approach Transform Extract Analyze Transform Load/Viz 1 - Identify 2 - Evaluate 3 - Document 4 - Implement 5 - Collaborate
Stage: Transform Data quality while transforming data 1 - Identify Missing data rules Integration rules Date-related rules (Service/Posting) Calcs needed 2 - Evaluate Test rules against source reporting Test integration for completeness Verify calculations 3 – Document Calculation logic Quality rules (IF, CASE WHEN) Data cleansing needs Inconsistent data 4 – Implement Hard-code logic Rules-based calcs Native/Source functions 5 - Collaborate Project Mgmt Dev teams Analysts Power Users
Agile Data Quality Approach As you go along the path… Extract Analyze Transform Load/Viz Completeness Accuracy History Calculations Integration Date logic 1 - Identify Source data Completeness 2 - Evaluate Profiling DB Compare Time Compare Detail profile NULL/Missing App Logic Rule/Calc testing Completeness SME DB/Files Prioritize Mapping rules Orphaned data Lookup rules Calcs DQ rules Cleansing needs 3 - Document Hard-code logic Repair data Build lookup lists Hard-code logic Rules-based calcs Native functions 4 - Implement Supplement data Other sources 5 - Collaborate Proj Mgmt Dev/IT Power Users Dev/IT Vendor Dev/IT Leadership
Agile Data Quality Approach Load Extract Analyze Transform Load/Viz 1 - Identify 2 - Evaluate 3 - Document 4 - Implement 5 - Collaborate
Stage: Load/Visualize Data quality while loading/using data 1 - Identify Future data needed Desired reporting structure Default values Granularity 2 - Evaluate Data model Test results using defaults/calcs Compare data at different granularity 3 – Document Data model Default rules Model/Data granularity 4 – Implement Views over tables Tables/queries Load/build routines Visualizations Calculations 5 - Collaborate Project Mgmt Analysts Power Users
Agile Data Quality Approach As you go along the path… Extract Analyze Transform Load/Viz Completeness Accuracy History Calculations Integration Date logic Future data needs Default values Granularity 1 - Identify Source data Completeness 2 - Evaluate Profiling DB Compare Time Compare Detail profile NULL/Missing App Logic Data model Granularity Rules Rule/Calc testing Completeness 3 - Document SME DB/Files Prioritize Mapping rules Orphaned data Lookup rules Calcs DQ rules Cleansing needs Data model Rules Hard-code logic Repair data Build lookup lists Hard-code logic Rules-based calcs Native functions Tables Visualizations Calcs 4 - Implement Supplement data Other sources 5 - Collaborate Dev/IT Vendor Proj Mgmt Dev/IT Power Users Dev/IT Leadership Proj Mgmt Power Users
Server & Tools Business 12/6/2018 Agile Data Quality Approach A word on Collaboration Right people/teams Clear roadblocks/build new paths Not ONE-TIME Continuous © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Data Analytics – A new role… Analytics Prototype Solutions Specialist (APSS) Speaks business and IT Rapid solutions Designs early prototypes Defines business metrics Creates vision/Defines ROI Channels data quality issues Aka… Chief Data Wrangler
Q&A 12/6/2018 Demo the service. Demo’s are available at //BI © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Driving Data Quality Initiatives with Agile Analytics Ken Raetz Principal Think Data Insights, LLC