Data Quality in the BI Life Cycle Robert Blaas Data Quality in the BI Life Cycle
Thanks to our sponsors
About Me rwb1912@gmail.com @rwb1912 nz.linkedin.com/in/rblaas www.thewanderer.nz
Agenda Effects of Bad Data Defining Data Quality & Data Quality Management Governance and Process How Can DQS Assist
Support Decision Making Data & Information Support Decision Making Information Economy Information = Context around data Information shaped into Dashboards and Reports Competitive EDGE FASTER and Greater Appetite for information Cut corners Data Quality takes a back seat
Confident Decision Making ? Ask Audience – How confident are you in your decision making process? Let me tell you a story (Manufacturing & Insurance)
Some Statistics
Effects of Bad Data Defining Data Quality & Data Quality Management Governance and Process How Can DQS Assist
Data Quality Is a measure, or set of measures, that give an organization an indication of the level of confidence it can have in the data that is used in it’s operational and strategic decision making process.
Data Quality Management Are the set of processes by which we manipulate the organizations data to increase its quality. Regulatory (Banking, Privacy personal details must be correct) Inaccurate data can lead to business failures, e.g. wrong customer address leads to wrong shipping. Data quality = OLTP delivery ( providing users with consistent and correct data for day to day business processing, e.g. being able to process orders, service, delivery) Data quality = Decision support (correct, reliable and consistent information because underlying data is known to be correct and consistent) Data Quality should form an integral part of data governance
Why Check For Data Quality Impacts Decision Making Process Impacts Profitability Impacts Brand Regulatory Requirements
Common Causes of DQ Issues Data Merging Broken Rules Data Entry Inconsistent Sources Data Transmission Timeliness Data Merging -> data merged is at risk of being incorrect Data Transferred -> corruption during file transport, transformation during transmission Timeliness - > Data must arrive on time Data Entry -> especially in legacy environments where no control or relational db MDS -> one version of the truth (single source) sadly this is a utopia. Incorrect BR -> We often forget this, the need to validate the rule being implemented in calculation or transformation
What to Check For Consistency Accuracy Completeness Validity Conformity Duplicates Completeness = Is all the relevant data available Consistency = is data consistent throughout (always male/female, m/f, 0/1, true/false) Validity = Does the data fall within accepted domains. Accuracy = How accurate is the data, if we are measuring temperature to what level is temperature measured, with what variance or margin of error Conformity = Does is it conform to the accepted business rule Duplicates = Is the value duplicated, if so what represents the true value (2 customers but with different addresses)
Effects of Bad Data Defining Data Quality & Data Quality Management Governance & Process How Can DQS Assist
People Technology Process
Data Quality through Governance Taking Ownership Cultural Shift Collaborative Effort Integral Part of Process aking Ownership is crucial . Everyone Owns the Data/Information so everyone is responsible for the quality Executive Buy In !! Business Buy In Cultural Shift Must form Integral part of Governance Process
Data Quality Assurance Process Monitor Assess Action Communicate ASSESS – Tools & Processes to asses issues COMMINICATE – Corporate Platform to communicate, open, collaborative, tools DATA STEWARDS (GOTO PEOPLE) ACTION – Tools MONITOR – Process, Continuous ongoing LIFE CYCLE – NOT ONE OFF
DQ in IT Project Life Cycle Analysis & Design Profiling Business Rules Establish Domains Test Cases Initiation Profiling Production DQS Projects Monitoring Action Development & UAT Domains DQS Projects Dashboards/Reports
Effects of Bad Data Effects of Bad Data Effects of Bad Data Effects of Bad Data Defining Data Quality & Data Quality Management Defining Data Quality & Data Quality Management Defining Data Quality & Data Quality Management Defining Data Quality & Data Quality Management Governance & Process Governance & Process Governance & Process Governance & Process How Can DQS Assist How Can DQS Assist How Can DQS Assist How Can DQS Assist
What Does DQS Provide Knowledge Base Through Discovery Reusable Knowledge Base Semantic Layering Monitoring Business Interface Reusable KB – build a data domain that can be reused across the organisation Semantic – data is mapped to domains which are given semantic meaning to be tested Discovery – KB can be built and expanded through data discovery Extension KB = expand KB by linking through to 3rd party kb providers such as marketing companies for addresses. Can work in conjunction with MDS
Functions of DQS Data Assessment KB Matching Match & Consolidate Build Knowledge Base Data Assessment KB Matching Knowledge Base Use Knowledge Base Match & Consolidate Data Cleansing
Categorized Reference Data Categorized Reference Data Services Architecture DQ Clients MS DQ Domains Store Azure Market Place Categorized Reference Data Categorized Reference Data Services DQS UI Knowledge Discovery and Management DQ Server 3rd Party Reference Data Services Reference Data Sets Interactive DQ Projects RD Services API (Browse, Set, Validate…) Reference Data API (Browse, Get, Update…) Data Exploration DQ Engine Cleansing Knowledge Discovery Data Profiling & Exploration Reference Data Matching SSIS DQ Component DQ Projects Store Common Knowledge Store Knowledge Base Store MS Data Domains Local Data Domains DQ Active Projects Published KBs
Time for a Demo
How we can assist Extensive DWH / BI Implementations Information Modelling SSIS Developments MDS deployments DQS deployments
Thank for attending Singapore SQLSaturday#646!