Copyright 2009, Information Builders. Slide 1 iWay Enterprise Information Management (EIM) Data Quality and Master Data Management Kam Wong Solutions Architect Information Builders D.C. User Forum December 8, 2009
Copyright 2009, Information Builders. Slide 2 Data Quality and Master Data Management Agenda Business Drivers Behind Data Management Usage – Where To Use Data Management Impact Of Data Quality What Is Data Management? Data Profiling Data Cleansing Data Enrichment Match & Merge (De-duplication) Master Data Management Examples and Demonstration
Copyright 2009, Information Builders. Slide 3 Business Drivers Copyright 2007, Information Builders. Slide 3
Copyright 2009, Information Builders. Slide 4 Business Drivers Customer Service Marketing Campaigns Process Improvement Regulatory Compliance Fraud Detection
Copyright 2009, Information Builders. Slide 5 Data Drivers Accuracy Correct Information Completeness Thorough Information Consistency Uniform Information Validity Valid Information
Copyright 2009, Information Builders. Slide 6 Copyright 2007, Information Builders. Slide 6 Business Intelligence Drivers Data Quality is the Cornerstone of Effective Business Intelligence, and operations for that matter. So far companies have spent significant amount IT budget to integrate disparate application, creating data warehouse in order to get better Business Intelligence. However, many companies overlook the fact that, at the end of the day, it is the underlying data that matters. All of the pretty screens and reports in the world would not make a difference if the data that resides in the system is full of errors, inconsistent and redundant. In order to achieve successful business intelligence companies need to tackle the Data Quality problem first.
Copyright 2009, Information Builders. Slide 7 Usage Copyright 2007, Information Builders. Slide 7
Copyright 2009, Information Builders. Slide 8 Copyright 2007, Information Builders. Slide 8 Analytic EIM (Batch or Real-time) Analytical EIM focuses on improving the data quality and accuracy of BI reports Operational EIM (Real-time) Goal is to synchronize operational systems data with golden record so that you have quality and consistency across enterprise processes. EIM Usages
Copyright 2009, Information Builders. Slide 9 Copyright 2007, Information Builders. Slide 9 EIM Dimensions DW/DM System
Copyright 2009, Information Builders. Slide 10 Processes Transactions Documents Supplier, Partners Customer, Exchange Data Warehouse, Data Mart. ODS Applications Portals Enterprise Search BI and Real-Time Dashboards Universal Adapter Suite Core Integration Services Reporting Application Data Management Mainframe Data, Applications and Transactions Applications, CRM, ERP, etc Databases, Data Warehouse, Data Marts Documents, Files, Content Management Messages, Transactions, s SWIFT, HIPAA, EDI Formats EIM and WebFOCUS Solutions Core Reporting Services
Copyright 2009, Information Builders. Slide 11 Impact Copyright 2007, Information Builders. Slide 11
Copyright 2009, Information Builders. Slide 12 Impact of Data Quality Address Data 36 % Naturally Correct 64 % Manual Attention
Copyright 2009, Information Builders. Slide 13 3 % Manual Attention 61 % Automated Cleansing 36 % Naturally Correct + Impact of Data Quality Address Data
Copyright 2009, Information Builders. Slide 14 What Is Data Management? Data Quality and Master Data Management Copyright 2007, Information Builders. Slide 14
Copyright 2009, Information Builders. Slide 15 Data Profiling Profiling Basic Analysis Minimums Maximums Averages Counts Etc. Patterns Extremes Quantities Frequency Analysis Foreign Key Analysis Masking Drilldown Copyright 2007, Information Builders. Slide 15
Copyright 2009, Information Builders. Slide 16 Parsing data parsed into components (pattern based) Standardization transformation into standard format (Jim Smith -> James Smith) standard and nonstandard abbreviations (Str. -> Street) language-specific replacements Data quality improvement validation against rules validation against reference tables Large number of domain oriented algorithms - examples: Address Party Vehicle Name Identification number Credit Card number Bank account number Extension by custom validation steps using complex function and rules including Levenshtein distance SoundEx internal (java-based) functions Data Cleansing
Copyright 2009, Information Builders. Slide 17 External company register standard company name registration ID official address national bank account classification Geocodes adding geo-codes for identified address allows showing map locations used for geomarketing or insurance risks External address register adding missing zip-codes, street names, city, etc. validating existence against register of addresses List of names, surnames, academic and social titles validating existence standardization (PHD -> Ph.D.) adding missing components Data Enrichment
Copyright 2009, Information Builders. Slide 18 Unification identification of the set of records connected to one person address vehicle contact …etc. Deduplication golden record creation (the best representation of the identified subject) Identification new data entries – to identify subject (person, address, etc.) to which the new record is connected (matched) Complex business rules using sophisticated algorithms and functions including Levenshtein distance Hamming distance Edit distance Data quality scores values Data stamps of last modification Source system originating data etc. Match & Merge
Copyright 2009, Information Builders. Slide 19 Master Data Management (MDM) Defined MDM for customer data systems are software products that: Support the global identification, linking and synchronization of customer information across heterogeneous data sources Create and manage a central, database-based system of record Enable the delivery of a single view for all stakeholders MDM architectural styles vary in: Instantiation of the customer master data — varying from the maintenance of a physical customer profile to a more-virtual, metadata-based indexing structure The latency of customer master data maintenance — varying from real-time, synchronous, reading and writing of the master data in a transactional context to batch, asynchronous harmonization of the master data across systems An MDM program potentially encompasses the management of customer, product, asset, person or party, supplier and financial masters.
Copyright 2009, Information Builders. Slide 20 MDM Architectures Master is Single Version of Truth Data Quality at Master Updates occur at Sources Updates propagated to Master Master Source Multiple Versions of Truth Data Quality is Ongoing Updates occur at Sources Keys and Metadata in Registry Updates propagated to other Sources (Optional) Master Source Consolidated Registry Master is Single Version of Truth Data Quality is Ongoing Updates occur at Sources or Master Updates propagated to other Sources Master Source Coexistence Master Source Master is Single Version of Truth Data Quality at Master Updates occur at Master Updates propagated to Sources Centralized
Copyright 2009, Information Builders. Slide 21 Examples And Demonstration Copyright 2007, Information Builders. Slide 21
Copyright 2009, Information Builders. Slide 22 Data Quality Examples Copyright 2007, Information Builders. Slide 22
Copyright 2009, Information Builders. Slide 23 Original data – before cleansing Source data NameGSINBirth DateAddress Dr. John SmithF /16/ Ave Surrey V3R 2A9 Smith W. JohnM Surrey Ave John William SmithSIN Linden Str Toronto M4X 1V5 Dr. J.W. SmithM /16/78 John Smith Leslie L3T 7M8 Toronto Smith John Leslie street Marham John Smiht Jane Watson Leslie str. Toronto L3T 7M8 Watson JaneF Leslei street Toronto L3T 7M8 Jane SmithFSIN J. Smith
Copyright 2009, Information Builders. Slide 24 Titles Parsing NameGSINBirth DateTitlesClearing Codes Dr. John SmithF /16/1978Dr. Academic_Title Smith W. JohnM John William Smith SIN Dr. J.W. SmithM /16/78Dr. Academic_Title John Smith Smith John John Smiht Jane Watson Watson JaneF Jane SmithFSIN J. Smith
Copyright 2009, Information Builders. Slide 25 Name Parsing FirstMLastGSINBirth DateClearing Codes JohnSmithF /16/1978 Academic_Title JohnW.SmithM JohnWilliamSmithSIN J.W.SmithM /16/78 Academic_Title JohnSmith JohnSmith JohnSmiht Last_name_not_found JaneWatson JaneWatsonF JaneSmithFSIN J.Smith
Copyright 2009, Information Builders. Slide 26 Update gender (based on first name) FirstMLastGSINBirth DateClearing Codes JohnSmithM /16/ le, Gender_changed JohnW.SmithM JohnWilliamSmithMSIN Gender_updated J.W.SmithM /16/78 Academic_Title JohnSmithM Gender_updated JohnSmithM Gender_updated JohnSmihtM Last_name_not_found JaneWatsonF Gender_updated JaneWatsonF JaneSmithFSIN J.Smith
Copyright 2009, Information Builders. Slide 27 Validate Social Security Number FirstMLastGSINBirth DateClearing Codes JohnSmithM /16/ nged, SIN_blacklist JohnW.SmithM SIN_removed_dashes JohnWilliamSmithMSIN ated, SIN_extra_chars J.W.SmithM /16/78...mic_Title, SIN_invalid JohnSmithM Gender_updated JohnSmithM updated, SIN_missing JohnSmihtM Last_name_not_found JaneWatsonF Gender_updated JaneWatsonF SIN_removed_dashes JaneSmithFSIN SIN_extra_characters J.Smith SIN_removed_dashes
Copyright 2009, Information Builders. Slide 28 Validate Social Security Number (after) FirstMLastGSINBirth DateClearing Codes JohnSmithM12/16/ nged, SIN_blacklist JohnW.SmithM SIN_removed_dashes JohnWilliamSmithM ated, SIN_extra_chars J.W.SmithM11/16/78...mic_Title, SIN_invalid JohnSmithM Gender_updated JohnSmithM updated, SIN_missing JohnSmihtM Last_name_not_found JaneWatsonF Gender_updated JaneWatsonF SIN_removed_dashes JaneSmithF SIN_extra_characters J.Smith SIN_removed_dashes
Copyright 2009, Information Builders. Slide 29 Validate Birth Date FirstMLastGSINBirth DateClearing Codes JohnSmithM12/16/ nged, SIN_blacklist JohnW.SmithM SIN_removed_dashes JohnWilliamSmithM ated, SIN_extra_chars J.W.SmithM11/16/78...mic_Title, SIN_invalid JohnSmithM Gender_updated JohnSmithM updated, SIN_missing JohnSmihtM Last_name_not_found JaneWatsonF _updated, BD_invalid JaneWatsonF SIN_removed_dashes JaneSmithF SIN_extra_characters J.Smith SIN_removed_dashes
Copyright 2009, Information Builders. Slide 30 Validate Birth Date (after) FirstMLastGSINBirth DateClearing Codes JohnSmithM nged, SIN_blacklist JohnW.SmithM SIN_removed_dashes JohnWilliamSmithM ated, SIN_extra_chars J.W.SmithM mic_Title, SIN_invalid JohnSmithM Gender_updated JohnSmithM updated, SIN_missing JohnSmihtM Last_name_not_found JaneWatsonF _updated, BD_invalid JaneWatsonF SIN_removed_dashes JaneSmithF SIN_extra_characters J.Smith SIN_removed_dashes
Copyright 2009, Information Builders. Slide 31 Prepared data (after cleansing) Cleansed data FirstLastGSINBirth DateAddress JohnSmithM V3R 2A9;BC;Surrey; Avenue JohnSmithM V3R 2A9;BC;Surrey; Avenue JohnSmithM M4X 1V5;ON;Toronto;25 Linden Street SmithM JohnSmithM L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmithM L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmihtM JaneWatsonF L3T 7M8;ON;Markham;8500 Leslie Str. JaneWatsonF L3T 7M8;ON;Markham;8500 Leslie Str. JaneSmithF J.Smith
Copyright 2009, Information Builders. Slide 32 Master Data Management Examples Copyright 2007, Information Builders. Slide 32
Copyright 2009, Information Builders. Slide 33 Prepared data (after cleansing) Cleansed data FirstLastGSINBirth DateAddress JohnSmithM V3R 2A9;BC;Surrey; Avenue JohnSmithM V3R 2A9;BC;Surrey; Avenue JohnSmithM M4X 1V5;ON;Toronto;25 Linden Street SmithM JohnSmithM L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmithM L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmiht JaneWatsonF L3T 7M8;ON;Markham;8500 Leslie Str. JaneWatsonF L3T 7M8;ON;Markham;8500 Leslie Str. JaneSmithF J.Smith
Copyright 2009, Information Builders. Slide 34 Match Cleansed data FirstLastGSINBirth DateAddress JohnSmithM V3R 2A9;BC;Surrey; Avenue JohnSmithM V3R 2A9;BC;Surrey; Avenue JohnSmithM M4X 1V5;ON;Toronto;25 Linden Street SmithM JohnSmithM L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmithM L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmiht JaneWatsonF L3T 7M8;ON;Markham;8500 Leslie Str. JaneWatsonF L3T 7M8;ON;Markham;8500 Leslie Str. JaneSmithF J.Smith
Copyright 2009, Information Builders. Slide 35 Merge Cleansed data FirstLastGSINBirth DateAddress JohnSmithM V3R 2A9;BC;Surrey; Avenue JohnSmithM V3R 2A9;BC;Surrey; Avenue JohnSmithM M4X 1V5;ON;Toronto;25 Linden Street Golden record FirstLastGSINBirth DateAddress JohnSmithM M4X 1V5;ON;Toronto;25 Linden Street The newest permanent address The most frequent address V3R 2A9;BC;Surrey; Avenue
Copyright 2009, Information Builders. Slide 36 Demonstration Copyright 2007, Information Builders. Slide 36
Copyright 2009, Information Builders. Slide 37 Thank-You Copyright 2007, Information Builders. Slide 37