Download presentation
Presentation is loading. Please wait.
Published byBlaise Boyd Modified over 9 years ago
1
Copyright 2009, Information Builders. Slide 1 iWay Enterprise Information Management (EIM) Data Quality and Master Data Management Kam Wong Solutions Architect Information Builders D.C. User Forum December 8, 2009
2
Copyright 2009, Information Builders. Slide 2 Data Quality and Master Data Management Agenda Business Drivers Behind Data Management Usage – Where To Use Data Management Impact Of Data Quality What Is Data Management? Data Profiling Data Cleansing Data Enrichment Match & Merge (De-duplication) Master Data Management Examples and Demonstration
3
Copyright 2009, Information Builders. Slide 3 Business Drivers Copyright 2007, Information Builders. Slide 3
4
Copyright 2009, Information Builders. Slide 4 Business Drivers Customer Service Marketing Campaigns Process Improvement Regulatory Compliance Fraud Detection
5
Copyright 2009, Information Builders. Slide 5 Data Drivers Accuracy Correct Information Completeness Thorough Information Consistency Uniform Information Validity Valid Information
6
Copyright 2009, Information Builders. Slide 6 Copyright 2007, Information Builders. Slide 6 Business Intelligence Drivers Data Quality is the Cornerstone of Effective Business Intelligence, and operations for that matter. So far companies have spent significant amount IT budget to integrate disparate application, creating data warehouse in order to get better Business Intelligence. However, many companies overlook the fact that, at the end of the day, it is the underlying data that matters. All of the pretty screens and reports in the world would not make a difference if the data that resides in the system is full of errors, inconsistent and redundant. In order to achieve successful business intelligence companies need to tackle the Data Quality problem first.
7
Copyright 2009, Information Builders. Slide 7 Usage Copyright 2007, Information Builders. Slide 7
8
Copyright 2009, Information Builders. Slide 8 Copyright 2007, Information Builders. Slide 8 Analytic EIM (Batch or Real-time) Analytical EIM focuses on improving the data quality and accuracy of BI reports Operational EIM (Real-time) Goal is to synchronize operational systems data with golden record so that you have quality and consistency across enterprise processes. EIM Usages
9
Copyright 2009, Information Builders. Slide 9 Copyright 2007, Information Builders. Slide 9 EIM Dimensions DW/DM System
10
Copyright 2009, Information Builders. Slide 10 Processes Transactions Documents Supplier, Partners Customer, Exchange Data Warehouse, Data Mart. ODS Applications Portals Enterprise Search BI and Real-Time Dashboards Universal Adapter Suite Core Integration Services Reporting Application Data Management Mainframe Data, Applications and Transactions Applications, CRM, ERP, etc Databases, Data Warehouse, Data Marts Documents, Files, Content Management Messages, Transactions, E-Mails SWIFT, HIPAA, EDI Formats EIM and WebFOCUS Solutions Core Reporting Services
11
Copyright 2009, Information Builders. Slide 11 Impact Copyright 2007, Information Builders. Slide 11
12
Copyright 2009, Information Builders. Slide 12 Impact of Data Quality Address Data 36 % Naturally Correct 64 % Manual Attention
13
Copyright 2009, Information Builders. Slide 13 3 % Manual Attention 61 % Automated Cleansing 36 % Naturally Correct + Impact of Data Quality Address Data
14
Copyright 2009, Information Builders. Slide 14 What Is Data Management? Data Quality and Master Data Management Copyright 2007, Information Builders. Slide 14
15
Copyright 2009, Information Builders. Slide 15 Data Profiling Profiling Basic Analysis Minimums Maximums Averages Counts Etc. Patterns Extremes Quantities Frequency Analysis Foreign Key Analysis Masking Drilldown Copyright 2007, Information Builders. Slide 15
16
Copyright 2009, Information Builders. Slide 16 Parsing data parsed into components (pattern based) Standardization transformation into standard format (Jim Smith -> James Smith) standard and nonstandard abbreviations (Str. -> Street) language-specific replacements Data quality improvement validation against rules validation against reference tables Large number of domain oriented algorithms - examples: Address Party Vehicle Name Identification number Credit Card number Bank account number Extension by custom validation steps using complex function and rules including Levenshtein distance SoundEx internal (java-based) functions Data Cleansing
17
Copyright 2009, Information Builders. Slide 17 External company register standard company name registration ID official address national bank account classification Geocodes adding geo-codes for identified address allows showing map locations used for geomarketing or insurance risks External address register adding missing zip-codes, street names, city, etc. validating existence against register of addresses List of names, surnames, academic and social titles validating existence standardization (PHD -> Ph.D.) adding missing components Data Enrichment
18
Copyright 2009, Information Builders. Slide 18 Unification identification of the set of records connected to one person address vehicle contact …etc. Deduplication golden record creation (the best representation of the identified subject) Identification new data entries – to identify subject (person, address, etc.) to which the new record is connected (matched) Complex business rules using sophisticated algorithms and functions including Levenshtein distance Hamming distance Edit distance Data quality scores values Data stamps of last modification Source system originating data etc. Match & Merge
19
Copyright 2009, Information Builders. Slide 19 Master Data Management (MDM) Defined MDM for customer data systems are software products that: Support the global identification, linking and synchronization of customer information across heterogeneous data sources Create and manage a central, database-based system of record Enable the delivery of a single view for all stakeholders MDM architectural styles vary in: Instantiation of the customer master data — varying from the maintenance of a physical customer profile to a more-virtual, metadata-based indexing structure The latency of customer master data maintenance — varying from real-time, synchronous, reading and writing of the master data in a transactional context to batch, asynchronous harmonization of the master data across systems An MDM program potentially encompasses the management of customer, product, asset, person or party, supplier and financial masters.
20
Copyright 2009, Information Builders. Slide 20 MDM Architectures Master is Single Version of Truth Data Quality at Master Updates occur at Sources Updates propagated to Master Master Source Multiple Versions of Truth Data Quality is Ongoing Updates occur at Sources Keys and Metadata in Registry Updates propagated to other Sources (Optional) Master Source Consolidated Registry Master is Single Version of Truth Data Quality is Ongoing Updates occur at Sources or Master Updates propagated to other Sources Master Source Coexistence Master Source Master is Single Version of Truth Data Quality at Master Updates occur at Master Updates propagated to Sources Centralized
21
Copyright 2009, Information Builders. Slide 21 Examples And Demonstration Copyright 2007, Information Builders. Slide 21
22
Copyright 2009, Information Builders. Slide 22 Data Quality Examples Copyright 2007, Information Builders. Slide 22
23
Copyright 2009, Information Builders. Slide 23 Original data – before cleansing Source data NameGSINBirth DateAddress Dr. John SmithF00000000012/16/197814618 110 Ave Surrey V3R 2A9 Smith W. JohnM095-242-43416.12.1978Surrey 14618 110 Ave John William SmithSIN09524243478161225 Linden Str Toronto M4X 1V5 Dr. J.W. SmithM09524243311/16/78 John Smith09525243316.11.19788500 Leslie L3T 7M8 Toronto Smith John16.11.19788500 Leslie street Marham John Smiht09525243316.11.1978 Jane Watson4203472131982600-8500 Leslie str. Toronto L3T 7M8 Watson JaneF420-347-2135.1.19828500 Leslei street Toronto L3T 7M8 Jane SmithFSIN4203472131982-01-05 J. Smith420-347-213
24
Copyright 2009, Information Builders. Slide 24 Titles Parsing NameGSINBirth DateTitlesClearing Codes Dr. John SmithF00000000012/16/1978Dr. Academic_Title Smith W. JohnM095-242-43416.12.1978 John William Smith SIN095242434781612 Dr. J.W. SmithM09524243311/16/78Dr. Academic_Title John Smith09525243316.11.1978 Smith John16.11.1978 John Smiht09525243316.11.1978 Jane Watson4203472131982 Watson JaneF420-347-2135.1.1982 Jane SmithFSIN4203472131982-01-05 J. Smith420-347-213
25
Copyright 2009, Information Builders. Slide 25 Name Parsing FirstMLastGSINBirth DateClearing Codes JohnSmithF00000000012/16/1978 Academic_Title JohnW.SmithM095-242-43416.12.1978 JohnWilliamSmithSIN095242434781612 J.W.SmithM09524243311/16/78 Academic_Title JohnSmith09525243316.11.1978 JohnSmith16.11.1978 JohnSmiht09525243316.11.1978 Last_name_not_found JaneWatson4203472131982 JaneWatsonF420-347-2135.1.1982 JaneSmithFSIN4203472131982-01-05 J.Smith420-347-213
26
Copyright 2009, Information Builders. Slide 26 Update gender (based on first name) FirstMLastGSINBirth DateClearing Codes JohnSmithM00000000012/16/1978...le, Gender_changed JohnW.SmithM095-242-43416.12.1978 JohnWilliamSmithMSIN095242434781612 Gender_updated J.W.SmithM09524243311/16/78 Academic_Title JohnSmithM09525243316.11.1978 Gender_updated JohnSmithM16.11.1978 Gender_updated JohnSmihtM09525243316.11.1978 Last_name_not_found JaneWatsonF4203472131982 Gender_updated JaneWatsonF420-347-2135.1.1982 JaneSmithFSIN4203472131982-01-05 J.Smith420-347-213
27
Copyright 2009, Information Builders. Slide 27 Validate Social Security Number FirstMLastGSINBirth DateClearing Codes JohnSmithM00000000012/16/1978...nged, SIN_blacklist JohnW.SmithM095-242-43416.12.1978 SIN_removed_dashes JohnWilliamSmithMSIN095242434781612...ated, SIN_extra_chars J.W.SmithM09524243311/16/78...mic_Title, SIN_invalid JohnSmithM09525243316.11.1978 Gender_updated JohnSmithM16.11.1978...updated, SIN_missing JohnSmihtM09525243316.11.1978 Last_name_not_found JaneWatsonF4203472131982 Gender_updated JaneWatsonF420-347-2135.1.1982 SIN_removed_dashes JaneSmithFSIN4203472131982-01-05 SIN_extra_characters J.Smith420-347-213 SIN_removed_dashes
28
Copyright 2009, Information Builders. Slide 28 Validate Social Security Number (after) FirstMLastGSINBirth DateClearing Codes JohnSmithM12/16/1978...nged, SIN_blacklist JohnW.SmithM09524243416.12.1978 SIN_removed_dashes JohnWilliamSmithM095242434781612...ated, SIN_extra_chars J.W.SmithM11/16/78...mic_Title, SIN_invalid JohnSmithM09525243316.11.1978 Gender_updated JohnSmithM16.11.1978...updated, SIN_missing JohnSmihtM09525243316.11.1978 Last_name_not_found JaneWatsonF4203472131982 Gender_updated JaneWatsonF4203472135.1.1982 SIN_removed_dashes JaneSmithF4203472131982-01-05 SIN_extra_characters J.Smith420347213 SIN_removed_dashes
29
Copyright 2009, Information Builders. Slide 29 Validate Birth Date FirstMLastGSINBirth DateClearing Codes JohnSmithM12/16/1978...nged, SIN_blacklist JohnW.SmithM09524243416.12.1978 SIN_removed_dashes JohnWilliamSmithM095242434781612...ated, SIN_extra_chars J.W.SmithM11/16/78...mic_Title, SIN_invalid JohnSmithM09525243316.11.1978 Gender_updated JohnSmithM16.11.1978...updated, SIN_missing JohnSmihtM09525243316.11.1978 Last_name_not_found JaneWatsonF4203472131982.._updated, BD_invalid JaneWatsonF4203472135.1.1982 SIN_removed_dashes JaneSmithF4203472131982-01-05 SIN_extra_characters J.Smith420347213 SIN_removed_dashes
30
Copyright 2009, Information Builders. Slide 30 Validate Birth Date (after) FirstMLastGSINBirth DateClearing Codes JohnSmithM1978-12-16...nged, SIN_blacklist JohnW.SmithM0952424341978-12-16 SIN_removed_dashes JohnWilliamSmithM0952424341978-12-16...ated, SIN_extra_chars J.W.SmithM1978-11-16...mic_Title, SIN_invalid JohnSmithM0952524331978-11-16 Gender_updated JohnSmithM1978-11-16...updated, SIN_missing JohnSmihtM0952524331978-11-16 Last_name_not_found JaneWatsonF420347213.._updated, BD_invalid JaneWatsonF4203472131982-01-05 SIN_removed_dashes JaneSmithF4203472131982-01-05 SIN_extra_characters J.Smith420347213 SIN_removed_dashes
31
Copyright 2009, Information Builders. Slide 31 Prepared data (after cleansing) Cleansed data FirstLastGSINBirth DateAddress JohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM095242434M4X 1V5;ON;Toronto;25 Linden Street SmithM1978-11-16 JohnSmithM0952524331978-11-16L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmithM1978-11-16L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmihtM0952524331978-11-16 JaneWatsonF420347213L3T 7M8;ON;Markham;8500 Leslie Str. JaneWatsonF4203472131982-01-01L3T 7M8;ON;Markham;8500 Leslie Str. JaneSmithF4203472131982-01-05 J.Smith420347213
32
Copyright 2009, Information Builders. Slide 32 Master Data Management Examples Copyright 2007, Information Builders. Slide 32
33
Copyright 2009, Information Builders. Slide 33 Prepared data (after cleansing) Cleansed data FirstLastGSINBirth DateAddress JohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM095242434M4X 1V5;ON;Toronto;25 Linden Street SmithM1978-11-16 JohnSmithM0952524331978-11-16L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmithM1978-11-16L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmiht0952524331978-11-16 JaneWatsonF420347213L3T 7M8;ON;Markham;8500 Leslie Str. JaneWatsonF4203472131982-01-01L3T 7M8;ON;Markham;8500 Leslie Str. JaneSmithF4203472131982-01-05 J.Smith420347213
34
Copyright 2009, Information Builders. Slide 34 Match Cleansed data FirstLastGSINBirth DateAddress JohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM095242434M4X 1V5;ON;Toronto;25 Linden Street SmithM1978-11-16 JohnSmithM0952524331978-11-16L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmithM1978-11-16L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmiht0952524331978-11-16 JaneWatsonF420347213L3T 7M8;ON;Markham;8500 Leslie Str. JaneWatsonF4203472131982-01-01L3T 7M8;ON;Markham;8500 Leslie Str. JaneSmithF4203472131982-01-05 J.Smith420347213
35
Copyright 2009, Information Builders. Slide 35 Merge Cleansed data FirstLastGSINBirth DateAddress JohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM095242434M4X 1V5;ON;Toronto;25 Linden Street Golden record FirstLastGSINBirth DateAddress JohnSmithM 0952424341978-12-16 M4X 1V5;ON;Toronto;25 Linden Street The newest permanent address The most frequent address V3R 2A9;BC;Surrey;14618 110 Avenue
36
Copyright 2009, Information Builders. Slide 36 Demonstration Copyright 2007, Information Builders. Slide 36
37
Copyright 2009, Information Builders. Slide 37 Thank-You Copyright 2007, Information Builders. Slide 37
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.