Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright 2009, Information Builders. Slide 1 iWay Enterprise Information Management (EIM) Data Quality and Master Data Management Kam Wong Solutions Architect.

Similar presentations


Presentation on theme: "Copyright 2009, Information Builders. Slide 1 iWay Enterprise Information Management (EIM) Data Quality and Master Data Management Kam Wong Solutions Architect."— Presentation transcript:

1 Copyright 2009, Information Builders. Slide 1 iWay Enterprise Information Management (EIM) Data Quality and Master Data Management Kam Wong Solutions Architect Information Builders D.C. User Forum December 8, 2009

2 Copyright 2009, Information Builders. Slide 2 Data Quality and Master Data Management Agenda  Business Drivers Behind Data Management  Usage – Where To Use Data Management  Impact Of Data Quality  What Is Data Management?  Data Profiling  Data Cleansing  Data Enrichment  Match & Merge (De-duplication)  Master Data Management  Examples and Demonstration

3 Copyright 2009, Information Builders. Slide 3 Business Drivers Copyright 2007, Information Builders. Slide 3

4 Copyright 2009, Information Builders. Slide 4 Business Drivers  Customer Service  Marketing Campaigns  Process Improvement  Regulatory Compliance  Fraud Detection

5 Copyright 2009, Information Builders. Slide 5 Data Drivers  Accuracy  Correct Information  Completeness  Thorough Information  Consistency  Uniform Information  Validity  Valid Information

6 Copyright 2009, Information Builders. Slide 6 Copyright 2007, Information Builders. Slide 6 Business Intelligence Drivers Data Quality is the Cornerstone of Effective Business Intelligence, and operations for that matter. So far companies have spent significant amount IT budget to integrate disparate application, creating data warehouse in order to get better Business Intelligence. However, many companies overlook the fact that, at the end of the day, it is the underlying data that matters. All of the pretty screens and reports in the world would not make a difference if the data that resides in the system is full of errors, inconsistent and redundant.  In order to achieve successful business intelligence companies need to tackle the Data Quality problem first.

7 Copyright 2009, Information Builders. Slide 7 Usage Copyright 2007, Information Builders. Slide 7

8 Copyright 2009, Information Builders. Slide 8 Copyright 2007, Information Builders. Slide 8 Analytic EIM (Batch or Real-time)  Analytical EIM focuses on improving the data quality and accuracy of BI reports Operational EIM (Real-time)  Goal is to synchronize operational systems data with golden record so that you have quality and consistency across enterprise processes. EIM Usages

9 Copyright 2009, Information Builders. Slide 9 Copyright 2007, Information Builders. Slide 9 EIM Dimensions DW/DM System

10 Copyright 2009, Information Builders. Slide 10 Processes Transactions Documents Supplier, Partners Customer, Exchange Data Warehouse, Data Mart. ODS Applications Portals Enterprise Search BI and Real-Time Dashboards Universal Adapter Suite Core Integration Services Reporting Application Data Management Mainframe Data, Applications and Transactions Applications, CRM, ERP, etc Databases, Data Warehouse, Data Marts Documents, Files, Content Management Messages, Transactions, E-Mails SWIFT, HIPAA, EDI Formats EIM and WebFOCUS Solutions Core Reporting Services

11 Copyright 2009, Information Builders. Slide 11 Impact Copyright 2007, Information Builders. Slide 11

12 Copyright 2009, Information Builders. Slide 12 Impact of Data Quality Address Data 36 % Naturally Correct 64 % Manual Attention

13 Copyright 2009, Information Builders. Slide 13 3 % Manual Attention 61 % Automated Cleansing 36 % Naturally Correct + Impact of Data Quality Address Data

14 Copyright 2009, Information Builders. Slide 14 What Is Data Management? Data Quality and Master Data Management Copyright 2007, Information Builders. Slide 14

15 Copyright 2009, Information Builders. Slide 15 Data Profiling  Profiling  Basic Analysis  Minimums  Maximums  Averages  Counts  Etc.  Patterns  Extremes  Quantities  Frequency Analysis  Foreign Key Analysis  Masking  Drilldown Copyright 2007, Information Builders. Slide 15

16 Copyright 2009, Information Builders. Slide 16  Parsing  data parsed into components (pattern based)  Standardization  transformation into standard format (Jim Smith -> James Smith)  standard and nonstandard abbreviations (Str. -> Street)  language-specific replacements  Data quality improvement  validation against rules  validation against reference tables  Large number of domain oriented algorithms - examples:  Address  Party  Vehicle  Name  Identification number  Credit Card number  Bank account number  Extension by custom validation steps  using complex function and rules including  Levenshtein distance  SoundEx  internal (java-based) functions Data Cleansing

17 Copyright 2009, Information Builders. Slide 17  External company register  standard company name  registration ID  official address  national bank account classification  Geocodes  adding geo-codes for identified address  allows showing map locations  used for geomarketing or insurance risks  External address register  adding missing zip-codes, street names, city, etc.  validating existence against register of addresses  List of names, surnames, academic and social titles  validating existence  standardization (PHD -> Ph.D.)  adding missing components Data Enrichment

18 Copyright 2009, Information Builders. Slide 18  Unification  identification of the set of records connected to one  person  address  vehicle  contact  …etc.  Deduplication  golden record creation (the best representation of the identified subject)  Identification  new data entries – to identify subject (person, address, etc.) to which the new record is connected (matched)  Complex business rules  using sophisticated algorithms and functions including  Levenshtein distance  Hamming distance  Edit distance  Data quality scores values  Data stamps of last modification  Source system originating data  etc. Match & Merge

19 Copyright 2009, Information Builders. Slide 19 Master Data Management (MDM) Defined  MDM for customer data systems are software products that:  Support the global identification, linking and synchronization of customer information across heterogeneous data sources  Create and manage a central, database-based system of record  Enable the delivery of a single view for all stakeholders  MDM architectural styles vary in:  Instantiation of the customer master data — varying from the maintenance of a physical customer profile to a more-virtual, metadata-based indexing structure  The latency of customer master data maintenance — varying from real-time, synchronous, reading and writing of the master data in a transactional context to batch, asynchronous harmonization of the master data across systems  An MDM program potentially encompasses the management of customer, product, asset, person or party, supplier and financial masters.

20 Copyright 2009, Information Builders. Slide 20 MDM Architectures  Master is Single Version of Truth  Data Quality at Master  Updates occur at Sources  Updates propagated to Master Master Source  Multiple Versions of Truth  Data Quality is Ongoing  Updates occur at Sources  Keys and Metadata in Registry  Updates propagated to other Sources (Optional) Master Source Consolidated Registry  Master is Single Version of Truth  Data Quality is Ongoing  Updates occur at Sources or Master  Updates propagated to other Sources Master Source Coexistence Master Source  Master is Single Version of Truth  Data Quality at Master  Updates occur at Master  Updates propagated to Sources Centralized

21 Copyright 2009, Information Builders. Slide 21 Examples And Demonstration Copyright 2007, Information Builders. Slide 21

22 Copyright 2009, Information Builders. Slide 22 Data Quality Examples Copyright 2007, Information Builders. Slide 22

23 Copyright 2009, Information Builders. Slide 23 Original data – before cleansing Source data NameGSINBirth DateAddress Dr. John SmithF00000000012/16/197814618 110 Ave Surrey V3R 2A9 Smith W. JohnM095-242-43416.12.1978Surrey 14618 110 Ave John William SmithSIN09524243478161225 Linden Str Toronto M4X 1V5 Dr. J.W. SmithM09524243311/16/78 John Smith09525243316.11.19788500 Leslie L3T 7M8 Toronto Smith John16.11.19788500 Leslie street Marham John Smiht09525243316.11.1978 Jane Watson4203472131982600-8500 Leslie str. Toronto L3T 7M8 Watson JaneF420-347-2135.1.19828500 Leslei street Toronto L3T 7M8 Jane SmithFSIN4203472131982-01-05 J. Smith420-347-213

24 Copyright 2009, Information Builders. Slide 24 Titles Parsing NameGSINBirth DateTitlesClearing Codes Dr. John SmithF00000000012/16/1978Dr. Academic_Title Smith W. JohnM095-242-43416.12.1978 John William Smith SIN095242434781612 Dr. J.W. SmithM09524243311/16/78Dr. Academic_Title John Smith09525243316.11.1978 Smith John16.11.1978 John Smiht09525243316.11.1978 Jane Watson4203472131982 Watson JaneF420-347-2135.1.1982 Jane SmithFSIN4203472131982-01-05 J. Smith420-347-213

25 Copyright 2009, Information Builders. Slide 25 Name Parsing FirstMLastGSINBirth DateClearing Codes JohnSmithF00000000012/16/1978 Academic_Title JohnW.SmithM095-242-43416.12.1978 JohnWilliamSmithSIN095242434781612 J.W.SmithM09524243311/16/78 Academic_Title JohnSmith09525243316.11.1978 JohnSmith16.11.1978 JohnSmiht09525243316.11.1978 Last_name_not_found JaneWatson4203472131982 JaneWatsonF420-347-2135.1.1982 JaneSmithFSIN4203472131982-01-05 J.Smith420-347-213

26 Copyright 2009, Information Builders. Slide 26 Update gender (based on first name) FirstMLastGSINBirth DateClearing Codes JohnSmithM00000000012/16/1978...le, Gender_changed JohnW.SmithM095-242-43416.12.1978 JohnWilliamSmithMSIN095242434781612 Gender_updated J.W.SmithM09524243311/16/78 Academic_Title JohnSmithM09525243316.11.1978 Gender_updated JohnSmithM16.11.1978 Gender_updated JohnSmihtM09525243316.11.1978 Last_name_not_found JaneWatsonF4203472131982 Gender_updated JaneWatsonF420-347-2135.1.1982 JaneSmithFSIN4203472131982-01-05 J.Smith420-347-213

27 Copyright 2009, Information Builders. Slide 27 Validate Social Security Number FirstMLastGSINBirth DateClearing Codes JohnSmithM00000000012/16/1978...nged, SIN_blacklist JohnW.SmithM095-242-43416.12.1978 SIN_removed_dashes JohnWilliamSmithMSIN095242434781612...ated, SIN_extra_chars J.W.SmithM09524243311/16/78...mic_Title, SIN_invalid JohnSmithM09525243316.11.1978 Gender_updated JohnSmithM16.11.1978...updated, SIN_missing JohnSmihtM09525243316.11.1978 Last_name_not_found JaneWatsonF4203472131982 Gender_updated JaneWatsonF420-347-2135.1.1982 SIN_removed_dashes JaneSmithFSIN4203472131982-01-05 SIN_extra_characters J.Smith420-347-213 SIN_removed_dashes

28 Copyright 2009, Information Builders. Slide 28 Validate Social Security Number (after) FirstMLastGSINBirth DateClearing Codes JohnSmithM12/16/1978...nged, SIN_blacklist JohnW.SmithM09524243416.12.1978 SIN_removed_dashes JohnWilliamSmithM095242434781612...ated, SIN_extra_chars J.W.SmithM11/16/78...mic_Title, SIN_invalid JohnSmithM09525243316.11.1978 Gender_updated JohnSmithM16.11.1978...updated, SIN_missing JohnSmihtM09525243316.11.1978 Last_name_not_found JaneWatsonF4203472131982 Gender_updated JaneWatsonF4203472135.1.1982 SIN_removed_dashes JaneSmithF4203472131982-01-05 SIN_extra_characters J.Smith420347213 SIN_removed_dashes

29 Copyright 2009, Information Builders. Slide 29 Validate Birth Date FirstMLastGSINBirth DateClearing Codes JohnSmithM12/16/1978...nged, SIN_blacklist JohnW.SmithM09524243416.12.1978 SIN_removed_dashes JohnWilliamSmithM095242434781612...ated, SIN_extra_chars J.W.SmithM11/16/78...mic_Title, SIN_invalid JohnSmithM09525243316.11.1978 Gender_updated JohnSmithM16.11.1978...updated, SIN_missing JohnSmihtM09525243316.11.1978 Last_name_not_found JaneWatsonF4203472131982.._updated, BD_invalid JaneWatsonF4203472135.1.1982 SIN_removed_dashes JaneSmithF4203472131982-01-05 SIN_extra_characters J.Smith420347213 SIN_removed_dashes

30 Copyright 2009, Information Builders. Slide 30 Validate Birth Date (after) FirstMLastGSINBirth DateClearing Codes JohnSmithM1978-12-16...nged, SIN_blacklist JohnW.SmithM0952424341978-12-16 SIN_removed_dashes JohnWilliamSmithM0952424341978-12-16...ated, SIN_extra_chars J.W.SmithM1978-11-16...mic_Title, SIN_invalid JohnSmithM0952524331978-11-16 Gender_updated JohnSmithM1978-11-16...updated, SIN_missing JohnSmihtM0952524331978-11-16 Last_name_not_found JaneWatsonF420347213.._updated, BD_invalid JaneWatsonF4203472131982-01-05 SIN_removed_dashes JaneSmithF4203472131982-01-05 SIN_extra_characters J.Smith420347213 SIN_removed_dashes

31 Copyright 2009, Information Builders. Slide 31 Prepared data (after cleansing) Cleansed data FirstLastGSINBirth DateAddress JohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM095242434M4X 1V5;ON;Toronto;25 Linden Street SmithM1978-11-16 JohnSmithM0952524331978-11-16L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmithM1978-11-16L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmihtM0952524331978-11-16 JaneWatsonF420347213L3T 7M8;ON;Markham;8500 Leslie Str. JaneWatsonF4203472131982-01-01L3T 7M8;ON;Markham;8500 Leslie Str. JaneSmithF4203472131982-01-05 J.Smith420347213

32 Copyright 2009, Information Builders. Slide 32 Master Data Management Examples Copyright 2007, Information Builders. Slide 32

33 Copyright 2009, Information Builders. Slide 33 Prepared data (after cleansing) Cleansed data FirstLastGSINBirth DateAddress JohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM095242434M4X 1V5;ON;Toronto;25 Linden Street SmithM1978-11-16 JohnSmithM0952524331978-11-16L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmithM1978-11-16L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmiht0952524331978-11-16 JaneWatsonF420347213L3T 7M8;ON;Markham;8500 Leslie Str. JaneWatsonF4203472131982-01-01L3T 7M8;ON;Markham;8500 Leslie Str. JaneSmithF4203472131982-01-05 J.Smith420347213

34 Copyright 2009, Information Builders. Slide 34 Match Cleansed data FirstLastGSINBirth DateAddress JohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM095242434M4X 1V5;ON;Toronto;25 Linden Street SmithM1978-11-16 JohnSmithM0952524331978-11-16L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmithM1978-11-16L3T 7M8;ON;Markham;8500 Leslie Str. JohnSmiht0952524331978-11-16 JaneWatsonF420347213L3T 7M8;ON;Markham;8500 Leslie Str. JaneWatsonF4203472131982-01-01L3T 7M8;ON;Markham;8500 Leslie Str. JaneSmithF4203472131982-01-05 J.Smith420347213

35 Copyright 2009, Information Builders. Slide 35 Merge Cleansed data FirstLastGSINBirth DateAddress JohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 Avenue JohnSmithM095242434M4X 1V5;ON;Toronto;25 Linden Street Golden record FirstLastGSINBirth DateAddress JohnSmithM 0952424341978-12-16 M4X 1V5;ON;Toronto;25 Linden Street The newest permanent address The most frequent address V3R 2A9;BC;Surrey;14618 110 Avenue

36 Copyright 2009, Information Builders. Slide 36 Demonstration Copyright 2007, Information Builders. Slide 36

37 Copyright 2009, Information Builders. Slide 37 Thank-You Copyright 2007, Information Builders. Slide 37


Download ppt "Copyright 2009, Information Builders. Slide 1 iWay Enterprise Information Management (EIM) Data Quality and Master Data Management Kam Wong Solutions Architect."

Similar presentations


Ads by Google