DATA QUALITY PROBLEMS AND THEIR ROOT CAUSES DAMA COLUMBUS, OH CHAPTER MEETING – JANUARY 2015
2 Consequences of poor Data Quality Have multiple names for the same supplier and items in our system, cannot roll up spend by supplier by item correctly –this leads to an inability to shrink our supplier base and negotiate more volume discounts and control spend Cannot identify if the same customer has auto,property and life insurance with our organization which limits our ability to cross-sell/up- sell to them We are losing millions of dollars in postage and collateral costs by mailing to customers who live in the same household
3 Consequences of poor Data Quality Unable to identify active accounts receivables for the same B2B customer across our different lines of business. Customers with an outstanding balance of $500,000 are required to have their credit reviewed per SEC guidelines. Need to measure risk exposure by top 5 customer accounts across the organization from a credit perspective. The biggest challenge is figuring out parent child relationships across affiliates Have lost millions of dollars in procuring suppliers because we are not able to correctly identify and reconcile suppliers on a global basis
4 Examples of Data Quality Problems Data glitches Typos, multiple formats, multiple scales, missing/default values Business logic embedded in natural keys Data values that do not conform to the business rules Information buried in free-form/flex fields Mistakes, misspellings, incorrect data types, lack of standards etc.
5 Traditional Definition of Data Quality Accuracy: Does the data accurately represent reality or a verifiable source? Integrity: Is the structure of data and relationships among entities and attributes maintained consistently? Policy data that is not tied to a valid customer Consistency: Are data elements consistently defined and understood?
6 Example of Data Inconsistency Accuracy: Does the data accurately represent reality or a verifiable source? Integrity: Is the structure of data and relationships among entities and attributes maintained consistently? Policy data that is not tied to a valid customer Consistency: Are data elements consistently defined and understood?
7 Traditional Definition of Data Quality Completeness: Is all necessary data present? Field three is Revenue In dollars or cents? In dollars or euros? Field four is Product Sales In units or cases?
8 Traditional Definition of Data Quality Validity: Do data values fall within acceptable ranges defined by the business? Timeliness: Is data available when needed? Accessibility: Is the data easily accessible, understandable, and usable? Relevance
9 Another perspective on Data Quality When the quality of data is sufficient to support the business purpose for which it needs to be used, either by people or applications
10 Major Causes for Poor Data Quality Poorly designed business processes No central management of business processes No ownership or stewardship for data Weakness of tool(s) used to manage business processes Multiple instances of a central tool for managing business processes
11 Major Causes for Poor Data Quality Third party interfaces Data conversion from legacy systems Human error Lack of sufficient training on tools used to manage business processes
12 Case Study of a Manufacturing Organization Example of a case study with multiple heterogeneous ERP systems used by different divisions in the organization Centralized ERP system used to share Master Data Master Data published in a “one-to-many” way to other ERP systems Comprehensive documentation on the usage of centralized ERP system and acceptable domain values
13 Analysis of Case Study No specification on “who” had to enter the data “when” Same business task supported differently by divisional ERP instances Considerable lag between the physical presence of incoming goods and their visibility in the ERP system
14 Analysis of Case Study (Continued..) Multiple customizations of divisional ERP systems Weakness of GUI of central ERP system used for master data entry Central ERP system did not prevent duplicate entries form being made
15 Analysis of Case Study (Continued..) Weak search engine in ERP system Difficult to correct errors in ERP system Data conversion from legacy applications
16 Lessons Learnt Processing delays, divisional idiosyncrasies or operational errors cause data quality issues Business processes must be both aligned with the organization and oriented towards the customer
17 Lessons Learnt (Continued..) Multiple instances of the same tool customized by different persons from various institutions with divergent interests or work standards can cause issues Multiple instances of the same tool also cause issues due to lack of an enterprise-wide view
18 How to Prevent Data Quality Issues Best practices for handling separate ways to enter data in an IT system Declare one application as “reference” or “master” system Develop a set of common definitions and procedures
19 Example of Sample Governance Process
20 How to Prevent Data Quality Issues (Cont..) Avoid bias in producing data Avoid distributed architectures
21 How to Prevent Data Quality Issues (Cont..) Assign responsibility for data quality issues
22 Typical Data Governance Org Chart
23 How to Prevent Data Quality Issues (Cont..) Design Information Chains First
24 Questions? Contact me at :
25 References Presentation on Data Quality and Data Cleaning from Rutgers University