Download presentation
Presentation is loading. Please wait.
Published byJustina Cook Modified over 9 years ago
1
Chapter 1 Introduction to Data Quality
2
Data Quality Characteristics Data quality affects several attributes associated with data: Accuracy–Is it realistic or believable? Integrity–Is it structured and managed? Consistency–Is it consistently defined and maintained? Validity–Is the data valid, based on business or industry rules and standards?
3
What Causes Poor Data Quality? These factors can contribute to poor data quality: Business rules do not exist or there are no standards for data capture. Standards may exist but are not enforced at the point of data capture. Inconsistent data entry (incorrect spelling, use of nicknames, middle names, or aliases) occurs. Data entry mistakes (character transposition, misspellings, and so on) happen. Integration of data from systems with different data standards is present. Data quality issues are perceived as time-consuming and expensive to fix.
4
Primary Sources of Data Quality Problems Source: The Data Warehousing Institute, Data Quality and the Bottom Line, 2002
5
How Is Clean Data Achieved? Clean data is the result of a combination of efforts: making sure that data entered into the system is clean cleaning up problems after the data is accepted.
6
Typical Data Quality Issues The most common processes in a data quality initiative are Data Analysis and Standardization –consistency analysis –standardization schemes –gender analysis –entity analysis –data parsing and casing. continued...
7
Typical Data Quality Issues The most common processes in a data quality initiative are Matching and Merging –de-duplication –householding Address Verification – against a CASS certified database Geocoding – data enrichment using third-party data elements.
8
... Analysis and Standardization Example Who is the biggest supplier? Anderson Construction$ 2,333.50 Briggs,Inc$ 8,200.10 Brigs Inc.$12,900.79 Casper Corp.$27,191.05 Caspar Corp$ 6,000.00 Solomon Industries$43,150.00 The Casper Corp$11,500.00
9
... Standardization Scheme Briggs, Inc Brigs Inc. Briggs Inc. Casper Corp. Casper Corp. Caspar Corp The Casper Corp
10
Supplier Spending 0 10,000 20,000 30,000 40,000 50,000 $ Spent Casper Corp. Solomon Ind. Briggs Inc. Anderson Cons.
11
... Operational System of Records Data Warehouse 01Mark Carver SAS SAS Campus Drive Cary, N.C. 02Mark W. Craver Mark.Craver@sas.com 03Mark Craver Systems Engineer SAS Mark Carver SAS SAS Campus Drive Cary, N.C. Mark W. Craver Mark.Craver@sas.com Mark Craver Systems Engineer SAS Data Matching Example
12
... 01 Mark Craver Systems Engineer SAS SAS Campus Drive Cary, N.C. 27513 Mark.Craver@sas.com Data Quality Process Mark Carver SAS SAS Campus Drive Cary, N.C. Mark W. Craver Mark.Craver@sas.com Mark Craver Systems Engineer SAS Operational System of Records Data Warehouse DQ
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.