Download presentation
Presentation is loading. Please wait.
Published byRichard Julian Hood Modified over 8 years ago
1
DWH-Ahsan Abdullah 1 Data Warehousing Lecture-21 Introduction to Data Quality Management (DQM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan101@yahoo.com
2
DWH-Ahsan Abdullah 2 Introduction to Data Quality Management (DQM)
3
DWH-Ahsan Abdullah 3 What is Quality? Informally Some things are better than others i.e. they are of higher quality. How much “better” is better? Is the right item the best item to purchase? How about after the purchase? What is quality of service? The bank example
4
DWH-Ahsan Abdullah 4 What is Quality? Formally “Quality is conformance to requirements” P. Crosby, “Quality is Free” 1979 “Degree of excellence” Webster’s Third New International Dictionary
5
DWH-Ahsan Abdullah 5 What is Quality? Examples from Auto Industry Quality means meeting customer’s needs, not necessarily exceeding them. Quality means improving things customers care about, because that makes their lives easier and more comfortable. Why example from auto-industry?
6
DWH-Ahsan Abdullah 6 What is Data Quality? Muhammad Khan Height = 5’8” Weight = 160 lbs Gender = Male Age = 35 yrs Emp_ID = 440 All data is an abstraction of something real What is Data? Note Change the picture
7
DWH-Ahsan Abdullah 7 What is Data Quality? Intrinsic Data Quality Electronic reproduction of reality. Realistic Data Quality Degree of utility or value of data to business.
8
DWH-Ahsan Abdullah 8 Data Quality & Organizations Intelligent Learning Organization: High-quality data is an open, shared resource with value- adding processes. The dysfunctional learning organization: Low-quality data is a proprietary resource with cost-adding processes. {Comment: Put picture of person in water holding round tube with data written on it}
9
DWH-Ahsan Abdullah 9 Law #1 - “Data that is not used cannot be correct!” Law #2 - “Data quality is a function of its use, not its collection!” Law #3 - “Data will be no better than its most stringent use!” Law #4 - “Data quality problems increase with the age of the system!” Law #5 – “The less likely something is to occur, the more traumatic it will be when it happens!” Orr’s Laws of Data Quality
10
DWH-Ahsan Abdullah 10 Total Quality Control (TQM) Philosophy of involving all for systematic and continuous improvement. It is customer oriented. Why? TQM incorporates the concept of product quality, process control, quality assurance, and quality improvement. Quality assurance is NOT Quality improvement
11
DWH-Ahsan Abdullah 11 Co$t of fixing data quality Lowest Quality Highest quality Cost of achieving quality Defect minimization is economical. Defect elimination is very very expensive. Exponential rise in cost
12
DWH-Ahsan Abdullah 12 Co$t of Data Quality Defects Controllable Costs Recurring costs for analyzing, correcting, and preventing data errors Resultant Costs Internal and external failure costs of business opportunities missed. Equipment & Training Costs
13
DWH-Ahsan Abdullah 13 Where data quality is critical? Almost everywhere, some examples: Marketing communications. Customer matching. Retail house-holding. Combining MIS systems after acquisition.
14
DWH-Ahsan Abdullah 14 Characteristics or Dimensions of Data Quality Data Quality Characteristic Definition Accuracy Qualitatively assessing lack of error, high accuracy corresponding to small error. Completeness The degree to which values are present in the attributes that require them.
15
DWH-Ahsan Abdullah 15 Completeness Vs Accuracy 95% accurate and 100% complete OR 100% accurate and 95% complete Which is better? Depends on data quality (i) tolerances, the (ii) corresponding application and the (iii) cost of achieving that data quality vs. the (iv) business value.
16
DWH-Ahsan Abdullah 16 Characteristics or Dimensions of Data Quality Data Quality Characteristic Definition Consistency A measure of the degree to which a set of data satisfies a set of constraints. Timeliness A measure of how current or up to date the data is. Uniqueness The state of being only one of its kind or being without an equal or parallel. Interpretability The extent to which data is in appropriate languages, symbols, and units, and the definitions are clear. Accessibility The extent to which data is available, or easily and quickly retrievable Objectivity The extent to which data is unbiased, unprejudiced, and impartial
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.