Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sept - Dec 2009 - w1d11 Beyond Accuracy: What Data Quality Means to Data Consumers CMPT 455/826 - Week 1, Day 1 (based on R.Y. Wang & D.M. Strong)

Similar presentations


Presentation on theme: "Sept - Dec 2009 - w1d11 Beyond Accuracy: What Data Quality Means to Data Consumers CMPT 455/826 - Week 1, Day 1 (based on R.Y. Wang & D.M. Strong)"— Presentation transcript:

1 Sept - Dec 2009 - w1d11 Beyond Accuracy: What Data Quality Means to Data Consumers CMPT 455/826 - Week 1, Day 1 (based on R.Y. Wang & D.M. Strong)

2 Basic Premise Many data-bases are not error-free Data quality problems, –go beyond accuracy to include other aspects such as completeness and accessibility Sept - Dec 2009 - w1d12

3 Data Quality The authors define "data quality" as –data that are fit for use by data consumers Challenge: isn’t that “quality data” rather than data quality? There are many problems with the use and misuse of the term quality. [further discussion of this problem on next slide] Sept - Dec 2009 - w1d13

4 Quality Unfortunately "quality" is a word that has many meanings depending on a person's perspective. –When “quality” is used as a noun it refers to some attribute or feature of a thing without regard to any evaluation of whether that attribute is good or bad. Systems may be described in terms of an infinite number of noun qualities. –When “quality” is used as an adjective it refers to a favorable evaluation of the thing to which it refers. There are an infinite number of bases for evaluating adjectival qualities. Despite all being favorable, some of the types of adjectival quality do not have an objective basis. The quality of a given object may not be quantifiable without relating it to the quality of some other object. Sept - Dec 2009 - w1d14

5 Data Quality Dimensions The authors define a "data quality dimension" as –a set of data quality attributes –that represent a single aspect or construct –of data quality. This may include some data quality attributes, such as: –accuracy, timeliness, precision, reliability, currency, completeness, and relevance, accessibility and interpretability Please note that the “data quality dimension” is something beyond other important data dimensions we are already familiar with: –“data value” - the data that is actually stored in our database –“data format” – the data structure that is used by our database to store the data value Sept - Dec 2009 - w1d15

6 Dimensions and Attributes Opportunity: The authors assume that the reader understands the distinction between dimensions and attributes. Attributes: –are defined in the data definition of a database –contain identifiable components of the data in the database –are something that you should all be used to (before taking this class) Sept - Dec 2009 - w1d16

7 Dimensions and Attributes Dimensions: –help us to organize the data by organizing data based on general concepts e.g. location, customers, products, finances, time –help us recognize similar purposes for the data by involving / combining different attributes of data NOTE: different attributes may have different granularities e.g. a “location” dimension can include attributes: city, province, country NOTE: some attributes may work in combination, e.g. 1 st and last names –may have further characteristics such as: their own data about the data (which is referred to as “meta-data”) their own particular structure, ordering, and/or (sub)dimensions potentially sharing data/attributes with other dimensions Sept - Dec 2009 - w1d17

8 Dimensions and Attributes Dimensions are a/the MAJOR FOCUS of this course –so if they are not clear yet, don’t worry but if they are not clear by the end of the course, then you should worry Now back to our consideration of this introductory paper –in this consideration, please note all the different possible concepts that we should consider along with the data itself Sept - Dec 2009 - w1d18

9 Hypothesis Their preliminary conceptual framework for data quality: –The data must be accessible to the data consumer. For example, the consumer knows how to retrieve the data. –The consumer must be able to interpret the data. For example, the data are not represented in a foreign language. –The data must be relevant to the consumer. For example, data are relevant and timely for use by the data consumer in the decision-making process. –The consumer must find the data accurate. For example, the data are correct, objective and come from reputable sources Sept - Dec 2009 - w1d19

10 Quality Framework Challenge: The authors did not research major relevant quality frameworks. ISO 9126-1 Software engineering – Software product quality – Quality characteristics and sub-characteristics –“categorizes the attributes of software quality into six characteristics, which are further subdivided”: Functionality Reliability Usability Efficiency Maintainability Portability Sept - Dec 2009 - w1d110

11 Functionality “the capability of the software to provide functions which meet stated and implied needs when the software is used under specified conditions.” [ISO 9126-1] –includes: suitability, which evaluates how system functions meet the needs of user tasks accuracy, which evaluates the achievement of the right results interoperability, which evaluates interactions with other systems security, which evaluates the ability of the system to withstand unauthorized accesses and modifications Sept - Dec 2009 - w1d111

12 Reliability “the capability of the software to maintain the level of performance of the system when used under specified conditions”. [ISO 9126-1] –includes: maturity, which evaluates the ability of the system to avoid failures, regardless of any faults it has fault tolerance, which evaluates the capability of the system to maintain a suitable level of performance in spite of faults or other difficulties recoverability, which evaluates the ability of the system to recover its data and performance after a failure Sept - Dec 2009 - w1d112

13 Usability “the capability of the software to be understood, learned, used and liked by the user, when used under specified conditions”. [ISO 9126-1] –includes: understandability, which evaluates the ability of users to understand how, when, and where to use the system, learnability, which evaluates the ability (including the effort required) for users to learn how to use the system, operability, which evaluates the ability of the product to be used and controlled by the user, attractiveness, which evaluates the ability of the product to be “liked” by users. Sept - Dec 2009 - w1d113

14 Efficiency “the capability of the software to provide the required performance, relative to the amount of resources used, under stated conditions”. [ISO 9126-1] –includes: time behaviour, which evaluates the appropriateness of response and processing times of the system, resource utilization, which evaluates the use of resources in performing system functions. Sept - Dec 2009 - w1d114

15 Maintainability “ the capability of the software to be modified”. [ISO 9126-1] –includes: analysability, which evaluates the ability to identify problems in the system, changeability, which evaluates the ability to implement modifications to the system, stability, which evaluates the ability to minimize undesired side effects of modifications, testability, which evaluates the ability to validate modified software. Sept - Dec 2009 - w1d115

16 Portability “the capability of software to be transferred from one environment to another”. [ISO 9126-1] –includes: adaptability, which evaluates the ability to modify software via features rather than reprogramming to meet the needs of different environments, installability, which evaluates the ability to install software in a given environment, co-existence, which evaluates the ability of the software to share common resources with other installed software, replaceability, which evaluates the ability of software to replace other software. Sept - Dec 2009 - w1d116

17 Quality characteristics The “Quality characteristics and sub-characteristics” of ISO 9126-1 –are a number of sub-dimensions of the data quality dimension So are the various “data quality attributes” of the authors –(accuracy, timeliness, precision, reliability, currency, completeness, and relevance, accessibility and interpretability) A “dimension" only becomes an attribute when it is recorded with the data –(as meta data that can be used computationally) –It is important to try to be precise in what we are saying –That way we can help clarify all these concepts Sept - Dec 2009 - w1d117

18 On being precise English is a very imprecise language –and it is very possible for different people to have different expectations of the same concept –e.g. ISO 9241-11 has a very different definition of “usability” from ISO 9126-1 –? guess which one I use more regularly Most people expect data to be precise –There are problems when it is not what we expect –Given a weather forecast for a high of 30 think how a Canadian and an American will dress –But given a forecast for 30F how will they dress? –Sometimes we need metadata to help interpret data Sept - Dec 2009 - w1d118

19 Their “Research” 1 st survey identified 179 attributes 2 nd survey was analyzed by factor analysis to group attributes into 20 “intermediate dimensions” Then they moved these 20 into the 4 components of their hypothesised framework Finally they revised the names of 2 of their 4 framework components Sept - Dec 2009 - w1d119

20 So why this paper? Not because of its (dubious) research methodology  –Where “research data” is forced into preconceived hypothesis –Where quality attributes are investigated out of any specific context This paper –Identifies many different concerns regarding information Including the need to contextualize it –Demonstrates that we need to develop approaches to help Design for quality data (whatever that means) Identify the qualities that are important to our users Justify (and then evaluate) our efforts at achieving quality –Provides a basis for examples of challenges and opportunities Sept - Dec 2009 - w1d120

21 What about future papers? All of the papers for this course –have some good points and some failings (like we all do) –are designed to make you think –can help you to develop better data / information / knowledge systems But none of the papers –have all the answers – or – –are a how to cookbook So we have to work to figure out how to apply them Sept - Dec 2009 - w1d121


Download ppt "Sept - Dec 2009 - w1d11 Beyond Accuracy: What Data Quality Means to Data Consumers CMPT 455/826 - Week 1, Day 1 (based on R.Y. Wang & D.M. Strong)"

Similar presentations


Ads by Google