Understanding Data Quality

Slides:



Advertisements
Similar presentations
Quality Data for a Healthy Nation by Mary H. Stanfill, RHIA, CCS, CCS-P.
Advertisements

Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.
UNDERSTANDING DATA QUALITY 1. Data quality dimensions in the literature  include dimensions such as accuracy, reliability, importance, consistency, precision,
Data - Information - Knowledge
Managing Data Resources
Database Administration
Mgt 20600: IT Management & Applications Databases Tuesday April 4, 2006.
Page 1 Vienna, 03. June 2014 Mario Gavrić Croatian Bureau of Statistics Senior Adviser in Classification, Sampling, Statistical Methods and Analyses Department.
UNDERSTANDING DATA QUALITY 1. Philosophical Position and Important Definitions 2.
Creating Research proposal. What is a Marketing or Business Research Proposal? “A plan that offers ideas for conducting research”. “A marketing research.
Database Administration Chapter 16. Need for Databases  Data is used by different people, in different departments, for different reasons  Interpretation.
What is data quality? An introduction to the culture and philosophy of collecting and using accurate and useful data.
Database Systems: Design, Implementation, and Management Ninth Edition
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Database Design - Lecture 1
DBS201: DBA/DBMS Lecture 13.
ETICS2 All Hands Meeting VEGA GmbH INFSOM-RI Uwe Mueller-Wilm Palermo, Oct ETICS Service Management Framework Business Objectives and “Best.
Assessing the Capacity of Statistical Systems Development Data Group.
UNDERSTANDING DATA QUALITY 1. Philosophical Position and Important Definitions 2.
Fundamentals of Information Systems, Seventh Edition 1 Chapter 3 Data Centers, and Business Intelligence.
ICT Techniques Processing Information. Data validation Validation is the checking of data before processing takes place. You need to ensure this is done.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
© 2010 Health Information Management: Concepts, Principles, and Practice Chapter 5: Data and Information Management.
Database Administration
Information Quality in Customer Relationship Management Systems Utpal Bose Herb Rebhun Shohreh Hashemi University of Houston-Downtown ISECON November 3,
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
SWE 513: Software Engineering
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Week 7 Lecture Part 2 Introduction to Database Administration Samuel S. ConnSamuel S. Conn, Asst Professor.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 1 Quality management Produced in Collaboration between.
Session 6: Data Flow, Data Management, and Data Quality.
A Training Course for the Analysis and Reporting of Data from Education Management Information Systems (EMIS)
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 17 – IT Security.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Understanding Data Quality 1. Understanding of data handling 2.
Introduction To DBMS.
Development of Strategies for Census Data Dissemination
Modern Systems Analysis and Design Third Edition
An assessment framework for Intrusion Prevention System (IPS)
Information for marketing management
Chapter 1- Introduction
DSS & Warehousing Systems
TechStambha PMP Certification Training
ServiceNow Implementation Knowledge Management
GODFREY HODGSON HOLMES TARCA
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
Training Course on Integrated Management System for Regulatory Body
Database Management Systems
Frequently asked questions about software engineering
Database Management System (DBMS)
Chapter 4 Automated Tools for Systems Development
Chapter 1 Database Systems
File Systems and Databases
Data Quality By Suparna Kansakar.
Database Systems Chapter 1
Modern Systems Analysis and Design Third Edition
Software Architecture
Software Requirements Specification Document
Modern Systems Analysis and Design Third Edition
Metadata The metadata contains
DATA RECORDS & FILES By Sinkala.
Chapter 11 Describing Process Specifications and Structured Decisions
Chapter 1 Database Systems
Management information systems ( MIS )
Eloise Forster, Ed.D. Foundation for Educational Administration (FEA)
The ultimate in data organization
Data Warehousing Concepts
OBSERVER DATA MANAGEMENT PRINCIPLES AND BEST PRACTICE (Agenda Item 4)
Presentation transcript:

Understanding Data Quality

What is data quality?

Definition Quality data are accurate depictions of the real word that are consistent across an enterprise, secure and accessible, delivered in a timely manner, and suitable for their intended applications (Redman, 2001). The state of completeness, consistency, timeliness and accuracy that makes data appropriate for a specific use (Government of British Columbia). Data quality institutionalizes a set of repeatable processes to continuously monitor data and improve data accuracy, completeness, timeliness and relevance (Holly Hyland and Lisa Elliott).

Dimensions Accuracy Completeness Consistency Utility/Validity Timeliness Security Accessibility National Center for Education Statistics (NCES), US.

Dimensions Accuracy Completeness Consistency Utility/Validity Timeliness Security Accessibility The data represent the truth. The best up front tool for data accuracy is a “single, exhaustive data dictionary.” The data dictionary must be published, understood, and used. This is the definitive source for data elements that will include data definitions, formats, codes lists, formats for each type of data and restrictions on values or ranges

Dimensions Accuracy Completeness Consistency Utility/Validity Timeliness Security Accessibility All required elements are reported.

Dimensions Accuracy Completeness Consistency Utility/Validity Timeliness Security Accessibility Everyone who handles the data shares an understanding of the data and their definitions.

Dimensions Accuracy Completeness Consistency Utility/Validity Timeliness Security Accessibility The data provide the right information to answer the questions that are asked.

Dimensions Accuracy Completeness Consistency Utility/Validity Timeliness Security Accessibility Quality data are accessible to users at the correct time in order to provide information for decision- making.

Dimensions Accuracy Completeness Consistency Utility/Validity Timeliness Security Accessibility Quality data are secured to protect privacy and to prevent tampering.

Dimensions Accuracy Completeness Consistency Utility/Validity Timeliness Security Accessibility Data quality results from data use. Data must be available to authorized staff to improve decision making.

Understanding of data handling

Understanding of data handling Read this passage. How many processes have you noticed? What are the processes involved? How data is handled in each process?

The first stage in data analysis is the preparation of an appropriate form in which the relevant data can be collected and coded in a format suitable for entry into a computer; this stage is referred to as data processing. The second stage is to review the recorded data, checking for accuracy, consistency and completeness; this process is often referred to as data editing. Next, the investigator summarizes the data in a concise form to allow subsequent analysis—this is generally done by presenting the distribution of the observations according to key characteristics in tables, graphs and summary measures. This stage is known as data reduction. Only after data processing, editing and reduction should more elaborate statistical manipulation of the data be pursued.

Data handling is the process of ensuring that data is stored, archived or disposed off in a safe and secure manner during and after completion of any program/project. This includes the development of policies and procedures to manage data handled electronically as well as through non-electronic means

Proper planning for data handling can result in efficient and economical storage, retrieval, and disposal of data.

In the case of data handled electronically, data integrity is a primary concern to ensure that recorded data is not altered, erased, lost or accessed by unauthorized users.

Issues that should be considered in ensuring integrity of data handled include the following: Type of data handled and its impact. Type of media containing data and its storage capacity, handling and storage requirements, reliability, longevity, retrieval effectiveness, and ease of upgrade to newer media. Data handling responsibilities/privileges, that is, who can handle which portion of data, at what point during the program/project, for what purpose, etc. Data handling procedures that describe how long the data should be kept, and when, how, and who should handle data for storage, sharing, archival, retrieval and disposal purposes.

Data quality dimensions in the literature include dimensions such as accuracy, reliability, importance, consistency, precision, timeliness, understandability, conciseness and usefulness Wand and Wang (1996: p92)

Kahn et al. (1997) developed a data quality framework based on product and service quality theory, in the context of delivering quality information to information consumers.

Four levels of information quality were defined: sound information, useful information, usable information, and effective information. The framework was used to define a process model to help organisations plan to improve data quality.

A more formal approach to data quality is provided in the framework of Wand and Wang (1996) who use Bunge’s ontology to define data quality dimensions. They formally define five intrinsic data quality problems: incomplete, meaningless, ambiguous, redundant, incorrect.

Semiotic Theory Semiotic theory concerns the use of symbols to convey knowledge. Stamper (1992) defines six levels for analysing symbols. These are the physical, empirical, syntactic, semantic, pragmatic and social levels.

Data quality could be emphasize on these levels: Concern with physical and physical media for communications of data Physical - Empirical - Syntactic - concerned with the structure of data Semantic - concerns with the meaning of data Pragmatic - concerns with the usage of data (usability and usefulness) Social - concerns with the shared understanding of the meaning of the data/information generated from the data

Data Quality: How good is your data? This is an example of data quality perceived by a company that producing GPS

Precision or Resolution Scale ratio of distance on a map to the equivalent distance on the earth's surface Primarily an output issue; at what scale do I wish to display? Precision or Resolution the exactness of measurement or description Determined by input; can output at lower (but not higher) resolution Accuracy the degree of correspondence between data and the real world Fundamentally controlled by the quality of the input Lineage The original sources for the data and the processing steps it has undergone Currency the degree to which data represents the world at the present moment in time Documentation or Metadata data about data: recording all of the above Standards Common or “agreed-to” ways of doing things Data built to standards is more valuable since it’s more easily shareable

DISCUSSIONS Discuss the strategies for ensuring quality data in all the categories listed in the table according to levels given in the context of educational settings or institutions.

Semiotic Level Goal Dimension Improvement Strategy Syntactic Consistent Well-defined (perhaps formal) syntax Semantic Complete and Accurate Comprehensive, Unambiguous, Meaningful, Correct Pragmatic Usable and Useful Timely, Concise, Easily Accessed, Reputable Social Shared understanding of meaning Understood, Awareness of Bias

Semiotic Level Goal Dimension Improvement Strategy Syntactic Consistent Well-defined (perhaps formal) syntax Corporate data model, Syntax checking, Training for data producers Semantic Complete and Accurate Comprehensive, Unambiguous, Meaningful, Correct Training for data producers, Minimise data transformations and transcriptions Pragmatic Usable and Useful Timely, Concise, Easily Accessed, Reputable Monitoring data consumers, Explanation and visualisation, High quality data delivery systems, Data tagging Social Shared understanding of meaning Understood, Awareness of Bias Viewpoint analysis, Conflict resolution, Cultural Immersion

4 Common Data Challenges Faced During Modernization: Data is fragmented across multiple source systems - Each system holds its own notion of the policyholder. This makes developing a unified user-centric view extremely difficult. The situation is further complicated because the level and amount of detail captured in each system is incongruent.

4 Common Data Challenges Faced During Modernization: Data formats across systems are inconsistent - When organization operating with systems from multiple vendors and each vendor has chosen to implement a custom data representation. In order to respond to evolving business needs, this led to a dilution of the meaning and usage of data fields: the same field represents different data, depending on the context.

4 Common Data Challenges Faced During Modernization: (Cont.) Data is lacking in quality - When organization has units that are organized by line of functions. Each unit holds expertise in a specific field and operates fairly autonomously. This has resulted in different practices when it comes to data entry. The data models from decades-old systems weren’t designed to handle today's business needs.

4 Common Data Challenges Faced During Modernization: (Cont.) Systems are only available in defined windows during the day, not 24/7 - If the organization's core systems are batch oriented. This means that to make updates are not available in the system until batch processing has completed. Furthermore, while the batch processing is taking place, the systems are not available, neither for querying nor for accepting data. Another aspect affecting availability is the closed nature of the systems: They do not expose functionality for reuse by other systems.

Lack of Centralized Approach Hurting Data Quality “Data quality is the foundation for any data-driven effort, but the quality of information globally is poor. Organizations need to centralize their approach to data management to ensure information can be accurately collected and effectively utilized in today’s cross-channel environment.” Thomas Schutz, senior vice president, general manager of Experian Data Quality