Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.

Slides:



Advertisements
Similar presentations
UNIT-2 Data Preprocessing LectureTopic ********************************************** Lecture-13Why preprocess the data? Lecture-14Data cleaning Lecture-15Data.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
P20 Seminar November 12, Statistical Collaboration Part 1: Working with Statisticians from Start to Finish Part 2: Essentials of Data Management.
Accounts Payable–1099 Processing 1Freedom Systems – Accounts Payable – 1099 Processing WELCOME TO THE ACCOUNTS PAYABLE – 1099 PROCESSING WEBINAR WE WILL.
Multiple Indicator Cluster Surveys Survey Design Workshop
By Mary Anne Poatsy, Keith Mulbery, Eric Cameron, Jason Davidson, Rebecca Lawson, Linda Lau, Jerri Williams Chapter 9 Fine-Tuning the Database 1 Copyright.
3.1 Data and Information –The rapid development of technology exposes us to a lot of facts and figures every day. –Some of these facts are not very meaningful.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Dr Samah Kotb Lecturer of Biochemistry 1 CLS 432 Dr. Samah Kotb Nasr El-deen Biochemistry Clinical practice CLS 432 Dr. Samah Kotb Nasr.
WELL-DESIGNED DATABASES Process faster Easy to develop and maintain Easy to read and write code.
Database Design Concepts INFO1408 Term 2 week 1 Data validation and Referential integrity.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Chapter 9 Database Management
U.S. Department of the Interior U.S. Geological Survey Tutorials on Data Management Lesson 6: Manage Quality CC image by Shane Melaugh on Flickr.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Entity-Relationship Design
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 15.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 9 Processing the Data.
Computer Based Information Systems Control UAA – ACCT 316 – Fall 2003 Accounting Information Systems Dr. Fred Barbee.
1 Accreditation and Certification: Definition  Certification: Procedures by which a third party gives written assurance that a product, process or service.
Software Metrics - Data Collection What is good data? Are they correct? Are they accurate? Are they appropriately precise? Are they consist? Are they associated.
PHP meets MySQL.
Quality Assurance & Quality Control Kristin Vanderbilt, Ph.D. Sevilleta LTER.
QA/QC and QUALIFIERS LOU ANN FISHER CITY OF STILLWATER, OK
Research Methodology Lecture No : 21 Data Preparation and Data Entry.
TIMES 3 Technological Integrations in Mathematical Environments and Studies Jacksonville State University 2010.
King Fahd University of Petroleum & Minerals Department of Management and Marketing MKT 345 Marketing Research Dr. Alhassan G. Abdul-Muhmin Editing and.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
Objectives of Control The objectives of control are:  To ensure that all data are processed  To preserve the integrity of maintained data  To detect,
I.Information Building & Retrieval Learning Objectives: the process of Information building the responsibilities and interaction of each data managing.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Chapter 10 Database Management. Chapter 10 Objectives Discuss the functions common to most DBMSs Identify the qualities of valuable information Explain.
Emission Inventory Quality Assurance/Quality Control (QA/QC) Melinda Ronca-Battista ITEP/TAMS Center.
Introduction to Databases Trisha Cummings. What is a database? A database is a tool for collecting and organizing information. Databases can store information.
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
Systems Life Cycle. Know the elements of the system that are created Understand the need for thorough testing Be able to describe the different tests.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
Developing Computer Games Testing & Documentation.
Managing and Curating Data Chapter 8. Introduction Data organization Data management Data curation Raw data is required to repeat a scientific study Any.
What have we learned?. What is a database? An organized collection of related data.
PROCESSING OF DATA The collected data in research is processed and analyzed to come to some conclusions or to verify the hypothesis made. Processing of.
Best Practices NMFS EDM June 18, 2013M. Brady. Context June 18, 2013M. Brady.
Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don.
DEVA Data Management Workshop Devil’s Hole Pupfish Project Quality Assurance.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Chapter Fifteen Chapter 15.
RESEARCH METHODS Lecture 29. DATA ANALYSIS Data Analysis Data processing and analysis is part of research design – decisions already made. During analysis.
Biochemistry Clinical practice CLS 432 Dr. Samah Kotb Lecturer of Biochemistry 2015 Introduction to Quality Control.
Microsoft Office 2013: In Practice Chapter 2 Using Design View, Data Validation, and Relationships Copyright © 2014 by The McGraw-Hill Companies, Inc.
Information Access Mgt09/12/971 Entity-Relationship Design Information Level Design.
Flat Files Relational Databases
Quality Assurance & Quality Control
Chapter 3 Data Control Ensure the Accurate and Complete data is entering into the data processing system.
Data Management in Clinical Research Rosanne M. Pogash, MPA Manager, PHS Data Management Unit January 12,
24 Nov 2007Data Management and Exploratory Data Analysis 1 Yongyuth Chaiyapong Ph.D. (Mathematical Statistics) Department of Statistics Faculty of Science.
PREPARED BY: PN. SITI HADIJAH BINTI NORSANI. LEARNING OUTCOMES: Upon completion of this course, students should be able to: 1. Understand the structure.
Session 6: Data Flow, Data Management, and Data Quality.
VOCAB REVIEW. A field that can be computed from other fields Calculated field Click for the answer Next Question.
SOFTWARE TESTING LECTURE 9. OBSERVATIONS ABOUT TESTING “ Testing is the process of executing a program with the intention of finding errors. ” – Myers.
Water quality data - providing information for management
Auditing Information Technology
Presented to:- Dr. Dibyojyoti Bhattacharjee
Normalization Referential Integrity
Data Quality By Suparna Kansakar.
Verification and Validation Unit Testing
Designs for Data Integrity, validations, security and controls
Lecture 1: Descriptive Statistics and Exploratory
Presentation transcript:

Module 6

Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices ◦ Before data collection ◦ During data collection/entry ◦ After data collection/entry

Data Management Plans  After completing this lesson, the participant will be able to: ◦ Define data quality control ◦ Define data quality assurance ◦ Perform quality control and assurance on their data at all stages of the research cycle (before data collection, during data collection/entry, and afterward)

Data Management Plans CollectAssureDescribeDepositPreserveDiscoverIntegrateAnalyze Managing the quality of data during data collection and entry is important, but data quality is monitored and managed throughout the data life cycle

Data Management Plans  Process or phenomenon, other than the one of interest, that affects the variable value  Erroneous values

Data Management Plans  Errors of Commission ◦ Incorrect or inaccurate data ◦ Causes: malfunctioning instrument, mistyped data  Errors of Omission ◦ Data or metadata not recorded ◦ Incomplete data record ◦ Causes: inadequate documentation, human error, anomalies in the field  Logic Errors ◦ Improbable values or value-combinations ◦ Usually detected during data review by a human who is a subject matter expert

Data Management Plans Preventing bad data from contaminating a data set  Quality assurance ◦ Activities that ensure quality of data before collection ◦ Verifying the quality of data obtained from others before use  Quality control ◦ Monitoring and maintaining the quality of data during the research life cycle

Data Management Plans  Create a suitable structure to store the data ◦ Database or well-designed spreadsheet  Define & enforce standards ◦ Metadata ◦ Formats ◦ Codes ◦ Measurement units  Assign responsibility for data quality ◦ Be sure assigned person is educated in QA/QC

Data Management Plans  Double-entry ◦ Data keyed in by two independent people and then checked for agreement with computer verification  Use text-to-speech program to read data back ◦ Serves as a ‘second person’ to help when one is not available  Use a properly designed database ◦ Atomize data: each value is stored (changed) in only one place ◦ Minimize errors using column, row, and relationship validation ◦ Use consistent terminology  Document all changes to data ◦ Avoids duplicate error checking ◦ Allows undo if necessary

Data Management Plans Data Review and Certification for Use It is important to review the data for quality and to certify it for use. Certification allows others to use the data knowing it meets a predetermined level of quality and completeness. Errors found need to be corrected, with documentation of the correction activity annotated on original data sheets. A person familiar with the kind of data being reviewed is essential, because some errors are cryptic and require recognition of a logical inconsistency (ex: incorrect equipment indicated for a particular type of parameter measured)

Data Management Plans Table 1 from Edwards (2000). An illegal-data filter, written in SAS (the data set "All" exists prior to this DATA step, containing the data to be filtered, variable names Y1, Y2, etc., and an observation identifier variable ID). Data Checkum; Set All; message=repeat(" ",39); If Y1 1 then do; message="Y1 is not on the interval [0,1]"; output; end; If Floor(Y2) NE Y2 then do; message="Y2 is not an integer"; output; end; If Y3>Y4 then do; message="Y3 is larger than Y4"; output; end; : (add as many such statements as desired...) : If message NE repeat(" ",39); keep ID message; Proc Print Data=Checkum; Example of Illegal Data Filter

Data Management Plans Look for outliers  Outliers: extreme values for a variable given the statistical model being used  Goal is not to eliminate outliers but to identify potential data contamination, and verify true values

Data Management Plans  Methods to look for outliers ◦ Graphical  Normal probability plots  Regression  Scatter plots  Maps ◦ Statistical  Be sure to transform data when looking for outliers graphically on a graph

Data Management Plans  Document your QA/QC activities to certify data for use ◦ Don’t waste anyones time by forcing them to re-check data that were already QA/QC’d  Record all changes made to the data ◦ All changes from the original record need to be documented to be defensible

Data Management Plans Data contamination results from a process or phenomenon that adversely affects data integrity or allows erroneous values to enter a dataset Quality Assurance and quality control are strategies to: -prevent errors from entering a data set -ensure quality of data -monitor and maintain the quality of data It is important to define and enforce quality assurance and quality control standards before, during, and after the collection and entry of data

Data Management Plans  Edwards, D, Data Quality Assurance. In Ecological Data: Design, Management and Processing. WK Michener and JW Brunt, Eds. Blackwell Science. p  Cook, RB, RJ Olson, P Kanciruk, and LA Hook, Best practices for preparing ecological data sets to share and archive. Bulletin of the Ecological Society of America 82(2):  Chapman, AD, Principles of Data Quality. Report for the Global Biodiversity Information Facility, 2004, Copenhagen. resources/download-publications/bookelets/ resources/download-publications/bookelets/  Grubbs’ Test for outliers. Wikipedia entry, accessed November  Vanderbilt, K. Quality Assurance & Quality Control. Presentation.

Data Management Plans We want to hear from you! CLICK the arrow to take our short survey.