Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Slides:



Advertisements
Similar presentations
Your Class Jeopardy Your Name Topic Life Earth Space Grab Bag II Physical.
Advertisements

Immunization Best Practices Made Easy
Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.
LAKESIDE WELLNESS PROGRAM - PBHCI LEARNING COMMUNITY REGION #3 ORLANDO, FLORIDA, RUTH CRUZ- DIAZ, BSN EXT
Wisconsin Department of Health Services Richard Miller Research Scientist Wisconsin Office of Health Informatics October 28, 2014 Matching Traffic Crash.
P20 Seminar November 12, Statistical Collaboration Part 1: Working with Statisticians from Start to Finish Part 2: Essentials of Data Management.
Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.
East Africa DATA East African IeDEA PI Meeting Zanzibar, Tanzania May 4, 2010 Beverly Musick Regional Data Manager.
Area 4 SHARP Face-to-Face Conference Phenotyping Team – Centerphase Project Assessing the Value of Phenotyping Algorithms June 30, 2011.
Joshua Kayiwa INRUD-IAA, Uganda. Session Objectives Narrate the experience of the Uganda INRUD-IAA team in collecting, cleaning, summarizing and analyzing.
Using ICD Codes and Birth Records to Prevent Mismatches of Multiple Births in Linked Hospital Readmission Data Alison Fraser 1, MSPH, Zhiwei Liu 2, MS,
Health Center Revenue and Reimbursement Management
Assessing Disease Frequency
ISB Notice and preparing for the implementation of the new IAPT Data Standard Shaun Crowe Mental Health, Employment and IAPT Mental Health Collaborative.
Happy semester with best wishes from all nursing staff Dr Naiema Gaber
Improving Data Recording in Primary Care Data Michelle Page & Hassy Dattani THIN.
Competitive Grant Program: Year 2 Meeting 2. SPECIAL DIABETES PROGRAM FOR INDIANS Competitive Grant Program: Year 2 Meeting 2 Data Quality Assurance Luohua.
Module 1: Final Case Study #1-CS-1. Case Study: Instructions v Try this case study individually. v We’ll discuss the answers in class. # 1-CS-2.
HIBBs is a program of the Global Health Informatics Partnership Introduction to Form Design Regional East African Centre for Health Informatics (REACH-INFORMATICS)
Selection of Data Sources for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Adapted by the State of California CHDP Nutrition Subcommittee from materials developed by California Department of Health Care Services  Children’s Medical.
Speaker Tips are listed in italics throughout the speaker notes pages.
DATA ENTRY AND CLEANING Zerubabel Ogom Ojoo
Reference Population: Standard Normal Curve
Quality Improvement Prepeared By Dr: Manal Moussa.
Data Management & Basic Analysis Interpretation of Diagnostic test.
Sue Lowry Biostatistical Design and Analysis Center (BDAC) Clinical and Translational Science Institute Academic Health Center University of Minnesota.
Analysis of Chlamydia Re-testing Rates Massachusetts Family Planning Update.
0 ICC Community CY 2013 Chart Audit September 9, 2014 Audit Timeframe: Jan 1, 2013 – Dec 31, 2013.
Presenter-Dr. L.Karthiyayini Moderator- Dr. Abhishek Raut
Performance Measures 101 Presenter: Peggy Ketterer, RN, BSN, CHCA Executive Director, EQRO Services Health Services Advisory Group June 18, :15 p.m.–4:45.
© 2003 East Collaborative e ast COLLABORATIVE ® eC SoftwareProducts TrackeCHealth.
8.Implications for Analysis: School Survey, Student Assessment, and Transcript Data.
Hospital maintain various indexes and register so that each health records and other health information can be located and classified for Patient care.
Joint Research & Enterprise Office Training The team, the procedures, the monitor and the Sponsor Lucy H H Parker Clinical Research Governance Manager.
PREPARING DATA FOR STATISTICAL ANALYSIS Data Cleaning Data Cleaning Dataset Preparation Dataset Preparation Documentation Documentation 9 September 2008.
Baltimore Update: SSuN, Challenges in implementation Clinic-based dataset: – Existing clinic data system (Insight™) – Minimal barriers to electronic.
Data Specifications Didactics on development of a concept sheet EA IeDEA Meeting May 16-17, 2011 Beverly Musick.
Components of HIV/AIDS Case Surveillance: Case Report Forms and Sources.
Unit 3: Universal Case Reporting and Sentinel Surveillance for STIs
©2001 Sowerby Centre for Health Informatics at Newcastle Progress on Virtual Medical Record HL7 Salt Lake City.
Risk Assessment Farrokh Alemi, Ph.D.. Session Objectives 1.Discuss the role of risk assessment in the TQM process. 2.Describe the five severity indices.
Results From The 2000 Tri-Service Recruit Oral Health Survey CAPT Andrew K. York, DC, USN CDR Thomas M. Leiendecker, DC,USN Lt Col Gary “Chad” Martin,
EPI 218 Queries and On-Screen Forms Michael A. Kohn, MD, MPP 9 August 2012.
Nursing research Is a systematic inquiry into a subject that uses various approach quantitative and qualitative methods) to answer questions and solve.
Practices and Predictors of the Use of Accommodations by University Faculty to Support College Students with Disabilities Leena Jo Landmark, M.Ed., and.
Proposed strategies for future National Health Exam Survey.
APPROVAL CRITERIA AN IRB INFOSHORT MAY CFR CRITERIA FOR IRB APPROVAL OF RESEARCH In order for an IRB to approve a research study, all.
Dr. Mona Hassan Ahmed Hassan
Data Quality Assurance Beverly Musick
Learning Outcomes Discuss current trends and issues in health care and nursing. Describe the essential elements of quality and safety in nursing and their.
Data Management in Clinical Research Rosanne M. Pogash, MPA Manager, PHS Data Management Unit January 12,
Center for Surveillance, Epidemiology, and Laboratory Services Division of Health Informatics and Surveillance José Aponte Public Health Advisor Adding.
Lesson 4Page 1 of 27 Lesson 4 Sources of Routinely Collected Data for Surveillance.
The Medical Record, Documentation, and Filing
Role of Site Investigator Ensure subject safety is protected & well-managed Full compliance with requirements of Good Clinical Practice (GCP) Conduct the.
Concepts & the Concept Dictionary Regional East African Centre for Health Informatics (REACH-INFORMATICS) Lauren Stanisic July 2012 REACH-INFORMATICS,
Strategies for improving immunisation rates. Factors associated with low vaccine uptake –parents Socio-demographic variables – Certain groups of people,
McGraw-Hill © 2007 The McGraw-Hill Companies, Inc. All rights reserved. Slide 1 Sociological Research SOCIOLOGY Richard T. Schaefer 2.
Co-occurring Mental Illness and Healthcare Utilization and Expenditures Among Adults with Obesity and Chronic Physical Illness Chan Shen, MA. MS. Usha.
CD4 trajectory among HIV positive patients receiving HAART in a large East African HIV care centre Agnes N. Kiragga 1, Beverly Musick 2 Ronald Bosch, Ann.
Transportation-related Injuries among US Immigrants: Findings from National Health Interview Survey.
Data quality & VALIDATION
Figure 1: AFHIR Observation Resource Definition for systolic blood pressure with example in JSON. From: SMART on FHIR: a standards-based, interoperable.
Tools and Techniques to Clean Up your Database
To start the presentation, click on this button in the lower right corner of your screen. The presentation will begin after the screen changes and you.
SQL for Cleaning Data Farrokh Alemi, Ph.D.
Data Quality Out of Range Values
Audit to improve consistency & reduce variation
11 iii. Define management and supervision roles and responsibilities
Presentation transcript:

Data Quality Data Cleaning Beverly Musick, M.S. May 20, This module was recorded at the health informatics –training course— data management series offered by the Regional East African Centre for Health Informatics (REACH-Informatics) in Eldoret, Kenya. Funding was made possible by NIH’s Fogarty Center. The training was held at the Academic Model Providing Access to Healthcare (AMPATH), a USAID-funded program, supported by the Regenstrief Institute at Indiana University. The moduleswere created in collaboration with the School of Informatics at IUPUI. Creative Commons Attribution-ShareAlike 3.0 Unported License

Quality Control Quality Control is the process of monitoring and maintaining the reliability, accuracy, and completeness of the data during the conduct of the project. Requires a multidisciplinary team which includes clinicians, data entry staff, statisticians, systems administrations, and data managers. Requires sharing knowledge about disease progression, clinical practice patterns, effects of medical treatments, relationships between variables and expected timing of events. 2

Ensuring Data Quality Point of Assessment – Collection: review form before patient leaves the clinic – Entry: range restrictions, logical checks – Post-entry clean-up queries – Statistical Analysis: data trends 3

Ensuring Data Quality (cont.) To ensure data quality the data manager needs to understand: – Goals of program – Standards of operation – Impact of intervention or program – Relationships between variables – Expected timing of events 4

Clean-up Queries Missing Data Generate reports regarding the percent of missing data for each item on the data collection forms Highlight differences between programs or specific groups of patients in order to identify methods to minimize missing data 5

Date Comparison Ensure that the date of birth precedes all other dates. Calculate age and verify that the date of birth makes sense. For patients who have died, ensure that the date of death follows all other dates. 6 Clean-up Queries

Date Comparison (cont.) Generate a clean-up list for observation dates that are after today’s date or, preferably, the date of data entry. Generate a similar list for observation dates that precede the date of inception of your program. Examine the interval between observation/visit dates to ensure that the expected time frame is reflected. 7 Clean-up Queries

Checks on Numeric Data Confirm all values are within the expected range. Investigate possible outliers by verifying against source document, comparing with other values for same subject, or cross- referencing with other variables such as current illnesses in the case of elevated lab result Confirm that values make sense with respect to patient’s age, gender, disease status, etc. 8 Clean-up Queries

Checks on Adult Heights/Weights Calculate BMI from height and weight (BMI=weight (kg) / height(m)  ) Most should be between 10 and 40 Flag unexpected weight fluctuations 9 Clean-up Queries

Checks on Pediatric Heights/Weights Calculate weight-for-age Z-scores using Epi Info NutStat software ( or SAS software ( charts/resources/sas.htm) charts/resources/sas.htm Review date of birth, visit date, age and weight for Z-scores less than -5 or greater than 5. Similar checks can be made with height-for- age and weight-for-height Z-scores. 10 Clean-up Queries

Checks on Numeric Data (cont.) Review longitudinal data. If special missing values are coded, ensure that the codes do not overlap with valid data. For lab results, a qualifier such as should be stored in a separate variable. 11 Clean-up Queries

Cross-Variable Checks Confirm that there is consistency between gender and other variables such as pregnancy. Look for contraindicated medication combinations. Look for data that may have been recorded under the wrong patient ID. 12 Clean-up Queries