After the Count: Data Entry and Cleaning

Slides:



Advertisements
Similar presentations
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Advertisements

Preparing Data for Analysis National Center for Immunization & Respiratory Diseases Influenza Division Nishan Ahmed Regional Training Workshop on Influenza.
© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Getting to Know Your Data Basic Data Cleaning Principles.
1 4HPlus – Next Steps April Today’s Program  Group Enrollment  When Group Enrollment is Appropriate  Process for Entering Groups  Data Transfer.
Copyright 2010, The World Bank Group. All Rights Reserved. Data Processing and Tabulation, Part I.
Research Methodology Lecture No : 21 Data Preparation and Data Entry.
GrowingKnowing.com © Frequency distribution Given a 1000 rows of data, most people cannot see any useful information, just rows and rows of data.
Verification & Validation. Batch processing In a batch processing system, documents such as sales orders are collected into batches of typically 50 documents.
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Preparing to collect data. Make sure you have your materials Surveys –All surveys should have a unique numerical identifier on each page –You can write.
Adult Student Match. You’re ready to get started!  You’ve completed RT’s orientation, training, interview and background check. Now it’s time to meet.
Adult Student Match.
Data quality & VALIDATION
Jenny DeBruyn, CAREER DEVELOPMENT COORDINATOR
Volunteer Training [Community and Location] [Date and Time]
The Basics of Social Science Research Methods
DATA TYPES.
ScWk 298 Quantitative Review Session
Team Leader Training [Community and Location] [Date and Time]
Session 5 – Questionnaire Checklists
CHAPTER 13 Data Processing, Basic Data Analysis, and the Statistical Testing Of Differences Copyright © 2000 by John Wiley & Sons, Inc.
FINA262 Financial Data Analysis
Data Collection Interview
[Community and Location] [Date and Time]
DATA INPUT AND OUTPUT.
What is our goal in our TL Class?
What’s a researcher to do with it all?????
CHAPTER 21 Developing Concepts of Data Analysis
Just the basics: Learning about the essential steps to do some simple things in SPSS Larkin Lamarche.
Graphics GrowingKnowing.com © 2013.
Module 5 Recording and monitoring uptake of rotavirus vaccine
Check Your Assumptions
Multi Rater Feedback Surveys FAQs for Participants
Multi Rater Feedback Surveys FAQs for Participants
Class Careers is a social enterprise company that seeks to enable equal access to careers information for all schools across the country regardless of.
Does More Rooms Equal More Clutter?
Applied Statistical Analysis
2018 NM Community Survey Data Entry Training
Give Me Simple Outputs: Using Excel to Tell Your Story
Warm up How do outliers effect the mean, median, mode, and range in a set of data? Based on your answer to number one, which do you think would be.
Lesson 5: Epic Appointment Scheduling Referrals
Warm up – Unit 4 Test – Financial Analysis
Assuring the Quality of your COSF Data
LECT. 8 INITIAL ANALYSIS OF RAW DATA
A few tricks to take you beyond the basics of Microsoft Office 2007
Does More Rooms Equal More Clutter?
Recording and monitoring uptake of fractional IPV (fIPV)
Common Core Math I Unit 2: One-Variable Statistics Boxplots, Interquartile Range, and Outliers; Choosing Appropriate Measures.
Warm up How do outliers effect the mean, median, mode, and range in a set of data? Based on your answer to number one, which do you think would be.
Common Core Math I Unit 1: One-Variable Statistics Boxplots, Interquartile Range, and Outliers; Choosing Appropriate Measures.
CHAPTER 21 Developing Concepts of Data Analysis
Data Quality Data Exploration
Data Processing, Basic Data Analysis, and the
Data validation handbook
Module 5 Recording and monitoring uptake of rotavirus vaccine
Lecture 1: Descriptive Statistics and Exploratory
A Story of Functions Module 2: Modeling with Descriptive Statistics
Module 5 Recording and monitoring uptake of rotavirus vaccine
Ass. Prof. Dr. Mogeeb Mosleh
Module 5 Recording and monitoring uptake of rotavirus vaccine
Module 5 Recording and monitoring uptake of rotavirus vaccine
Module 5 Recording and monitoring uptake of rotavirus vaccine
Warm up How do outliers effect the mean, median, mode, and range in a set of data? Based on your answer to number one, which do you think would be.
Exercise 1: Entering data into SPSS
Discrepancy Management
Mobilizing Results: Working with PiT Data
Indicator 3.05 Interpret marketing information to test hypotheses and/or to resolve issues.
Assuring the Quality of your COSF Data
Common Core Math I Unit 2 Day 16 One-Variable Statistics
Presentation transcript:

After the Count: Data Entry and Cleaning Christina Maes Nino, Community Animator, Social Planning Council of Winnipeg Module 2 ─ Data Input and Management

Post Count Activities Depending on your methodology, after the count you’ll likely feel a little scattered. We had boxes of supplies and surveys at various organizations, random packs of cigarettes and surveys being returned from PACT and outreach teams, etc. Being organized in your planning will make a big difference

Post Count Activities These are actually PRE Count: Write EVERYTHING down, ideally in the same place Have a plan and a deadline for data entry and analysis DURING the count: Keep the surveys organized at the base sites Keep track of all the shelter data locations and send reminders to people who will fill them in Keep track of everyone who may have surveys in other locations and send reminders to collect them

Post Count Activities Kept surveys in envelopes organized by volunteer We: Kept surveys in envelopes organized by volunteer Went through each volunteer survey, looked for obvious issues Made a plan to deal with surprise survey issues Because we assume errors, missing information and other data issues will be tied to the person conducing the survey Missing information (location, time, name of volunteer), incomplete or just plain weird surveys and marked them so volunteers are not entering data from them (sheltered vs. unsheltered survey)

Entering Data We: Tested data entry in advance Held a training session with all data entry volunteers Marked everything data entry volunteers changed, added, or data that stood out as a likely error Only had 4 volunteers with attention to detail and a lot of time for data entry

Entering Data Other tips for data entry?

Objectives of Data Cleaning Eliminate (or reduce) duplicates Eliminate (or reduce) errors in data entry and data collection Increase validity/ reliability of analysis Any others? Results can vary significantly if you have incorrect data For instance, if someone entered a date wrong, it can introduce outliers which then bias the average of a variable. (i.e. someone entered 999 years for a respondent accidentally, the average age of respondents is going to be overestimated)

Reducing Duplicates Use identifying information to check for duplicates For Winnipeg Street Census DOB/Aboriginal Status/Gender

Reviewing Possible Errors Conduct frequency analysis for EVERY variable in your dataset Logic Checks… are responses logical? Was the age of the respondent 120 years old? Did a person self-describe as male also report being pregnant? Review: Mean, median, mode, standard deviation, skewness, kurtosis, range (MINIMUM AND MAXIMUM VALUES) -check for responses that seem illogical while entering data and mark it

Reviewing Possible Errors By requesting a graphic (bar chart/scatterplots/ histogram) for each variable you can catch problems visually Real outliers vs. coding errors This will allow you to identify coding errors by variable and begin to correct them

Increase Validity/Reliability of Analysis: Code “Other” Responses Example: Q: What is the reason you first became homeless? A: Mom died; children passed away; parents’ death; Read through all the responses to see if they belong to an existing category Create new categories if a certain answer is repeated often New category for “death in family” If the “Other” category large, (12-20%) then consider developing new categories– otherwise bivariate analysis might lose meaning

Increase Validity/Reliability of Analysis: Deal with Missing Responses If it doesn’t make logical sense, doesn’t seem valid, mark it as missing Check that all the symbolic data (eg. 01 where the month was missing) is entered consistently Note fields where there is a lot of missing data so it is not used in analysis incorrectly

Increase Validity/Reliability of Analysis: Deal with Missing Responses Key questions to check: How old were you when you first became homeless? When did you become homeless most recently? How many times have you been homeless in the past three years? How long have you been homeless throughout your lifetime?

Reviewing Possible Errors, Checking Reliability/Validity Other ways to check for errors?

Things I’ve found helpful: - If you don’t know, ask - Use resources in the community (academics, health authority staff) as well as people from other communities - Be realistic about the limitations of your data A count coordinator has to wear many hats, if something isn’t your area of expertise, don’t try to do it yourself. A point-in-time methodology has limitations, recognize them. It doesn’t mean what you do find out is not valuable.