SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys MICS3 Regional Training Workshop Survey Techniques.
Advertisements

Multiple Indicator Cluster Surveys MICS3 Regional Training Workshop Household Information Panel.
MICS4 Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Creating Analysis Files: Description of Preparation Steps.
MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Field Staff and Field Procedures.
MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Survey Quality Control.
MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Household Questionnaire: Household Information Panel and Household.
The English Longitudinal Study of Ageing (ELSA) Data & Documentation 2008 Jibby Medina NatCen.
Longitudinal LFS Catherine Barham and Paul Smith ONS.
Do Economic and Demographic Characteristics Differ between Web and Mail Respondents to the 2005 Census of Agriculture Content Test? By Nancy J. Dickey.
Conducting the Interview/Survey
HOW TO EXAMINE AND USE FAMILY SURVEY DATA TO PLAN FOR PROGRAM IMPROVEMENT Levels of Representativeness: SIOBHAN COLGAN, ECO AT FPG BATYA ELBAUM, DAC -
Sampling A population is the total collection of units or elements you want to analyze. Whether the units you are talking about are residents of Nebraska,
Children’s subjective well-being Findings from national surveys in England International Society for Child Indicators Conference, 27 th July 2011.
SAMPLING.
STATISTICS FOR MANAGERS LECTURE 2: SURVEY DESIGN.
Survey Design Steps in Conducting a survey.  There are two basic steps for conducting a survey  Design and Planning  Data Collection.
NLSCY – Non-response. Non-response There are various reasons why there is non-response to a survey  Some related to the survey process Timing Poor frame.
Information for students. August 2007 Welcome to the S 3 P system. Use your normal login for University systems – and the password that you were sent by.
Washington State Prevention Summit Analyzing and Preparing Data for Outcome-Based Evaluation Using the Assigned Measures and the PBPS Outcomes Report.
MICS Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Data Quality Tables.
1 MTN-003 Training General Interviewing Techniques Some specific tips for administering the Screening interviewer-administered CRFs SSP Section 14.
1 Social Research Methods Surveys. 2 Survey Characteristics Collecting a SMALL amount of data in STANDARDISED form from RELATIVELY LARGE NUMBERS OF INDIVIDUALS.
Guidelines to Enter Data in Data Entry Module. Benefit of this presentation This presentation will help you to: Enter data in the correct format as required.
Mannheim Research Institute for the Economics of Aging SHARE IDs Stephanie Stuck MEA Frankfurt December 6 th.
Mannheim Research Institute for the Economics of Aging Data Cleaning Process Patrick Bartels MEA Frankfurt, December 6 th.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 24 Designing a Quantitative Analysis Strategy: From Data Collection to Interpretation.
Copyright 2010, The World Bank Group. All Rights Reserved. Data Processing and Tabulation, Part I.
Laura Crespo SHARE Meeting on Data Cleaning The Analysis of Interviewers’ Remarks Laura Crespo Spanish Team CEMFI Frankfurt December 6, 2007.
Math notebook, pencil and calculator Conditional Relative Frequencies and Association.
Pensions Reform An update on the GB Wealth and Assets Survey June 2007 Angela Donkin Cross-cutting Pensions Analysis Division Department for Work and Pensions.
Information Processing and Presentation by Rico Yu.
Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI.
HPRP: New Reports HPRP new reports and data entry reporting review April 2010.
AADAPT Workshop South Asia Goa, December 17-21, 2009 Maria Isabel Beltran 1.
HPRP Reporting Training For Quarterly Reporting Nov. 20 th wilderresearch.org.
1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
ELSA ELSA datasets and documentation available from the archive or by special arrangement Kate Cox National Centre for Social.
Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, December 2008.
The Challenge of Non- Response in Surveys. The Overall Response Rate The number of complete interviews divided by the number of eligible units in the.
Post enumeration survey in the 2009 Pilot Census of Population, Households and Dwellings in Serbia Olga Melovski Trpinac.
… what did Delaware do? Ensuring representative sample for Family Survey….
Mannheim Research Institute for the Economics of Aging SHARE Data Cleaning Stephanie Stuck MEA Vienna November 5/6 th.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Essex Dependent Interviewing Workshop 17/09/2004 British Household Panel Survey.
1 SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS Presenters: Nat McKee - Branch Chief Census Bureau Demographic Surveys Division (DSD) Income Surveys Programming.
RESEARCH METHODS Lecture 29. DATA ANALYSIS Data Analysis Data processing and analysis is part of research design – decisions already made. During analysis.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
Data Cleaning in Financial Modules Workshop in Frankfurt Mario Schnalzenberger.
The Marketing Research Process Overview. Learning Objectives  To learn the steps in the marketing research process.  To understand how the steps in.
Do now! Complete the keywords test on the sheet. Remember that you are not writing 4 mark answers – just concise definitions (like in the book). You need.
Bell Ringer Write down your Student ID# and Phone #, and Sports Jersey # from Fall Sport. Some of you will receive candy based on your student ID and Phone.
McKinney-Vento Education for Homeless Children and Youth Program (EHCY) Improving the Quality of LEA Level Data February 28, 2013 Prepared for: Office.
Mannheim Research Institute for the Economics of Aging SHARE data versions & IDs Stephanie Stuck MEA Antwerpen February 2008.
The Cognitive Survey for Mauritius – test and results Presented by: Mr Chettun Kumar ARIANAICK Statistician.
1 Chapter 13 Collecting the Data: Field Procedures and Nonsampling Error © 2005 Thomson/South-Western.
What are the Command Words? Calculate Compare Complete Describe Evaluate Explain State, Give, Name, Write down Suggest Use information to…..
SHARELIFE Meeting Vienna – November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6,
Multiple Indicator Cluster Surveys Data Processing Workshop Overview of SPSS structural check programs and frequencies MICS Data Processing Workshop.
Creating a data set From paper surveys to excel. STEPS 1.Order your filled questionnaires 2.Number your questionnaires 3.Name your variables. 4.Create.
Mannheim Research Institute for the Economics of Aging SHARE Data Cleaning General rules and procedures Stephanie Stuck MEA Antwerp.
Services Provided Survey Tutorial National Cross-Site Evaluation of Juvenile Drug Courts and Reclaiming Futures University of Arizona Southwest Institute.
Sampling Which do we choose and why?.
Prague EU-SILC Best Practice Workshop, 14th and 15th September 2017
Fundamentals of Data Representation
Andrew Jenkins and Rosalind Levačić
By A.Arul Xavier Department of mathematics
The Kish Method.
Mock Exam Feedback Paper
LLIN Durability Monitoring
Presentation transcript:

SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting December 6, 2007

How to proceed? Data cleaning as a whole can be divided in 2 stages: 1)The frame All about identification of households and/or individuals (id’s – demo characteristics – household composition) 2)The picture All about individual characteristics and answers (that can be checked !) Omar Paccagnella Data cleaning meeting December 6, 2007

The frame (1) Are you sure that the interviewed household/individual is the one you want ? Longitudinal sample - IWER has to contact the same wave1 hh (wrong address? selection errors in the SMS by IWER?) … looking at the CV (name, gender, year of birth, children) Refresher sample - IWER has to fill in at the end of the CV the selected individual (other 50+ in the hh?) … looking at the CV (name, gender, year of birth) … sample representation; oversampling 1955/1956. Omar Paccagnella Data cleaning meeting December 6, 2007

The frame (2) Are you sure that in the CV of the household ALL eligible individuals are reported? Longitudinal sample - You need to know what happened to all wave1 individuals: … w1 individuals not in the w2 CV: deceased? Moved out? … w1 individuals both deceased and moved out in the w2 CV: check for linking errors (exit instead of longitudinal interview) … w1 individuals indicated in the w2 CV as moved in: check for the id and type of interview (baseline vs longitudinal) … w1 individuals indicated as “New hh members” after w2 CV: check for the id and type of interview (baseline vs longitudinal) … w2 individuals not in the w1 CV: moved in questions completed? Omar Paccagnella Data cleaning meeting December 6, 2007

The frame (2) Are you sure that in the CV of the household ALL eligible individuals are reported? Refresher sample - You need to know whether all household members are reported … this can be checked only when the sample selection is based on hh instead of individuals or other hh information is available Omar Paccagnella Data cleaning meeting December 6, 2007

The frame (3) Are you sure that demographic information matches correctly within and between waves? Within waves - Check mixing up of respondents: e.g. the interview to the husband was done in the SMS row of the wife (refresher vs longitudinal) … gender & year of birth must be the same in CV, DN, XT sections and drop-off (where available) Between waves - Check mixing up of respondents, e.g. the name of the husband was linked with the name of his wife … gender & year of birth must be the same in CV, DN, XT sections and drop-off (where available) Omar Paccagnella Data cleaning meeting December 6, 2007

The frame: summing up Check that there is no household different from the selected (this also means that at least one household member must have the same gender and year of birth of the selected individual in every wave) Check that wave1 eligible individuals are not “forgotten” in wave2 Check that id’s of the eligible individuals are properly merged Omar Paccagnella Data cleaning meeting December 6, 2007

To complete the frame … … check and clean interviewer characteristics ! In CV there is the “org” variable, but the characteristics of IWER who completes the interview is only in IV section: - Be sure that the same IWER has a unique id number (small/capital letter, spaces, numbers, etc.) - Check age, gender and education for the same IWER (in wave1 there were some interviews where IWER reported the respondent characteristics instead of his/hers) Omar Paccagnella Data cleaning meeting December 6, 2007

The picture Check outliers, DK, RF and all values that can be compared with other sources Amounts: too large/small values; “0” values; results by IWER Physical & cognitive tests: too large values; value of 1 in the “Ten words recall test” (total number of words instead of cited words); tests non completed; rounding off of results; same results across trials; results by IWER Children: are their age/year of some events compatible with the age of respondents? Other in answer categories: may the answer be recoded in one category already defined? A large number of “other”: do we miss something? Omar Paccagnella Data cleaning meeting December 6, 2007

Some final thoughts Data cleaning is not only the corrections of some errors, but it is a way to check and evaluate the quality of our datasets: we can find sections where data are less good (compared to other similar surveys), the variables that need more attention (both analyzing the data and preparing the briefings). A good data cleaning starts at the beginning of the field Omar Paccagnella Data cleaning meeting December 6, 2007