Mannheim Research Institute for the Economics of Aging www.mea.uni-mannheim.de Data Cleaning Process Patrick Bartels MEA Frankfurt, December 6 th.

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys Data Entry and Processing.
Advertisements

MICS Data Processing Workshop
MICS4 Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Creating Analysis Files: Description of Preparation Steps.
Copyright , SPSS Inc. 1 Practical solutions for dealing with missing data Rob Woods Senior Consultant.
The English Longitudinal Study of Ageing (ELSA) Data & Documentation 2008 Jibby Medina NatCen.
THE BRITISH CRIME SURVEY: PRESENTATION TO THE BCS USER GROUP 11 December 2007 Progress with data archiving Last meeting we described our review of current.
GENERATIONS AND GENDER SURVEY IN RUSSIA: Parents and Children, Men and Women in Family and Society 2 nd wave IWG, 13 May, 2008 Oxana Sinyavskaya, IISP.
1 Formatting Your Survey. What should a format look like? For any questionnaire, whether small or big, the important things are: a.Skip patterns b.Options.
Recap of basic SPSS and statistics 5 th - 9 th December 2011, Rome.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Title to go here By Peter Deacon & Mel Parekh The GP Patient Survey - Managing the largest healthcare study in the UK.
Good Data Management Practices Patty Glynn 10/31/05
SPSS 202: Data Management by SPSS (Workshop) Dr. Daisy Dai Department of Medical Research 1.
MICS Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Data Quality Tables.
Chapter Sixteen Starting the Data Analysis Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Consumption calculations with real data – CORRECTED VERSION (CORRECTIONS IN RED) Gretchen Donehower Day 3, Session 2, NTA Time Use and Gender Workshop.
Farm Household Surveys DATABASE ORGANISATION AND DATA CLEANING Glwadys Aymone GBETIBOUO C4ECOSOLUTIONS, CAPE TOWN Economics analyses of climate change.
Electronic reporting in Poland 27th Voorburg Group Meeting Warsaw, Poland October 1st to October 5th, 2012 Central Statistical Office of Poland.
Curating and Managing Research Data for Re-Use Review & Processing Jared Lyle.
GETTING STARTED Workshop Track A Wednesday, June 5, 9am-10am Gretchen Donehower University of California at Berkeley, Demography United States.
1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”
SHARE data cleaning meeting Frankfurt – December, 6 Some suggestions from the Italian experience Paccagnella Omar Omar Paccagnella Data cleaning meeting.
Project organisation in Stata Adrian Spoerri and Marcel Zwahlen Department of Social and Preventive Medicine University of Berne, Switzerland Research.
Mannheim Research Institute for the Economics of Aging SHARE IDs Stephanie Stuck MEA Frankfurt December 6 th.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
Copyright 2010, The World Bank Group. All Rights Reserved. Data Processing and Tabulation, Part I.
Harmonisation across countries in SHARE Workshop on Harmonisation of Social Survey Data for Cross-National Comparison Prague 19.
© John M. Abowd 2007, all rights reserved Analyzing Frames and Samples with Missing Data John M. Abowd March 2007.
1 SHARE Pretest 2010 Feedback & Results. 2 Schedule.
Laura Crespo SHARE Meeting on Data Cleaning The Analysis of Interviewers’ Remarks Laura Crespo Spanish Team CEMFI Frankfurt December 6, 2007.
1 Clinical Investigation and Outcomes Research Research Using Existing Databases Marcia A. Testa, MPH, PhD Department of Biostatistics Harvard School of.
Data cleaning workshop Berlin, 8-10 June 2009 The Analysis of Interviewers‘ remarks Laura Crespo Spanish team CEMFI.
HPRP: New Reports HPRP new reports and data entry reporting review April 2010.
AADAPT Workshop South Asia Goa, December 17-21, 2009 Maria Isabel Beltran 1.
ELSA ELSA datasets and documentation available from the archive or by special arrangement Kate Cox National Centre for Social.
The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. Data structure for a discrete-time event history analysis Jane E. Miller, PhD.
Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, December 2008.
Improving the Quality of the HMRC Personal Wealth Statistics Rebecca Ambler and Abeda Malek - HMRC.
Consumption calculations with real data Gretchen Donehower Day 3, Session 2, NTA Time Use and Gender Workshop Wednesday, May 23, 2012 Institute for Labor,
Mannheim Research Institute for the Economics of Aging SHARE Data Cleaning Stephanie Stuck MEA Vienna November 5/6 th.
Analyses using SPSS version 19
Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.
1 SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS Presenters: Nat McKee - Branch Chief Census Bureau Demographic Surveys Division (DSD) Income Surveys Programming.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
Gateway to Global Aging Data September 17 th, 2014 APRU Data Workshop Drystan Phillips.
Data Cleaning in Financial Modules Workshop in Frankfurt Mario Schnalzenberger.
ADRS Applied Development Research Solutions How to Write A Tax Benefit Model Asghar Adelzadeh Graham Stark.
MICS Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Creating Analysis Files: Description of Preparation Steps.
TIMOTHY SERVINSKY PROJECT MANAGER CENTER FOR SURVEY RESEARCH Data Preparation: An Introduction to Getting Data Ready for Analysis.
Data Cleaning and Imputation Imputation done on economic variables (assets, income, consumption, financial transfers, health expenses), education, self-reported.
Mannheim Research Institute for the Economics of Aging SHARE data versions & IDs Stephanie Stuck MEA Antwerpen February 2008.
16a. Accessing Data: Means in SPSS ®. 16a. Accessing Data: Means in SSPS ® 1 Prerequisites Recommended modules to complete before viewing this module.
Outliers with „natural limits“ SHARE Data Cleaning Workshop Berlin, June 2009 Sabrina Zuber.
Data Management Research Methods Professional Development Institute December 4, 2015.
Data Management Seminar, 8-11th July 2008, Hamburg WinDEM - Merge Files.
SHARELIFE Meeting Vienna – November, 5-6 The Italian experience in SHARE data cleaning Paccagnella Omar Omar Paccagnella SHARELIFE meeting November 6,
Creating a data set From paper surveys to excel. STEPS 1.Order your filled questionnaires 2.Number your questionnaires 3.Name your variables. 4.Create.
MICS4 Data Processing Workshop Multiple Indicator Cluster Surveys Data Processing Workshop Tabulation Programs.
Religious trends in Switzerland: disentangling age, cohort, individual flux and period effects Marion Burkimsher Affiliated to University of Lausanne.
Mannheim Research Institute for the Economics of Aging SHARE Data Cleaning General rules and procedures Stephanie Stuck MEA Antwerp.
Session 15 Merging Data in SPSS
Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and
Online Testing System Assessment Viewing Application (AVA)
ECONOMETRICS ii – spring 2018
Dale Rhoda & Mary Kay Trimner Stata Conference 2018
2018 NM Community Survey Data Entry Training
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
LAMAS Working Group 7-8 December 2015
Presentation transcript:

Mannheim Research Institute for the Economics of Aging Data Cleaning Process Patrick Bartels MEA Frankfurt, December 6 th

A short reminder  „Respondents don´t lie!“  only change values if you´re really sure  gather information about your country_specific database  by references of survey agencies  by information of remarks  by own investigation  write syntax or do-file, don´t change the data directely  save original variable, when recoding values e.g. varname_original  indicate by flag_variable e.g. varname_flag  save corrected data files with new name e.g. filename_corrected

Division of work What we do  consistency checks  between cv_r & modules  between wave_1 & wave_2  for demography  for children  fixing of interchanged IDs by automatic exchanges

Automatic corrections (respid) gender_w1 gender_w2 month / year of birth_w1 month / year of birth_w2 sampidrespid female maleOkt. 1945Apr male female Apr. 1942Okt. 1945

Automatic corrections (respid) gender_w1 gender_w2 month / year of birth_w1 month / year of birth_w2 sampidrespid female maleOkt. 1945Apr male Apr wave1 wave2 female Okt compute respid_original = respid compute respid_flag = 1

Overview of merge between wave_1 and wave_2 malefemalemissingtotal male female refusal missing total wave_1 - gender wave_2 - gender after auto-corrections

Division of work What we do  consistency checks  between cv_r & modules  between wave_1 & wave_2  for demography  for children  fixing of interchanged IDs by automatic exchanges  correction of wave_1  by further information in wave_2 What we want you to do  ID-corrections  initiated by survey agencies  check booklets, tests, HH-composition (> Omar)  check financial modules (> Mario)  check remarks (> Laura)  check country specific deviations (> Stephanie)  encoding open questions  priority: education, ep005 you´re much better in doing this we can fix a lot of cases

Division of work What we do  consistency checks  between cv_r & modules  between wave_1 & wave_2  for demography  for children  fixing of interchanged IDs by automatic exchanges  correction of wave_1  by further information in wave_2  response for not fixable cases to country-teams What we want you to do  ID-corrections  initiated by survey agencies  check booklets, tests, HH-composition (> Omar)  check financial modules (> Mario)  check remarks (> Laura)  check country specific deviations (> Stephanie)  encoding open questions  priority: education, ep005 check data again, inquire survey agencies if necessary you´re much better in doing this we can fix a lot of cases

Do-File or Syntax  name of author, date of program  short description of ‘what is made‘  which database  and which modules  version of data, date of publishing  conditions / order of do-files  for STATA-users: define global path

Example of STATA-do_file (1) /************************************************************ ****************** This program provides changes in cvid and respid variables in wave2 datasets of the longitudinal sample, in order to get exact matching between wave1 and wave2 respondents. A variable called "mix_hh_flag" is added to the final dataset : it is equal to 1 in each household when the value of the respid variable was changed in one or two interviews of that household.  data-version: 2007/Oct/26  Omar Paccagnella, 30 October 2007  VERY IMPORTANT! IN ORDER TO GET EXACT MATCHING OF RESPONDENTS WITHIN AND BETWEEN WAVES, THIS PROGRAM MUST BE RUN ONLY AFTER THE PROGRAMS: "IT_DN_changes_w2.do", "IT_CV_changes_w2.do" and "IT_XT_changes_w2.do" !  **********************************************/ author´s name & date of program short description which dataset order of do-files data-version

Example of STATA-do_file (2) global drive “S:/Share/wave2“ /************************************************************* THIS PROGRAM HAS TO BE RUN FOR ALL SECTIONS FROM DN TO IV **************************************************************/ foreach module in ac as br cf ch co cs dn ep ex hc hh ho iv mh pf ph sp ws { use $drive/sharew2_`module' gen mix_hh_flag=0 gen sampid_original = sampid gen respid_original = respid replace respid=1 if sampid==" " & cvid==2 & respid==2 replace mix_hh_flag=1 if sampid==" " [...] save $drive/sharew2_`module'_corrected } global drive save original variables flag-variable for which modules? new version of data

Example of SPSS-syntax (1) COMMENT This program provides changes in cvid and respid variables in wave2 datasets of the longitudinal sample, in order to get exact matching between wave1 and wave2 respondents. A variable called "mix_hh_w2" is added to the final dataset (called sharew2_`var'_checked): it is equal to 1 in each household when the value of the respid variable was changed in one or two interviews of that household. * date of data: 2007/Oct/26 * Omar Paccagnella, October 2007 * VERY IMPORTANT! IN ORDER TO GET EXACT MATCHING OF RESPONDENTS WITHIN AND BETWEEN WAVES, * THIS PROGRAM MUST BE RUN ONLY AFTER THE PROGRAMS: "IT_DN_changes_w2.do", * "IT_CV_changes_w2.do" and "IT_XT_changes_w2.do" ! **************************************************************************** *THIS PROGRAM HAS TO BE RUN FOR ALL SECTIONS FROM DN TO IV short description author´s name which dataset order of syntax data-version for which modules?

Example of SPSS-syntax (2) GET FILE='S:\SHARE\wave2\dn_module.sav'. EXE. compute mix_hh_flag=0. compute cvid_original = cvid. compute respid_original = respid. compute sampid_original = sampid. if (sampid = & cvid = 2) cvid = 1. if (sampid = & cvid = 2) respid = 2. if sampid = ( ) mix_hh_flag=1. EXE. [...] SAVE OUTFILE='S:\SHARE\wave2\dn_module_corrected.sav'. EXE. flag-variable save original variables

Any problems with programming do-files or syntax? Please give us a call