Download presentation
Presentation is loading. Please wait.
Published byDulcie Greene Modified over 9 years ago
1
Mannheim Research Institute for the Economics of Aging www.mea.uni-mannheim.de Data Cleaning Process Patrick Bartels MEA Frankfurt, December 6 th
2
A short reminder „Respondents don´t lie!“ only change values if you´re really sure gather information about your country_specific database by references of survey agencies by information of remarks by own investigation write syntax or do-file, don´t change the data directely save original variable, when recoding values e.g. varname_original indicate by flag_variable e.g. varname_flag save corrected data files with new name e.g. filename_corrected
3
Division of work What we do consistency checks between cv_r & modules between wave_1 & wave_2 for demography for children fixing of interchanged IDs by automatic exchanges
4
Automatic corrections (respid) gender_w1 gender_w2 month / year of birth_w1 month / year of birth_w2 sampidrespid 100123 01female maleOkt. 1945Apr. 1942 100123 02male female Apr. 1942Okt. 1945
5
Automatic corrections (respid) gender_w1 gender_w2 month / year of birth_w1 month / year of birth_w2 sampidrespid 100123 female maleOkt. 1945Apr. 1942 100123 male Apr. 1942 wave1 wave2 female Okt. 1945 0102 01 02 compute respid_original = respid compute respid_flag = 1
6
Overview of merge between wave_1 and wave_2 malefemalemissingtotal male7.865 8.566 757 121 6.717 6.633 15.339 15.320 female 755 121 10.184 10.949 8.322 8.212 19.261 19.282 refusal - 21211 3131 missing 5.286 5.219 6.484 6.35823 11.793 11.600 total13.906 17.427 17.429 15.063 14.869 46.396 46.204 wave_1 - gender wave_2 - gender after auto-corrections
7
Division of work What we do consistency checks between cv_r & modules between wave_1 & wave_2 for demography for children fixing of interchanged IDs by automatic exchanges correction of wave_1 by further information in wave_2 What we want you to do ID-corrections initiated by survey agencies check booklets, tests, HH-composition (> Omar) check financial modules (> Mario) check remarks (> Laura) check country specific deviations (> Stephanie) encoding open questions priority: education, ep005 you´re much better in doing this we can fix a lot of cases
8
Division of work What we do consistency checks between cv_r & modules between wave_1 & wave_2 for demography for children fixing of interchanged IDs by automatic exchanges correction of wave_1 by further information in wave_2 response for not fixable cases to country-teams What we want you to do ID-corrections initiated by survey agencies check booklets, tests, HH-composition (> Omar) check financial modules (> Mario) check remarks (> Laura) check country specific deviations (> Stephanie) encoding open questions priority: education, ep005 check data again, inquire survey agencies if necessary you´re much better in doing this we can fix a lot of cases
9
Do-File or Syntax name of author, date of program short description of ‘what is made‘ which database and which modules version of data, date of publishing conditions / order of do-files for STATA-users: define global path
10
Example of STATA-do_file (1) /************************************************************ ****************** This program provides changes in cvid and respid variables in wave2 datasets of the longitudinal sample, in order to get exact matching between wave1 and wave2 respondents. A variable called "mix_hh_flag" is added to the final dataset : it is equal to 1 in each household when the value of the respid variable was changed in one or two interviews of that household. data-version: 2007/Oct/26 Omar Paccagnella, 30 October 2007 VERY IMPORTANT! IN ORDER TO GET EXACT MATCHING OF RESPONDENTS WITHIN AND BETWEEN WAVES, THIS PROGRAM MUST BE RUN ONLY AFTER THE PROGRAMS: "IT_DN_changes_w2.do", "IT_CV_changes_w2.do" and "IT_XT_changes_w2.do" ! **********************************************/ author´s name & date of program short description which dataset order of do-files data-version
11
Example of STATA-do_file (2) global drive “S:/Share/wave2“ /************************************************************* THIS PROGRAM HAS TO BE RUN FOR ALL SECTIONS FROM DN TO IV **************************************************************/ foreach module in ac as br cf ch co cs dn ep ex hc hh ho iv mh pf ph sp ws { use $drive/sharew2_`module' gen mix_hh_flag=0 gen sampid_original = sampid gen respid_original = respid replace respid=1 if sampid=="1604200015300" & cvid==2 & respid==2 replace mix_hh_flag=1 if sampid=="1604200015300" [...] save $drive/sharew2_`module'_corrected } global drive save original variables flag-variable for which modules? new version of data
12
Example of SPSS-syntax (1) COMMENT This program provides changes in cvid and respid variables in wave2 datasets of the longitudinal sample, in order to get exact matching between wave1 and wave2 respondents. A variable called "mix_hh_w2" is added to the final dataset (called sharew2_`var'_checked): it is equal to 1 in each household when the value of the respid variable was changed in one or two interviews of that household. * date of data: 2007/Oct/26 * Omar Paccagnella, October 2007 * VERY IMPORTANT! IN ORDER TO GET EXACT MATCHING OF RESPONDENTS WITHIN AND BETWEEN WAVES, * THIS PROGRAM MUST BE RUN ONLY AFTER THE PROGRAMS: "IT_DN_changes_w2.do", * "IT_CV_changes_w2.do" and "IT_XT_changes_w2.do" ! **************************************************************************** *THIS PROGRAM HAS TO BE RUN FOR ALL SECTIONS FROM DN TO IV short description author´s name which dataset order of syntax data-version for which modules?
13
Example of SPSS-syntax (2) GET FILE='S:\SHARE\wave2\dn_module.sav'. EXE. compute mix_hh_flag=0. compute cvid_original = cvid. compute respid_original = respid. compute sampid_original = sampid. if (sampid = 1604200015300 & cvid = 2) cvid = 1. if (sampid = 1604200015300 & cvid = 2) respid = 2. if sampid = (1604200015300) mix_hh_flag=1. EXE. [...] SAVE OUTFILE='S:\SHARE\wave2\dn_module_corrected.sav'. EXE. flag-variable save original variables
14
Any problems with programming do-files or syntax? Please give us a call
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.