Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mannheim Research Institute for the Economics of Aging www.mea.uni-mannheim.de Data Cleaning Process Patrick Bartels MEA Frankfurt, December 6 th.

Similar presentations


Presentation on theme: "Mannheim Research Institute for the Economics of Aging www.mea.uni-mannheim.de Data Cleaning Process Patrick Bartels MEA Frankfurt, December 6 th."— Presentation transcript:

1 Mannheim Research Institute for the Economics of Aging www.mea.uni-mannheim.de Data Cleaning Process Patrick Bartels MEA Frankfurt, December 6 th

2 A short reminder  „Respondents don´t lie!“  only change values if you´re really sure  gather information about your country_specific database  by references of survey agencies  by information of remarks  by own investigation  write syntax or do-file, don´t change the data directely  save original variable, when recoding values e.g. varname_original  indicate by flag_variable e.g. varname_flag  save corrected data files with new name e.g. filename_corrected

3 Division of work What we do  consistency checks  between cv_r & modules  between wave_1 & wave_2  for demography  for children  fixing of interchanged IDs by automatic exchanges

4 Automatic corrections (respid) gender_w1 gender_w2 month / year of birth_w1 month / year of birth_w2 sampidrespid 100123 01female maleOkt. 1945Apr. 1942 100123 02male female Apr. 1942Okt. 1945

5 Automatic corrections (respid) gender_w1 gender_w2 month / year of birth_w1 month / year of birth_w2 sampidrespid 100123 female maleOkt. 1945Apr. 1942 100123 male Apr. 1942 wave1 wave2 female Okt. 1945 0102 01 02 compute respid_original = respid compute respid_flag = 1

6 Overview of merge between wave_1 and wave_2 malefemalemissingtotal male7.865 8.566 757 121 6.717 6.633 15.339 15.320 female 755 121 10.184 10.949 8.322 8.212 19.261 19.282 refusal - 21211 3131 missing 5.286 5.219 6.484 6.35823 11.793 11.600 total13.906 17.427 17.429 15.063 14.869 46.396 46.204 wave_1 - gender wave_2 - gender after auto-corrections

7 Division of work What we do  consistency checks  between cv_r & modules  between wave_1 & wave_2  for demography  for children  fixing of interchanged IDs by automatic exchanges  correction of wave_1  by further information in wave_2 What we want you to do  ID-corrections  initiated by survey agencies  check booklets, tests, HH-composition (> Omar)  check financial modules (> Mario)  check remarks (> Laura)  check country specific deviations (> Stephanie)  encoding open questions  priority: education, ep005 you´re much better in doing this we can fix a lot of cases

8 Division of work What we do  consistency checks  between cv_r & modules  between wave_1 & wave_2  for demography  for children  fixing of interchanged IDs by automatic exchanges  correction of wave_1  by further information in wave_2  response for not fixable cases to country-teams What we want you to do  ID-corrections  initiated by survey agencies  check booklets, tests, HH-composition (> Omar)  check financial modules (> Mario)  check remarks (> Laura)  check country specific deviations (> Stephanie)  encoding open questions  priority: education, ep005 check data again, inquire survey agencies if necessary you´re much better in doing this we can fix a lot of cases

9 Do-File or Syntax  name of author, date of program  short description of ‘what is made‘  which database  and which modules  version of data, date of publishing  conditions / order of do-files  for STATA-users: define global path

10 Example of STATA-do_file (1) /************************************************************ ****************** This program provides changes in cvid and respid variables in wave2 datasets of the longitudinal sample, in order to get exact matching between wave1 and wave2 respondents. A variable called "mix_hh_flag" is added to the final dataset : it is equal to 1 in each household when the value of the respid variable was changed in one or two interviews of that household.  data-version: 2007/Oct/26  Omar Paccagnella, 30 October 2007  VERY IMPORTANT! IN ORDER TO GET EXACT MATCHING OF RESPONDENTS WITHIN AND BETWEEN WAVES, THIS PROGRAM MUST BE RUN ONLY AFTER THE PROGRAMS: "IT_DN_changes_w2.do", "IT_CV_changes_w2.do" and "IT_XT_changes_w2.do" !  **********************************************/ author´s name & date of program short description which dataset order of do-files data-version

11 Example of STATA-do_file (2) global drive “S:/Share/wave2“ /************************************************************* THIS PROGRAM HAS TO BE RUN FOR ALL SECTIONS FROM DN TO IV **************************************************************/ foreach module in ac as br cf ch co cs dn ep ex hc hh ho iv mh pf ph sp ws { use $drive/sharew2_`module' gen mix_hh_flag=0 gen sampid_original = sampid gen respid_original = respid replace respid=1 if sampid=="1604200015300" & cvid==2 & respid==2 replace mix_hh_flag=1 if sampid=="1604200015300" [...] save $drive/sharew2_`module'_corrected } global drive save original variables flag-variable for which modules? new version of data

12 Example of SPSS-syntax (1) COMMENT This program provides changes in cvid and respid variables in wave2 datasets of the longitudinal sample, in order to get exact matching between wave1 and wave2 respondents. A variable called "mix_hh_w2" is added to the final dataset (called sharew2_`var'_checked): it is equal to 1 in each household when the value of the respid variable was changed in one or two interviews of that household. * date of data: 2007/Oct/26 * Omar Paccagnella, October 2007 * VERY IMPORTANT! IN ORDER TO GET EXACT MATCHING OF RESPONDENTS WITHIN AND BETWEEN WAVES, * THIS PROGRAM MUST BE RUN ONLY AFTER THE PROGRAMS: "IT_DN_changes_w2.do", * "IT_CV_changes_w2.do" and "IT_XT_changes_w2.do" ! **************************************************************************** *THIS PROGRAM HAS TO BE RUN FOR ALL SECTIONS FROM DN TO IV short description author´s name which dataset order of syntax data-version for which modules?

13 Example of SPSS-syntax (2) GET FILE='S:\SHARE\wave2\dn_module.sav'. EXE. compute mix_hh_flag=0. compute cvid_original = cvid. compute respid_original = respid. compute sampid_original = sampid. if (sampid = 1604200015300 & cvid = 2) cvid = 1. if (sampid = 1604200015300 & cvid = 2) respid = 2. if sampid = (1604200015300) mix_hh_flag=1. EXE. [...] SAVE OUTFILE='S:\SHARE\wave2\dn_module_corrected.sav'. EXE. flag-variable save original variables

14 Any problems with programming do-files or syntax? Please give us a call


Download ppt "Mannheim Research Institute for the Economics of Aging www.mea.uni-mannheim.de Data Cleaning Process Patrick Bartels MEA Frankfurt, December 6 th."

Similar presentations


Ads by Google