Mannheim Research Institute for the Economics of Aging SHARE data versions & IDs Stephanie Stuck MEA Antwerpen February 2008
2 Data versions and ID-variables Data cleaningPublications who gets the idinternal versionpublic version Household ID all householdssampidsampid2 (scrambled version of sampid) Person ids all household members (in CV), that means: non-eligible persons get a cvid, too, e.g. children, other people living in the household cvid should be used to merge modules within waves all eligible household members (in CV), that means: all household members that should be interviewed, e.g. respondents and partners even if partners areyounger than 50 years respid should be used to merge waves
3 Data versions and ID-variables Data cleaningPublications internal versionpublic version Household IDsampidsampid2 (scrambled version of sampid) Download site country specific versions Raw version (updates during fieldwork and ‘shortly’ after fieldwork) CentERdata site Corrected versions during cleaning process new internal SHARE site (not yet available) all countries public website & internal website (data for working groups) Available forrespective country team, CentERdata, MEAworking groups, external users
4 sampid rules (old) Digits 1-2: country code (e.g. 23 for Belgium French speaking) Digits 3-5: wave indicator (042 for wave 1 and 062 for wave 2 main survey) Digits 6-11: household ID Digits 12-13: longitudinal household split indicator 00 by default, if respondent moves out based on respid, e.g. if ‘moving out respondent’ has respid 01 it is changed to 01 Examples : Austria, starting in wave 1 (longitudinal sample) : Belgium (French), starting in wave 2 (refresher) One needs to combine sampid with the respondent ID (respid) to identify and merge cases on the respondent level Merging problems esp. for split households / ‘moving’ respondents across waves
5 Therefore... We will change the system and have unique person ids, that can be used to merge modules and waves person id will not change across waves, even if a household splits have string country codes instead of numeric ones We will divide sampid into different parts: household id (fixed part and split indicator if needed) new wave indictor variable ‘wi’ indicates when a household first entered the sample
6 New household identifier hhid1 (internal) & hhid (public) Digits 1-2: country code in letters. e.g. AT for Austria, Bf for Belgium French speaking (internal) Digits 3-8: fixed household ID This part will not change across waves if household splits off Digit 9: one digit added to the fixed household id to identify whether it is an ‘additional’ household that resulted from a split, A for all ‘original’ household (all in wave 1, refresher in wave 2) B used only if a household has split. A is than still used for the ‘first’ part of the household and B for the ‘splitting part’ (the one that is interviewed second, normally the one that moved out) C is used for very rare case of split off household when original household in wave 1 consisted of 3 eligible sisters for example and split in 3 parts. Examples for new household id AT100100A: Austria, ‘original’ household AT100100B: Austria, split off household Bf140103A: Belgium French speaking household (internal)
7 New person identifier: person1 Digits 1-2: country code (CC) in letters e.g. AT for Austria, Bf for Belgium French speaking Digits 3-8: fixed household ID this part will not change across waves. Digit 9-10: respondent id, e.g if respid is 1 it will be 01 Respondent identifier oldnew Sampid & respidperson AT AT Bf
8 Old and new ids internal versionpublic version (scrambled) oldnewoldnew Household ID & wave indicator sampidhhid1 & wi sampid2hhid & wi Person idsampid & respid person1 sampid & respid personid
9 In addition: A dataset will be generated that shows to which households a respondent belonged during her or his ‘SHARE history’, e.g.: person1hhid1_w1hhid1_w2hhid1_w3 AT AT100100A AT AT100100A AT100100B Bf Bf140103A A compatibility file will be made for internal use to merge the old sampid respid files with the new ids We will have an additional person id (uuid) to insure uniqueness, but it will be used in the background only for technical reasons
10 Data cleaning always use the unscrambled version that includes sampid for data cleaning use sampid and respid to identify respondents generate/compute sampid_original, respid_original and cvid_original before you change ids