TG EHIS 23-24 January 2012 Item 3.2 of the agenda EHIS wave 1 anonymised data Bart De Norre, Eurostat
Content Introduction WG Public Health June 2011, Consultation Round Sept 2011 Legal context Procedures Pre-requisites - Anonymisation Next steps TG EHIS, 23-24 January 2012
Legal context Amendment of the EC Regulation 831/2002 (May 2002) conditions to access to confidential data for scientific purposes include the EHIS in the list of data sets mentioned in the Regulation Commission Regulation 520/2010 (16 June 2010) * Admissibility of requesters Belong to categories defined in EC Regulation 223/2009 and 831/2002-article3(1): universities and other higher education organisations organisations or institutions for scientific research national statistical institutes of the Member States; the European Central Bank and the national central banks If not: Commission decision to amend EC regulation 452/2004 Personal data protection: health … data, needed for purpose of preventive medicine historical, statistical or scientific research * TG EHIS, 23-24 January 2012
Procedures Bilateral agreements Eurostat – MS to release EHIS micro data For each request for anonymised data set: Eurostat analyses justification If so, Eurostat asks MS (6 weeks – 2 weeks in practice) If so, Eurostat – requester: contract with data of MS which approved For each request for access to Safe Centre: Requester works in Safe Centre Eurostat checks confidentiality on outcome results TG EHIS, 23-24 January 2012
Pre-requisites Anonymisation rules Disclosure control checks Contract template and guidelines “simple” development approach: use anonymisation practices defined for SILC, AES, LFS “complex” development approach: additional disclosure tests with specific software by country and including sample design parameters Methodology Contract TG EHIS, 23-24 January 2012
Anonymisation classify the variables which are indirect identifiers extremely identifying: high risk of disclosure single age of a person * degree of urbanisation of the area where she lives very identifying: concerns specific and personal attributes sex, country of birth, citizenship simply identifying: common in the general population (educational attainment) variables very identifying but based on subjective aspects difficult to know for an intruder (ex. disease diagnosed by a doctor) Concept + frequency (in strata according several combinations: disclosure scenarios) List outcomes – level of sensitivity TG EHIS, 23-24 January 2012
Anonymisation … EHIS wave 1 Removal extremely identifying variables Transformation: Grouping answer categories when the number of responses is too low ex. on PA "with a lot of difficulty" and "not at all" grouped) Top coding for very high and very low values which are usually rare in the population (ex. '85 +' for the upper limit of age) Special rules for specific MS where figures are quite low (< 20 people in a strata) TG EHIS, 23-24 January 2012
Anonymisation … WG PH June 2011: Consultation Round: Too severe, soften the transformation rules Simplify and speed-up process to get access Consultation Round: Feedback of AT BE BG CZ DE DK EE FI FR GR HU IE IT LT LV NL PL PT RO SI SK UK NO HR For some: not relevant (not in wave 1) or misunderstanding (not for IR wave 2) Updated proposal: major issues: INSTIT: people living in institutions: keep/drop ? AGE: years and 85+ Household Type: grouping refined Limitations: no grouping SF questions: no transformation Weight, height: unaltered Fruit consumption: unaltered Alcohol consumption: more groups (threshold of 6 drinks) TG EHIS, 23-24 January 2012
Next steps Updated proposal enough or additional need for detailed disclosure scenario analysis, if so extra study and delay Proposal to SDC group (Statistical Disclosure Control group) Written consultation of the WG Statistical Confidentiality TG EHIS, 23-24 January 2012