Download presentation
Presentation is loading. Please wait.
Published byRosamond Taylor Modified over 8 years ago
1
ROLE OF ANONYMIZATION FOR DATA PROTECTION Irene Schluender and Murat Sariyar (TMF)
2
Background Individual-level data used in health research contexts: EHRs hospital discharge databases Health insurance data Clinical studies Genetic datasets: 1000 genomes, HapMap, TCGA, …
3
Legal reasons for anonymisation: Privacy BASIC DATA PROTECTION CONSTRAINTS Legal framework: national Law, EU Law, no international harmonisation, only „softlaw“, e.g. Declaration of Helsinki EU Data Protection Directive: Any processing of personal data is generally prohibited, if not explicitly permitted (Article 7: “Member States shall provide that personal data may be processed only if:…) Permission must be based on law or on the consent of the data subject refer to a specific purpose difficult in „omics“ research and data mining
4
ANONYMISATION AS „MAGIC BULLET“? Article 7: “Member States shall provide that personal data may be processed only if:… Dichotomy of data protection law: anonymous versus personal data: Only personal data is protected by law Anonymous data: no consent or other legal basis needed for processing (Rec. 26: “the principles of protection shall not apply to data rendered anonymous”) Conclusion: anonymise to get rid of any data protection constraints!
5
Anonymisation vs. De-identification No HIPPA List! Removing all 18 identifiers leads to de-identified data, not to anonymous data! The list makes only sense within the context of HIPAA and cannot be transferred into the European legal framework.
6
Context and Trade-off Anonymisation is not static, but dependent on context knowlwdge „Harry Smith“ De facto anonymity is sufficient („reasonable means“) Trade-off: usefulness and re-identification risk information is reduced or distorted some of it may be relevant for research Challenges: enhanced re-identification technologies, increasing context kowledge
7
Anonymisation of genetic data? DNA sequences alone do not disclose the identity of an individual But it can be enough information to single out a person Opinion on Anonymisation Techniques (Art. 29 Working Party): “Genetic data profiles are an example of personal data that can be at risk of identification if the sole technique used is the removal of the identity of the donor due to the unique nature of certain profiles. It has already been shown in the literature that the combination of publically available genetic resources (e.g. genealogy registers, obituary, results of search engine queries) and the metadata about DNA donors (time of donation, age, place of residence) can reveal the identity of certain individuals even if that DNA was donated ‘anonymously’”.
8
Side-effects (adverse events) of anonymisation Full (unlinked) anonymisation deprives the donor of the possibility to use their right to withdraw consent (critical for biosamples/genetic data) It makes feeding back research results or incidental findings impossible It is not useful in cases where research is linked to treatment (oncology: precision medicine)
9
Therefore … Anonymisation is not a panacea to resolve any data protection issues We will have to rely on „Broad consent“ (for example as agreed with the German Ethics Committee‘s Working Group) + additional safeguards (access control etc.) Broad consent will (hopefully) supported by GDPR (Rec. 25aa)
10
The technical perspective
11
What is anonymization? ISO 29100:2011: “Anonymization is the process by which personally identifiable information (PII) is irreversibly altered in such a way that a PII principal can no longer be identified directly or indirectly, either by the PII controller alone or in collaboration with any other party.”
12
Relevant terms: Kind of Attributes Kind of attributes: (1)Unique Identifiers (e.g., social security number) (2)Quasi-Identifiers (e.g., Zip-Code) => QIDs (3)Sensitive attributes (exhibiting a special characteristic) (4)Non-sensitive attributes
13
Relevant terms: Quasi-Identifier OECD-Definition for a Quasi-Identifier: Variable values or combinations of variable values within a dataset that are not structural uniques but might be empirically unique and therefore in principle uniquely identify a population unit. Should contain an attribute A if an attacker could potentially obtain A from other external resources. QIDs (5-digit ZIP code, birth date, gender) uniquely identify 87% of the population in the U.S.
14
Important Anonymization techniques Generalization and Suppression (hide some details in QID) Replace some values with a parent value in a taxonomy Full-domain and local (subtree, cell) generalization Suppression (see former slide) Anatomization and Permutation (structural changes) Deassociate the relationship between QIDs and sensitive attributes Partition into groups and shuffle sensitive values within each group Perturbation Additive Noise (Randomization; independent of other recs => data streams), Data swapping, synthetic data generation
15
Anonymization techniques: Cave These are criteria not techniques: K-Anonymity L-Diversity T-Closeness And there is no hierarchy! K-Anonymity protects against identity disclosure L-diversity and T-Closeness protect against attribute disclosure There are more definitions for L-diversity
16
Anonymization techniques: generalization
17
Conclusion Creating an anonymous dataset whilst retaining as much of the underlying information as required for the task (usefulness) is done by technical means However … The legal perspective should correspond with the technical one (e.g., regarding the definition of sensitive attributes)
18
References BCM Fung et al. Privacy-preserving data publishing: A survey of recent developments. 2010 (ACM Computing Surveys) L Sweeney. K-anonymity: a model for protecting privacy. 2002 (International Journal on Uncertainty, Fuzziness and Knowledge-based Systems) CC Aggarwal. Privacy-Preserving Data Mining: Models and Algorithms (Advances in Database Systems). 2008 (Springer)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.