Download presentation
Presentation is loading. Please wait.
Published byDouglas Rice Modified over 9 years ago
1
European Conference on Quality in Official Statistics, Rome, July 2008 Community Innovation Survey: a Flexible Approach to the Dissemination of Microdata Files for Research Daniela Ichim
2
European Conference on Quality in Official Statistics, Rome, July 2008 Outline Dissemination of Microdata Files for Research Risk assessment Disclosure limitation Data quality –Record linkage –Data utility
3
European Conference on Quality in Official Statistics, Rome, July 2008 Confidentiality against Dissemination Find the right balance! Disclosure scenarios
4
European Conference on Quality in Official Statistics, Rome, July 2008 Community Innovation Survey IDENTIFYING VARIABLES –Nace –Nuts –Size –Turnover (TURN) (STRUCTURAL VARIABLES) CONFIDENTIAL VARIABLES –Expenditures in innovation (RTOT, …) –Number of patents, … (VARIABLES INVOLVED IN ANALYSES)
5
European Conference on Quality in Official Statistics, Rome, July 2008 Confounding Categorical Numerical safe unsafe A … A k-anonymity
6
European Conference on Quality in Official Statistics, Rome, July 2008 a) Given a threshold (on units) b) Local Outlier Factor as a measure of difference in density between a unit and its nearest neighbours General risk function Distance between and Density around :
7
European Conference on Quality in Official Statistics, Rome, July 2008 Threshold - dissemination policy Parameters Cut-off point for density (LOF) –quantiles –automatic
8
European Conference on Quality in Official Statistics, Rome, July 2008 Stratification variables Analysis by Nace Nace A all Nace
9
European Conference on Quality in Official Statistics, Rome, July 2008 Disclosure limitation MFR Selective masking k-anonymity Nearest neighbour Micro-aggregation on tails
10
European Conference on Quality in Official Statistics, Rome, July 2008 Quality assessment Dissemination Confidentiality
11
European Conference on Quality in Official Statistics, Rome, July 2008 Risk measure assessment Quality of the external database D E Chambers of Commerce database Record linkage
12
European Conference on Quality in Official Statistics, Rome, July 2008 Record linkage M*=3 1 equal unit within 10% less than 3 units within 10% less than 3 units within 20% less than 3 units within 30% NACE 88%84%97%100% NACE EMP 63%60% a 74% a 87% a M*=5 1 equal unit within 10% less than 5 units within 10% less than 5 units within 20% less than 5 units within 30% NACE 88%73%87%96% NACE EMP 63%58% a 70% a 80% a a) 100% for enterprises with more than 250 employees
13
European Conference on Quality in Official Statistics, Rome, July 2008 Information content analysis Information preservation Selective masking –Data utility –Only identifying and confidential variables were modified. –Only records at risk were modified. The weights were not modified. –weighted totals (coherence with the already published information) Some statistical indicators were slightly modified: variances
14
European Conference on Quality in Official Statistics, Rome, July 2008 Information content analysis Data utility Assessment of the perturbation impact on ratios like RTOT/TURN Original Selective masking Individual ranking
15
European Conference on Quality in Official Statistics, Rome, July 2008 Conclusions 1.Confidentiality: Risk measure based on the k- anonymity principle Flexible a) continuous and categorical variables b) easy to implement c) consistent for extreme choices 2.Data utility: Selective protection to achieve the k- anonymity 3.Comparable dissemination: Control both risk of re-identification and information loss QUALITY DIMENSIONS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.