Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESSNET Data Integration - Rome, January 2010 ESSNET on Statistical Disclosure Control Daniela Ichim.

Similar presentations


Presentation on theme: "ESSNET Data Integration - Rome, January 2010 ESSNET on Statistical Disclosure Control Daniela Ichim."— Presentation transcript:

1 ESSNET Data Integration - Rome, January 2010 ESSNET on Statistical Disclosure Control Daniela Ichim

2 ESSNET Data Integration - Rome, January 2010 Outline 1.ESSNET SDC 2.Record linkage and SDC 3.Statistical matching and SDC

3 ESSNET Data Integration - Rome, January 2010 ESSNET SDC Pilot ESSnet, 2008-2009 12 Participants: CBS (coordinator), Istat, Destatis, ONS, Statistics Sweeden, Statistics Austria, Statistics Norway, Portugal INE, …. 3 sub-contractors: University Rovira I Virgili, University of Naples, IAB Germany Web-site: http://neon.vb.cbs.nl/casc/http://neon.vb.cbs.nl/casc/

4 ESSNET Data Integration - Rome, January 2010 Before ESSNET SDC 4rd Framework SDC-project (1996-1998) 5th Framework CASC project (2000- 2003)CASC project CENEX project (2006)CENEX project Aim: enhance the development in the field of statistical confidentiality 1. methodological 2. software 3. practice, practice, practice, …

5 ESSNET Data Integration - Rome, January 2010 Before ESSNET SDC Outputs: 1.Argus software 2.Handbook on SDC 3.Conferences (PSD) 4.Methodological papers web-site International journals

6 ESSNET Data Integration - Rome, January 2010 ESSNET SDC Main goal: raise the level of knowledge and skills to a higher level 1.promotion of the results achieved so far 2.make SDC tools more easily applicable 3.Involvement of “new” NSIs 4.Coordination at ESS level Main outputs: 1.Improved versions of Argus/handbook 2.Dissemination Training courses Reports and case studies

7 ESSNET Data Integration - Rome, January 2010 Record linkage and SDC Link: MICRODATA SDC: I.measure the disclosure risk II.release of microdata files (PUF, MFR)

8 ESSNET Data Integration - Rome, January 2010 I. Standard disclosure scenario Assumptions: 1.The intruder has access to an external register (E) 2.E covers the whole population 3.E and D share a set of (key) variables, measured without error 4.The intruder uses record linkage to match a unit in the sample to one in the population using only the key variables 5.… Risk Measures: 1.Number of “linked” units 2.Probability of correct identification = Probability of correct linkage

9 ESSNET Data Integration - Rome, January 2010 I. RL used in SDC 1.Distance-based RL (Domingo-Ferrer) –linking each record d in file D to its nearest record e in file E –Mainly for continuous variables (business data) 2.Probabilistic RL (Skinner) –Classical framework –Mainly for categorical variables (social data)

10 ESSNET Data Integration - Rome, January 2010 I. RL and Risk QUALITY 1.External register –Coverage –Misclassification errors –Which variables? Which registers?... 2.Disseminated microdata file –Misclassification errors (known pattern, known protection parameters, etc.) –Usage (in RL) of the publicly available information: a)Sampling design (stratification, survey weights) b)Known population characteristics (M/F) c)Hierarchical file structure (HH, enterprise-local unit) d)Ideal (worst) case: true whole population – a (unique) correct link exists ….

11 ESSNET Data Integration - Rome, January 2010 II. RL and Release Integrate THEN Disseminate Grant access to composite microdata covering a wider range of variables –More careful management of the risks of disclosure (+ the previous slide + the increased confidentiality/sensitivity of integrated data sets) –Impact on analyses

12 ESSNET Data Integration - Rome, January 2010 Statistical Matching and SDC (Y,X) (X,Z) X (Y,X,Z)

13 ESSNET Data Integration - Rome, January 2010 Statistical Matching and SDC ABC a1b1c1? a1b1c2? a1b1c3? a1b2c1? a1b2c2? a1b2c3? a2b1c1? a2b1c2? a2b1c3? a2b2c1? a2b2c2? a2b2c3? BC b1b2c1c2c3 Aa1 a1. a1 a2 a2. a2 b.1b.211c.1c.2c.3 ABC LBUB a1b1c1 …… a1b1c2 …… a1b1c3 …… a1b2c1 …… a1b2c2 …… a1b2c3 …… a2b1c1 …… a2b1c2 …… a2b1c3 …… a2b2c1 …… a2b2c2 …… a2b2c3 …… Frechèt Bounds

14 ESSNET Data Integration - Rome, January 2010 How to use a released microdata file in a statistical matching procedure? Issues: 1.Use protection/perturbation information to improve the statistical matching performance 2.Impact on statistical analyses. Statistical Matching and Release

15 ESSNET Data Integration - Rome, January 2010 Conclusions 1.Change (improve/adapt) the DI process to account for microdata files with (some) known properties 2.Change (improve/adapt) the SDC process to account for the latest methodological and technological DI developments 3.PRACTICE Step-by-step approach!!!


Download ppt "ESSNET Data Integration - Rome, January 2010 ESSNET on Statistical Disclosure Control Daniela Ichim."

Similar presentations


Ads by Google