Presentation is loading. Please wait.

Presentation is loading. Please wait.

Confidentiality on the Fly

Similar presentations


Presentation on theme: "Confidentiality on the Fly"— Presentation transcript:

1 Confidentiality on the Fly
LAMAS Working Group, 7 Dec 2017 Agenda Item 3.1 Fabian BACH ESTAT.B.1 Eurostat

2 Outline Introduction: "Confidentiality on the fly"
Method principle: Random noise (cell key method) Application to LFS ad-hoc tables & implications Opinion of the Expert Group on Statistical Disclosure Control Eurostat

3 Confidentiality on the fly in short
1. Introduction Confidentiality on the fly in short Automatic machinery: Table builder + confidentiality treatment Safe tables Microdata Eurostat

4 Confidentiality on the fly in short
1. Introduction Confidentiality on the fly in short Automatic machinery: Table builder + confidentiality treatment Safe tables Microdata Key requirements for confidentiality treatment: consistent automatic Eurostat

5 Confidentiality on the fly in short
1. Introduction Confidentiality on the fly in short Automatic machinery: Table builder + confidentiality treatment Safe tables Microdata Key requirements for confidentiality treatment: Live example: ABS TableBuilder  public tool by the Australian Bureau of Statistics (ABS)  uses random noise ("cell key method" developed by ABS) consistent automatic Eurostat

6 2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Eurostat

7 2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 1: assign each record a random number (record key, "Rkey") between 1 and max. cell key (here 200) ID Region Sex Age 1 A M 31 2 F 47 3 B 22 Eurostat

8 2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 1: assign each record a random number (record key, "Rkey") between 1 and max. cell key (here 200) ID Region Sex Age Rkey 1 A M 31 54 2 F 47 104 3 B 22 93 Eurostat

9 2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 2: Create the tables (unweighted frequencies here!). Sex Age M F 0-15 . 16-24 4 25-34 Eurostat

10 2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 2: Create the tables (unweighted frequencies here!). For each cell, sum all Rkeys modulo 200 ID Rkey 2 104 4 61 56 7 72 90 Sex Age M F 0-15 . 16-24 4 25-34 Eurostat

11 2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 2: Create the tables (unweighted frequencies here!). For each cell, sum all Rkeys modulo 200 ID Rkey 2 104 4 61 56 7 72 90 Sex Age M F 0-15 . 16-24 4 25-34 Ckey = Σ(Rkey) mod 200 = 262 mod 200 = 62 Eurostat

12 2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 3: Use pre-defined perturbation table ("p-table") to get noise value Eurostat

13 2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 4: Add noise value to cell Sex Age M F 0-15 . 16-24 4 25-34 +1 Eurostat

14 2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 4: Add noise value to cell Sex Age M F 0-15 . 16-24 4 25-34 Sex Age M F 0-15 . 16-24 5 25-34 +1 Eurostat

15 2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 4: Add noise value to cell Pseudo-random: will always be +1 for this cell, due to Ckey Fixed noise variance through ptable design Sex Age M F 0-15 . 16-24 4 25-34 Sex Age M F 0-15 . 16-24 5 25-34 +1 Eurostat

16 Application to LFS ad-hoc tables
Variant proposed by ABS for weighted tables: ID weight 2 0.23 4 0.2 56 72 0.42 Sex Age M F 0-15 . 16-24 1.08 25-34 = Σ(record weights) Eurostat

17 Application to LFS ad-hoc tables
Variant proposed by ABS for weighted tables: ID weight 2 0.23 4 0.2 56 72 0.42 Sex Age M F 0-15 . 16-24 4*0.27 25-34 = Σ(record weights) = avg. weight * #obs. Add random noise here! 3 variances examined: 0.25 / 0.5 / 1.0 Eurostat

18 Application to LFS ad-hoc tables
Variant proposed by ABS for weighted tables: Add random noise to unweighted observations Sex Age M F 0-15 . 16-24 4*0.27 25-34 Sex Age M F 0-15 . 16-24 5*0.27 25-34 +1 obs. Eurostat

19 Application to LFS ad-hoc tables
Variant proposed by ABS for weighted tables: Add random noise to unweighted observations Multiply by cell's average weight Absolute noise size is capped by input variance  becomes negligible when table values are big (next slide) Sex Age M F 0-15 . 16-24 1.08 25-34 Sex Age M F 0-15 . 16-24 1.35 25-34 +1 obs. Eurostat

20 Implications for LFS ad-hoc tables
Generally no additivity inside a table: e.g. males, females and total aged born outside EU28 in an elementary occupation (from BE 2015 data) 1.Males 2.Females TOTAL manual sum VALUE 1.87 2.39 4.27 VALUENEW 2.01 4.40 Eurostat

21 Implications for LFS ad-hoc tables
Generally no additivity inside a table: e.g. males, females and total aged born outside EU28 in an elementary occupation (from BE 2015 data)  always extract table (sub-)totals for highest accuracy! 1.Males 2.Females TOTAL noise manual sum VALUE 1.87 2.39 4.27 - VALUENEW 2.01 0 % 4.40 % Eurostat

22 Implications for LFS ad-hoc tables
Generally no additivity inside a table: e.g. males, females and total aged born outside EU28 in an elementary occupation (from BE 2015 data)  always extract table (sub-)totals for highest accuracy! Noise magnitude ± ~few obs.  within typical error  negligible for big values 1.Males 2.Females TOTAL noise manual sum VALUE 1.87 2.39 4.27 - VALUENEW 2.01 0 % 4.40 % VALUE VALUENEW noise 0.65 0.75 % ~0.0 % Eurostat

23 4. EG SDC opinion Opinion of the EG SDC 14 Nov: Presentation to the Expert Group on Statistical Disclosure Control (EG SDC) The EG SDC … … welcomed the general approach and way forward … including particularly the LFS proposal (as one data pilot) … stressed the need for clear explanations to all parties: Data providers Data users Eurostat


Download ppt "Confidentiality on the Fly"

Similar presentations


Ads by Google