Download presentation
Presentation is loading. Please wait.
1
Confidentiality on the Fly
LAMAS Working Group, 7 Dec 2017 Agenda Item 3.1 Fabian BACH ESTAT.B.1 Eurostat
2
Outline Introduction: "Confidentiality on the fly"
Method principle: Random noise (cell key method) Application to LFS ad-hoc tables & implications Opinion of the Expert Group on Statistical Disclosure Control Eurostat
3
Confidentiality on the fly in short
1. Introduction Confidentiality on the fly in short Automatic machinery: Table builder + confidentiality treatment Safe tables Microdata Eurostat
4
Confidentiality on the fly in short
1. Introduction Confidentiality on the fly in short Automatic machinery: Table builder + confidentiality treatment Safe tables Microdata Key requirements for confidentiality treatment: consistent automatic Eurostat
5
Confidentiality on the fly in short
1. Introduction Confidentiality on the fly in short Automatic machinery: Table builder + confidentiality treatment Safe tables Microdata Key requirements for confidentiality treatment: Live example: ABS TableBuilder public tool by the Australian Bureau of Statistics (ABS) uses random noise ("cell key method" developed by ABS) consistent automatic Eurostat
6
2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Eurostat
7
2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 1: assign each record a random number (record key, "Rkey") between 1 and max. cell key (here 200) ID Region Sex Age … 1 A M 31 2 F 47 3 B 22 Eurostat
8
2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 1: assign each record a random number (record key, "Rkey") between 1 and max. cell key (here 200) ID Region Sex Age … Rkey 1 A M 31 54 2 F 47 104 3 B 22 93 Eurostat
9
2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 2: Create the tables (unweighted frequencies here!). Sex Age M F 0-15 . 16-24 4 25-34 … Eurostat
10
2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 2: Create the tables (unweighted frequencies here!). For each cell, sum all Rkeys modulo 200 ID Rkey 2 104 4 61 56 7 72 90 Sex Age M F 0-15 . 16-24 4 25-34 … Eurostat
11
2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 2: Create the tables (unweighted frequencies here!). For each cell, sum all Rkeys modulo 200 ID Rkey 2 104 4 61 56 7 72 90 Sex Age M F 0-15 . 16-24 4 25-34 … Ckey = Σ(Rkey) mod 200 = 262 mod 200 = 62 Eurostat
12
2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 3: Use pre-defined perturbation table ("p-table") to get noise value Eurostat
13
2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 4: Add noise value to cell Sex Age M F 0-15 . 16-24 4 25-34 … +1 Eurostat
14
2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 4: Add noise value to cell Sex Age M F 0-15 . 16-24 4 25-34 … Sex Age M F 0-15 . 16-24 5 25-34 … +1 Eurostat
15
2. Method principle Cell key method Variant of random noise developed by the Australian Bureau of Statistics: strictly consistent through cell keys, e.g. 1…200 Algorithm – step 4: Add noise value to cell Pseudo-random: will always be +1 for this cell, due to Ckey Fixed noise variance through ptable design Sex Age M F 0-15 . 16-24 4 25-34 … Sex Age M F 0-15 . 16-24 5 25-34 … +1 Eurostat
16
Application to LFS ad-hoc tables
Variant proposed by ABS for weighted tables: ID weight 2 0.23 4 0.2 56 72 0.42 Sex Age M F 0-15 . 16-24 1.08 25-34 … = Σ(record weights) Eurostat
17
Application to LFS ad-hoc tables
Variant proposed by ABS for weighted tables: ID weight 2 0.23 4 0.2 56 72 0.42 Sex Age M F 0-15 . 16-24 4*0.27 25-34 … = Σ(record weights) = avg. weight * #obs. Add random noise here! 3 variances examined: 0.25 / 0.5 / 1.0 Eurostat
18
Application to LFS ad-hoc tables
Variant proposed by ABS for weighted tables: Add random noise to unweighted observations Sex Age M F 0-15 . 16-24 4*0.27 25-34 … Sex Age M F 0-15 . 16-24 5*0.27 25-34 … +1 obs. Eurostat
19
Application to LFS ad-hoc tables
Variant proposed by ABS for weighted tables: Add random noise to unweighted observations Multiply by cell's average weight Absolute noise size is capped by input variance becomes negligible when table values are big (next slide) Sex Age M F 0-15 . 16-24 1.08 25-34 … Sex Age M F 0-15 . 16-24 1.35 25-34 … +1 obs. Eurostat
20
Implications for LFS ad-hoc tables
Generally no additivity inside a table: e.g. males, females and total aged born outside EU28 in an elementary occupation (from BE 2015 data) 1.Males 2.Females TOTAL manual sum VALUE 1.87 2.39 4.27 VALUENEW 2.01 4.40 Eurostat
21
Implications for LFS ad-hoc tables
Generally no additivity inside a table: e.g. males, females and total aged born outside EU28 in an elementary occupation (from BE 2015 data) always extract table (sub-)totals for highest accuracy! 1.Males 2.Females TOTAL noise manual sum VALUE 1.87 2.39 4.27 - VALUENEW 2.01 0 % 4.40 % Eurostat
22
Implications for LFS ad-hoc tables
Generally no additivity inside a table: e.g. males, females and total aged born outside EU28 in an elementary occupation (from BE 2015 data) always extract table (sub-)totals for highest accuracy! Noise magnitude ± ~few obs. within typical error negligible for big values 1.Males 2.Females TOTAL noise manual sum VALUE 1.87 2.39 4.27 - VALUENEW 2.01 0 % 4.40 % VALUE VALUENEW noise 0.65 0.75 % ~0.0 % Eurostat
23
4. EG SDC opinion Opinion of the EG SDC 14 Nov: Presentation to the Expert Group on Statistical Disclosure Control (EG SDC) The EG SDC … … welcomed the general approach and way forward … including particularly the LFS proposal (as one data pilot) … stressed the need for clear explanations to all parties: Data providers Data users Eurostat
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.