Download presentation
Presentation is loading. Please wait.
Published byFelix Goodwin Modified over 8 years ago
1
G. Merola Winton Capital Management 1 UN/ECE Work Session On Statistical Data Confidentiality (Geneva, 9-11 November 2005) WP30: Safety rules in statistical disclosure control for tabular data Giovanni Merola Winton Capital Management Ltd g.merola@wintoncapital.com Partially written while at ISTAT and partially supported by EU project CASC.
2
G. Merola Winton Capital Management 2 Plan of the Talk 1. SDC for Magnitude tables; 2. Existing safety rules; 3. Generalised p-rule; 4. Rational estimates; 5. Prior distribution; 6. U-estimates; 7. Comparison on real SBS data; 8. MU-rules; 9. Concluding remarks.
3
G. Merola Winton Capital Management 3 1. SDC for Magnitude Tables 15013012090504020 Total T is published n is number contrib.n Contributions in non-increasing order Income £K YoungOldAll Ages Male200600800 Female150450600 All Sexes35010501400 z1≥z1≥ z 2 ≥ z 3 ≥ z 4 ≥ ··· ≥ z n Total 600 (Old Males) Tables showing the sums of non-negative contributions in each cell. Example:
4
G. Merola Winton Capital Management 4 1. SDC for Magnitude Tables cont.d SDC policy: 1. If the categories are confidential, (likely) identification of respondents is disclosure; 2. else only the contributions of (likely) identifiable respondents cannot be disclosed (too precisely); 3. same rule for all cells, else microdata protection.
5
G. Merola Winton Capital Management 5 2. Existing Safety Rules Rare respondents are identifiable – threshold rule: n > m. Respondents with large contrib. are identifiable – Dominance: (z 1 +···+z m )/T k. Largest contributor is identifiable, hence second largest must not estimate z 1 closely – p-rule: [(T-z 2 ) -z 1 ]/z 1 > p.
6
G. Merola Winton Capital Management 6 3. Generalised p-rule Group with largest sum identifiable; group with second largest sum must not estimate largest sum too closely; z1z1 z2z2 z3z3 z4z4 ··· znzn Total is T Includes the existence of groups of respondents t2t2 R 2,2
7
G. Merola Winton Capital Management 7 3. Generalised p-rule cont.d Gen. p-rule ((T-R m,l ) -t m )/t m > p Same estimate as p-rule: maximum possible value ^t m =T-R m,l t 1 =z 1 and R 1,1 =z 2 p-rule
8
G. Merola Winton Capital Management 8 3. Generalised p-rule cont.d If zero contributions are known (external intruder): Dominance rule with k=1/(1+p) If no groups: simple p-rule; If intruding group formed of (m-1) respondents: threshold rule n>m protects against exact estimation (p=0). Merola, G. M., 2003a. Generalized risk measures for tabular data. Proceedings of the 54th Session of the International Statistical Institute.
9
G. Merola Winton Capital Management 9 4. Rational Estimates An intruder can compute a lower and an upper bound for the value of t m : For example, if z 2 =40 and T=100: 40=z 2 z 1 T- z 2 =60; the bounds are different for different prior knowledge of the intruder.
10
G. Merola Winton Capital Management 10 4. Rational Estimates cont.d for a well known property MSE is minimised by the mean t m can be estimated by minimising the Mean Square Error for some distribution F(t m ) :
11
G. Merola Winton Capital Management 11 5. Prior Distribution: Uniform The ignorance about the distribution of t m can be modelled with a Uniform distribution: in this case the mean is simply: Note: same estimate for any symmetric F. t m ~U(t m -, t m + )
12
G. Merola Winton Capital Management 12 5. Prior Distribution: maximising We refer to the Gen p-rule as M-rule, and to the that derived using the Uniform as U- rule. The Generalised p-rule can be derived by assuming a prior concentrated on the maximum value
13
G. Merola Winton Capital Management 13 6. U-estimates knows T but not n: knows T and n, knows T and L contributions, knows T, L contributions and n, either as above or * for m=L=1 uniform p-rule is same as uniform dominance (Dominance); (Gen. p-rule*) Different prior knowledge of the intruder Merola, G., 2003b. Safety rules in statistical disclosure control for tabular data. Contributi Istat 1, istituto Nazionale di Statistica, Roma.
14
G. Merola Winton Capital Management 14 6. U-estimates cont.d C=(970,376,274,253,203,169,161,121,86,62,21,10), T=2706 Rule Estimated z 1 RelErr Dom27061.8 (t 2 /T=0.5) p-rule23301.4 U-Dom13530.4 U (1:n)14650.51 U(1;1)13530.4 Example
15
G. Merola Winton Capital Management 15 7. Comparison on real SBS data We applied different rules to Italian SBS data, turnover by Region and SIC for the years ’94 and ‘97. We considered the SIC with 2 and 3 digits.
16
G. Merola Winton Capital Management 16 7. Comparison on real SBS data cont.d Mean relative error for z 1
17
G. Merola Winton Capital Management 17 7. Comparison on real SBS data cont.d Mean relative error for t 2
18
G. Merola Winton Capital Management 18 8. U-rules The values for are intervals : Knowing only T (Dominance) Knowing T and L contributions (gen p-rule)
19
G. Merola Winton Capital Management 19 9. MU-rules assuming both estimating approaches we obtain subadditive rules, analogous to p-rule but with stricter bounds
20
G. Merola Winton Capital Management 20 9. MU-rules cont.d Safety rule when only T known (Dominance) Safety rule when T and L contributions known (gen p-rule)
21
G. Merola Winton Capital Management 21 10. Conclusions The assumptions for the existing rules are unrealistic; using a simple noninformative distribution much smaller relative error of estimation; the corresponding rules are not subadditive; joining assumptions leads to stricter rules; identifiability of all largest respondents requires these rules; different prior can be used.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.