Download presentation
Presentation is loading. Please wait.
Published bySamantha Rice Modified over 9 years ago
1
Disclosure detection & control in research environments Felix Ritchie
2
Why are research environments special? Little disclosure control on input Few limits on processing Unpredictable, complex outputs –an infinity of “special cases” Manual review for disclosiveness required
3
Problems of reviewing research outputs Limited application of rules How do we ensure –consistency? –transparency? –security? How do we do this with few resources?
4
Classifying the research zoo Some outputs inherently “safe” Some inherently “unsafe” Concentrate on the unsafe –Focus training –Define limits –Discourage use
5
Safe versus unsafe Safe outputs –Will be released unless certain conditions arise Unsafe outputs –Won’t be released unless demonstrated to be safe Examples: * = conditions for release apply UnsafeSafeIndeterminate QuantilesLinear regression*Herfindahl indexes GraphsPanel data estimates Aggregated tables Cross-product matrices Estimated covariances*
6
Determining safety Key is to understand whether the underlying functional form is safe or unsafe Each output type assessed for risk of –Primary disclosure –Disclosure by differencing
7
Example: linear aggregates of data are unsafe Inherent disclosiveness: –Differencing is feasible each data point needs to be assessed for threshold/dominance limits => resource problem for large datasets Disclosure by differencing:
8
Example: linear regression coefficients are safe Let can’t identify single data point But No risk of differencing Exceptions –All right hand variables public and an excellent fit (easily tested, can generate automatic limits on prediction) –All observations on a single person/company –Must be a valid regression
9
Example: cross-product/variance-covariance matrices –Can’t create a table for X unless Z=X and W=I weighted covariance matrix is safe Cross product matrix M = (X’X) is unsafe Frequencies/totals identified by interaction with constant And for any other categorical variables What about variance-covariance matrices? –V is unsafe – can be inverted to produce M –But in the more general case
10
Example: Herfindahl indices Safe as long as at least 3 firms in the industry? No: –Quadratic term exacerbates dominance –If second-largest share is much smaller, H share of largest firm –Standard dominance rule of largest unit<45% share doesn’t prevent this Current tests for safety not very satisfactory Composite index of industrial concentration
11
Questions? Felix Ritchie Microdata Analysis and User Support Office for National Statistics felix.ritchie@ons.gov.uk +44 1633 45 5846
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.