Presentation is loading. Please wait.

Presentation is loading. Please wait.

Disclosure detection & control in research environments Felix Ritchie.

Similar presentations


Presentation on theme: "Disclosure detection & control in research environments Felix Ritchie."— Presentation transcript:

1 Disclosure detection & control in research environments Felix Ritchie

2 Why are research environments special? Little disclosure control on input Few limits on processing Unpredictable, complex outputs –an infinity of “special cases”  Manual review for disclosiveness required

3 Problems of reviewing research outputs Limited application of rules How do we ensure –consistency? –transparency? –security? How do we do this with few resources?

4 Classifying the research zoo Some outputs inherently “safe” Some inherently “unsafe” Concentrate on the unsafe –Focus training –Define limits –Discourage use

5 Safe versus unsafe Safe outputs –Will be released unless certain conditions arise Unsafe outputs –Won’t be released unless demonstrated to be safe Examples: * = conditions for release apply UnsafeSafeIndeterminate QuantilesLinear regression*Herfindahl indexes GraphsPanel data estimates Aggregated tables Cross-product matrices Estimated covariances*

6 Determining safety Key is to understand whether the underlying functional form is safe or unsafe Each output type assessed for risk of –Primary disclosure –Disclosure by differencing

7 Example: linear aggregates of data are unsafe Inherent disclosiveness: –Differencing is feasible  each data point needs to be assessed for threshold/dominance limits => resource problem for large datasets Disclosure by differencing:

8 Example: linear regression coefficients are safe Let  can’t identify single data point But  No risk of differencing Exceptions –All right hand variables public and an excellent fit (easily tested, can generate automatic limits on prediction) –All observations on a single person/company –Must be a valid regression

9 Example: cross-product/variance-covariance matrices –Can’t create a table for X unless Z=X and W=I  weighted covariance matrix is safe Cross product matrix M = (X’X) is unsafe Frequencies/totals identified by interaction with constant And for any other categorical variables What about variance-covariance matrices? –V is unsafe – can be inverted to produce M –But in the more general case

10 Example: Herfindahl indices Safe as long as at least 3 firms in the industry? No: –Quadratic term exacerbates dominance –If second-largest share is much smaller,  H  share of largest firm –Standard dominance rule of largest unit<45% share doesn’t prevent this Current tests for safety not very satisfactory Composite index of industrial concentration

11 Questions? Felix Ritchie Microdata Analysis and User Support Office for National Statistics felix.ritchie@ons.gov.uk +44 1633 45 5846


Download ppt "Disclosure detection & control in research environments Felix Ritchie."

Similar presentations


Ads by Google