Disclosure detection & control in research environments Felix Ritchie.

Disclosure detection & control in research environments Felix Ritchie

Why are research environments special? Little disclosure control on input Few limits on processing Unpredictable, complex outputs –an infinity of “special cases”  Manual review for disclosiveness required

Problems of reviewing research outputs Limited application of rules How do we ensure –consistency? –transparency? –security? How do we do this with few resources?

Classifying the research zoo Some outputs inherently “safe” Some inherently “unsafe” Concentrate on the unsafe –Focus training –Define limits –Discourage use

Safe versus unsafe Safe outputs –Will be released unless certain conditions arise Unsafe outputs –Won’t be released unless demonstrated to be safe Examples: * = conditions for release apply UnsafeSafeIndeterminate QuantilesLinear regression*Herfindahl indexes GraphsPanel data estimates Aggregated tables Cross-product matrices Estimated covariances*

Determining safety Key is to understand whether the underlying functional form is safe or unsafe Each output type assessed for risk of –Primary disclosure –Disclosure by differencing

Example: linear aggregates of data are unsafe Inherent disclosiveness: –Differencing is feasible  each data point needs to be assessed for threshold/dominance limits => resource problem for large datasets Disclosure by differencing:

Example: linear regression coefficients are safe Let  can’t identify single data point But  No risk of differencing Exceptions –All right hand variables public and an excellent fit (easily tested, can generate automatic limits on prediction) –All observations on a single person/company –Must be a valid regression

Example: cross-product/variance-covariance matrices –Can’t create a table for X unless Z=X and W=I  weighted covariance matrix is safe Cross product matrix M = (X’X) is unsafe Frequencies/totals identified by interaction with constant And for any other categorical variables What about variance-covariance matrices? –V is unsafe – can be inverted to produce M –But in the more general case

Example: Herfindahl indices Safe as long as at least 3 firms in the industry? No: –Quadratic term exacerbates dominance –If second-largest share is much smaller,  H  share of largest firm –Standard dominance rule of largest unit<45% share doesn’t prevent this Current tests for safety not very satisfactory Composite index of industrial concentration

Questions? Felix Ritchie Microdata Analysis and User Support Office for National Statistics felix.ritchie@ons.gov.uk +44 1633 45 5846

Disclosure detection & control in research environments Felix Ritchie.

Similar presentations

Presentation on theme: "Disclosure detection & control in research environments Felix Ritchie."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Disclosure detection & control in research environments Felix Ritchie.

Similar presentations

Presentation on theme: "Disclosure detection & control in research environments Felix Ritchie."— Presentation transcript:

Similar presentations

About project

Feedback