Operationalising ‘safe statistics’ the case of linear regression Felix Ritchie Bristol Business School, University of the West of England, Bristol
Background: output SDC Safe statistics in principle Making it work: regression and totals What does ‘safe’ mean? Plan
Researchers increasingly using very sensitive data ‘Traditional’ SDC research (tables and anonymisation) of limited relevance Need rules for generalised output ‘Output SDC’ Ideally, principles-based Background: output SDC
How do you devise guidelines for output when everything is possible? The ‘research zoo’ –Separate lions from rabbits –Focus on the lions –Forget about the rabbits Making output SDC manageable
Define a statistic (sum, regression, odds ratio, index etc) as ‘safe’ or ‘unsafe’ –safe: release unless there’s a reason –unsafe: don’t release unless shown to be safe in the specific context SDC efforts concentrated on problematic output ‘Safe statistics’
1.Define the functional form 2.Identify the disclosure potential 1.Can it directly reveal a single data point? 2.Can it be differenced? 3.Anything else? 3.If provisionally ‘safe’, identify special cases 4.Draft guidelines Categorising ‘safe statistics’
Example: regression coefficients
Example: total
Safe statistics decision chart
We can’t –nor are we trying to ‘safe’ = ‘for all practical purposes posing no significant disclosure risk’ ‘unsafe’ = ‘high risk; spend time on this’ How can we say X is unconditionally ‘safe’?
Everything has a theoretical risk Resources are limited Overall risk protection is maximised by concentrating on real risks …and also the non-experienced Risk assessment for grown-ups
Questions