Disclosure Control in Practice: issues and approaches Andy Sutherland Health and Social Care Information Centre
Outline Background – transparency, open data, confidentiality, Code of Practice and other requirements Basics of disclosure control Approaches used Issues Reflections
Background Transparency, open data Publish in as much detail as possible Make machine readable Allow and encourage re-use Confidentiality Data Protection Act, Common Law requirements etc.
Code of Practice Principle 5, practice 1 “Ensure that official statistics do not reveal the identity of an individual, or any private information relating to them, taking into account other relevant sources of information.” Principle 5, practice 4 “Ensure that arrangements for confidentiality are sufficient to protect the privacy of individual information, but not so restrictive as to limit unduly the practical utility of official statistics.” National Statistician’s Guidance
Other Guidance ONS work on health control-of-health-statistics/index.html Scottish Government guidance ossary Various consultations ongoing anonymisation-code-of-practice aspx DH v ICO [abortion statistics case] anonymisedstatistics.htm
User comment “…Basically ONS and IC only care about disclosure control and don't give a toss as to whether data are any use to users.”
Why disclosure control is needed? Basic revision class! Number of A+E consultants by hospital, March 2012 TrustTotal Ashfield4 Beetown1 Corstone5
Why disclosure control is needed? Basic revision class! Number of A+E consultants by hospital and ethnicity, March 2012 TrustTotalWhiteBlackOther Ashfield4211 Beetown1010 Corstone5230
HSCIC process and approaches 150 publications per year Other releases Ad-hoc queries Data access or analysis systems Standard risk assessment process “Small Numbers Panel” assesses complex cases
Small Numbers Panel Head of Profession for Statistics (Chair) Head of Information Governance Programme Manager, Information Services statistical, legal and business/user input.
Issues (1) Understanding of scope Distinguishing cases where disclosure control is needed (“I don’t want inadvertently to release identifiable information”) from those where different legal approaches are needed (“I know this is identifiable but I need to do it anyway”).
Issues (2) Seeing the wider context Proposal to publish practice-level prescribing data Legality Level of granularity and frequency of publication Feasibility Costs, benefits and risks Perverse outcomes
Issues (3) – Maternity tables Enhanced, easier for users to interpret Overview of main delivery types Easy to compare (in one table) Available as automated reports to provider level - Server?siteID=1937&categoryID= Server?siteID=1937&categoryID=1815 Unexpected consequences More suppression due to tables within tables ‘Unknown’ values were used for secondary suppression, these are used to calculate rates; now try to avoid using for secondary suppression.
Method of delivery ( ) Unable to aggregate to SHA level Unable to aggregate delivery types (e.g. Spontaneous), therefore cannot calculate rates
Method of delivery ( ) Able to used aggregated data (SHA level) Able to use aggregated data (Delivery types), therefore can calculate rates Unable to calculate rates as lots of ‘Unknowns’ are suppressed Rate = Spontaneous / (Total – Unknown)
Suppression Example Table D: Method of delivery – example ( ) Primary suppression All values equal to 5 or less (excluding unknowns)
Suppression Example Table D: Method of delivery – example ( ) Secondary suppression All values corresponding to primary suppressed values Row and column, effectively four tables ‘Other’ suppressed, therefore also ‘Unknown’ – unable to calculate the rate
Suppression Example Table D: Method of delivery – example ( ) Suppression Similar primary and secondary suppression values ‘Other’ no longer suppressed as not disclosive Therefore ‘Unknown’ not suppressed, can calculate rate
Issues (4) Blanket protocols Can be difficult to adapt in light of changing environment, and act as a brake on wider release Often need to suppress as a whole rather than just where disclosure is an issue Often needed as individual manual suppression can be time consuming
Issues (5) Implications of providing “systems” and machine readable files, rather than just reports Allows potentially disclosive cross classifications to be produced Standard primary and secondary suppression approach breaks down Record swapping (cf census) is a possibility For our less critical applications prefer a combination of primary suppression and rounding
Issues (5) Understanding the data and risks Clinical Audits Classic disclosure control problem with sensitive data overlaid by incomplete (but improving) data collection. Risk management approach likely to change in time, and may become more difficult when data is better!
Reflections No approach is infallible – it is a matter of assessing risk Important to consider user needs This is one (important) component of the release process Don’t assume more information will be more helpful! Blanket protocols should allow some “flex” “Jigsaw identification” remains a worry
Final word Our approaches and their outcomes are on our website. Feel free to inspect and comment.