Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.

Slides:



Advertisements
Similar presentations
Developments in Output Geography for 2011 Census ESRC 2011 Census conference, 7-8 July 2011 Andy Tait ONS Geography.
Advertisements

Will 2011 be the last Census of its kind in England and Wales? Roma Chappell, Programme Director Beyond 2011 Office for National Statistics, July 2011.
Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009.
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
The Census Area Statistics Myles Gould Understanding area-level inequality & change.
Progress on the SDC Strategy for the 2011 Census 23 rd June 2008 Keith Spicer and Caroline Young.
Data linking – Project update 15 th May 2012 – Homecare & SDS event Atlantic Quay Ellen Lynch & Euan Patterson.
Census 2011 Output Content and User Consultation Joe Traynor.
The future of output geography - geography policy for NeSS Robert Heyward ONS.
The micro-geography of UK demographic change Paul Norman School of Geography, University of Leeds Understanding Population Trends and Processes.
Regional Workshop for African Countries on Compilation of Basic Economic Statistics Pretoria, July 2007 Administrative Data and their Use in Economic.
Weighting and Imputation for CORE Social Housing Statistics Julia Bowman & Niall Goulding.
Internet access to UK Census interaction data: that's WICID! John Stillwell Centre for Computational Geography University of Leeds, Leeds LS2 9JT
EGM – Population & Housing Censuses Eurostat / UNECE - Geneva - 24/25 May 2012 Beyond 2011 The future of population statistics (England & Wales) Alistair.
Developments with ONS’ Small Area Population Estimates Project Andy Bates.
Northern Ireland Neighbourhood Information Service - NINIS Fiona Johnston Neighbourhood Statistics NISRA.
2001 Census Programme Delivering UK Census Data to Researchers: Progress and Challenges David Martin University of Southampton and ESRC/JISC Census Programme.
Methods of Geographical Perturbation for Disclosure Control Division of Social Statistics And Department of Geography Caroline Young Supervised jointly.
David Martin Department of Geography University of Southampton 2001 Census: the emergence of a new geographical framework.
The Use of Administrative Sources for Economic Statistics An Overview Steven Vale Office for National Statistics UK.
RGS-IBG Online CPD course in GIS Analysing Data using WebGIS: The Office of National Statistics Session 3.
Census.ac.uk Census Area Statistics and Casweb David Rawnsley Census Dissemination Unit (CDU) Mimas University of Manchester.
Beyond 2011 – A new paradigm for population statistics? Pete Benton, Beyond 2011 Programme Director Office for National Statistics, UK.
Geographical Data Products Carol Blackwood UKBORDERS 3 rd July 2012.
‘Estimating with Confidence’ and hindsight: Population estimates for areas smaller than districts, revisions to levels of 1991 Census non-response Paul.
GEOG3025 Census and administrative data sources 3: Integration and future development.
GEOG3025 Census and administrative data sources 2: Outputs and access.
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
Using Research to Inform Geographic Policy Best-fitting from Output Areas to Higher Geographies.
Intruder Testing: Demonstrating practical evidence of disclosure protection in 2011 UK Census Keith Spicer, Caroline Tudor and George Cornish 1 Joint UNECE/Eurostat.
Statistical Disclosure Control for the 2011 UK Census Jane Longhurst, Caroline Young and Caroline Miller (ONS)
WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS.
1 Statistical Disclosure Control Methods for Census Outputs Natalie Shlomo SDC Centre, ONS January 11, 2005.
1 Statistical Disclosure Control for Communal Establishments in the UK 2011 Census Joe Frend Office for National Statistics.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
2011 CENSUS Coverage Assessment – What’s new? OWEN ABBOTT.
Scot Exec Course Nov/Dec 04 Survey design overview Gillian Raab Professor of Applied Statistics Napier University.
Census/NeSS Roadshows March 2003 Better Information Initiatives.
GEOG3025 Confidentiality and social implications.
2011 Census Dissemination Workshops London 16 th May & Manchester 17 th May.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
2011 Census: Lessons learned from the Business Sector Dr Barry Leventhal MRS Census & Geodemographics Group CAG Meeting 8 th January 2015.
1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support.
RESEARCH ETHICS AND DATA CONFIDENTALITY: ANONYMISATION AND ACCESS CONTROL ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………......…...
The Research Excellence Framework Expert Advisory Groups round 1 meetings February 2009 Paul Hubbard Head of Research Policy.
GEOG3025 Administrative and statistical geographies.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Quality Assurance Programme of the Canadian Census of Population Expert Group Meeting on Population and Housing Censuses Geneva July 7-9, 2010.
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, MAY 2009 DETERMINING USER NEEDS FOR THE 2011 UK CENSUS IAN WHITE, Office.
Disclosure Control in the UK Census Keith Spicer 11 January 2005.
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
Mismatches and matches in address information from the Census and the BSO: A longitudinal perspective Ian Shuttleworth and Brian Foley, Queen’s.
Exploiting census workplace data to build a daytime grid map of England and Wales. David Martin, Samantha Cockings, Alan Smith European Forum for Geostatistics,
2011 Census Data Quality Assurance Strategy: Plans and developments for the 2009 Rehearsal and 2011 Census Paula Guy BSPS 10 th September 2009.
Data Zones - Consultation Euan Smith Small Area Stats 5 th October 2010.
The 2011 Census: Estimating the Population Alexa Courtney.
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, 7-9 JULY 2010 DISSEMINATING THE RESULTS OF THE 2011 CENSUS IN ENGLAND AND WALES.
UN ECE Seminar on New Frontiers for Statistical Data Collection 31 Oct – 2 Nov 2012 Beyond 2011 The future of population statistics Andy Teague, Office.
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
The complexities of publishing gridded data for the UK European Forum for Geostatistics Krakow – October 2014 Ian Coady Geography Policy and Research Manager.
The evolution of the England and Wales census in a European context Garnett Compton, ONS RSS Conference, 9 September 2015.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Developments with ONS’s Small Area Population Estimates Project Andy Bates, Office for National Statistics.
Progress towards a table builder with in-built disclosure control for 2021 Census Keith Spicer UNECE, 22 September 2017.
England and Wales Grid Map
2001 Census Disclosure Control UK variations
Presentation transcript:

Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics

Overview Disclosure Risk UK Census – context Evaluation of methods Proposed strategy Further work

What is disclosure risk? There is a disclosure risk when information is published that could allow an intruder to indicate the identity or particulars of: an individual a household or family a business or another statistical unit

Statistical Disclosure Control Statistical Disclosure Control (SDC) involves either: introducing sufficient ambiguity / damage into, or reducing level of detail of published statistics so that the risk of disclosing confidential information is reduced to an acceptable level and / or: controlling access to data

Risk – Utility balance Disclosure Risk: Information about confidential units Data Utility: Information about legitimate items Original Data No data Released Data Maximum Tolerable Risk High Low

UK Census - Context (1) 2001 – random record swapping SCA applied in E, W, NI, not in Scotland Lack of harmonisation and late changes SCA protected individual tables, but some remaining risk through differencing

UK Census - Context (2) RsG agreement November 2006 –Small cell counts as long as ‘sufficient uncertainty’ –Main risk attribute disclosure – finding out something new about an individual…….. Evaluation to short-list –Qualitative – including user acceptability, additivity, consistency, feasibility –3 methods: Record swapping Over imputation IACP method (post-tabular) based on ABS

UK Census - Context (3) Short-list of 3 methods evaluated Quantitative assessment using 2001 Census data, using different measures of risk and utility –Protection against disclosure (and differencing) –Measures of association –Effect on totals & sub-totals –Variances –Rankings Revisit qualitative aspects Proposed Strategy – Record Swapping

Proposed Strategy: Record Swapping Swap the geographical location of a small number of households Households are paired according to similar characteristics (to avoid too much data distortion) Creates uncertainty in the data Can target risky records

B Area B A Treatment: FFind a different geographical Area F Identify another individual in a different area with the same characteristics on matching variables F Swap the two records Characteristics: Age: 22, Sex: Male, Marital Status: Single Economic activity: Student Tenure: Rented Characteristics Age: 22, Sex: Male, Marital Status: Single Economic activity: Active Tenure: Owned Matches all variables except economic activity and tenure Swap records Record swapping

Pre-tabular method protects underlying microdata Protected tables will be additive and consistent Minimise bias by use of matching variables Vary swap rates by geographical level Relatively simple to understand and implement Some risks from population uniques at higher geographies (in microdata) Need consideration for ‘special outputs’

Record swapping – further work Determine swapping rates –Set tolerable risk threshold –Vary by geographical level Targeted or random –How to determine ‘risky’ records Take into account levels of imputation Interaction with output design –Flexible table / hypercube solutions – how much detail can we have in a hypercube? –Additional ‘rules’ around table design –Geography – providing ‘exact fit’?

Record swapping – further work Protecting outputs for special populations –Workplace zones –Communal establishments Origin-destination tables –Protection of most detailed via licensing –Consideration of what can be ‘public use’ Microdata –Suite of products –Detailed content Record swapping will be ‘smarter’ in 2011 – targeting risky records at low geographies

Summary Extensive evaluation of SDC methods Record swapping primary strategy for tabular outputs ‘Smarter’ Further work continues

Output Geography Andy Tait/Ian Coady ONS Geography

Overview Background –2001 Output Geography - OAs –Neighbourhood Geographies - SOAs What has changed since 2001? 2011 Requirements –2007 Geography Consultation – what you said –Resulting Policy Work in progress –OA/SOA Maintenance Research project –Workplace Zones 2009 Geography Consultation

2001 Output Areas - why Census output geography separated from data collection geography a geography created from Census data consistent size in population/no of households socially homogeneous meets confidentiality thresholds aligns with administrative boundaries Consistent throughout UK

2001 Output Areas 175,000 output areas Mean 297 persons; 123 households Freely available digital boundary data Building blocks for “neighbourhood” geographies: Super Output Areas (LSOAs, MSOAs) Image courtesy of David Martin. This work is based on data provided through EDINA UKBORDERS with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown.

2001 Output Areas – achieved size hhds Pop

Super Output Areas (SOAs) created 2004, for Neighbourhood Statistics groupings of Output Areas layered hierarchy – lower, middle, upper layers each layer with size thresholds and targets offer levels of statistical reporting Lower SOAs ≈ approx 35,000 OAs, avge pop ≈ 1,500 - created automatically Middle SOAs ≈ approx 7,000 OAs, avge pop ≈ 7,200 - created automatically – modified locally Upper SOAs not created

Wards 1998 Index of Deprivation 1998

Index of Deprivation 2004 Lower Layer SOAs 2004

Changes since population Population growth, especially migration More and smaller households Newly built properties –Greenfield/new land –Brownfield/in-filling Sub-division of existing properties Changing socio-economic characteristics of areas

Changes since geography Postcodes Census address register Ward/parish changes since 2003 Administrative re-organisation

How much change by 2011 Lower threshold Upper threshold Population threshold OAs100 people625 people (2 *target) 2.5 * household thresholds LSOAs1000 people 3000 people (2 *target) 2.5 * household thresholds MSOAs5000 people people (2 *target) 2.5 * household thresholds

How much change by 2011? threshold breaches, based on mid-year population estimates Output Areas: 2005 below2005 within2005 above2001 totals 2001 below within above totals %

How much change by 2011? Lower Layer Super Output Areas: 2005 below2005 within2005 above2001 totals 2001 below within above totals %

How much change by 2011? Middle Layer Super Output Areas: 2005 below2005 within2005 above2001 totals 2001 below within above totals %

Key messages Most output areas (and LSOAs, MSOAs) unlikely to have breached thresholds by 2011 BUT, changes clustered geographically, so could breach badly in some areas Some areas already known to be problematic in 2001

Small Area Geography Consultation 2007 Strong support for: Stability with 2001 (but reflect change!) Easy/free licensing of boundaries Mean high water boundary set England/Scotland alignment Some support (in descending order) for: Aligning boundaries to real world features Separating communal establishments Retaining postcode blocks v street blocks Building a separate set of zones based on workplace Building separate OAs with no population Building an Upper layer of SOAs

Resulting in ONS policy for 2011 Geography……… Change only significant population change: – split where populations too big – merge where population too small No more than 5% overall change (could be well under) Assess methods of splitting/merging No real world alignment for its own sake Consider redesign of extreme cases where unfit as statistical zone No separate “empty” OAs Align Scotland and England at the border Mean high water boundaries as well Investigate new workplace geography linked to OAs Keep licensing free, get better deal for commercial use Exact count outputs for OAs and other geographies, e.g. wards – a matter for disclosure control

OA/SOAs – some “not fit for purpose”?

OA/SOAs – not fit for purpose” ?

Challenges for 2011 output geography design Stability at what level? OA, LSOA, MSOA? Building blocks? Postcodes or street blocks? Constrain within wards, LADs? Same design criteria as 2001? BUT: balance against licensing issues Automation of processes

Census2011Geog project – Southampton University ESRC funded project Develop automated procedures for maintaining (splitting, merging, re-designing) 2001 output geographies to create 2011 output geographies for E&W Assess implications of using different building blocks (e.g. postcodes, street blocks) maintenance Work extended to January 2010

2001 OAs 2001 LSOAs Above upper threshold Within thresholds Below lower threshold Merge (merge 2001 OAs) Split (aggregate postcodes/ street blocks) 2011 OAs Append 2011 OAs Postcodes/Street blocks For a 2001 LAD/UA Merge all 2011 OAs from all LADs/UAs Automated maintenance procedures

Absolute population change (mid-year estimates) Camden Increase Decrease This work is based on data provided through EDINA UKBORDERS with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown.

Absolute population change (mid-year estimates) Liverpool Increase Decrease This work is based on data provided through EDINA UKBORDERS with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown.

Absolute population change (mid-year estimates) Manchester Increase Decrease This work is based on data provided through EDINA UKBORDERS with the support of the ESRC and JISC and uses boundary material which is copyright of the Crown.

More information on OA Maintenance project at s.ac.uk s.ac.uk

Workplace Zones OAs based on where people live not work – can be unsuitable for workplace statistics Some OAs contain no/few businesses; some contain many businesses or large employer, e.g. business parks, City of London Workplace Zones project looking at splitting/merging OAs for a new geography nesting with OAs User Group established Pilot WZs to be created/evaluated 2010 Q2

2009 Output Geography consultation Need for an Upper layer SOA Workplace Zone requirements Provide instances of OAs/SOAs that are unfit as a statistical geography –Priority instances –Not useful for analysis due to their design –ONS panel to consider redesign

2009 Output Geography consultation Census Geography consultation part of Census Outputs consultation Runs for three months from November 2009 Follow up submissions January to May 2010

Conclusions contd 5.Greater flexibility in outputs i.Hypercube research 6.Multiple population bases 7.Geography i.Workplace zones ii.Possible production of data on two geographical bases 8.Application Programme Interface (API) i.Access to census data ii.Functionality of census data

Conclusions contd 9.Increased user input in consultation process i.Rounds of consultation ii.Online survey / persona research iii.Methods of engaging users Topic group experts Advisory groups Working groups Consulting users and distributors of census data Academic groups Direct consultation including output consultation events and internet