Progress on the SDC Strategy for the 2011 Census 23 rd June 2008 Keith Spicer and Caroline Young.

Slides:



Advertisements
Similar presentations
Measuring Coverage: Post Enumeration Surveys Owen Abbott Office for National Statistics, UK.
Advertisements

Comparing Results from the England and Wales, Scotland and Northern Ireland Longitudinal Studies: Health and Mortality as a case study Census Microdata.
Will 2011 be the last Census of its kind in England and Wales? Roma Chappell, Programme Director Beyond 2011 Office for National Statistics, July 2011.
The Impact of LFS & APS Reweighting Marilyn Thomas Labour Force Survey Output Manager, Office for National Statistics.
Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009.
Confidentiality and the SARs Update on SAR progress, and discussion of the disclosure work done for Scotland. Sam Smith
2011 Census Outputs Plans and Progress. CONTENTS Aims for 2011 Census Outputs Strategy Development User Consultation Next Steps.
Balancing Access and Confidentiality Jenny Telford Australian Bureau of Statistics September 2008.
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
Conference Programme Introduction to the Samples of Anonymised Records - Keith Spicer, ONS CCSR's role in providing SAR's support - Jo Wathan,
WP 33 Information Loss Measures for Frequency Tables Natalie Shlomo University of Southampton Office for National Statistics Caroline.
RELEASE OF THE 2001 CENSUS RESULTS March Release of the 2001 Census Content Media and formats Release schedule Arrangements for using the results.
Update on Population Statistics Research Projects Jonny Tinsley, Population Statistics Research Unit
Data linking – Project update 15 th May 2012 – Homecare & SDS event Atlantic Quay Ellen Lynch & Euan Patterson.
Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
SSRG Annual Workshop 2011: How can the Children in Need census help to improve children’s services and outcomes? Monday 7 th March Birmingham Isabella.
WP 9 Assessing Disclosure Risk in Microdata using Record Level Measures Natalie Shlomo University of Southampton Office for National Statistics
The estimation strategy of the National Household Survey (NHS) François Verret, Mike Bankier, Wesley Benjamin & Lisa Hayden Statistics Canada Presentation.
SDC for continuous variables under edit restrictions Natalie Shlomo & Ton de Waal UN/ECE Work Session on Statistical Data Editing, Bonn, September 2006.
Quality assurance -Population and Housing Census Alma Kondi, INSTAT, Albania.
Assessing Disclosure Risk in Sample Microdata Under Misclassification
Weighting and Imputation for CORE Social Housing Statistics Julia Bowman & Niall Goulding.
11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale Garrett and Michael Starsinic U.S. Census Bureau AAPOR.
Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
Scotland’s 2011 Census Migration Matters Scotland Thematic Event Cecilia Macintyre 26 February 2015.
Len Cook: Hero or Zero of the 2001 Census? OR A look at the impact of disclosure control on aggregate census outputs.
Methods of Geographical Perturbation for Disclosure Control Division of Social Statistics And Department of Geography Caroline Young Supervised jointly.
Population and places through time: Grid-square data and the NILS Ian Shuttleworth QUB and NILS-RSU.
1 Numerical Data Masking Techniques for Maintaining Sub-Domain Characteristics Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State.
11 Comparison of Perturbation Approaches for Spatial Outliers in Microdata Natalie Shlomo* and Jordi Marés** * Social Statistics, University of Manchester,
GEOG3025 Census and administrative data sources 2: Outputs and access.
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
Intruder Testing: Demonstrating practical evidence of disclosure protection in 2011 UK Census Keith Spicer, Caroline Tudor and George Cornish 1 Joint UNECE/Eurostat.
LOOKING TOWARDS 2011 Ian Cope Director 2011 Census.
Statistical Disclosure Control for the 2011 UK Census Jane Longhurst, Caroline Young and Caroline Miller (ONS)
1 Statistical Disclosure Control Methods for Census Outputs Natalie Shlomo SDC Centre, ONS January 11, 2005.
1 Statistical Disclosure Control for Communal Establishments in the UK 2011 Census Joe Frend Office for National Statistics.
Coverage assessment and adjustment methodology Owen Abbott Methodology Directorate, ONS.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
Kirsty Wells, Scotland Manager, HouseMark Angela Currie, Director, SHBVN.
2011 CENSUS Coverage Assessment – What’s new? OWEN ABBOTT.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
2011 Census: Lessons learned from the Business Sector Dr Barry Leventhal MRS Census & Geodemographics Group CAG Meeting 8 th January 2015.
CHAPTER 11 SECTION 2 Inference for Relationships.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
1 IPAM 2010 Privacy Protection from Sampling and Perturbation in Surveys Natalie Shlomo and Chris Skinner Southampton Statistical Sciences Research Institute.
2008 Population Census of Cambodia Post Enumeration Survey Mrs. Hang Lina Deputy Director General National Institute of Statistics, Min. of Planning Regional.
Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie.
Disclosure Control in the UK Census Keith Spicer 11 January 2005.
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
1 WP 10 On Risk Definitions and a Neighbourhood Regression Model for Sample Disclosure Risk Estimation Natalie Shlomo Hebrew University Southampton University.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Establishing E&I capability and best practices at Statistics NZ Vera Costa & Tracey Savage 2008 UNECE Work Session on Statistical Data Editing.
Mismatches and matches in address information from the Census and the BSO: A longitudinal perspective Ian Shuttleworth and Brian Foley, Queen’s.
2011 Census Data Quality Assurance Strategy: Plans and developments for the 2009 Rehearsal and 2011 Census Paula Guy BSPS 10 th September 2009.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
The 2011 Census: Estimating the Population Alexa Courtney.
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
Samples of Anonymised Records from the U.K. Census 1991 and 2001 Integrating Census Microdata Workshop Barcelona th July 2005 Dr. Ed Fieldhouse Cathie.
Data Management and Analysis John Hollis (GLA) BSPS Conference University of St Andrew’s 11 September 2007 Data Management and Analysis Further Alterations.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Progress towards a table builder with in-built disclosure control for 2021 Census Keith Spicer UNECE, 22 September 2017.
Assessing Disclosure Risk in Microdata
Integrating administrative data – the 2021 Census and beyond
2001 Census Disclosure Control UK variations
Towards a Fully Adjusted Census Database for the 2011 Census
The role of metadata in census data dissemination
Presentation transcript:

Progress on the SDC Strategy for the 2011 Census 23 rd June 2008 Keith Spicer and Caroline Young

Context Work plan Description of the short-listed methods Quantitative Evaluation – some results! Conclusions and Further Work Outline

Context SDC for 2011 Census outputs is a major concern for users Different SDC methodologies were adopted for tabular 2001 Census outputs across UK Late addition of small cell adjustment by ONS/NISRA resulted in high level of user confusion and dissatisfaction Publicised commitment to aim for a common UK SDC methodology for all 2011 Census outputs

Phase 1 (March ’06 – Jan ’07) –UK agreement of key SDC policy issues Phase 2 (Jan ’07 – Sept ’08) –Evaluation of all methods complying with agreed SDC policy position in terms of risk/utility framework and feasibility of implementation Phase 3 (Sept ’08 – Spring/Summer ’09) –Recommendations and UK agreement of SDC methodologies for 2011 Census tabular outputs Phase 4 (Feb ’09 onwards) –Evaluate and develop SDC methods for microdata, future work on output specification, system specification, development and testing Workplan

Progress Development of SDC Strategy –UK SDC working group established to take forward methodological work consisting of representatives from Wales, Northern Ireland and Scotland –UKCDMAC subgroup set up to QA work Methodological research: –Determine the short-list of SDC methods (Aug ‘07) –Quantitative evaluation of short-list (complete Sep ’08) Focus on tabular outputs whilst considering impact on other outputs (e.g. microdata)

Quantitative Evaluation Examine how methods protect and manage risk and how they impact on data utility Using a range of 2001 Census tables, varying parameters, different geographies Information Loss software used to evaluate each short-listed method

Short-listed Methods being considered for 2011 Census data Applied so that ‘safe’ tabular outputs can be released Record Swapping Over-imputation ABS Cell Perturbation (developed by the Australian Bureau of Statistics) 2001 Census SDC methods used as a baseline for comparison: Record Swapping and Small Cell Adjustment (SCA)

Short-listed SDC methods Record Swapping pre-tabular (applied Over-imputation directly to the microdata) ABS Cell Perturbation: post-tabular (applied to tables) SCA (a type of rounding) is also a post-tabular method

Record Swapping Swap the geographical location of a small number of households Households are paired according to similar characteristics (to avoid too much data distortion) Creates uncertainty in the data Can swap unique records only (those at greater risk)

B Area B A Treatment: FFind a different geographical Area F Identify another individual in a different area with virtually all the same characteristics F Swap the two records Characteristics: Age: 22, Sex: Male, Marital Status: Married N o of Cars: 3 Region: Area A Characteristics Age: 22, Sex: Male, Marital Status: Married N o of Cars: 1 Region: Area B Matches all variables except N o of Cars Unique as only person with 3 cars in Area A Swap records Record Swapping

Over-Imputation Imputation is a standard procedure for census data used to insert plausible values for those missing due to non-response Since it is not known whether these records are true or false, can also be used for SDC Carried out by the Edit and Imputation team at ONS using CANCEIS Algorithm: distance based nearest neighbour to use as a donor based on a set of matching variables

1)Blank out values for certain records in the data 2) Replace blanked out values with ‘imputed values’ using a nearest neighbour donor 25 malesingle 6 people in hhld 0 carsstudent 21 malesingle 6 people in hhld 0 carsstudent Blank out age from record Find a donor to impute age Over-Imputation

Which variables to impute? Risky variables? Ethnicity, elderly, other minority populations CANCEIS may impute exactly if using nearest neighbour donor Impute age (all donors) and small area geography (use only donors within same local authority): get a small margin of error

(ABS) Cell Perturbation Developed by the Australian Bureau of Statistics (ABS) Perturb each cell value in a table to create uncertainty around the true value Two stage method: –Stage 1: Adding Perturbation –Stage 2: Restoring Additivity

(ABS) Cell Perturbation Stage 1: Each cell is always perturbed in the same way using microdata keys – CONSISTENCY Stage 2: Restoring ADDITIVITY means consistency is lost slightly An improved approach is being developed in collaboration with Southampton University: optimise consistency and additivity – INVARIANT cell perturbation.

Results What is the effect on statistical quality of the data? –Tendency to increase correlations? –Tendency to distort distance metrics? –etc (many ways to measure infoloss) Impact on disclosure risk Examine different types of data

Results Only Over-Imputation, Record Swapping and Record Swapping with SCA have been evaluated so far. Both targeted and random approaches are being looked at. Note there are different ways of carrying out swapping and imputation, so interpretation of the results should take this into account.

SJ EA; approx. 200,000 households and 500,000 persons Four census tables so far: (1) Country of birth by religion by sex Individuals at ward level (2) Number of persons by accommodation type Households at OA and ED level (3) Age by religion by gender Individuals at OA and ED level (4) Origin-destination table Flows between home and travel to work location Data for Analysis

Measures of Quality Impact on Tests for Independence: C ramer’s V measure of association: where is the Pearson chi-square statistic Also, the same measure for entropy and the Pearson Statistic Variance of Cell Counts: For each row : and 

Measures of Utility Impact on Rank Correlations: Sort original cell counts and define deciles Repeat on perturbed cell counts where I is the indicator function and the number of rows Log Linear Analysis: Ratio of the deviance (likelihood ratio test statistic) between perturbed table and original table for a given model:

Impact on Disclosure Risk

Quality Measures

Swapping does not change the overall set of household locations  Totals and subtotals by geography preserved Over-Imputation does change set of locations  Totals and subtotals by geography not preserved Swapping has no impact on Origin-Destination total flows – NO PROTECTION Over-Imputation does not preserve O/D total flows – POOR QUALITY Changes to Totals / Subtotals

Conclusions Decide whether to drop over-imputation: test on another EA? Quantitative Evaluation to be finished by September ’08 ABS cell perturbation method currently being evaluated – results are looking good

Further Work Setting of parameter values for final method; e.g. level of perturbation Protection of microdata samples Communal establishments Output specification / geography System specification, development and testing

Contact Details Useful links: a/outputconfidentiality.asp