Michelle Simard Statistics Canada UNECE Worksessions on Statistical Disclosure Control Methods Helsinki, October 2015 Development of rules from administrative.

Slides:



Advertisements
Similar presentations
Characterization and Management of Multiple Components of Cost and Risk in Disclosure Protection for Establishment Surveys Discussion of Advances in Disclosure.
Advertisements

Balancing Access and Confidentiality Jenny Telford Australian Bureau of Statistics September 2008.
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
Protecting the Confidentiality of Tables by Adding Noise to the Underlying Microdata Paul Massell and Jeremy Funk Statistical Research Division U.S. Census.
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
Towards a Better Integration of Survey and Tax Data in the Unified Enterprise Survey Claude Turmelle Statistics Canada ICES-III Montréal, Québec, Canada.
Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
© Statistisches Bundesamt, IIA - Mathematisch Statistische Methoden Summary of Topic ii (Tabular Data Protection) Frequency Tables Magnitude Tables Web.
In a Virtual Data Centre Protecting Confidentiality COMPUTATIONAL INFORMATICS Christine O’Keefe, Mark Westcott, Adrien Ickowicz, Maree O’Sullivan, CSIRO.
Statistics Canada Statistique Canada Protecting Confidentiality in Canadian Research Data Centres Cynthia Cook Senior Research Data Centre Analyst, Statistics.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.
How to Build an Integrated Database in Agriculture and Food: The Farm Income and Prices Section Database – A Case Study Presentation to the Fourth International.
Basque Statistics Office Confidentiality Project: Final stages Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Tarragona, Spain,
“ Does Cloud Computing Offer a Viable Option for the Control of Statistical Data: How Safe Are Clouds” Federal Committee for Statistical Methodology (FCSM)
Metadata: Integral Part of Statistics Canada Quality Framework International Conference on Agriculture Statistics October 22-24, 2007 Marcelle Dion Director.
ABS Tablebuilder and DataAnalyser Session 7 UNECE Work Session on Statistical Data Confidentiality October 2013 Daniel Elazar
The Statistical Metadata System: its role in a statistical organization Jana Meliskova Joint UNECE / Eurostat / OECD Work Session on Statistical Metadata.
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of.
Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011.
Health Statistics Information on STC website Calgary–DLI training–Dec 2003 Michel B. Séguin, Statistics Canada,
Copyright 2010, The World Bank Group. All Rights Reserved. COVERAGE, FRAMES & GIS, Part 2 Quality assurance for census 1.
Luisa Franconi Integration, Quality, Research and Production Networks Development Department Unit on microdata access ISTAT Essnet on Common Tools and.
Discussion of “ Statistical Disclosure Limitation: Releasing Useful Data for Statistical Analysis” Nancy J. Kirkendall Energy Information Administration.
Daniel Beckler United States Department of Agriculture National Agricultural Statistics Service Timothy Mulcahy NORC at the University of Chicago Topic.
Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.
Copyright 2010, The World Bank Group. All Rights Reserved. Part 2 Labor Market Information Produced in Collaboration between World Bank Institute and the.
1 New Implementations of Noise for Tabular Magnitude Data, Synthetic Tabular Frequency and Microdata, and a Remote Microdata Analysis System Laura Zayatz.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
Some ACS Data Issues and Statistical Significance (MOEs) Table Release Rules Statistical Filtering & Collapsing Disclosure Review Board Statistical Significance.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
2008 NCHS Data Users’ Conference Omni Shoreham Hotel Washington, DC Wednesday, August 13, 2008.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
1 Using Fixed Intervals to Protect Sensitive Cells Instead of Cell Suppression By Steve Cohen and Bogong Li U.S. Bureau of Labor Statistics UNECE/Work.
Lyne Guertin Census Data Processing and Estimation Section Social Survey Methods Division Methodology Branch, Statistics Canada UNECE April 28-30, 2014.
Statistical Methodology for the Automatic Confidentialisation of Remote Servers at the ABS Session 1 UNECE Work Session on Statistical Data Confidentiality.
The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
Michelle Simard, Thérèse Lalor Statistics Canada CSPA Project Manager UNECE Work Session on Statistical Data Confidentiality Helsinki, October 2015 Confidentialized.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
The views expressed herein are those of the author and should not necessarily be attributed to the IMF, its Executive Board, or its management Data Confidentiality,
1 1 Confidentiality protection of large frequency data cubes UNECE Workshop on Statistical Confidentiality Ottawa October 2013 Johan Heldal and Svetlana.
Disclosure Analysis: What do RDC Analysts do? Research Data Centre Program, Statistics Canada James Chowhan Ontario DLI Training, Queen's University
Joint UNECE/Eurostat work session on statistical data confidentiality Manchester, December 2007 Dealing with Confidentiality in Dissemination: The.
United Nations Oslo City Group on Energy Statistics OG7, Helsinki, Finland October 2012 ESCM Chapter 8: Data Quality and Meta Data 1.
David Price October 2011 Real Time Remote Access (RTRA) #10.
RECENT DEVELOPMENT OF SORS METADATA REPOSITORIES FOR FASTER AND MORE TRANSPARENT PRODUCTION PROCESS Work Session on Statistical Metadata 9-11 February.
Michelle Simard Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Tarragona, Spain, November 23 rd, 2011 Progress on Real Time Remote.
Joint Eurostat Unece Worksession on Statistical Data Confidentiality 2011, Tarragona Initial analyses on comparable dissemination from the Essnet project.
Joint UNECE-Eurostat worksession on confidentiality, 2011, Tarragona Sampling as a way to reduce risk and create a Public Use File maintaining weighted.
Wesley Yung and Claude Poirier 2015 World Statistics Congress CSPA from a Methodologist’s Point of View.
Remote Analysis Server for Tabulation and Analysis of Data Tarragonia, October 2011 James Chipperfield and Frank Yu (presenter)
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
The Evolution of Administrative Data Use for the Canadian Business Register (BR) IAOS Conference Gaétan St-Louis October 2008.
The London Health Observatory: monitoring health and health care in the capital, supporting practitioners and informing decision-makers Disclosure control.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Australian Census of Population and Housing Dissemination Strategies UNSC Seminar February 2011 Gillian Nicoll Australian Bureau of Statistics.
Progress towards a table builder with in-built disclosure control for 2021 Census Keith Spicer UNECE, 22 September 2017.
Confidentiality in Published Statistical Tables
Establishing an Automated Confidentiality Service in Stats NZ
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
Generic Statistical Business Process Model (GSBPM)
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Item 2.2 of the Agenda Remote access to confidential data for researchers: possible actions under the 7th Framework Programme Pascal JACQUES Unit B 5 15.
Mapping Data Production Processes to the GSBPM
The role of metadata in census data dissemination
The Role of Metadata in Census Data Dissemination
Presentation transcript:

Michelle Simard Statistics Canada UNECE Worksessions on Statistical Disclosure Control Methods Helsinki, October 2015 Development of rules from administrative data 1 Statistics Canada Statistique Canada

2 Outline Progress Status - Canada Administrative Data Proposed solutions for risk: the Companion for method: The Layered Perturbation Method Future

Statistics Canada Statistique Canada 3 Current Status G- CONFID For skewed data - mostly business surveys Primary (PCS) and Complementary Cell suppression (CCS) New in 2015: Additive controlled rounding (G-TAB) For Fall 2016, the automated treatment of: Data in the presence of waivers Variables containing both positive and negative values Absolute values Linear combinations including proxy variables Survey weights

Statistics Canada Statistique Canada 4 Current Status – Social Surveys Real Time Remote Access (RTRA) Started in 2009 Currently 29 social surveys (+ cycles), 5 administrative files International Users Generalized Tabulation System (G-TAB) Integration with our common tools infrastructure, generalized systems and our new dissemination model GUI, Engine and Confidentiality modules Starting transition of social surveys to G-TAB Both tools share the same Engine and Confidentiality modules

Statistics Canada Statistique Canada 5 Current Status – G-TAB Created efficiencies, robustness and responsiveness in our process Improved quality and standardization (rules, methods, metadata) Accelerated data availability SDC methods developed for Counts, means, percentiles, quartiles, proportions, ratios, Gini coefficient, geometric means, level change, percentage change, moving averages, Z-tests Additive and Controlled rounding Other rules : Min cells rule, no 0 and no 1 rules, etc… Quality indicators (C.V., S.E) What’s next ? Administrative data and TOTAL

Statistics Canada Statistique Canada 6  Why not treating them like survey  Census of a given population  No sampling, no weights  Outside StatCan  Different ownership, governance and policy framework (STC)  Often acquired under a legal framework from known institutions  Own dissemination approach  Release is still under the Statistics Act  Continuous increase of administrative data in all programs  Our dissemination activities and our business evolved  More complex than what it used to be internally  In the past 3 years : defined or created new  division, policies, governance, ownership framework … Administrative Data

Statistics Canada Statistique Canada 7 Type A: Discrete variables or continuous variables with no dominance Institutions-type files Type B: Continuous variables, fiscal-type information, skewed distribution Taxation files, immigration database Other types: Census linked with one administrative file Linkage of administrative files and/or survey files Dealt with types A and B Administrative Data: some definitions

Statistics Canada Statistique Canada 8 Administrative Data - Balancing Act  Develop options considering also  Users (internal vs. external)  Types  Governance  Impossible to develop a “no risk” approach  Different types, different ways of using them Risk management has to be integrated Needed a framework more than just methods

Statistics Canada Statistique Canada 9  Disclosure control framework - 3 R: the rules, the risks, the roles Risk: the Companion Support, guide, help and TEACH the analysts, users, directors about the disclosure risk SDC rules Introducing the Layered Perturbation Method (LPM) Roles and responsibilities Confidentiality methodologists GTAB/RTRA Administrative data (data sources) methodologists Administrative data (data sources) analysts Directors (owner of the acquired files) Administrative Data – The Approach

Statistics Canada Statistique Canada 10 Disclosure risk management tool To be implemented for various systems, outputs and tools Provides decision-support information Does not replace the expertise Plans for G-TAB Companion Scan of the outputs (tabular, statistical analyses) Provide a quick assessment on how sensitive a table is (with a global score or a color-code) Provide a detailed log of the potential disclosure risks of a table or model Suggest possible solutions Users can then decide to Release or not the table as is Take appropriate measures to ensure that the risk is more manageable The Companion

Statistics Canada Statistique Canada 11  The risk is measured based on: 1.Geographic levels National Provincial Sub-provincial – High level Sub-provincial – Low level 2.Sensitive values (ex: Rare type of cancer) 3.Proportion of cells with low counts(1 or 2) 4.Presence of full cells 5.Presence of dominance (Type B : means and totals) 6.Presence of derived variables Requires client’s input Automated process Score Function The Companion

Developed for personal taxation data in a custom tabulation environment Few dominant units in a cell total Covers residual disclosure from multiple tabular requests (focus on differencing)  Benefits Protection of ratios Treatment of zeroes and negative values Maintenance of data quality (minimal loss information) Minimal manual intervention Computational simplicity 12 The Layered Perturbation Method Statistics Canada Statistique Canada

 Suppress sensitive cells only (no CCS)  Perturb units in all other cells (e.g., using Evans-Zayatz-Slanta (EZS) multiplicative noise w i =1+ ε i for ε i ~(0, σ ε ² ))  Largest units perturbed consistently  Median units perturbed semi-consistently  Smallest units not perturbed 13 The Layered Perturbation Method Statistics Canada Statistique Canada

 Basic idea 1: Pseudo-random hash numbers (h i ) Attached to each unit E.g., h i ~ Uniform(0,1) used to determine unit-specific noise w i h i ’ used to determine unit-cell-specific noise w i ’  Basic idea 2: Layered perturbation Largest n 1 units are always perturbed consistently using w i Next n 2 units are perturbed semi-consistently  Use a mixture of w i & w i ’ for those n 2 units (unit-specific and cell-specific) Remaining units are not perturbed (no noise) Similar to ABS TableBuilder 14 The Layered Perturbation Method Statistics Canada Statistique Canada

 Basic idea 3: Focus on differencing problem Increase noise for top 3 units, if needed Set w i from (-1) i+1 ε i to increase variance of differences Noises change direction if a top contributor is removed (even-ranked) Lessens risk when small unit is added/removed  Suppress sensitive (e.g. p%-rule) and small cells (e.g. n<15)  Perturb largest n 1 +n 2 units in other cells following the method Evans, T., Zayatz, L. and Slanta, J. (1998). Using Noise for Disclosure Limitation of Establishment Tabular Data. Journal of Official Statistics, 14, 537–551. Thompson, G., Broadfoot, S. and Elazar, D. (2013). Methodology for the Automatic Confidentialisation of Statistical Outputs from Remote Servers at the Australian Bureau of Statistics. Joint UNECE/Eurostat work session on statistical data confidentiality, Ottawa, October, The Layered Perturbation Method Statistics Canada Statistique Canada

16 Future  Social data issues  Residual Disclosure  Monitoring the situation for social surveys  Increasing usage of Administrative Data  Big data  Linkages  Governance/Approval protocols before releasing  Census  Reengineering their tabulation system