G-Confid: Turning the tables on disclosure risk Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Ottawa, Canada 30 October 2013 Peter.

Slides:



Advertisements
Similar presentations
My presentation will be on the use of paradata… By
Advertisements

Issues in Designing a Confidentiality Preserving Model Server by Philip M Steel & Arnold Reznek.
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
BTS Confidentiality Seminar Series June 11, 2003 FCSM/CDAC Disclosure Limiting Auditing Software: DAS Mark A. Schipper Ruey-Pyng Lu Energy Information.
OPSM 301 Operations Management
Eurostat Statistical Disclosure Control. Presented by Peter-Paul de Wolf, Statistics Netherlands (CBS)
SDC for continuous variables under edit restrictions Natalie Shlomo & Ton de Waal UN/ECE Work Session on Statistical Data Editing, Bonn, September 2006.
IMPROVING CONFIDENTIALITY WITH tau-ARGUS BY FOCUSSING ON CLEVER USAGE OF MICRODATA Roland van der Meijden MSc. ± 10 minutes.
The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick Beaucage ICES III June 2007.
ERCOT Analysis of 2005 Residential Annual Validation Using the Customer Survey Results ERCOT Load Profiling Presented to PWG - October 26, 2005.
11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale Garrett and Michael Starsinic U.S. Census Bureau AAPOR.
Quick Data Summaries in SAS Start by bringing in data –Use permanent data set for these examples Proc Tabulate –Produces summaries very quickly and easily.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Methods of Geographical Perturbation for Disclosure Control Division of Social Statistics And Department of Geography Caroline Young Supervised jointly.
More Linear Regression Outliers, Influential Points, and Confidence Interval Construction.
Who would be a good loanee? Zheyun Feng 7/17/2015.
Data Preparation for Analytics Using SAS Gerhard Svolba, Ph.D. Reviewed by Madera Ebby, Ph.D.
Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.
Introduction to SAS Essentials Mastering SAS for Data Analytics
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Disclosure Avoidance: An Overview Irene Wong ACCOLEDS/DLI Training December 8, 2003.
1 Statistical Disclosure Control for Communal Establishments in the UK 2011 Census Joe Frend Office for National Statistics.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Daniel Beckler United States Department of Agriculture National Agricultural Statistics Service Timothy Mulcahy NORC at the University of Chicago Topic.
Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.
Chapter 4 MODELING AND ANALYSIS. Model component Data component provides input data User interface displays solution It is the model component of a DSS.
Sampling Error Estimation – SORS practice Rudi Seljak, Petra Blažič Statistical Office of the Republic of Slovenia.
Michelle Simard Statistics Canada UNECE Worksessions on Statistical Disclosure Control Methods Helsinki, October 2015 Development of rules from administrative.
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
1 Using Fixed Intervals to Protect Sensitive Cells Instead of Cell Suppression By Steve Cohen and Bogong Li U.S. Bureau of Labor Statistics UNECE/Work.
BUSI 6480 Lecture 8 Repeated Measures.
Techniques to apply cell suppression to large sparse linked tables and some results using those techniques on the 2012 (US) Economic Census Philip Steel,
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, MAY 2009 DETERMINING USER NEEDS FOR THE 2011 UK CENSUS IAN WHITE, Office.
The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
The views expressed herein are those of the author and should not necessarily be attributed to the IMF, its Executive Board, or its management Data Confidentiality,
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
1 1 Confidentiality protection of large frequency data cubes UNECE Workshop on Statistical Confidentiality Ottawa October 2013 Johan Heldal and Svetlana.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
Statistics Canada Citizenship and Immigration Canada Methodological issues.
Computer Security: Principles and Practice
The 2011 Census: Estimating the Population Alexa Courtney.
Michelle Simard Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Tarragona, Spain, November 23 rd, 2011 Progress on Real Time Remote.
Joint Eurostat Unece Worksession on Statistical Data Confidentiality 2011, Tarragona Initial analyses on comparable dissemination from the Essnet project.
Joint UNECE-Eurostat worksession on confidentiality, 2011, Tarragona Sampling as a way to reduce risk and create a Public Use File maintaining weighted.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
Data disclosure control Nordic Forum for Geography and Statistics Stockholm, 10 th September 2015.
CMS SAS Users Group Conference Learn more about THE POWER TO KNOW ® October 17, 2011 Medicare Payment Standardization Modeling using SAS Enterprise Miner.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Improving researcher access to USDA’s Agricultural Resource Management Survey Charles Towe and Mitch Morehart Economic Research Service, USDA.
Data Confidentiality and the Common Good.
ECE Application Programming
Excel Solver IE 469 Spring 2017.
Confidentiality in Published Statistical Tables
Establishing an Automated Confidentiality Service in Stats NZ
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
Advanced Analytics Using Enterprise Miner
Excel Solver IE 469 Spring 2018.
Quick Data Summaries in SAS
Data from statistical modeling (e. g
Excel Solver IE 469 Fall 2018.
Disclosure Avoidance: An Overview
Education and Training Statistics Working Group – 2-3 June 2016
Excel Solver IE 469 Spring 2019.
SAFE – a method for anonymising the German Census
Item 5 Wim Kloek, Eurostat
Anco Hundepool Sarah Giessing
Presentation transcript:

G-Confid: Turning the tables on disclosure risk Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Ottawa, Canada 30 October 2013 Peter Wright

2 G-Confid: a cell suppression application  Use with any table size and any number of dimensions (subject to hardware / memory limitations)  Available for SAS 9.2 and 9.3; SAS EG 4.3 and 5.1  PROC SENSITIVITY identifies sensitive cells Highlights, inputs, strategies  Macro SUPPRESS creates a suppression pattern  Inputs, outputs, strategies  Macro AUDIT audits a suppression pattern Overview by component

PROC SENSITIVITY identifies confidential cells Highlights:  Choice of sensitivity rule: p-percent, (n,k), arbitrary  Allows multiple decomposition 3 where

Inputs for PROC SENSITIVITY  Definition of hierarchy(ies) for each table dimension  Microdata file Classification variables (e.g., geography, industry) Enterprise identifier Enterprise value 4 Tip: to reduce the sensitivity of a cell by the value of an enterprise, set the enterprise identifier = missing

Example of SAS code to run PROC SENSITIVITY proc sensitivity data=microfile outconstraint=consfile outcell=cellfile outlargest=largestfile hierarchy="0 East West; ;" srule=“pq.20" range=“East A B: West C D; : : ;" minresp=5; id Enterpriseid; var Income; dimension EastWest Industry; run; 5

Strategies using PROC SENSITIVITY  Use the MINRESP=r option to set the minimum number of respondents Any cell with fewer than r respondents is assigned a sensitivity of max{1, S} where S is the sensitivity of the cell Only positive (>0) values are counted as respondents MINRESP rule is ignored for a cell with a value contributed by an anonymous enterprise 6 Note: we can use MINRESP without applying a sensitivity rule

Strategies using PROC SENSITIVITY (continued)  To reduce oversuppression, apply rules that make use of sampling weights Example: if the sampling weight w i >3, make the enterprise anonymous (set ID value=missing). G-Confid will use its contribution to reduce the sensitivity of the cell. 7 Find more strategies in: Tambay and Fillion (Proceedings of the JSM 2013)

Macro SUPPRESS – complementary suppression  Uses the SAS/OR® LP solver  Input files: (i) cell sensitivities file, and (ii) linear constraints file  Syntax: %Suppress(InCell=, Constraint=, CFunction1=, CFunction2=, CVar1=, CVar2=, OutCell=, ByVars=, OutComplement=, ScaleCost=);  Output file has final status (Suppress, Publish) and the net variation (largest amount the cell was “moved”) 8

Strategies using the macro SUPPRESS  Choice of cost functions (functions of cell total) Can run the LP process twice to reduce the number of suppressions (e.g., SIZE or DIGITS, then INFORMATION)  Can favour publishing certain cells by defining higher cost values (by default, cost=tot) 9 SIZE (=tot)DIGITS (=log[tot+1]) CONSTANT (=1)INFORMATION (=log[tot+1]/[tot+1])

Macro AUDIT – validates a suppression pattern  Calculates minimum and maximum values for each suppressed cell using LP solver  Provides results for each cell (protection achieved, not achieved, or exact disclosure) 10  Coming soon: pre-set narrower starting intervals than the default values (0.5tot and 1.5tot) using the Shuttle algorithm (Buzzigoli and Giusti (2006)) Using the Shuttle algorithm to pre-set the starting intervals ↓ run time

11  PROC SENSITIVITY  Use pre-defined or customized sensitivity rule  Can do multiple decomposition  MINRESP function  Can apply weighting strategies  Macro SUPPRESS  Can favour cells to publish (or suppress)  Macro AUDIT Conclusion Coming soon: additive controlled rounding

12  For more information,  Pour plus d’information, please contact:veuillez contacter : Peter Wright