European Conference on Quality in Official Statistics, Rome, July 2008 Community Innovation Survey: a Flexible Approach to the Dissemination of Microdata.

Slides:



Advertisements
Similar presentations
Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009.
Advertisements

Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
Eurostat T HE E UROPEAN PROCESS OF ENHANCING ACCESS TO E UROSTAT DATA A LEKSANDRA B UJNOWSKA E UROSTAT.
Issues in Designing a Confidentiality Preserving Model Server by Philip M Steel & Arnold Reznek.
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
Benefits of integrated economic statistics for central bank users Richard Walton European Central Bank Berne, 6-8 June 2007.
Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra.
Business microdata dissemination at Istat Daniela Ichim Luisa Franconi
SDC for continuous variables under edit restrictions Natalie Shlomo & Ton de Waal UN/ECE Work Session on Statistical Data Editing, Bonn, September 2006.
17 September SME Statistics OECD Workshop SME data and methodologies in the EU - item 5 Paul Feuvrier / Eurostat.
1 A Common Measure of Identity and Value Disclosure Risk Krish Muralidhar University of Kentucky Rathin Sarathy Oklahoma State University.
IMPROVING CONFIDENTIALITY WITH tau-ARGUS BY FOCUSSING ON CLEVER USAGE OF MICRODATA Roland van der Meijden MSc. ± 10 minutes.
11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale Garrett and Michael Starsinic U.S. Census Bureau AAPOR.
Geoffrey Greenwell, IHSN/PARIS21 IASSIST Conference Tampere, Finland, May 2009 Development of Microdata Anonymization Tools by the Olivier.
Eurostat M ODES OF ACCESS TO EU MICRODATA IN THE NEW LEGAL FRAMEWORK A LEKSANDRA BUJNOWSKA E UROSTAT S TATISTICAL OFFICE OF THE E UROPEAN U NION.
Multiple Indicator Cluster Surveys Data Dissemination and Further Analysis Workshop Data Archiving MICS4 Data Dissemination and Further Analysis Workshop.
MOLLA HUNEGNAW STATISTICIAN AFRICAN CENTRE FOR STATISTICS ECASTATS.UNECA.ORG Confidentiality and Anonymization of Microdata 1 United Nations Regional Seminar.
A PRIMER ON DATA MASKING TECHNIQUES FOR NUMERICAL DATA Krish Muralidhar Gatton College of Business & Economics.
1 Numerical Data Masking Techniques for Maintaining Sub-Domain Characteristics Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State.
Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
Carmela Pascucci – Istat - Italy Meeting of the Working Party on International Trade in Goods and Trade in Services Statistics (WPTGS) Linking business.
© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.
Luisa Franconi Integration, Quality, Research and Production Networks Development Department Unit on microdata access ISTAT Essnet on Common Tools and.
User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences
Daniel Beckler United States Department of Agriculture National Agricultural Statistics Service Timothy Mulcahy NORC at the University of Chicago Topic.
Neural Networks for Data Privacy ONN the use of Neural Networks for Data Privacy Jordi Pont-Tuset Pau Medrano Gracia Jordi Nin Josep Lluís Larriba Pey.
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
Assessing Disclosure for a Longitudinal Linked File Sam Hawala – US Census Bureau November 9 th, 2005.
European Conference on Quality in Official Statistics Session 26: Quality Issues in Census « Rome, 10 July 2008 « Quality Assurance and Control Programme.
ESG Part 2: European standards for the external quality assurance of higher education Conference on self-evaluation July, Belgrade Lewis Purser.
The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM Researcher Janika Konnu Manchester, United Kingdom December.
Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.
Statistik.atSeite 1 Norbert Rainer Quality Reporting and Quality Indicators for Statistical Business Registers European Conference on Quality in Official.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
Handbook on Precision Requirements and Variance Estimation for ESS Household Surveys Denisa Florescu, Eurostat European Conference on Quality in Official.
Statistical data confidentiality and micro data in Albania
The availability of Dutch census microdata Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands Division Social.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Anonymization of longitudinal surveys in the presence of outliers Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences
Disclosure Limitation in Microdata with Multiple Imputation Jerry Reiter Institute of Statistics and Decision Sciences Duke University.
Outlier Treatment in HCSO Present and future. Outline Outlier detection – types, editing, estimation Description of the current method Alternatives Future.
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Privacy-preserving data publishing
Joint UNECE/Eurostat work session on statistical data confidentiality Manchester, December 2007 Dealing with Confidentiality in Dissemination: The.
IAOS Conference Shanghai 14th October 2008 The Impact of Technology and Innovation on the Performance of Businesses in the Irish Services Sector Steve.
Developing the prototype Longitudinal Business Database: New Zealand’s Experience Julia Gretton IAOS Conference Shanghai, China, October 2008
Microdata masking as permutation Krish Muralidhar Price College of Business University of Oklahoma Josep Domingo-Ferrer UNESCO Chair in Data Privacy Dept.
Joint Eurostat Unece Worksession on Statistical Data Confidentiality 2011, Tarragona Initial analyses on comparable dissemination from the Essnet project.
Joint UNECE-Eurostat worksession on confidentiality, 2011, Tarragona Sampling as a way to reduce risk and create a Public Use File maintaining weighted.
ESSNET Data Integration - Rome, January 2010 ESSNET on Statistical Disclosure Control Daniela Ichim.
Combinations of SDC methods for continuous microdata Anna Oganian National Institute of Statistical Sciences.
1 General Recommendations of the DIME Task Force on Accuracy WG on HBS, Luxembourg, 13 May 2011.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Disclosure scenario and risk assessment: Structure of Earnings Survey
Measures for Information Loss in Protected Data
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
Harmonisation process of anonymisation of microdata
ESSnet on common tools and harmonized methodology for statistical data confidentiality Daniela Ichim, Luisa Franconi.
Roundtable on Business Survey Frames 17-21/10/2005
Federal Statistical Office Germany Research Data Centre
Strategies to achieve SDC harmonisation at European level: multiple countries, multiple files, multiple surveys Daniela Ichim and Luisa Franconi Istat,
Metadata on quality of statistical information
Imputation as a Practical Alternative to Data Swapping
Presentation transcript:

European Conference on Quality in Official Statistics, Rome, July 2008 Community Innovation Survey: a Flexible Approach to the Dissemination of Microdata Files for Research Daniela Ichim

European Conference on Quality in Official Statistics, Rome, July 2008 Outline Dissemination of Microdata Files for Research Risk assessment Disclosure limitation Data quality –Record linkage –Data utility

European Conference on Quality in Official Statistics, Rome, July 2008 Confidentiality against Dissemination Find the right balance! Disclosure scenarios

European Conference on Quality in Official Statistics, Rome, July 2008 Community Innovation Survey IDENTIFYING VARIABLES –Nace –Nuts –Size –Turnover (TURN) (STRUCTURAL VARIABLES) CONFIDENTIAL VARIABLES –Expenditures in innovation (RTOT, …) –Number of patents, … (VARIABLES INVOLVED IN ANALYSES)

European Conference on Quality in Official Statistics, Rome, July 2008 Confounding Categorical Numerical safe unsafe A … A k-anonymity

European Conference on Quality in Official Statistics, Rome, July 2008 a) Given a threshold (on units) b) Local Outlier Factor as a measure of difference in density between a unit and its nearest neighbours General risk function Distance between and Density around :

European Conference on Quality in Official Statistics, Rome, July 2008 Threshold - dissemination policy Parameters Cut-off point for density (LOF) –quantiles –automatic

European Conference on Quality in Official Statistics, Rome, July 2008 Stratification variables Analysis by Nace Nace A all Nace

European Conference on Quality in Official Statistics, Rome, July 2008 Disclosure limitation MFR  Selective masking k-anonymity Nearest neighbour Micro-aggregation on tails

European Conference on Quality in Official Statistics, Rome, July 2008 Quality assessment Dissemination Confidentiality

European Conference on Quality in Official Statistics, Rome, July 2008 Risk measure assessment Quality of the external database D E Chambers of Commerce database Record linkage

European Conference on Quality in Official Statistics, Rome, July 2008 Record linkage M*=3 1 equal unit within 10% less than 3 units within 10% less than 3 units within 20% less than 3 units within 30% NACE 88%84%97%100% NACE EMP 63%60% a 74% a 87% a M*=5 1 equal unit within 10% less than 5 units within 10% less than 5 units within 20% less than 5 units within 30% NACE 88%73%87%96% NACE EMP 63%58% a 70% a 80% a a) 100% for enterprises with more than 250 employees

European Conference on Quality in Official Statistics, Rome, July 2008 Information content analysis Information preservation Selective masking –Data utility –Only identifying and confidential variables were modified. –Only records at risk were modified. The weights were not modified. –weighted totals (coherence with the already published information) Some statistical indicators were slightly modified: variances

European Conference on Quality in Official Statistics, Rome, July 2008 Information content analysis Data utility Assessment of the perturbation impact on ratios like RTOT/TURN Original Selective masking Individual ranking

European Conference on Quality in Official Statistics, Rome, July 2008 Conclusions 1.Confidentiality: Risk measure based on the k- anonymity principle Flexible a) continuous and categorical variables b) easy to implement c) consistent for extreme choices 2.Data utility: Selective protection to achieve the k- anonymity 3.Comparable dissemination: Control both risk of re-identification and information loss QUALITY DIMENSIONS