1 REMOTE ACCESS INFRASTRUCTURE FOR REGISTER DATA / www.raird.no 1 RAIRD Remote Access Infrastructure for Register Data Johan Heldal *, Elin Monstad **,

Slides:



Advertisements
Similar presentations
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
Advertisements

Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
Implementation of the CoP in SLOVENIA Cooperation with data users Genovefa RUŽIĆ Deputy Director-General.
In a Virtual Data Centre Protecting Confidentiality COMPUTATIONAL INFORMATICS Christine O’Keefe, Mark Westcott, Adrien Ickowicz, Maree O’Sullivan, CSIRO.
RESEARCHERS‘ ACCESS TO HEALTH DATA – FACTS AND CHALLENGES Metka Zaletel National Institute of Public Health 24 March 2015.
United Nations Expert Group Meeting on Revising the Principles and Recommendations for Population and Housing Censuses New York, 29 October – 1 November.
ABS Tablebuilder and DataAnalyser Session 7 UNECE Work Session on Statistical Data Confidentiality October 2013 Daniel Elazar
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
Global Customer Partnership Council Forum | 2008 | November 18 1IBM - GCPC MeetingIBM - GCPC Meeting IBM Lotus® Sametime® Meeting Server Deployment and.
Statistics Canada’s Real Time Remote Access Solution 2011 MSIS Meeting – Karen Doherty May 2011.
1 1 and the consistency between inter- Nordic migration figures Kåre Vassenden Statistics Norway Presentation for the Joint UNECE/Eurostat Work Session.
1 Research Groups : KEEL: A Software Tool to Assess Evolutionary Algorithms for Data Mining Problems SCI 2 SMetrology and Models Intelligent.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
Analyzing Demand of E-Learning Professor: Dr. Kimiagary Student: Arman Daneshvaran.
Daniel Beckler United States Department of Agriculture National Agricultural Statistics Service Timothy Mulcahy NORC at the University of Chicago Topic.
1 REMOTE ACCESS INFRASTRUCTURE FOR REGISTER DATA / 1 RAIRD Remote Access Infrastructure for Register Data - metadata aspects Ørnulf Risnes,
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
Safe Centre Network Need for Safe Centre to enrich European Research Maurice Brandt (Destatis) and David Schiller (IAB) Work session on statistical data.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
Copyright © 2013 Curt Hill Database Security An Overview with some SQL.
The Dutch Virtual Census based on registers and already existing surveys Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics.
Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Michelle Simard Statistics Canada UNECE Worksessions on Statistical Disclosure Control Methods Helsinki, October 2015 Development of rules from administrative.
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
1 Using Fixed Intervals to Protect Sensitive Cells Instead of Cell Suppression By Steve Cohen and Bogong Li U.S. Bureau of Labor Statistics UNECE/Work.
J OINT ACCESS TO NORDIC MICRODATA C HARLOTTE N IELSEN & I VAN T HAULOW R ESEARCH S ERVICES S TATISTICS D ENMARK.
© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 ESSnet Projects “Decentralised Access to EU microdata” Maurice Brandt Research.
The experience of a National Statistical Institute after a law change: Estonia First Regional Workshop Microdata Access in European Countries ― Cooperation.
1 1 Anonymised Integrated Event History Datasets for Researchers Johan Heldal Statistics Norway.
State Statistical Institute Berlin-Brandenburg Jörg Höhne / Julia HöningerResearch Data Centre Morpheus – Remote Data Access with a Quality Measure Joint.
Statistical Methodology for the Automatic Confidentialisation of Remote Servers at the ABS Session 1 UNECE Work Session on Statistical Data Confidentiality.
Gillian Raab, Chris Dibben, & Paul Burton UNECE-Eurostat Work Session on Statistical Data Confidentiality, Helsinki, 2015 Running an analysis of combined.
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
Remodelling Frida From institutional registration to common registration and responsibility across member institutions Grete Christina Lingjærde Andora.
Michelle Simard, Thérèse Lalor Statistics Canada CSPA Project Manager UNECE Work Session on Statistical Data Confidentiality Helsinki, October 2015 Confidentialized.
SIMO Python/XML Simulator Current situation 28/10/2005 SIMO Seminar Antti Mäkinen Dept. of Forest Resource Management / University of Helsinki.
Metadata projects in Sweden and the Nordic countries C-G Hjelm, Statistics Sweden.
The views expressed herein are those of the author and should not necessarily be attributed to the IMF, its Executive Board, or its management Data Confidentiality,
1 1 Confidentiality protection of large frequency data cubes UNECE Workshop on Statistical Confidentiality Ottawa October 2013 Johan Heldal and Svetlana.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
1 Statistical business registers as a prerequisite for integrated economic statistics. By Olav Ljones Deputy Director General Statistics Norway
Virtual Research Environments (VREs) to enable access to confidential data for scientific purposes Joint UNECE/Eurostat Work Session on Statistical Data.
Generic Statistical Information Model (GSIM) Jenny Linnerud
GSIM in practice in Norway Jenny Linnerud – Ørnulf Risnes – Arofan Gregory -
19-20 October 2010IT Directors’ Group Meeting 1 Item 3.3.g of the agenda Vision Infrastructure Project on Secure Infrastructure for CONfidential data access.
Remote Analysis Server for Tabulation and Analysis of Data Tarragonia, October 2011 James Chipperfield and Frank Yu (presenter)
Joint UNECE/Eurostat work session on statistical data confidentiality October 2015 Helsinki, Finland Circle of trust Maurice Brandt DESTATIS.
Researchers’ Usage of Microdata The example of Statistics Finland Advanced presentation – Some additional details Consultation Mission on Promoting the.
National Statistics - access and disclosure issues for Vital Events data Allan Baker Office for National Statistics.
Software sales at U Waterloo Successfully moved software sales online Handle purchases from university accounts Integrated with our Active Directory and.
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
Multi-Regime Trade and Transaction Reporting
UK Data Service Secure Lab
Nicolás J. I. Rodríguez & Arild Mellesdal
Protecting Confidential Data
ITDG meeting of of October 2011
Metadata The metadata contains
microdata.no Instant Access to Microdata
Item 2.2 of the Agenda Remote access to confidential data for researchers: possible actions under the 7th Framework Programme Pascal JACQUES Unit B 5 15.
Federal Statistical Office Germany Research Data Centre
Item 2.2 Scientific Use Files for the Time Use Survey
microdata.no Instant Access to Microdata
Presentation transcript:

1 REMOTE ACCESS INFRASTRUCTURE FOR REGISTER DATA / 1 RAIRD Remote Access Infrastructure for Register Data Johan Heldal *, Elin Monstad **, Terje Risberg * and Ørnulf Risnes **, * Statistics Norway (SN), **Norwegian Social Science Data Services (NSD) Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Helsinki, Finland, 5 to 7 October 2015 )

Remote Access Infrastructure for Register Data / Norway has a large number of registers of individuals established for administrative or statistical purposes. Examples of such registers: –The Central Population Register (CPR, administrative) –National Database of Educations (NUDB, statistical) –Registers for administration of the social security, welfare system, pensions and retirements, taxation etc. –Many others These registers are being merged by a unique person identification number to event history data bases that are updated every year. Demand for such data is high. Researchers want safe access without time consuming bureaucracy. The Situation in Norway

Remote Access Infrastructure for Register Data / A few highly trusted researchers get access to some of these registers at their own premises (a weak point). Other researchers in approved institutions can apply for access to selected data for specific projects. The application procedure is cumbersome and time consuming. Permissions from the Norwegian Data Protection Authority and Statistics Norway, in some cases other register owners, are needed Research based on register data has shown a high pay off. We wish to extend their use to a wider range of researchers than those who have access today. Research access today

Remote Access Infrastructure for Register Data / 1. Remote Analysis server with event history data 2. Micro data are invisible (Anonymous User interface) –Only statistical output should be visible. –Metadata should be public domain 3. A way to get around legal procedures for access. –Simpler and cheaper(?) access for more researchers. 4. Requirement: Safe (enough) statistical output. RAIRD Metadata

Remote Access Infrastructure for Register Data / Sweden offers similar microdata through the LISA database and the MONA-system (Microdata Online Access). Statistics Denmark offers microdata for researchers through its Research Services. Both countries provide access from researchers desks. In both countries microdata are visible on the screen. In Norway: To make register data accessible with RA without the extensive application process, microdata must be invisible (AUI). AUI shifts the DC focus from microdata record disclosure to the statistical output. Comparison to neighbour countries

Remote Access Infrastructure for Register Data / The RAIRD information model All operations go through the Virtual Statistical Machine. VSM allows controlled exposure of methods and Managed Execution Environment.

Remote Access Infrastructure for Register Data / Data- Store Extract variables from Data Store VRE User workspace

Remote Access Infrastructure for Register Data / User workspace 1 Extract observations and period Extract observations User workspace 2. Population for analysis.

Remote Access Infrastructure for Register Data / Safe results Statistical Analysis Analyze data User workspace 2

Remote Access Infrastructure for Register Data / The system should serve the medium and less sophisticated researchers (low/medium access level) PhD and master students should be allowed access. The ambition is to develop the system to satisfy more and more advanced users. We want to provide access directly to users desks, not through Safe Centres. Target Groups

Remote Access Infrastructure for Register Data / NSD is working on a prototype for an AUI solution. With AUI we cannot use any standard software. Basic software: Python programming environment (SciPy, NumPy, Pandas, Statsmodels) We establish a STATA-like command interface. Prototype >> import HOVED to as jobseeker Imported jobseeker 8368 values imported 0 missing generated >>drop if jobseeker != units were removed from the data set. >>

Remote Access Infrastructure for Register Data / With AUI disclosure risk will be associated with the statistical output only, not with micro data themselves. In general: We consider disclosure attacks that can be considered realistic scenarios. Tabular output: Zeroes, Small counts, group disclosure. Aggregations: Dominance Differencing is a challenge for tabular output. Graphical output –Scatter plots, residual plots, boxplots –Can be fixed with existing software. Risks associated with RAIRD - 1

Remote Access Infrastructure for Register Data /

Regressions: –High leverage points. –Cross product matrices We miss studies of disclosure risks for event history settings, but believe that scenarios for cross sectional data still apply. Inferential attacks are not seen as very likely. Risks associated with RAIRD - 2

Remote Access Infrastructure for Register Data / do not distort the microdata (no inconsisten- cies) provide useful analysis and statistical output avoid costly manual output control. embrace as many statistical methods as possible. –We have to start with the most basic. We want solutions that

Remote Access Infrastructure for Register Data / User authorisation –Different access levels for different researchers. Strict access control. Logging of all activity on the system. –All activity can be reproduced –Store logs as data that can be routinely analysed with RAIRD itself. Censoring tables or analysis with –too few observations (e.g. 20) –too few obs. per cell or estimated parameter in average (e.g.10) –too sparse tables Dealing with risk 1

Remote Access Infrastructure for Register Data / Drop observations at random to imped differensiation (Sparks et al. (2008)) Automatic output control. –Add noise to results on the fly. –Noise fixed for repeated runs of the same output. –We are inspired by the Australian Data Analyser. Focus on robust statistical methods. –Reduces the effect of high leverages. Hexbin plots, modified boxplots Possibly: Adding noise to estimating equations –Chipperfield & O’Keefe (2014) Dealing with risk 2

Remote Access Infrastructure for Register Data / RA with AUI will be a way to get around legal obstacles for access to register micro data in Norway. –Easier access for more researchers The practice of giving access at own premises should seize. RAIRD will take micro data safety to a new level in Norway. Compared to the present situation

Remote Access Infrastructure for Register Data / Positive to remote access as such –Need not have the responsibility for storing confidential data safely at their own premises. –As far as they can do what they have been able to do until now. –Appreciate less bureaucracy for access. Reluctant to Anonymous User Interface –They are used to be able to see microdata. –We mean that is a question of habit and culture. They have their spikes out when they hear the word “noise”. –They believe we mean noise on micro data. Researchers attitudes

Remote Access Infrastructure for Register Data / We must build RAIRD bottom-up, not top-down. Start with access to the less advanced researchers (master and PhD students) with safe output for the more basic methods. Implement higher access levels with more advanced methods with less disclosure control for the more advanced and trusted researchers. Conclusion – Short run

Remote Access Infrastructure for Register Data / Implement disclosure control methods with safe output for more advanced statistical methods. Win advanced researchers’ confidence in the system and its disclosure control methods. We hope the RAIRD technology once will be estab- lished as the standard for all access to statistical microdata in Norway. Conclusion – Long run