Using synthetic data to improve the accessibility of the SLS Susan Carsley, SLS Project Manager.

Slides:



Advertisements
Similar presentations
UK Data Archive Microdata Access and the New ESRC Secure Data Service Melanie Wright, UKDA 2 nd Workshop on Data Access Cardiff, February 2009.
Advertisements

The Economic and Social Data Service (ESDS) Karen Dennison UK Data Archive Improving access to government datasets 18 January 2007.
Comparing Results from the England and Wales, Scotland and Northern Ireland Longitudinal Studies: Health and Mortality as a case study Census Microdata.
Samples of Anonymised Records from the 2001 Census Five different microdata files - with varying amounts of detail Three different modes of access - with.
Balancing Access and Confidentiality Jenny Telford Australian Bureau of Statistics September 2008.
User views Jo Wathan SARs Support team
The Statistics Act and Research Access to Data Paul J Jackson Legal Services ONS.
The Samples of Anonymised Records: Understanding Individual differences Mark Brown.
Issues in Designing a Confidentiality Preserving Model Server by Philip M Steel & Arnold Reznek.
Data linking – Project update 15 th May 2012 – Homecare & SDS event Atlantic Quay Ellen Lynch & Euan Patterson.
Data Sharing and Linking Service Overview presentation May 2013.
Capturing Sensitive Data & Data Linkage. Capturing Sensitive Data Data Protection Act 1998 (Section 33) – Allows data to be used for research purposes.
RESEARCHERS‘ ACCESS TO HEALTH DATA – FACTS AND CHALLENGES Metka Zaletel National Institute of Public Health 24 March 2015.
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
IASSIST 2003 Changes in the Way Data Archives Process Data Data Processing at ICPSR Darrell Donakowski.
2001 Census Programme Delivering UK Census Data to Researchers: Progress and Challenges David Martin University of Southampton and ESRC/JISC Census Programme.
DATA SECURITY Social Security Numbers, Credit Card Numbers, Bank Account Numbers, Personal Health Information, Student and/or Staff Personal Information,
Part of the Arts and Humanities Data Service and the UK Data Archive. Funded by the Joint Information Systems Committee and the Arts and Humanities Research.
Geography and Geographical Analysis using the ONS Longitudinal Study Christopher Marshall & Julian Buxton CeLSIUS.
The ONS Longitudinal Study. © London School of Hygiene and Tropical Medicine The Office for National Statistics Longitudinal Study (LS) o What is it o.
Building Historical Social Science Infrastructure: Data Integration Projects of the Minnesota Population Center Robert McCaa and Steven Ruggles Minnesota.
Developing a Security Policy Chapter 2. Learning Objectives Understand why a security policy is an important part of a firewall implementation Determine.
The Northern Ireland Longitudinal Study: An Introduction.
Data Protection Recruitment Process
Development of Remote Access Systems Tanvi Desai LSE Research Laboratory Data Manager Research Laboratory IASSIST 2008: Stanford.
Basque Statistics Office Confidentiality Project: Final stages Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Tarragona, Spain,
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
GEOG3025 Census and administrative data sources 2: Outputs and access.
U.S. Decennial Census Finding and Accessing Data Summer Durrant October 20, 2014 Data & Geographical Information Librarian Research Data Services
THE SCOTTISH LONGITUDINAL STUDY A new opportunity for research in Scotland Paul Boyle.
Disclosure Avoidance: An Overview Irene Wong ACCOLEDS/DLI Training December 8, 2003.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
General Register Office for S C O T L A N D information about Scotland's people General Register Office for Scotland “Information about Scotland’s people”
Introduction to the Public Use Microdata Sample (PUMS) File from the American Community Survey Updated February 2013.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
Transparency and Open Data: GSS Response Iain Bell HoP MoJ.
The 2006 National Health Interview Survey (NHIS) Paradata File: Overview And Applications Beth L. Taylor 2008 NCHS Data User’s Conference August 13 th,
SPIRE Project Scottish Primary Care Information Resource SCIMP Conference 2013.
Developing and improving data resources for social science research A strategic approach to data development and data sharing in the social sciences Peter.
On-line data submission training California Partnership for Achieving Student Success.
2011 Census: Lessons learned from the Business Sector Dr Barry Leventhal MRS Census & Geodemographics Group CAG Meeting 8 th January 2015.
The Scottish Longitudinal Study A New Source for Scottish Research Paul Boyle.
Information Commissioner’s Office Sheila Logan Operations and Policy Manager Information Commissioner’s Office Business Matters 20 May 2008.
Census.ac.uk The UK Census Longitudinal Studies Chris Dibben, University of St Andrews.
Creating Something from Nothing: Synthetic and Dummy files Bo Wandschneider University of Guelph Chuck Humphrey University of Alberta DLI Training: Ottawa,
Administrative procedures for microdata access at SURS October 2013.
1 Census Data Dissemination and Utilization A Case of China 2010 Census.
2008 NCHS Data Users’ Conference Omni Shoreham Hotel Washington, DC Wednesday, August 13, 2008.
About the Secure Data Access For the academic research community in the UK Delivered by the UK Data Service/Archive Funded by the Economic and Social Research.
The availability of Dutch census microdata Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands Division Social.
1 Dissemination Michael J. Levin Harvard Center for Population and Development Studies
Creating Something from Nothing: Working with Synthetic Files ACCOLEDS /DLI Training: December 2003 Chuck Humphrey University of Alberta.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
Disclosure Risk and Grid Computing Mark Elliot, Kingsley Purdam, Duncan Smith and Stephan Pickles CCSR, University of Manchester
David Price October 2011 Real Time Remote Access (RTRA) #10.
HETUS Pilot Group 8 Privacy procedures and ethical issues Kimberly Fisher, Centre for Time Use Research – co-ordinator External consultant Kai Ludwigs.
2021 Census Topic Consultation Statistics User Forum 17 June 2015 Ann Blake, ONS.
Creating and submitting Cal-PASS Data files California Partnership for Achieving Student Success.
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
Creating Something from Nothing: Working with Synthetic Files
Meeting children’s needs for care and protection
Tennessee Longitudinal Data system (TLDS)
NILS Research Showcase
Presentation 2b 2018 Census Products & Services Engagement.
Sabrina Iavarone Senior User Services Officer
Protecting Confidential Data
Disclosure Avoidance: An Overview
SPIRE Project Scottish Primary Care Information Resource
Item 2.2 of the Agenda Remote access to confidential data for researchers: possible actions under the 7th Framework Programme Pascal JACQUES Unit B 5 15.
Presentation transcript:

Using synthetic data to improve the accessibility of the SLS Susan Carsley, SLS Project Manager

Overview What is the SLS? How the SLS can currently be accessed How the SLS hopes to use synthetic data

What is the SLS?

The SLS is a large-scale, anonymised linkage study designed to capture 5.5% of the Scottish population The sample is based on 20 semi-random birthdays It’s a joint project between University of Edinburgh, University of St Andrews and National Records of Scotland (NRS) It is built using data available from…  Census  Vital Events  NHSCR (migration into or out of Scotland)  Education (School Census, Absences and SQA qualifications)  NSS health data (linked on a project by project basis)

Aims and scope Aims: Continue building and developing the SLS; Support researchers who wish to undertake projects with the SLS data; Provide web-based resources that help make use of the SLS easier; Provide training on the SLS and longitudinal data handling, analysis and modelling. Scope: Research into demographic, health and social questions in Scotland; Support is primarily given to academic researchers, and secondly to non-academic researchers for non-commercial use.

Security & Confidentiality Dataset is held in a secure environment at NRS (access to the building is controlled, passes are worn at all times and visitors are escorted) Data are accesses in a keypad-secure environment Computers are on a password-protected, stand-alone network Abide by all relevant protocols on data sharing, access and security Data access strictly controlled Release of the results of data analysis are all disclosure checked

How the SLS can currently be accessed

Accessing the SLS There are currently 2 ways to access the SLS Remote access Safe Setting access

Types of data access: Remote Access Analysis Researchers can specify the analyses by writing syntax code in SPSS, SAS or Stata, and sending this to their SLS Support Officer. Use the web-based Data Dictionary for looking up variable names and category names ( Or Support Officer will the researcher an ‘empty shell’ including variable labels and value labels to aid writing the syntax. The Support Officer will then run the analysis on the real dataset.

Types of data access: Remote Access Outputs The Support Officer will check the output of the analyses to check for confidentiality issues. If the output is disclosive, your Support Officer does one of the following two things: alters the output slightly so that it no longer contains disclosive elements. informs you that the analyses you wish cannot be carried out because they breach the confidentiality rules. Cleared output is sent to researchers (by in an encrypted attachment). Researchers never receive the real dataset. Remote access only provides you with cleared analysis outputs, such as frequency tables, cross tabulations, or regression model parameters.

Types of data access: Remote Access ProsCons Can work from the comfort of own home/ office Get no feel for the data Can access textbooks and internet whilst writing syntax Can be a long process if models need tweaking and rerun Don’t need to travel to the Safe Setting in Edinburgh Very reliant on Support Officer

Types of data access: Working in the safe setting room If you wish to analyse the data yourself – as most users do especially at the initial stages of recoding variables and exploratory analysis – you will need to visit NRS in Edinburgh to work with in the safe setting (safe haven) room. You will not have access to the entire SLS database (only the sub-set of data extracted for your project). The computers for analysis are not connected to the outside world and are only equipped with a CD-ROM reader. You cannot take your outputs home immediately, because they first have to be cleared by the SLS Team (the encrypted outputs will be sent to you afterwards).

Types of data access: Working in the safe setting room ProsCons Work with the data hands onMust travel to the Safe Setting in Edinburgh Can tweak and rerun modelsNo internet access within the SLS Support Officer on hand to provide advise Strict rules within Safe Setting

How the SLS hopes to use synthetic data

Why use synthetic data? The sensitive nature of the information the SLS contains means that access to the microdata is highly restricted. Consequently, compared to other census data products the SLS is used by a small number of researchers – a situation which limits their potential impact. Using synthetic data will facilitate access to the SLS while protecting confidentiality.

Synthetic data for the SLS - SYLLS Synthetic SLS data spine (1991 & 2001) Age, sex, marital status, ethnicity, limiting long term illness and geography Open access via CALLS Hub and SLS Bespoke synthetic datasets Synthetic versions of data extracts to match individual user data requests Provided to approved researchers for preliminary analysis, final analysis will be run on the real data in safe settings

Synthetic SLS data spine Aims Provide web-based resources that help make use of the SLS easier; Provide training on the SLS and longitudinal data handling, analysis and modelling. Benefits Will allow a small subset of longitudinal data to be made available online. Uses Will allow potential users to access a small subset of data online and allow them to consider and practice longitudinal analysis techniques Used in SLS training courses Freely available for others to use as a training dataset

Bespoke synthetic datasets Aims Support researchers who wish to undertake projects with the SLS data; Benefits A good compromise between the current access options. The synthetic dataset can be accessed at home and will look (structurally) and behave (statistically) like original confidential data but will contain artificial units only. Uses Allow researchers to access a synthetic version of their dataset at home Allow researchers to write syntax and develop models using synthetic data which should behave like the original data

Coming soon…….. Access to SLS-like data on own computer: Spine datasets available soon via CALLS Hub and SLS website Following formal approval bespoke synthetic data should be available for SLS users in 2015

For more information SLS Website – sls.lscs.ac.uk – Twitter SYLLS Website – data-estimation-for-uk-longitudinal-studies/