Download presentation
Presentation is loading. Please wait.
Published byTobias Flowers Modified over 6 years ago
1
Starting from the end: what to do when restricted data is released
Dr Marta Teperek Office of Scholarly Communication, University of Cambridge @martateperek SciDataCon 2016, Monday 12 September 2016
2
Slides are available at:
3
This session will cover:
Content This session will cover: Background to Cambridge research repository The incident: the release of restricted data What did we do Workflow development Lessons learnt
4
About the Cambridge research repository
Created in 2005 – joint project with the MIT Hosting ALL research outputs (problems!) Over 200,000 research outputs!!! Articles, Theses Datasets, Software Videos, Book Chapters Presentations…
5
About the Cambridge research repository
Created in 2005 – joint project with the MIT Hosting ALL research outputs (problems!) Quite popular: 12 August – 11 September: 20,772 visits
6
About the Cambridge research repository
Created in 2005 – joint project with the MIT Hosting ALL research outputs (problems!) Quite popular Mints DOIs for datasets
7
Advocacy + easy to share process => lots of data shared
2015 2016 In a bit more than a year 10X more data submissions than during a decade
8
About the Cambridge research repository
Created in 2005 – joint project with the MIT Hosting ALL research outputs (problems!) Quite popular Mints DOIs for datasets But: managed access to data currently not provided More and more requests for managed access to data Currently scoping to provide managed access to data
9
We tell researchers to go somewhere else…
10
Restricted dataset shared by a Cambridge researcher was released
The incident Restricted dataset shared by a Cambridge researcher was released
11
Externally-held dataset Dataset protected by:
What was released Externally-held dataset Dataset protected by: Pre-publication embargo License agreement specifying the re-use conditions The dataset was not yet complete The researcher informed weeks after the repository noticed the error The dataset had been downloaded several times
12
Never blame the repository:
Time to act We had to act: Provide the researcher with appropriate support and advice on how to proceed Document the steps: Develop workflows for dealing with is type of situations in the future Community resource Never blame the repository: Hosting personal/sensitive data will always present an element of risk
13
Risk assessment Three types of risks: Risks to study participants Risks to the researcher Reputational risks
14
Risk to study participants
Can participants be re-identified?... The risk can never be eliminated, it can only be managed
15
Risk to the researcher Being scooped
Publishers might refuse to publish this work Re-users might be misled by incomplete data
16
Reputational risks What if participants are re-identified and the information is released in the public domain Threat to future research funding
17
Risk mitigation Contact those who downloaded the data… …impossible…. …only IP addresses available
18
Mitigating the risk to study participants
Low risk of re-identification Informed the study administrator at the Research Office Inform the ethics committee
19
Mitigating the risk to the researcher
Letters to publishers
20
Mitigating reputational risks
Contacting the funder of research
21
Workflow establishment
22
Lessons learnt Transparency and open communication necessary to build trust and understanding
23
Lessons learnt – better guidance needed
We already offer: Workshops on Research Integrity and Ethics Workshops on Research Data Management Online training Guidance on creating consent forms Missing data anonymisation guidance - risks constantly evolve: new datasets available new computational tools to link data
24
Watch out for our paper in the Data Science Journal
Thank you Questions: @martateperek @CamOpenData Watch out for our paper in the Data Science Journal Slides:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.