Download presentation
Presentation is loading. Please wait.
Published byPolly Beasley Modified over 9 years ago
1
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065 www.eudat.eu Research Data Management Version 1 December 2015 This work is licensed under the Creative Commons CC-BY 4.0 licence
2
The changing data landscape Managing and sharing research data EUDAT services Overview
3
THE CHANGING DATA LANDSCAPE Image CC-BY-SA ‘data.path Ryoji.Ikeda - 3’ by r2hox www.flickr.com/photos/rh2ox/9990016123
4
Data explosion More and more data is being created Issue is not creating data, but being able to navigate and use it Data management is critical to make sure data are well-organised, understandable and reusable
5
Digital data are fragile and susceptible to loss for a wide variety of reasons Natural disaster Facilities infrastructure failure Storage failure Server hardware/software failure Application software failure Format obsolescence Legal encumbrance Human error Malicious attack Loss of staffing competencies Loss of institutional commitment Loss of financial stability Changes in user expectations Data loss Image CC-BY ‘Hard Drive 016’ by Jon Ross www.flickr.com/photos/jon_a_ross/1482849745
6
Link rot – more 404 errors generated over time Reference rot* – link rot plus content drift i.e. webpages evolving and no longer reflecting original content cited * Term coined by Hiberlink http://hiberlink.orghttp://hiberlink.org Data persistency issues Jonathan D. Wren Bioinformatics 2008;24:1381-1385
7
A reproducibility crisis Nature special issue http://www.nature.com/n ews/reproducibility- 1.17552 Several studies have shown alarming numbers of published papers that don’t stand up to scrutiny
8
A wildlife biologist for a small field office was the in-house GIS expert and provided support for all the staff’s GIS needs. However, the data was stored on her own workstation. When the biologist relocated to another office, no one understood how the data was stored or managed. Solution: A state office GIS specialist retrieved the workstation and sifted through files trying to salvage relevant data. Cost: 1 work month ($4,000) plus the value of data that was not recovered Consider that the situation could have been worse, because the data was not being backed up as it would have been if stored on a server. Poor data management - science example
9
In preparation for a Resource Management Plan, an office discovered 14 duplicate GPS inventories of roads. However, because none of the inventories had enough metadata, it was impossible to know which inventory was best or if any of the inventories actually met their requirements. Solution: Re-Inventory roads Cost: Estimated 9 work months per inventory @$4,000/wm (14 inventories = $504,000) Poor data management - federal example Image CC-BY ‘Minature fake highway interchange in Chicago’ by Ryan www.flickr.com/photos/ryanready/4692092024
10
Why manage research data? To make your research easier! To stop yourself drowning in irrelevant stuff In case you need the data later To avoid accusations of fraud or bad science To share your data for others to use and learn from To get credit for producing it Because funders or your organisation require it Well-managed data opens up opportunities for re-use, integration and new science
11
MANAGING & SHARING DATA
12
CREATING DATA PROCESSING DATA ANALYSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA Research data lifecycle CREATING DATA: designing research, DMPs, planning consent, locate existing data, data collection and management, capturing and creating metadata RE-USING DATA: follow- up research, new research, undertake research reviews, scrutinising findings, teaching & learning ACCESS TO DATA: distributing data, sharing data, controlling access, establishing copyright, promoting data PRESERVING DATA: data storage, back- up & archiving, migrating to best format & medium, creating metadata and documentation ANALYSING DATA: interpreting, & deriving data, producing outputs, authoring publications, preparing for sharing PROCESSING DATA: entering, transcribing, checking, validating and cleaning data, anonymising data, describing data, manage and store data Ref: UK Data Archive: http://www.data-archive.ac.uk/create-manage/life-cyclehttp://www.data-archive.ac.uk/create-manage/life-cycle
13
Bitstream Persistent Identifier Metadata Digital objects can be aggregated to digital collections What is a digital object? https://b2share.eudat.eu/record/1
14
A DMP is a brief plan to define: how the data will be created? how it will be documented? who will access it? where it will be stored? who will back it up? whether (and how) it will be shared & preserved? DMPs are often submitted as part of grant applications, but are useful whenever researchers are creating data. Data Management Planning
15
Metadata and documentation is needed to locate and understand research data Think about what others would need in order to find, evaluate, understand, and reuse your data. Get others to check the metadata to improve quality Use standards to enable interoperability Metadata and documentation
16
Where to store your data? Your own drive (PC, server, flash drive, etc.) –And if you lose it? Or it breaks? Somebody else’s drive / departmental drive “Cloud” drive –Do they care as much about your data as you do? Large scale infrastructure services like EUDAT
17
How to backup? 3... 2... 1... backup! –at least 3 copies of a file –on at least 2 different media –with at least 1 offsite Use managed services where possible e.g. University filestores or infrastructure services like EUDAT rather than local or external hard drives Ask IT teams for advice
18
Backup and preservation – not the same thing! Backups o Used to take periodic snapshots of data in case the current version is destroyed or lost o Backups are copies of files stored for short or near- long-term o Often performed on a somewhat frequent schedule Archiving o Used to preserve data for historical reference or potentially during disasters o Archives are usually the final version, stored for long- term, and generally not copied over o Often performed at the end of a project or during major milestones
19
A mistake in a spreadsheet led to dramatically different results from those published. These results were cited by the International Monetary Fund and the UK Treasury to justify austerity programmes. Had the data been shared, this could have been picked up earlier. The importance of sharing data
20
Concerns About Data Sharing ConcernSolution inappropriate use due to misunderstanding of research purpose or parameters security and confidentiality of sensitive data lack of acknowledgement / credit loss of advantage when competing for research dollars
21
Concerns About Data Sharing ConcernSolution inappropriate use due to misunderstanding of research purpose or parameters security and confidentiality of sensitive data lack of acknowledgement / credit loss of advantage when competing for research dollars metadata
22
Concerns About Data Sharing ConcernSolution inappropriate use due to misunderstanding of research purpose or parameters provide rich Abstract, Purpose, Use Constraints and Supplemental Information where needed security and confidentiality of sensitive data the metadata does NOT contain the data Use Constraints specify who may access the data and how lack of acknowledgement / credit specify a required data citation within the Use Constraints loss data insight and competitive advantage when vying for research dollars create second, public version with generalized Data Processing Description
23
Making data shareable Create robust metadata that has been checked Include reference information e.g. unique IDs & properly formatted data citations Publish your metadata so it’s discoverable. Use portals, clearing houses, online resources… Package up the data and associated metadata to deposit in repositories
24
Deciding what to preserve and share It’s not possible to keep everything. Select based on: What has to be kept e.g. data underlying publications What can’t be recreated e.g. environmental recordings What is potentially useful to others What has scientific, cultural or historical value What legally must be destroyed How to select and appraise research data: www.dcc.ac.uk/resources/how-guides/appraise-select-research-data
25
EUDAT SERVICE SUITE Image CC-BY-NC ‘Data centre’ by Bob Mical www.flickr.com/photos/small_realm/15995555571
26
EUDAT services EUDAT offers a pan-European solution, providing a generic set of services to ensure minimum level of interoperability Building common data services in close collaboration with 25+ communities
27
EUDAT B2 service suite Covering both access and deposit, from informal data sharing to long-term archiving, and addressing identification, discoverability and computability of both long-tail and big data, EUDAT’s services will address the full lifecycle of research data
28
Support throughout the lifecycle CREATING DATA PROCESSING DATA ANALYSING DATA PRESERVING DATA GIVING ACCESS TO DATA RE-USING DATA
29
www.eudat.eu Thanks – any questions Acknowledgements: Thanks to EUDAT colleagues Mark van de Sanden and Sarah Jones for slides Content has also been repurposed from the DataONE Educational modules, ‘Data Management’ and ‘Data Sharing’ Retrieved from https://www.dataone.org/education-modules
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.