Introduction to Research Data Management 20th January 2016

Slides:



Advertisements
Similar presentations
Organising and Documenting Data Stuart Macdonald EDINA & Data Library DIY Research Data Management Training Kit for Librarians.
Advertisements

Research Data Management for Support Staff Jonathan Rans & Kerry Miller, Digital Curation Centre.
Good practice in Research Data Management Module 5: Deposit and long-term preservation.
Because good research needs good data Research Data Management for Researchers University of Aberdeen 7 th October 2014 Jonathan Rans Digital Curation.
Research Data at Warwick. “The aim for research data management at Warwick in five years is that it forms an integral element of the overall University.
Data Management Planning and DMPonline Angus Whyte DCC, University of Edinburgh Slides by Sarah Jones University of Aberdeen, 7 Oct 2014.
Managing your research data: University support for researchers Sally Rumsey The Bodleian Libraries University of Oxford Mary Harssch
How to Write a Data Management Plan Gareth Cole, Data Curation Officer, Open Access Team.
Because good research needs good data Research Data Management for Researchers University of Aberdeen 7 th October 2014 Jonathan Rans Digital Curation.
Data Management: Documentation & Metadata Types of Documentation.
Open Exeter Project Team
Research Data Management: The Basics Open Exeter Project team.
August 14, 2015 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
FILING SYSTEMS Research Data Management. Filing is more than saving files, it’s making sure you can find them later in your project. Naming Directory.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
Data Management Planning and DMPonline Sarah Jones DCC, University of Glasgow VADS4R, UCA Epsom, 22 nd July 2014.
What are research data? July 2015 This work is licensed under a Creative Commons Attribution 4.0 International LicenseCreative Commons Attribution 4.0.
Data Management Planning and DMPonline
R ESEARCH D ATA M ANAGEMENT : AN I NTRODUCTION TO THE B ASICS Open Access and Data Curation Team.
Jonathan Rans Digital Curation Centre
Data Management Planning Data Sharing. What is data sharing? “… the practice of making data used for scholarly research available to others.” [Wikipedia]
The Digital Curation Lifecycle Model Joy Davidson and Sarah Jones
Because good research needs good data Data Management Planning Anglia Ruskin University 1 st June 2015 Jonathan Rans Digital Curation Centre This work.
Because good research needs good data The DCC lifecycle model, Exeter Uni, 19 May 2012 Funded by: The Digital Curation Lifecycle Model Joy Davidson and.
Joint Declaration of Data Citation Principles Notes [1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman &
October 24, 2015 Research data management – a brief introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
From KAPTUR to VADS4R: Exploring Research Data in the Visual Arts Open Repositories Conference 2014, Helsinki Dr Robin Burgess
Because good research needs good data Funded by: Digital Curation for Researchers, 28th February 2013 The Shifting Research Data Management Policy Landscape.
Research Services Ten top things researchers need to know about research data management Slides provided by DaMaRO Project, University of Oxford.
June 3, 2016 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
Managing data and being open Sarah Jones Digital Curation Centre, Glasgow Data Management Plans: principles and.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
DOE Data Management Plan Requirements
11 Researcher practice in data management Margaret Henty.
Using RMS to comply with Open Access Requirements Betsy Fuller Research Repository Librarian Information Services.
Joint Declaration of Data Citation Principles (Overview) The Data Citation Synthesis Group Joint Declaration.
Aalto Research Data Management Policy Ella Bingham 8 April 2016 This work is licensed under the Creative Commons Attribution 4.0 International License.
Introduction to Research Data Management Joy Davidson and Sarah Jones Digital Curation Centre
Keeping your Research Alive: Preserving Research Data.
RoaDMaP LEEDS RESEARCH DATA MANAGEMENT PILOT Research data Management Workshop Welcome!
Introduction to RDM Sarah Jones & Joy Davidson Digital Curation Centre
Research Data Management in the Humanities: an Introduction to the Basics Open Exeter Project Team.
Because good research needs good data The DCC lifecycle model, Exeter Uni, May 2011 Funded by: The Digital Curation Lifecycle Model Joy Davidson.
Data Management Planning Sarah Jones & Joy Davidson Digital Curation Centre
Preservation Planning Bojana Tasić FORS SEEDS Workshop I Belgrade, October.
Funders’ data policies and costs Sarah Jones DCC, University of Glasgow Twitter: sjDCC Funded by:
Introduction to Managing Research and Personal Data.
Publish your Data on the Tropical Data Hub Seeding the Commons Project Australian National Data Service e-Research Centre James Cook University This work.
NRF Open Access Statement
Jeff Moon Data Librarian &
Open Exeter Project Team
DMPonline Adaption of template
Slide Template for Module 2: Types, Formats, and Stages of Data
EPSRC research data expectations and research software management
How NOT to share your data: Avoiding data horror stories
Publishing software and data
DIGITAL RESEARCH DATA MANAGEMENT
Linking persistent identifiers at the British Library
General Finnish DMP Guidance
Data Management: Documentation & Metadata
Digital Curation Centre
Open Access to your Research Papers and Data
Introduction to Research Data Management
Research Data Management for librarians
Research Data Management
Research Data Management
Research data lifecycle²
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Research Data Dr Aoife Coffey, Research Data Coordinator
Presentation transcript:

Introduction to Research Data Management 20th January 2016 Jonathan Rans Digital Curation Centre This work is licensed under the Creative Commons Attribution 2.5 UK: Scotland License.

Who we are The (Est. 2004) is: A national-level centre of expertise in digital preservation with a particular focus on Research Data Management (RDM) Working closely with a number of UK institutions to boost RDM capability across the HE sector Also involved in a variety of national and international collaborations

What will we cover? Definitions that we work to Why take a formal approach to managing your data? What does Research Data Management encompass?

Why is RDM an issue? Digital technology now used very widely in research, and is enabling new research and scientific paradigms Research funders and publishers know that digital research data can be expensive to produce but inexpensive to share, making reuse more feasible and desirable The challenge is to ensure digital research findings can be reproduced and cited There is a recognition that preservation skills which we have developed over hundreds of years are not sufficient to ensure the longevity of digital research material, which increasingly comprises the majority of the research record.

Definitions of research data? “Research data, unlike other types of information is collected, observed, or created, for purposes of analysis to produce original research results.” “Research data is defined as recorded factual material commonly retained by and accepted in the scientific community as necessary to validate research findings; although the majority of such data is created in digital format, all research data is included irrespective of the format in which it is created.“ “Evidence which is used or created to generate new knowledge and interpretations. ‘Evidence’ may be intersubjective or subjective; physical or emotional; persistent or ephemeral; personal or public; explicit or tacit; and is consciously or unconsciously referenced by the researcher at some point during the course of their research.”

So, what might this include? http://www.aoml.noaa.gov/phod/dac/array_growth.html Instrument measurements Experimental observations Still images, video and audio Text documents, spreadsheets, databases Quantitative data (e.g. survey data) Survey results & interview transcripts Simulation data, models & software Slides, artefacts, specimens, samples Questionnaires Sketches, diaries, lab notebooks … http://www.sbirc.ed.ac.uk/documents/lbc_protocol.pdf Anything & everything produced in the course of research http://www.aoml.noaa.gov/phod/graphics/dacdata/globpop.gif

Data management is part of What is research data management? Plan Create Use Appraise Deposit and Publish Discover and Reuse “an explicit process covering the creation and stewardship of research materials to enable their use for as long as they retain value.” Data management is part of good research practice 7

Why manage research data? To make research easier! To stop yourself drowning in irrelevant stuff In case you need the data later To avoid accusations of fraud or bad science To share data so others can use and learn from it To get credit for producing the data Because somebody else said to do so Data is increasing in significance. It will unquestionably matter to your research careers, more than it does to your supervisors’ generation. Learn good data habits now! You’ll need them later.

What if researchers data fell into the wrong hands? http://news.bbc.co.uk/1/hi/uk/8332445.stm

What if you had to produce a researcher’s data?

What is the worst-case scenario? http://www.computerweekly.com

Why make data available?

What do funders expect?

RCUK Common Principles in brief Make data openly available where possible Have policies & plans. Preserve data of long-term value Metadata for discovery / reuse. Link to data from publications Be mindful of legal, ethical and commercial constraints Allow limited embargoes to protect the effort of creators Acknowledge sources to recognise IP and abide by T&Cs Ensure cost-effective use of public funds for RDM www.rcuk.ac.uk/research/Pages/DataPolicy.aspx

Ultimately funders expect: Data management plans timely release of data once patents are filed or on (acceptance for) publication open data sharing minimal or no restrictions if possible preservation of data typically 5-10+ years if of long-term value See the RCUK Common Principles on Data Policy: www.rcuk.ac.uk/research/Pages/DataPolicy.aspx

What is involved in RDM? Data Management Planning Data creation Annotating / documenting data Analysis, use, versioning Storage and backup Publishing papers and data Preparing for deposit Archiving and sharing Licensing Citing… Plan Create Use Appraise Deposit and Publish Discover and Reuse

Active data management

Data creation Adopt file naming conventions: http://www.jiscdigitalmedia.ac.uk/guide/choosing-a-file-name/ Design a good project folder structure http://research-data-toolkit.herts.ac.uk/document/research-project-file-plan/ Ensure consent forms, licences and partnership agreements don’t restrict opportunities to share data http://www.dcc.ac.uk/resources/how-guides/license-research-data http://www.data-archive.ac.uk/create-manage/consent-ethics/anonymisation

Some formats are better for long-term It’s preferable to opt for formats that are: Uncompressed Non-proprietary Open, documented Standard representation (ASCII, Unicode) Data centres may have preferred formats for deposit e.g. Type Recommended Non-preferred Tabular data CSV, TSV, SPSS portable Excel Text Plain text, HTML, RTF PDF/A only if layout matters Word Media Container: MP4, Ogg Codec: Theora, Dirac, FLAC Quicktime H264 Images TIFF, JPEG2000, PNG GIF, JPG Structured data XML, RDF RDBMS Some formats are better for data sharing and long-term preservation than others. It’s preferable to use formats that are uncompressed (e.g. large, high-quality files like .wav), non-proprietary (i.e. open) standards that are documented and well-understood. This aids preservation and interoperability. Some data centres have preferred formats for deposit so it’s worthwhile encouraging researchers to consult these to check. Further examples: http://www.data-archive.ac.uk/create-manage/format/formats-table

Where to store data? Your own device (PC, flash drive, etc.) And if you lose it? Or it breaks? Departmental drive or university filestore Should be more robust with automated back-up “Cloud” storage Do they care as much about your data as you do?

Storage and backup Use managed services where possible e.g. shared drives rather than local or external hard drives Consider the security implications of where you store data and how you transfer it 3… 2… 1… backup! at least 3 copies of a file on at least 2 different media with at least 1 offsite

Data aboutdata What is metadata? Documentation and metadata are essentially descriptive information about the information contained in a dataset. There should be good documentation at the study level, for example a description of the research methodology that created the data – [the best metadata the data can have is the publication it supports] or a data paper. There should also be documentation at the file, item and variable level suitable so that someone reusing the data can understand it – this could be ensuring that excel spreadsheets have sensible row and column descriptions or that a document is included with the dataset which properly explains any abbreviations used.

What is the difference? Documentation Metadata Standardised Structured Machine and human readable Documentation Metadata is a subset of documentation Documentation is a catch-all term which can include some very high-level, human-readable, loosely structured information about the dataset – for example a description of the laboratory method used to generate a set of results or a sketch book accompanying a work of sculpture. The term metadata has come to define a subset of documentation information that uses standardised terms and is presented in a structured way. This metadata will be in a form enabling it to be read by machines and/or may be presented in human readable form as well. Metadata

What metadata are we talking about? It can be helpful to define research metadata by its use: Metadata for CITATION Metadata for DISCOVERY Metadata enabling REUSE

What is the minimum required? Repository requirements Funder requirements include Licencing/access conditions Citation/disambiguation Identifier Creator Title Publisher Publication Year Potentially, the bare minimum of metadata required will be defined by the repository that accepts the dataset. This could correspond to the DataCite mandatory minimum metadata set although it is very likely that other elements will be required, whether by the repository or by the research funder. This metadata set is designed for citation and disambiguation, in other words ensuring that the dataset a researcher is reusing is identical to one cited in a journal article or online. This set of metadata does little to make the resource visible to researchers speculatively searching for relevant data to reuse.

What are persistent identifiers? The likely home of an electronic research data resource attached to a persistent identifier is a data repository (exceptions being resources not available over the web which have a PI attached to a metadata record – physical resource, cumbersomely large dataset etc.)

Aiding discoverability Catalogue or discovery metadata Structured so that search engines can uncover it. Must be exposed in machine-readable form eg XML Controlled vocabluaries are helpful for keywords

Ensuring the utility of the data The what, why and how data creation must be understood Data dictionaries Columns/rows labelled Variable ranges defined Ensuring that other researchers can understand and effectively reuse data that they access online without the help of the data’s creator is a more complicated task and requires a greater investment of effort to do successfully.

DCC metadata catalogue The catalogue lists: Metadata standards Profiles Use cases Tools http://www.dcc.ac.uk/drupal/resources/metadata-standards

Readme files We recommend that a ReadMe be a plain text file containing the following: for each filename, a short description of what data it includes, optionally describing the relationship to the tables, figures, or sections within the accompanying publication for tabular data: definitions of column headings and row labels; data codes (including missing data); and measurement units any data processing steps, especially if not described in the publication, that may affect interpretation of results a description of what associated datasets are stored elsewhere, if applicable whom to contact with questions These recommendations for developing a readme file were put together by the Dryad team and are published on the repository’s web-site. http://datadryad.org/pages/readme

Readme – general example https://www.lib.umn.edu/datamanagement/metadata

RDM and sharing : a best practice guide http://data-archive.ac.uk/media/2894/managingsharing.pdf

Tools for managing data www.dcc.ac.uk/resources/external/tools-services/ managing-active-research-data

Questions? Jonathan Rans J.Rans@ed.ac.uk @JNRans Image Credits Harvey Rutt, Southampton. Recovered from: http://www.computerweekly.com Pile of flash drives: www.flashdrivepros.com Dalian University fire: www.weirdworldnews.org Metadata devil: http://www.truthdig.com/cartoon/item/nsa_its_just_metadata_20130812 Field researcher: http://www.nationalgeographic.com/mission/enduringvoices/expeditions.html