Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Research Data Management 20th January 2016

Similar presentations


Presentation on theme: "Introduction to Research Data Management 20th January 2016"— Presentation transcript:

1 Introduction to Research Data Management 20th January 2016
Jonathan Rans Digital Curation Centre This work is licensed under the Creative Commons Attribution 2.5 UK: Scotland License.

2 Who we are The (Est. 2004) is: A national-level centre of expertise in digital preservation with a particular focus on Research Data Management (RDM) Working closely with a number of UK institutions to boost RDM capability across the HE sector Also involved in a variety of national and international collaborations

3 What will we cover? Definitions that we work to
Why take a formal approach to managing your data? What does Research Data Management encompass?

4 Why is RDM an issue? Digital technology now used very widely in research, and is enabling new research and scientific paradigms Research funders and publishers know that digital research data can be expensive to produce but inexpensive to share, making reuse more feasible and desirable The challenge is to ensure digital research findings can be reproduced and cited There is a recognition that preservation skills which we have developed over hundreds of years are not sufficient to ensure the longevity of digital research material, which increasingly comprises the majority of the research record.

5 Definitions of research data?
“Research data, unlike other types of information is collected, observed, or created, for purposes of analysis to produce original research results.” “Research data is defined as recorded factual material commonly retained by and accepted in the scientific community as necessary to validate research findings; although the majority of such data is created in digital format, all research data is included irrespective of the format in which it is created.“ “Evidence which is used or created to generate new knowledge and interpretations. ‘Evidence’ may be intersubjective or subjective; physical or emotional; persistent or ephemeral; personal or public; explicit or tacit; and is consciously or unconsciously referenced by the researcher at some point during the course of their research.”

6 So, what might this include?
Instrument measurements Experimental observations Still images, video and audio Text documents, spreadsheets, databases Quantitative data (e.g. survey data) Survey results & interview transcripts Simulation data, models & software Slides, artefacts, specimens, samples Questionnaires Sketches, diaries, lab notebooks … Anything & everything produced in the course of research

7 Data management is part of
What is research data management? Plan Create Use Appraise Deposit and Publish Discover and Reuse “an explicit process covering the creation and stewardship of research materials to enable their use for as long as they retain value.” Data management is part of good research practice 7

8 Why manage research data?
To make research easier! To stop yourself drowning in irrelevant stuff In case you need the data later To avoid accusations of fraud or bad science To share data so others can use and learn from it To get credit for producing the data Because somebody else said to do so Data is increasing in significance. It will unquestionably matter to your research careers, more than it does to your supervisors’ generation. Learn good data habits now! You’ll need them later.

9 What if researchers data fell into the wrong hands?

10 What if you had to produce a researcher’s data?

11 What is the worst-case scenario?

12 Why make data available?

13 What do funders expect?

14 RCUK Common Principles in brief
Make data openly available where possible Have policies & plans. Preserve data of long-term value Metadata for discovery / reuse. Link to data from publications Be mindful of legal, ethical and commercial constraints Allow limited embargoes to protect the effort of creators Acknowledge sources to recognise IP and abide by T&Cs Ensure cost-effective use of public funds for RDM

15 Ultimately funders expect:
Data management plans timely release of data once patents are filed or on (acceptance for) publication open data sharing minimal or no restrictions if possible preservation of data typically years if of long-term value See the RCUK Common Principles on Data Policy:

16 What is involved in RDM? Data Management Planning Data creation
Annotating / documenting data Analysis, use, versioning Storage and backup Publishing papers and data Preparing for deposit Archiving and sharing Licensing Citing… Plan Create Use Appraise Deposit and Publish Discover and Reuse

17 Active data management

18 Data creation Adopt file naming conventions:
Design a good project folder structure Ensure consent forms, licences and partnership agreements don’t restrict opportunities to share data

19 Some formats are better for long-term
It’s preferable to opt for formats that are: Uncompressed Non-proprietary Open, documented Standard representation (ASCII, Unicode) Data centres may have preferred formats for deposit e.g. Type Recommended Non-preferred Tabular data CSV, TSV, SPSS portable Excel Text Plain text, HTML, RTF PDF/A only if layout matters Word Media Container: MP4, Ogg Codec: Theora, Dirac, FLAC Quicktime H264 Images TIFF, JPEG2000, PNG GIF, JPG Structured data XML, RDF RDBMS Some formats are better for data sharing and long-term preservation than others. It’s preferable to use formats that are uncompressed (e.g. large, high-quality files like .wav), non-proprietary (i.e. open) standards that are documented and well-understood. This aids preservation and interoperability. Some data centres have preferred formats for deposit so it’s worthwhile encouraging researchers to consult these to check. Further examples:

20 Where to store data? Your own device (PC, flash drive, etc.)
And if you lose it? Or it breaks? Departmental drive or university filestore Should be more robust with automated back-up “Cloud” storage Do they care as much about your data as you do?

21 Storage and backup Use managed services where possible e.g. shared drives rather than local or external hard drives Consider the security implications of where you store data and how you transfer it 3… 2… 1… backup! at least 3 copies of a file on at least 2 different media with at least 1 offsite

22 Data aboutdata What is metadata?
Documentation and metadata are essentially descriptive information about the information contained in a dataset. There should be good documentation at the study level, for example a description of the research methodology that created the data – [the best metadata the data can have is the publication it supports] or a data paper. There should also be documentation at the file, item and variable level suitable so that someone reusing the data can understand it – this could be ensuring that excel spreadsheets have sensible row and column descriptions or that a document is included with the dataset which properly explains any abbreviations used.

23 What is the difference? Documentation Metadata Standardised Structured
Machine and human readable Documentation Metadata is a subset of documentation Documentation is a catch-all term which can include some very high-level, human-readable, loosely structured information about the dataset – for example a description of the laboratory method used to generate a set of results or a sketch book accompanying a work of sculpture. The term metadata has come to define a subset of documentation information that uses standardised terms and is presented in a structured way. This metadata will be in a form enabling it to be read by machines and/or may be presented in human readable form as well. Metadata

24 What metadata are we talking about?
It can be helpful to define research metadata by its use: Metadata for CITATION Metadata for DISCOVERY Metadata enabling REUSE

25 What is the minimum required?
Repository requirements Funder requirements include Licencing/access conditions Citation/disambiguation Identifier Creator Title Publisher Publication Year Potentially, the bare minimum of metadata required will be defined by the repository that accepts the dataset. This could correspond to the DataCite mandatory minimum metadata set although it is very likely that other elements will be required, whether by the repository or by the research funder. This metadata set is designed for citation and disambiguation, in other words ensuring that the dataset a researcher is reusing is identical to one cited in a journal article or online. This set of metadata does little to make the resource visible to researchers speculatively searching for relevant data to reuse.

26 What are persistent identifiers?
The likely home of an electronic research data resource attached to a persistent identifier is a data repository (exceptions being resources not available over the web which have a PI attached to a metadata record – physical resource, cumbersomely large dataset etc.)

27 Aiding discoverability
Catalogue or discovery metadata Structured so that search engines can uncover it. Must be exposed in machine-readable form eg XML Controlled vocabluaries are helpful for keywords

28 Ensuring the utility of the data
The what, why and how data creation must be understood Data dictionaries Columns/rows labelled Variable ranges defined Ensuring that other researchers can understand and effectively reuse data that they access online without the help of the data’s creator is a more complicated task and requires a greater investment of effort to do successfully.

29 DCC metadata catalogue
The catalogue lists: Metadata standards Profiles Use cases Tools

30 Readme files We recommend that a ReadMe be a plain text file containing the following: for each filename, a short description of what data it includes, optionally describing the relationship to the tables, figures, or sections within the accompanying publication for tabular data: definitions of column headings and row labels; data codes (including missing data); and measurement units any data processing steps, especially if not described in the publication, that may affect interpretation of results a description of what associated datasets are stored elsewhere, if applicable whom to contact with questions These recommendations for developing a readme file were put together by the Dryad team and are published on the repository’s web-site.

31 Readme – general example

32 RDM and sharing : a best practice guide

33 Tools for managing data
managing-active-research-data

34 Questions? Jonathan Rans J.Rans@ed.ac.uk @JNRans Image Credits
Harvey Rutt, Southampton. Recovered from: Pile of flash drives: Dalian University fire: Metadata devil: Field researcher:


Download ppt "Introduction to Research Data Management 20th January 2016"

Similar presentations


Ads by Google