Download presentation
Presentation is loading. Please wait.
1
Data Publication (in H2020)
Dr Sünje Dallmeier-Tiessen CERN Madrid, November 2016
2
Agenda Introduction Research data Relation to H2020 Data Publishing
Examples, developments and lessons learnt from the “real world” General purpose and disciplinary repositories Adding journals to the mix Adding ”reproducibility workflows” to the mix Lessons learnt
3
Research Data What is it? How does it look like? Does it hurt?
4
Funders’ policies WELCOMES Open Access to scientific publications as the option by default for publishing the results of publicly funded research; […] RECOGNISES that the full scale transition towards Open Access should be based on common principles such as transparency, research integrity, sustainability, fair pricing and economic viability; and […] CALLS on Member States, the Commission and stakeholders to remove financial and legal barriers, and to take the necessary steps for successful implementation in all scientific domains, including specific measures for disciplines where obstacles hinder its progress. See for example:
5
Mandatory Data Management Plans (DMPs)
6
Journals’ policies Springer Nature Data policy
7
Data Publishing Paradigms
8
Data Publishing Concepts
Standalone Data (Repository) Traditional article-data linking Data articles/journals Data Article Data Article Data
9
Data Publishing components (RDA endorsed)
[2] DOI: /s
10
Another data publishing perspective: establishing context
[2] DOI: /s
11
The FAIR Guiding Principles I
To be Findable: F1. (meta)data are assigned a globally unique and persistent identifier F2. data are described with rich metadata (defined by R1 below) F3. metadata clearly and explicitly include the identifier of the data it describes F4. (meta)data are registered or indexed in a searchable resource To be Accessible: A1. (meta)data are retrievable by their identifier using a standardized communications protocol A1.1 the protocol is open, free, and universally implementable A1.2 the protocol allows for an authentication and authorization procedure, where necessary A2. metadata are accessible, even when the data are no longer available FAIR data
12
The FAIR Guiding Principles II
To be Interoperable: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles I3. (meta)data include qualified references to other (meta)data To be Reusable: R1. meta(data) are richly described with a plurality of accurate and relevant attributes R1.1. (meta)data are released with a clear and accessible data usage license R1.2. (meta)data are associated with detailed provenance R1.3. (meta)data meet domain-relevant community standards
13
Various solutions Disciplinary and institutional repositories exist
Choose partners: re3data.org Article-Data linking Now easier with Datacite and CrossRef Data/software journals already exist With partner repositories With repository recommendations
14
Data Publishing solutions
Examples, there are more!
16
re3data.org
17
Data Publishing Concepts
Standalone Data (Repository) Traditional article-data linking Data articles/journals Data Article Data Article Data
18
All disciplines, institutions
Needs replacement with Zenodo3 Zenodo.org
19
All disciplines, institutions
Figshare screenshot Figshare.com
20
Established discipplinary databases: life sciences
EBI database screenshot
21
Established disciplinary databases: earth & environmental sciences
Pangaea screenshot pangaea.de
22
dataverse.org
23
Data Publishing Concepts
Standalone Data (Repository) Traditional article-data linking Data articles/journals Data Article Data Article Data
25
Discipline specific data journals
Add another data journal, e.g. ESSD?
27
Considerations for choosing the “right service”
Future purpose: reuse, reproducibility, preservation Metadata (standards) Quality Dependencies (software, methods) Versioning Visibility, Discoverability (cf. FAIR principles) Referencing, data citation capability for all outputs Persistent links, sustainability
28
In practical terms Discuss with researchers
What are the needs of the group/community Are there existing services or is there are need for more? Re3data.org Don’t shy away from contacting data centres or services directly Check out what community publishers do Recommended repositories? Discuss with partners in computer centre and/or community meetings What do they do and plan to do; anything you can contribute to or profit from
29
Moving beyond the individual elements
Opening data publishing to reproducible workflows
32
Conditions very discipline specific
Reproducibility Repeatability Replicability Reproducibility Reusability Repurposing In order to reuse/repurpose results, you sometimes have to reproduce the original results first (to understand the exact details) An article about computational science in a scientific publication is not the scholarship itself; it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. ( stanford.edu/doku.php?id=sep:research:reproducible :seg92) We can reserve the term "replicability" for the regeneration of published results from author provided code and data Reproducibility is a more general term, implying both replication and the regeneration of findings with at least some independence from the code and/or data associated with the original publication. Both refer to the analysis that occurs after publication. A third term, “repeatability,” is sometimes used in place of reproducibility, but this is more typically used as a term of art referring to the sensitivity of results when underlying measurements are retaken To summarize, we need replicability, in part, to resolve differences in outcomes that arise from reproduced computational results, regardless of whether the experiments have been repeated. Conditions very discipline specific
33
To reproduce or reuse research results a researcher needs…
More than “just” the article Context, documentation Links to related research objects: data, code, workflows Understandable method, processing, software etc. Steps taken during the research process (versions)
34
Research Lifecycle
35
Seamless integration across the research lifecycle
Who? When? Where? ? project/rcn/194927_en.html Slide credit to Trisha Cruse, Datacite
36
https://benchling.com/
Docker
37
Example from CERN: CERN Open Data and CERN Analysis Preservation
Future purpose: reuse, reproducibility, preservation What are the components of an analysis (and where are they stored now) How much do these components vary within the collaboration How is quality defined What are the dependencies (software, methods) Versioning Linking Size (10-15TB per analysis) See CERN presentation later
38
Future Big challenge is adoption needs all of us to work together
We can help with data curation and services, i.e. guiding researchers to the right services But we need your expertise to make it an intrinsic process for researchers Integrated in publishing process Link objects/resources (DataCite!) Give data more ❤️ and visibility – make it discoverable
39
Backup slides
40
References THOR project; https://project-thor.eu/ ORCID: orcid.org
All icons are kindly provided by freeicon via flaticon
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.