Download presentation
Published byJasmin Conley Modified over 9 years ago
1
Data Publishing Workflows: Strategies and Standards
Sünje Dallmeier-Tiessen (CERN) for many collaborators at CERN and in the RDA-WDS Data Publishing Workflows Group
2
Outline Policy pressure Solutions across disciplines Standards
Persistent Identifier Data Citation Quality Assurance, Peer Review Licensing Examples in High-Energy Physics (CERN) INSPIRE Analysis Preservation Framework Open Data Portal
3
Research data is a first class citizen
Royal Society, 1665 and 2012
4
Towards Open Science Open Science Open Data & Code Open Access
Open Source Open Access Open Data & Code Open Science We are here now Slide provided by Patricia Herterich, CERN
5
Policy pressure: STFC example
6
Policy pressure: DOE example
DMPs should provide a plan for making all research data displayed in publications resulting from the proposed research open, machine-readable, and digitally accessible to the public at the time of publication. …the underlying digital research data used to generate the displayed data should be made as accessible as possible to the public in accordance with the principles stated above.
7
Expectations: PLOS Data Policy
8
Concerns across disciplines
Datasets are… Not shared or lost Difficult to discover and access Difficult to understand > context missing Nature, 2009
9
How this challenge is addressed
10
Example: Dedicated Data Repositories
11
Preserving and promoting data reuse
12
International sharing and curation of data
ww.icgc.org
13
ICGC – Data Publication Timeline
Time limits for publication moratoriums: All data shall become free of a publication moratorium when either the data is published by the ICGC member project or one year after a specified quantity of data (e.g. genome dataset from 100 tumors per project) has been released via the ICGC database or other public databases. […] In all cases data shall be free of a publication moratorium two years after its initial release.
14
Zenodo – Data Repository
15
How to find a data repository
16
Example: A dedicated data journal
Nature Scientific Data
17
F1000
18
Connecting articles and data
Tagged Genbank entry (genetic sequence) Slide provided by H. Koers, Elsevier. Article: doi: /j.biortech
19
Towards Open Science Open Science Open Data & Code Open Access
Open Source Open Access Open Data & Code Open Science We are here now Slide provided by Patricia Herterich
20
Publish (Citable) Software
21
More and more examples
22
Published Software Papers
23
Standards
24
Licensing Enable others to reuse your data and software
Choose the licenses or public domain dedications accordingly As “open” as possible Re-Use There are measures to demand citations to track reuse and the impact of your work If you re-use, cite the dataset yourself
25
DOIs for datasets URLs are not persistent
(e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics , Jun 1;24(11):1381-5). Digital Object Identifiers (DOI names) offer a solution Mostly widely used identifier for scientific articles Researchers, authors, publishers know how to use them Put datasets on the same playing field as articles Dataset Yancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA. doi: /PANGAEA Slides by courtesy of Dr. Jan Brase, DataCite
26
ORCID id
27
Force11- Data Citation Principles
Author, Publication Year, Dataset Title, Data Repository, Version, Unique Identifier - should include a persistent method for identification that is machine actionable and globally unique - should facilitate identification of, access to, and verification of the specific data that support a claim.
28
Data Citation in Practice
29
Quality assurance for data: peer review
Products Data records in data repositories Data journals Data articles Note: standalone vs. supporting materials QA Workflows Standalone or integrated? Blind and invited peer review Open peer review Citable review reports
30
How to publish your data
Decide which dataset should be preserved or which dataset might be of interest for others to study or reuse Are there issues which restrict the publishing process, e.g. confidentiality for patient data? Which data product? Do I have enough materials for a dedicated data article? Which journal or repository works for me? Prepare the documentation/metadata Publish and let the others know you did Cite the dataset in the resulting papers Track who used and cited your data
31
HEP High-Energy Physics
32
Research data in HEP
33
Research Data on INSPIRE: starting from the paper
34
The underlying datasets (HEPdata)
35
Data Citation (Tracking)
36
Referenced Data arXiv:
37
Code snippets
38
Code snippets
39
… and who gets the credit for sharing data?
40
Kyle’s profile on INSPIRE
41
Using author IDs for attributing credit
42
Excerpt from publication list on
43
Excerpt from publication list on
Make data publications count - alongside your articles
44
Focusing on reproducibility and reuse
Two important new tools
45
Capturing the complexity: Analysis Preservation Framework
46
Open it up: CERN Open Data Portal
47
How to publish your data
Decide which dataset should be preserved or which dataset might be of interest for others to study or reuse Are there issues which restrict the publishing process, e.g. confidentiality for patient data? Which data product? Do I have enough materials for a dedicated data article? Which journal or repository works for me? Prepare the documentation/metadata Publish and let the others know you did Cite the dataset in the resulting papers Track who used and cited your data
48
Conclusions Policy pressure nationally and globally: we need data publishing solutions Considerable advancements in many disciplines We learn from best practices HEP with commitment to data preservation and open data releases First tools are available to support data preservation and data publishing
49
Towards Open Science Open Science Open Data & Code Open Access
Open Source Open Access Open Data & Code Open Science We are here now Slide provided by Patricia Herterich
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.