It’s the data that makes a paper Joerg Heber Executive Editor Nature Communications
Science is based on trust.
Nature 478, (2011) But trust gets you only this far
Trust in published data used to be much higher Prior to scientific journals, scientists published books – with no editor involved in the process Journals have editors but started to use peer review predominantly only in the 20 th century Evidence still taken at face value
Would we publish a paper today having only this data?
We should (with more data…) This was the report of the first laser. The full paper is less than a printed page. Nature 187, (1960)
Watson and Crick didn’t even include any experimental data Nature 171, (1953)
Proper data handling is important Establishes priority Supports the research findings Helps to reproduce the outcome Data needs to be archived and made accessible Helps with further research
Without proper data handling things can go wrong…
Data sharing advances science
Data sharing is in the public interest We encourage our authors to share data related to public health emergencies as soon as possible, even prior to peer review or publication. Nature 518, 477–479 (2015)
Data is serious for us
Funders can mandate data deposition The UK Engineering and Physical Sciences Research Council (EPSRC) now expects data deposition upon publication. Principles EPSRC’s Charter is to support high-quality basic, strategic and applied research and related post-graduate training in engineering and the physical sciences, and to communicate and disseminate the resulting outcomes and knowledge. As such, EPSRC believes that the following guiding principles, which are aligned with the agreed RCUK principles on sharing of research data, should inform all decisions relating to the management of all research data that has arisen as a result of EPSRC funding:RCUK principles i.EPSRC-funded research data is a public good produced in the public interest and should be made freely and openly available with as few restrictions as possible in a timely and responsible manner. ii.EPSRC recognises that there are legal, ethical and commercial constraints on release of research data. To ensure that the research process (including the collaborative research process) is not damaged by inappropriate release of data, research organisation policies and practices should ensure that these constraints are considered at all stages in the research process. iii.Sharing research data is an important contributor to the impact of publicly funded research. To recognise the intellectual contributions of researchers who generate, preserve and share key research datasets, all users of research data should acknowledge the sources of their data and abide by the terms and conditions under which they are accessed. iv.EPSRC-funded researchers should be entitled to a limited period of privileged access to the data they collect to allow them to work on and publish their results. The length of this period will depend on the scientific discipline and the nature of the research. v.Institutional and project specific data management policies and plans should be in accordance with relevant standards and community best practice and should exist for all data. Data with acknowledged long term value should be preserved and remain accessible and useable for future research. vi.Sufficient metadata should be recorded and made openly available to enable other researchers to understand the potential for further research and re-use of the data. Published results should always include information on how to access the supporting data.
Linking to data from within a paper Nature Communications 6, 7031 (2015)
Technology supports a data infrastructure This is how part of the structured header of the same paper looks like in our database. Data mentions in a paper can be made machine-readable Papers are not just for reading, but also for text and data mining { "article": { "id": "ncomms8031", "type": "articles", "title": "Coherent perfect absorption in deeply subwavelength films in the single-photon regime", "titleXml": " Coherent perfect absorption in deeply subwavelength films in the single-photon regime ", "publicationDate": " ", "publicationYear": "2015", "publicationYearMonth": " ", "number": "7031", "volume": "6", "doi": " /ncomms8031", "homepage": " "doiLink": " "license": " "hasPdfAsset": { "type": "pdf-assets", "id": "ncomms8031.pdf", "mimetype": "application/pdf", "path": "./ncomms/2015/150505/ncomms8031/pdf/ncomms8031.pdf" },
Reproducibility issues Statistics Materials, reagents, cell lines Animals and their welfare Human subjects Experimental methods Sharing data improves the reproducibility of experimental results.
Reproducibility We have checklists for a number of research areas to improve the reproducibility of research findings.
Papers form part of a larger ecosystem A single paper represents only a fraction of a researcher’s output. It usually only tells one aspect of a larger story. There is a benefit in sharing data from different labs and to build data repositories: chemical compounds, protein structures, genetic sequences etc. We support publication of data in such repositories For recommended repositories see:
Data is only useful if it can be assessed and reused Data needs structure and descriptions Those that create and curate data need to get credit Data items should be citable and should count as an integral part of the scientific literature Scientific Data is a solution to provide credit for data description
Thank you! Data sharing is beautiful.