DATA CITATION Laurie Goodman, PhD Editor-in-Chief, GigaScience ORCID ID: Twitter:

Slides:



Advertisements
Similar presentations
DataCite Jan Brase, DataCite 5 minute madness Nordbib 2012 Copenhagen.
Advertisements

Rewarding Reproducibility and Method Publishing the GigaScience Way Scott Edmunds
THE NEED AND DRIVE FOR HIGH QUALITY DATA PUBLICATION Iain Hrynaszkiewicz Head of Data and HSS Publishing, Open Research Nature Publishing Group & Palgrave.
Data Publishing Workflows: Strategies and Standards
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
FROM DATA REPOSITORIES TO DATA JOURNALS – WHERE, WHEN AND HOW TO SUBMIT Andrew L. Hufton Managing Editor, Scientific Data Nature Publishing Group
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
Promoting data dissemination and reproducibility. Christopher I. Hunter, Scott C. Edmunds, Peter Li, Xiao Si Zhe, Robert L Davidson, Laurie Goodman. Submit.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
DATAVERSE FOR JOURNALS Mercè Crosas, Ph.D. Director of Data Science IQSS, Harvard Society for Scholarly Publishing 37 th Meeting,
The Public Library of Science: Open-Access Publishing and Advocacy Barbara Cohen ICML 9, Salvador de Bahia, September 2005.
Libraries as Partners in Research: the UC Curation Center’s Tools and Services UC3 Team University of California Curation Center California Digital Library.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
Future Use of Stored Samples & Data and the NIH Policy on GWAS and dbGaP NIAID/DAIDS Dione Washington, M.S. -- ProPEP Sudha Srinivasan, Ph.D.-- TRP Tanisha.
Introduction to GigaScience journal & database Chris I Hunter & Rob L Davidson ISI CODATA International Training Workshop on Big Data 11 th March 2015.
8 October 2009Microbial Research Commons1 Toward a biomedical research commons: A view from NLM-NIH Jerry Sheehan Assistant Director for Policy Development.
June 3, 2016 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
SARS and Information Policy: Emerging Roles for Information Practitioners Information Flow ·IPs could help in developing information policy because they.
Can sharing research data raise your research profile and impact? Gerry Ryder Charles Darwin University, September 2015.
SiZhe Xiao GigaScience 2013 POSTER Open Access GigaDB – revolutionizing data dissemination, organization and use Xiao Si Zhe 1, Chris Hunter, Tam P. Sneddon,
Now launched! Visit nature.com/scientificdata Honorary Academic Editor Susanna-Assunta Sansone Advisory.
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI: /m9.figshare
It’s the data that makes a paper Joerg Heber Executive Editor Nature Communications.
GigaScience ( is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
PLOS ONE: Managing Peer Review at Scale OAI9 conference, Geneva Damian Pattinson, PhD June 2015.
Publication Ethics Webinar: Jan 2016 (Ethical) framework for author-driven publishing Dr Michaela Torkar Editorial Director, F1000Research
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services May 20, 2016 Publishing The Full Research Cycle To Support.
Writing a successful data management plan Kathleen Fear October 17, 2013.
Sara Bowman Center for Open Science | Promoting, Supporting, and Incentivizing Openness in Scientific Research.
ScienceOpen: Scientific Publishing for “Generation Open” Open Access Ambassadors Conference, December, Munich Dr. Stephanie Dawson, CEO.
Publish your data. The Data Journal concept Data must be well described before others can use it and benefit from it. Scientists who share data in a reusable.
Role of librarians in improving the research impact and academic profiling of Indian universities J. K. Vijayakumar Ph. D Manager, Collections & Information.
Publish your Data on the Tropical Data Hub Seeding the Commons Project Australian National Data Service e-Research Centre James Cook University This work.
Data publishing wakes up the sleeping data -real practices in China Scientific Data Zhang Lili Computer Network Information Center,CAS.
Getting Academic Works Published in Peer-Reviewed Journals
NRF Open Access Statement
REMOVE THIS SLIDE BEFORE PRESENTATION
Peter Li GigaScience GigaDB and Galaxy: revolutionizing data dissemination, organization and analysis Peter Li GigaScience.
Epidemiology and Genomics Research Program
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
Edmunds GigaScience 2013 POSTER Open Access
Open peer review as educational resource for science PhD students
Tin-Lap, LEE School of Biomedical Sciences,
Research software best practices: Transparency, credit, and citation
10 ideas about the future of science that I find exciting!!
The pitfalls of “salami slicing”:
GigaDB – revolutionizing data dissemination, organization and use
Open Access : Challenging the norm in Academia
Publishing software and data
Institutional role in supporting open access, open science, open data
Making Annotations FAIR
Open Science at the Royal Society Dr Stuart Taylor Publishing Director
Open Access to your Research Papers and Data
Scientific Publishing in the Digital Age
Benefits and Problems Facing Them
Research Data Management
Entering the Data Era; Digital Curation of Data-intensive Science…… and the role Publishers can play The STM view on publishing datasets Bloomsbury Conference.
Brian Matthews STFC EOSCpilot Brian Matthews STFC
The Activities of COPE: Code, International Standards and Best Practices on the Ethics of Scientific Publications The 7th International Scientific and.
Measuring Your Research Impact
Bird of Feather Session
Contributor Roles, Open Badges
Preprints and literature provenance in Europe PMC
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Preprints and literature provenance in Europe PMC
Presentation transcript:

DATA CITATION Laurie Goodman, PhD Editor-in-Chief, GigaScience ORCID ID: Credit Where Credit is Due

Journal and database for large-scale data studies Editor-in-Chief: Laurie Goodman Executive Editor: Scott Edmunds Editor: Nicole Nogoy Assistant Editor: Hans Zauner GigaDB: Chris Hunter, Jesse Xiao GigaGalaxy: Peter Li in conjunction with

Introduction to GigaScience Article types include research articles, data notes, technical notes, reviews, commentary, and editorials Have linked database GigaDB that hosts all data types Have journal linked source code-sharing (Github) and computational platforms (Galaxy) etc. Have in-house biocurator and data scientists to aid researchers in sharing information and putting it into appropriate databases

GigaDB piggy-backs onto China National Genbank (CNGB) China National Genbank Launched this month 10PB Object Storage (Aliyun “S3”) 1PB high performance storage: Huawei OceanStor disk array 480 core , 2560G memory Internet network bandwidth 250M 、 500M 、 1G (fibre 1) 10G dark fibre direct connect between genebank and BGI headquarters (fibre 2) Internet network bandwidth 100M, 200M, 300M (fibre 3) Aliyun private cloud software platform GigaDB has a dedicated server within CNGB, an offsite server in HK for backup, and is implementing Amazon cloud for storage and data use in 2017.

Why Share Data? Reproducibility Transparency Improved data quality More people accessing data speeds scientific discovery People are dying

By the end of the day: 334 people will have died of the measles Data from World Health Organization Fact Sheets

Cultural Reason’s Not to Share Fear of journals considering Data Publication as prior publication. Fear of being scooped (Data Parasites*) Lack of career advancement due to no credit for data production- only for analysis/concept papers. * Response from Functional Genomics Data Society:

A Tale of Two Bacteria 1.On May 2, 2011 German Doctors Reported the first case of an E.coli infection, that was accompanied by hemolytic-uremic syndrome 2.On May 21, 2011 the first death occurred from this bacteria (denoted E.coli O104:H4) 3.On June 3, 2014, BGI completed a draft sequence of E.coli O104:H4 from a sample provided by doctors at the University Medical Centre Hamburg-Eppendorf 4.At this point- the leaders at BGI held a discussion about whether to release the sequence data immediately: what were the potential repercussions of doing so The question arose: If the data were released now- would it affect their ability to publish later?

A Tale of Two Bacteria In one world- the researchers — who were concerned about their ability to publish as this is the way to obtain recognition and obtain grants (which are essential for them to work) — waited. The first publication appeared on July 29 th In another world, the researchers — who decided public health was more important than obtaining a publication — released the data immediately. The first publication appeared on July 29 th — but was not from that group who released the data (though information on that data was included).

To maximize its utility to the research community and aid those fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as: Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY BGI Shenzhen. doi: / These data were put on an FTP server under a CCO waiver and also given a DOI to make access ‘permanent’ To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.

Whether the concern about the ability to publish if data are released early is real or imagined Researchers act on that concern Note: Harmsen’s group DID share- immediately upon sequencing— the O104:H4 outbreak strain data. The data referred to here was the 2001 strain that was believed to be the strain involved in a 2001 outbreak of similar type. This slide is meant to only highlight that concerns about being scooped drive early sharing decisions. Given that the first paper published did use the early available O104:H4 data, it would be expected that these data, had they been shared, would have been used in that paper as well.

By the end of the day: 1,027 people will have died from influenza Data from World Health Organization Fact Sheets

Deconstructing a paper into accessible, useable, trackable, interlinked units Need to provide credit to reward sharing and proper organization of: Narrative Data/Metadata availability/curation Software availability Interoperability Availability of workflows Transparent analyses Data/ MetaData Software Methods Narrative

Deconstructing a paper into accessible, useable, trackable, interlinked units Currently publishers provide credit for this: Narrative Data/Metadata availability/curation Software availability Interoperability Availability of workflows Transparent analyses Data/ MetaData Software Methods Narrative

Moving Beyond the Narrative

How We Envision Research Publication (Communicating Science) Data Sets in GigaDB Analyses in GigaGalaxy Paper in GigaScience Linked to Open-access journal Data Publishing Platform Data Analysis Platform

Paper DOI Data set DOI Linking of papers and data by citation of DOIs

NO Hosting all data types

By the end of the day: 1,718 people will have died of malaria Data from World Health Organization Fact Sheets

Data Publication and Citation Promotes Rapid and Open Sharing It gets to the heart of the cultural reasons not to share

What is a Data Publication? 1.Publishing a standard article that describes the data. 2.Making the data itself citable.

Make it easy to cite See where it got cited! Describe the data

Current list Of Darwin Finch Data Citations on Google Scholar …And more

By the end of the day: 4,110 people will have died of complications from diabetes Data from World Health Organization Fact Sheets

Cultural Reasons Not to Share Fear of journals considering data publication as prior publication. Fear of being scooped. (Data Parasites*) Lack of career advancement due to no credit for data production- only papers. * Response from Functional Genomics Data Society:

Cultural Reasons Not to Share Fear of journals considering data publication as prior publication. Fear of being scooped (Data Parasites*) Lack of career advancement due to no credit for data production- only papers. * Response from Functional Genomics Data Society:

e-latest-weapon-in-publishing-data-the-polar-bear/ Direct Data Citation Encourages data release prior to publication of data analysis article THREE YEARS before publication of the analysis article Releasing Data Early with a Citation

The polar bear DATA was released –prepublication- in 2011 Data were used and cited in at least 5 studies 1.Hailer, F et al., Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science Apr 20;336(6079): doi: /science Cahill, JA et al., Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution. PLoS Genet. 2013;9(3):e doi: /journal.pgen Morgan, CC et al., Heterogeneous models place the root of the placental mammal phylogeny. Mol Biol Evol Sep;30(9): doi: /molbev/mst Cronin, MA et al., Molecular Phylogeny and SNP Variation of Polar Bears (Ursus maritimus), Brown Bears (U. arctos), and Black Bears (U. americanus) Derived from Genome Sequences. J Hered. 2014; 105(3): doi: /jhered/est Bidon, T et al., Brown and Polar Bear Y Chromosomes Reveal Extensive Male-Biased Gene Flow within Brother Lineages. Mol Biol Evol Apr 4. doi: /molbev/msu109 Analysis Article by data producers was published in 2014 in Cell The Data Publication has since garnered 6 more citations

Cell Press Journals had indicated publishing a dataset prior to publication could be considered as prior publication

By the end of the day: 7,671 children (under 5) will have died from Malnutrition Data from World Health Organization Fact Sheets

Cultural Reasons Not to Share Fear of journals considering data publication as prior publication. Fear of being scooped (Data Parasites*) Lack of career advancement due to no credit for data production- only papers. * Response from Functional Genomics Data Society:

Data as a publication can be cited in the references (like a ‘real’ paper) This rewards authors for making data available AND makes it easier to find the data

Cited Data is Being Tracked

Funding Agencies are paying attention Funding agencies are now including data release information in grants, require data release on publication, and are assessing if researchers are releasing data

By the end of the day: 22,466 people will have died from Cancer Data from World Health Organization Fact Sheets

Data Citation Really is a Major Incentive Last year, we released the genome sequences from 3000 Rice strains (13.4 TB of data) These data were also deposited in NIH SRA repository So why did we do it too? 1.It is linked directly to the Data Paper that provides details of data production, quality, and basic analysis 2.Authors were hesitant to release these data (a HUGE community resource) prior to the analysis paper publication (which, for 3000 strains… could possibly take years…). The opportunity to have these data citable (and trackable) encouraged the authors and led to their releasing these data and doing so in collaboration with GigaScience’s Biocurator The 3,000 Rice Genomes Project. (2014) GigaScience 3:7 The 3000 Rice Genomes Project (2014) GigaScience Database.

Cultural Reasons not to publish Data They aren’t ‘real’ papers They only pad a researchers publication list, and do not add to the lexicon of scientific discoveries. Data production is not a scholarly pursuit.

Padding a Resume: Publishing data is “Salami Slicing”!! What is Salami Slicing? Publishing research in several different papers that should form a single cohesive paper Why is Salami Slicing considered ‘unethical’? It fragments the scientific literature, wasting researcher’s time as they try to get all the information related to a very specific topic/dataset/method It can give the appearance (given there are multiple publications) that there is large support for a particular hypothesis It pads a researcher’s publication record unfairly

Publishing Data is “Salami Slicing”! Baloney 1.Those guidelines were developed prior to the year 2000: More than 15 years ago: at a time when data set sizes and data types collected in the life sciences by a single research group were relatively small and primarily suitable for a single or narrow range of disciplines or hypotheses. Most journals were not online (which allows easier identification and access to closely related articles ) until the late ‘90s. 2.In 2005, COPE* ruled that a paper that had data that had been used and described, at least in part, in a previous publication was not unethical *Council of Publication Ethics. 3.Data collection can be (should be!!) a scholarly pursuit: Data that is broadly reusable requires care, thought, training, time, and money to be properly collected, curated, stored, and shared.

Data Production is not a scholarly pursuit It doesn’t merit a publication Contrary to popular belief… There are very few —if any— ‘push-a-button-and-get-it’ reuseable data resources

Your not supposed to just collect samples! *Collect ALL available metadata*

By the end of the day: 47,945 people will have died from Cardiovascular Disease Data from World Health Organization Fact Sheets

Thanks to: Scott Edmunds, Executive Editor Nicole Nogoy, Editor Hans Zauner, Assistant Editor Peter Li, Lead Data Manager Chris Hunter, Lead BioCurator Xiao (Jesse) Si Zhe, Database Developer Joseph Hasan, Journal Development Manager @GigaScience facebook.com/GigaScience blogs.openaccesscentral.com/blogs/gigablog Contact us: Follow us: