GigaDB – revolutionizing data dissemination, organization and use

Slides:



Advertisements
Similar presentations
Raising your research profile with AEKOS Anita Smyth and David Turner Logos used with consent. Content of this presentation except logos is released under.
Advertisements

Don’t make me think Biodiversity data publishing made easy Vince Smith, Alice Heaton, Laurence Livermore, Simon Rycroft, Ben Scott & Lyubomir Penev* The.
Ensuring a Journal’s Economic Sustainability, While Increasing Access to Knowledge.
Service activities ViBRANT Project Year 3/Final Review Meeting – Brussels Description & Objectives WP Description WP Objectives WP partners.
Rewarding Reproducibility and Method Publishing the GigaScience Way Scott Edmunds
Figures for ADMIRAL Project grant application These figures are copyright © David Shotton, University of Oxford, They are made available for reuse.
ⓒ UNIST LIBRARY UNIST Institutional Repository ⓒ UNIST LIBRARY
Promoting data dissemination and reproducibility. Christopher I. Hunter, Scott C. Edmunds, Peter Li, Xiao Si Zhe, Robert L Davidson, Laurie Goodman. Submit.
Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Collaborative Approach to Open Access: Experience from Bioline International Leslie Chan Associate Director Bioline International University of Toronto.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Software workflows as research objects & GigaGalaxy Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015.
Introduction to GigaScience journal & database Chris I Hunter & Rob L Davidson ISI CODATA International Training Workshop on Big Data 11 th March 2015.
GigaDB explained Christopher I Hunter International Training Workshop on Big Data 11-Mar-2015.
Data Management in Scholarly Journals and possible Roles for Libraries – Some Insights from EDaWaX Sven Vlaeminck | Leibniz-Information Centre for Economics.
SiZhe Xiao GigaScience 2013 POSTER Open Access GigaDB – revolutionizing data dissemination, organization and use Xiao Si Zhe 1, Chris Hunter, Tam P. Sneddon,
Now launched! Visit nature.com/scientificdata Honorary Academic Editor Susanna-Assunta Sansone Advisory.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI: /m9.figshare
Don’t make me think Biodiversity Data Publishing Made Easy Laurence Livermore, Vince Smith, Alice Heaton, Simon Rycroft, Ed Baker, Ben Scott & Lyubomir.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
GigaScience ( is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
Merging and sharing Metabolomics analysis tools with Galaxy: transparent, reproducible, open 'omics Robert L Davidson #MMW2014 Merlion.
CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Brian Hole COASP, Riga, 20 September 2013.
Data Citation Implementation Pilot Workshop
Publication Ethics Webinar: Jan 2016 (Ethical) framework for author-driven publishing Dr Michaela Torkar Editorial Director, F1000Research
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services May 20, 2016 Publishing The Full Research Cycle To Support.
CitEc as a source for research assessment and evaluation José Manuel Barrueco Universitat de València (SPAIN) May, й Международной научно-практической.
DATA CITATION Laurie Goodman, PhD Editor-in-Chief, GigaScience ORCID ID: Twitter:
Enhancements to Galaxy for delivering on NIH Commons
Data publishing wakes up the sleeping data -real practices in China Scientific Data Zhang Lili Computer Network Information Center,CAS.
NRF Open Access Statement
Our Digital Showcase Scholars’ Mine Annual Report from July 2015 – June 2016 Providing global access to the digital, scholarly and cultural resources.
Peter Li GigaScience GigaDB and Galaxy: revolutionizing data dissemination, organization and analysis Peter Li GigaScience.
Olawale Olayide, Abdulazeez Adelopo & Rising Osazuwa
Edmunds GigaScience 2013 POSTER Open Access
OMICS Journals are welcoming Submissions
J Exp Bot. 2017;68(17): doi: /jxb/erx352
Tin-Lap, LEE School of Biomedical Sciences,
Figure 3: MetaLIMS sample input.
Christopher I Hunter Conference name Date
GFBio – Education module
Publishing software and data
Figure 2: Make a component
Notes: Household survey-based Gini and p90/p10 measures use the same definitions as employed by the UK’s official income distribution statistics (source:
University of Nigeria, Nsukka
Figure 1. The flow chart illustrates the construction process of anti-CRISPRdb, and the information that users can obtain from anti-CRISPRdb. From: Anti-CRISPRdb:
Data publishing from the viewpoint of a biodiversity publisher
SRA Submission Pipeline
Digital Curation Centre
Figure 2. Workflow of MethMotif Batch Query
OpenML Workshop Eindhoven TU/e,
Mission DataCite was founded in 2009 as an international organization which aims to: establish easier access to research data increase acceptance of research.
Figure 1. Evaluation the sensitivity and specificity value of urine LAM and sputum AFB Procedure using GeneXpert as the Reference (Gold Standard) From:
Serum Cholesterol levels mg/dl Frequency P
A Case Study for Synergistically Implementing the Management of Open Data Robert R. Downs NASA Socioeconomic Data and Applications.
USER MANUAL - WORLDSCINET
Figure 2. Effect of gradually decreasing photoperiod on PHA response in Siberian hamsters. Asterisk (*) indicates statistical significance at P﹤0.05, determined.
Figure 4. The mean of spermatocyte of various treatment groups
Land cover Class Area % Sand Water bodies
Land cover class in (Ha)
Figure 4. Classified landsat image 2016
Fig. 1. iS-CellR pipeline overview
Land cover class in 2016 (%) Land cover class 2006 (Ha) l
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
USER MANUAL - WORLDSCINET
Presentation transcript:

GigaDB – revolutionizing data dissemination, organization and use SiZhe Xiao GigaScience 2013 POSTER Open Access GigaDB – revolutionizing data dissemination, organization and use Xiao Si Zhe1 , Chris Hunter, Tam P. Sneddon, Scott C. Edmunds, Alexandra T. Basford, Peter Li, and Laurie Goodman. Abstract GigaScience, the online open-access open-data journal, has recently developed GigaDB, a new integrated database of ‘big-data’ studies from the life and biomedical sciences. The initial goals of GigaDB are to assign DOIs to datasets to allow them to be tracked and cited, and to provide a user-friendly web interface to provide easy access to selected GigaDB datasets and files. We will be working with authors to make the raw data, computational tools and data processing pipelines described in the GigaScience papers available and, where possible, executable on an informatics platform. We hope that by making both the data and processes involved in their analysis freely accessible, this novel form of publication will help articles published in GigaScience to have a much higher impact in the scientific literature, and maximize their reuse within the community. GigaDB currently accepts submissions in Excel format. Example submission and template files can be found on the website (http://gigadb.org/). To date, GigaDB comprises over 56 datasets and includes Genomic, Transcriptomic, Epigenomic and Metagenomic dataset types but we accept many other dataset types including proteomic and neuroimaging studies. Future goals include integration with the BGI Cloud, and with the Galaxy software tools to enable users to directly upload files to Galaxy for further analysis. We are also working with ISA-Tab and other scientific standards groups to support and extend the usability and interoperability model. Keywords: DOI, Galaxy, big-data, database, informatics platform, GigaScience Background GigaDB Growing replication gap: Home page: www.gigadb.org Datasets public in GigaDB 10/18 microarray papers cannot be reproduced Ioannidis: “Most Published Research Findings Are False” >15X increase in retracted papers in last decade Lack of incentives to make data/methods available Poor metadata quality and lack of interoperability GigaSolution: deconstructing the paper Combine and integrate (via citable DOIs): Open-access journal www.gigasciencejournal.com Aspera data transfer Faster download speeds Data Publishing Platform gigadb.org GigaDB Submission Workflow Data Analysis Platform galaxy.cbiit.cuhk.edu.hk Curator contacts submitter with DOI citation and to arrange file transfer (and resolve any other questions/issues). Submitter logs in to GigaDB website and uploads Excel submission Fail – submitter is provided error report Linking papers to data and analyses Curator Review Excel submission file Submitter provides files by ftp or Aspera Open-Paper Open-Data Validation checks DOI assigned DOI:10.5524/100038 Data sets 78GB CC0 data Files GigaDB Linked to Pass – dataset is uploaded to GigaDB. DOI Open-Pipelines DOI Open-Workflows XML is generated and registered with DataCite Linked to Analyses DOI:10.5524/100044 Curator makes dataset public (can be set as future date if required) DataCite XML file Public GigaDB dataset DOI 10.5524/100003 Genomic data from the crab-eating macaque/cynomolgus monkey (Macaca fascicularis) (2011) Acknowledgements Thanks to: Laurie Goodman, Chris Hunter, Scott Edmunds, Tam Sneddon (GigaScience), Shaoguang Liang (BGI-SZ), Qiong Luo, Senghong Wang, Yan Zhou (HKUST), Rob Davidson and Mark Viant (Birmingham Uni), Marco Galardini (Unifi) doi:10.6084/m9.figshare.xxxxx Cite this poster as: GigaGalaxy: A GigaSolution for reproducible and sustainable genomic data publication and analysis. Scott C. Edmunds, Peter Li, Huayan Gao, Chris Hunter, Si Zhe Zhao, Ruibang Luo, Dennis Chan, Alex Wong, Zhang Yong, Tin-Lap Lee, ISA-TAB team. figshare http://dx.doi.org/10.6084/m9.figsharexxxx Financial support from: Submit your next manuscript containing large-scale data and workflows to GigaScience and take full advantage of: Correspondence: jesse@gigasciencejournal.com 1. BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong SAR, China. 2. BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, China. 3. School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. 4. CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. 5. HKU-BGI  Bioinformatics Algorithms and Core Tecnology Research Laboratory & Department of Computer Science, University of Hong Kong, Pok Fu Lam, Hong Kong 6. Oxford e-Research Centre, University of Oxford, Oxford, UK.   No space constraints, and unlimited data and workflow hosting in GigaDB and GigaGalaxy Article processing charges for all submissions in 2013 covered by BGI Open access, open data and highly visible work freely available for distribution Inclusion in PubMed and Google Scholar © 2013 Edmunds et al. This is an Open Access poster distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.