Edmunds GigaScience 2013 POSTER Open Access

Slides:



Advertisements
Similar presentations
Software workflows as research objects & GigaGalaxy Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015.
Advertisements

Rewarding Reproducibility and Method Publishing the GigaScience Way Scott Edmunds
Figures for ADMIRAL Project grant application These figures are copyright © David Shotton, University of Oxford, They are made available for reuse.
FROM DATA REPOSITORIES TO DATA JOURNALS – WHERE, WHEN AND HOW TO SUBMIT Andrew L. Hufton Managing Editor, Scientific Data Nature Publishing Group
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
Promoting data dissemination and reproducibility. Christopher I. Hunter, Scott C. Edmunds, Peter Li, Xiao Si Zhe, Robert L Davidson, Laurie Goodman. Submit.
Tools for reproducible and accessible science VMs, KnitR and OMERO Rob Davidson Cardiac Physiome Workshop Auckland, April 8th 2015.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Software workflows as research objects & GigaGalaxy Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015.
Introduction to GigaScience journal & database Chris I Hunter & Rob L Davidson ISI CODATA International Training Workshop on Big Data 11 th March 2015.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
SiZhe Xiao GigaScience 2013 POSTER Open Access GigaDB – revolutionizing data dissemination, organization and use Xiao Si Zhe 1, Chris Hunter, Tam P. Sneddon,
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI: /m9.figshare
QSSPN: Dynamic Simulation of Molecular Interaction Networks Describing Gene Regulation, Signalling and Whole-Cell Metabolism in Human Cells.
GigaScience ( is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB.
Merging and sharing Metabolomics analysis tools with Galaxy: transparent, reproducible, open 'omics Robert L Davidson #MMW2014 Merlion.
Brian Hole COASP, Riga, 20 September 2013.
Publication Ethics Webinar: Jan 2016 (Ethical) framework for author-driven publishing Dr Michaela Torkar Editorial Director, F1000Research
Kathleen Shearer Data management: The new frontier for libraries.
Publish your data. The Data Journal concept Data must be well described before others can use it and benefit from it. Scientists who share data in a reusable.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
Galaxy for analyzing genome data Hardison October 05, 2010
DATA CITATION Laurie Goodman, PhD Editor-in-Chief, GigaScience ORCID ID: Twitter:
Konstantin Okonechnikov Qualimap v2: advanced quality control of
Peter Li GigaScience GigaDB and Galaxy: revolutionizing data dissemination, organization and analysis Peter Li GigaScience.
Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and
Figure 1. Unified models predicting gene regulation based on landscapes of gene-regulating factors. For each gene, position specific combinatorial patterns.
Tools and Services Workshop
Fig. 8 The two cases for the terminal edge e<sub>0</sub>
Joslynn Lee – Data Science Educator
J Exp Bot. 2017;68(17): doi: /jxb/erx352
Tin-Lap, LEE School of Biomedical Sciences,
Figure 3: MetaLIMS sample input.
Figure 2. The graphic integration of CNAs with altered expression genes in lung AD and SCC. The red lines represent the amplification regions for CNA and.
GigaDB – revolutionizing data dissemination, organization and use
Figure 1. Example of a template search of expression data: screenshot of the ‘SpellDataSet→SpellScore→Genes’ template showing the SpellExpression Score.
Figure 1. The Vicsek set graphs $G_0$, $G_1$, $G_2$.
Publishing software and data
Figure 2: Make a component
Figure 2. Number of SNPs detected from empirical ddRAD-Seq analysis
From: Introducing the PRIDE Archive RESTful web services
Fig. 1. RUbioSeq pipelines for exome variant detection and BS-Seq analyses. Dark gray boxes correspond to the main steps of the pipelines. Light gray boxes.
From: Computational analysis of bacterial RNA-Seq data
Figure 2. Workflow of MethMotif Batch Query
OpenML Workshop Eindhoven TU/e,
From: TopHat: discovering splice junctions with RNA-Seq
Characteristics LAM Results Total (%) Statistic P-Value Positive (%)
Figure 1. Evaluation the sensitivity and specificity value of urine LAM and sputum AFB Procedure using GeneXpert as the Reference (Gold Standard) From:
Serum Cholesterol levels mg/dl Frequency P
All Psychosis Patients
Figure 2. Effect of gradually decreasing photoperiod on PHA response in Siberian hamsters. Asterisk (*) indicates statistical significance at P﹤0.05, determined.
Figure 1. autoMLST workflow depicting placement and de novo mode
Figure 4. The mean of spermatocyte of various treatment groups
Characteristics LAM Result Total (%) 95% CI Odds ratio
Land cover Class Area % Sand Water bodies
Land cover class in (Ha)
Figure 4. Classified landsat image 2016
Histological classification
Fig. 1. iS-CellR pipeline overview
Land cover class in 2016 (%) Land cover class 2006 (Ha) l
Figure 1. Yvis platform overview
Figure 3. A basic scheme of a part of the cross-section of the experimental setup implemented for the measurements of asymmetric reduction of weight of.
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Figure 2. Model adequacy results for the two empirical data sets, West African Ebola, and 2009 H1N1 influenza. The ... Figure 2. Model adequacy results.
Fig. 1. A graphical representation of the GGDNA workflow used to identify each single expressed TE transcript from ... Fig. 1. A graphical representation.
Presentation transcript:

Edmunds GigaScience 2013 POSTER Open Access GigaGalaxy: A GigaSolution for reproducible and sustainable genomic data publication and analysis Scott Edmunds1,2, Peter Li1,2, Huayan Gao3,4, Ruibang Luo2,5, Dennis Chan1, Alex Wong1, Zhang Yong2, Tin-Lap Lee3,4 Abstract Today's next generation sequencing (NGS) experiments generate substantially more data and are more broadly applicable to previous high-throughput genomic assays. Despite the plummeting costs of sequencing, downstream data processing and analysis create financial and bioinformatics challenges for many biomedical scientists. It is therefore important to make NGS data interpretation as accessible as data generation. GigaGalaxy (http://galaxy.cbiit.cuhk.edu.hk) represents a NGS data interpretation solution towards the big sequencing data challenge. We have ported the popular Short Oligonucleotide Analysis Package (http://soap.genomics.org.cn) as well as supporting tools such as Contiguator2 (http://contiguator.sourceforge.net) into the Galaxy framework, to provide seamless NGS mapping, de novo assembly, NGS data format conversion and sequence alignment visualization. Our vision is to create an open publication, review and analysis environment by integrating GigaGalaxy into the publication platform at GigaScience and its GigaDB database that links to more than 20 TBs of genomic data. We have begun this effort by re-implementing the data procedures described by Luo et al., (GigaScience 1: 18, 2012) as Galaxy workflows so that they can be shared in a manner which can be visualized and executed in GigaGalaxy. We hope to revolutionize the publication model with the aim of executable publications, where data analyses can be reproduced and reused. Keywords: Galaxy, workflows, reproducible research, genome assembly, next generation sequencing, GigaScience Background Example: SOAPdenovo2 Growing replication gap: Linking papers to data and analyses 10/18 microarray papers cannot be reproduced Ioannidis: “Most Published Research Findings Are False” >15X increase in retracted papers in last decade Lack of incentives to make data/methods available Open-Paper Open-Data DOI:10.5524/100038 Data sets 78GB CC0 data Linked to DOI Open-Pipelines DOI Open-Workflows Linked to Analyses DOI:10.5524/100044 GigaSolution: deconstructing the paper Combine and integrate (via citable DOIs): Implement paper pipelines in GigaGalaxy Open-access journal www.gigasciencejournal.com Data Publishing Platform gigadb.org Data Analysis Platform galaxy.cbiit.cuhk.edu.hk GigaGalaxy: screenshot Visualization of results: e.g. GAGE metrics and CONTIGuator 2 outputs: NC_010079.pdf gi_161510924_ref_NC_010063.1_.pdf References Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 2009 41: 14 Science publishing: The trouble with retractions Nature 2011 478, 26-28 Ioannidis J. Why Most Published Research Findings Are False. PLoS Med 2005 2(8): e124. Luo R et al., SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler GigaScience 2012, 1:18 Wang, J; et al., (2012): Updated genome assembly of YH: the first diploid genome sequence of a Han Chinese individual (version 2, 07/2012). GigaScience Database. http://dx.doi.org/10.5524/100038. Luo, R; et al., (2012): Software and supporting material for “SOAPdenovo2: An empirically improved memory-efficient short read de novo assembly”. GigaScience Database. http://dx.doi.org/10.5524/100044 Galardini et al.:CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes. Source Code for Biology and Medicine 2011 6:11. Acknowledgements Thanks to: Laurie Goodman, Chris Hunter, Xiao Si Zhe, Tam Sneddon (GigaScience), Shaoguang Liang (BGI-SZ), Qiong Luo, Senghong Wang, Yan Zhou (HKUST), Rob Davidson and Mark Viant (Birmingham Uni), Marco Galardini (Unifi) doi:10.6084/m9.figshare.713512 Cite this poster as: GigaGalaxy: A GigaSolution for reproducible and sustainable genomic data publication and analysis. Scott C. Edmunds, Peter Li, Huayan Gao, Ruibang Luo, Dennis Chan, Alex Wong, Zhang Yong, Tin-Lap Lee. figshare http://dx.doi.org/10.6084/m9.figshare.713512 Financial support from: Submit your next manuscript containing large-scale data and workflows to GigaScience and take full advantage of: Correspondence: scott@gigasciencejournal.com 1. BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong SAR, China. 2. BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen, China. 3. School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. 4. CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. 5. HKU-BGI  Bioinformatics Algorithms and Core Tecnology Research Laboratory & Department of Computer Science, University of Hong Kong, Pok Fu Lam, Hong Kong   No space constraints, and unlimited data and workflow hosting in GigaDB and GigaGalaxy Article processing charges for all submissions in 2013 covered by BGI Open access, open data and highly visible work freely available for distribution Inclusion in PubMed and Google Scholar © 2013 Edmunds et al. This is an Open Access poster distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.