Tin-Lap, LEE School of Biomedical Sciences,

Slides:



Advertisements
Similar presentations
Bio-IT World Asia Conference 2013 A Genomics Virtual Lab for Cancer Research Dominique Gorse.
Advertisements

Rewarding Reproducibility and Method Publishing the GigaScience Way Scott Edmunds
Open journal systems open journal systems ICCC 8th International Conference on Elelectronic Publishing.
Data Publishing Workflows: Strategies and Standards
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
26-28 th April 2004BioXHIT Kick-off Meeting: WP 5.2Slide 1 WorkPackage 5.2: Implementation of Data management and Project Tracking in Structure Solution.
Promoting data dissemination and reproducibility. Christopher I. Hunter, Scott C. Edmunds, Peter Li, Xiao Si Zhe, Robert L Davidson, Laurie Goodman. Submit.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Bioinformatics Core Facility Ernesto Lowy February 2012.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Open Data, Open Source: preparing for Big Data in Metabolomics Rob L Davidson #MetSoc2015 This presentation DOI: /m9.figshare
Software workflows as research objects & GigaGalaxy Rob L Davidson, Chris I Hunter ISI CODATA International Training Workshop on Big Data 11 th March 2015.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Introduction to GigaScience journal & database Chris I Hunter & Rob L Davidson ISI CODATA International Training Workshop on Big Data 11 th March 2015.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Bioinformatics Core Facility Guglielmo Roma January 2011.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
SiZhe Xiao GigaScience 2013 POSTER Open Access GigaDB – revolutionizing data dissemination, organization and use Xiao Si Zhe 1, Chris Hunter, Tam P. Sneddon,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
WHAT ARE WE GOING TO DO WITH DATA? Rob L Davidson #WCSJ2015 This presentation DOI: /m9.figshare
Repository for Targeted Proteomics Assays Josh Eckels Skyline Users Group - June 9, 2013.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Construction of Shanghai Life Science & Bio-technology Service Platform for Data Access and Sharing International Workshop on Strategies Presentation of.
GigaScience ( is an online, open-access journal that includes, as part of its publishing activities, the database GigaDB.
Merging and sharing Metabolomics analysis tools with Galaxy: transparent, reproducible, open 'omics Robert L Davidson #MMW2014 Merlion.
Data Citation Implementation Pilot Workshop
Local ICTS Mirror of UCSC Genome Browser Local ICTS Mirror of UCSC Genome Browser Lucas Van Tol: Gi-yung Ryu:
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Rafael Jimenez ELIXIR CTO BioMedBridges Life science requirements from e-infrastructure: initial results from a joint BioMedBridges workshop Stephanie.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Joslynn.
Galaxy for analyzing genome data Hardison October 05, 2010
LAMS: The Learning Activity Management System
Peter Li GigaScience GigaDB and Galaxy: revolutionizing data dissemination, organization and analysis Peter Li GigaScience.
Bioinformatics Shared Resource
Science for Life Laboratory
Introduction to Bioinformatics and Functional Genomics
Operating System Concepts
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
CyVerse Tools and Services
Tools and Services Workshop
Edmunds GigaScience 2013 POSTER Open Access
University of Chicago and ANL
Customized cloud platform for computing on your terms !
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
NGS Analysis Using Galaxy
Making “Open Data” Work: Challenges for Data Integration in Genomics Research
Juliana Freire, Norbert Fuhr, Andreas Rauber
CBTTC Expansion in China through BGI/CNBC
GigaDB – revolutionizing data dissemination, organization and use
Bioinformatics Community of CNGrid A New Approach to Utilizing Grids
Tools and Services Workshop
NBIC Galaxy to Strengthen the Bioinformatics Community in the Netherlands Hailiang Mei David van Enckevort
Institutional role in supporting open access, open science, open data
Data uploading and sharing with CyVerse
SRA Submission Pipeline
HP Quality Center 10 Hottest Features and Project Harmonization
Product Positioning, Partner Resources and recent developments
EOSCpilot All Hands Meeting 8 March 2018 Pisa
Mission DataCite was founded in 2009 as an international organization which aims to: establish easier access to research data increase acceptance of research.
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Building an open library without walls : Archiving of particle physics data and results for long-term access and use Joanne Yeomans CERN Scientific Information.
Chinese Academy of Sciences
PRIMACY GUI for Massively Multiplexed Pathogen Detection Optimization
Scientific Workflows Lecture 15
Presentation transcript:

CBIIT GigaGalaxy – A Galaxy-based Platform for Large-scale Genomics Analysis Tin-Lap, LEE School of Biomedical Sciences, CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Hong Kong SAR, China.

CBIIT Jointly established between The Chinese University of Hong Kong (CUHK) and BGI. “We aim to provide a platform conducive to training of multi-disciplinary talents conversant with the knowledge and application of genomics, proteomics, genetics , computation biology and bioinformatics, by capitalizing on both institutions’ expertise and strengths in genomic science.”

Big Data Translates into Big Opportunities... and Big Responsibilities

The challenges for biomedical scientists

The challenges for biomedical scientists

http://galaxyproject.org/

CBIIT GigaGalaxy Highlights: Provides enhanced functionality in additional to the original Galaxy functions Specialized instances Speed: local servers with SBS-UCSC genome database mirror in Hong Kong Reproducibility: Seamless integration with Taverna/myExperiment workflows Data exchange and publishing: GigaScience journal portal/GigaDB Customized functions and more…..

CBIIT GigaGalaxy Benefits: Simplifies complicated bioinformatics tasks, accelerate data processing and allow flexible analysis. Significantly reduce software and hardware costs, encourage research collaboration.

Galaxy/CUHK-BGI http://www.cuhk.edu.hk/cbiit/galaxy.html

CBIIT GigaGalaxy Structure Tool Development Biomedical and bioinformatics research Publishing The first section of this talk is about implementation of public instance using galaxy tool shed. We are currently implement the first public SOAP instance to the platform.

What is SOAP? SOAP - a tool package that provides full solution to NGS data analysis by BGI. The SOAP package provides a set of tools for processing NGS data. There are different versions of SOAP for mapping short reads to reference sequences. There are also tools like soapdenovo for construction of a new genome sequence and soapsnp which can assemble a consensus sequence and identify SNPs present on it in relation to a reference.   Documentation in the BGI SOAP package is limited in scope, making the tools difficult to use. We will be working with the BGI developers in providing test data and Galaxy pipelines demonstrating the use of SOAP. http://soap.genomics.org.cn/

Why SOAP? Galaxy has been using SAMtools for consensus sequence calling, but the recent upgrade has left this part out, which is very limited to some biologists. SOAPsnp is the only other method that can call full consensus sequences besides SAMtools. The main galaxy site supports none of the SOAP tools, including SOAPsnp. Other than its popularity, another main reason to implement SOAP tool is that …

Galaxy Tool Shed Enables sharing of Galaxy tools across Galaxy servers around the world. SOAP package tools configured for use in Galaxy. SOAPsnp/SOAPdenovo We transform the command line base SOAP tool into galaxy instance by Galaxy tool shed. The tool shed is useful to transofrm any programs through python rapper. I should say the Galaxy team did a great job on this, and they are very helpful during the development process. By doing that.. It allows

NGS mapping: SOAP1

NGS mapping: SOAP2

SOAPsnp You can notice that all the parameters has been transformed into drop-down menu.. We also put an explanation for each par. So that the user has a better understanding on each item.

SOAPpopindel

NGS De Novo Assembly: SOAPdenovo Similar to SOAPsnp, the complicated parameters or option has been transformed. The settings will be recorded in each run, so that one can track back easily.

NGS De Novo Assembly: SOAPdenovo2

CBIIT GigaGalaxy structure Bioinformatics Development Biomedical and bioinformatics research Publishing So much for the tool development, the second part of the talk will focus on work flow implementation using the workflows from myexperiment.

How does it work? myExperiment -a repository for workflows. Taverna workflows. New: Galaxy workflows. CBIIT GigaGalaxy integration http://www.myexperiment.org

Taverna workflow http://www.taverna.org.uk/

Galaxy workflow

Import (1)

Import (2)

Export (1)

Export (2)

SOAPdenovo2 Galaxy workflow

CBIIT GigaGalaxy structure Bioinformatics Development Biomedical and bioinformatics research Publishing What does semantic mean in the

Large-Scale Data Journal/Database www.gigasciencejournal.com Now launched… Large-Scale Data Journal/Database In conjunction with: Introduction into GigaScience, a journal published by BGI and BioMed Central which focuses on the publication of papers involving the analysis of large-scale omics data - show first issue slide. In addition, the journal has a focus on enabling the experimental data and results published in its papers to be reproducible for readers.  Data produced from post-genomic experiments can be stored in GigaScience's GigaDB database. It currently holds 37 data sets of mainly NGS data - show slide. Each data set is allocated a DOI - Digital Object Identifier which enables the data set to be uniquely identified and used for its citation, providing a handle for tracking its usage. Editor-in-Chief: Laurie Goodman, PhD Editor: Scott Edmunds, PhD Commissioning Editor: Nicole Nogoy, PhD www.gigasciencejournal.com

GigaScience is go…

Data Publishing www.gigaDB.org

40 Datasets with DOI®s Released pre-publication Non-BGI Paper in GigaScience Invertebrate Ant - Florida carpenter ant - Jerdon’s jumping ant - Leaf-cutter ant Roundworm Schistosoma Silkworm Vertebrates Giant panda Macaque Chinese rhesus Crab-eating Mini-Pig Naked mole rat  Parrot Penguin - Emperor penguin - Adelie penguin Pigeon, domestic Polar bear Sheep Tibetan antelope Plants Chinese cabbage Cucumber Foxtail millet Pigeonpea Potato Sorghum Human Asian individual  (YH) v1+v2 - DNA Methylome - Genome Assembly - Transcriptome Cancer (14TB) Hep B infected exomes Single Cell Bladder Cancer Ancient DNA - Saqqaq Eskimo - Aboriginal Australian Coming soon… Microbiome data Microbes E. Coli O104:H4 TY-2482 Cell-Line Chinese Hamster Ovary Mouse Methylomes

GigaDB v2 export to CBIIT GigaGalaxy

How are we supporting data reproducibility? Data sets Linked to DOI Linked to GigaScience paper Analyses Community tools for data reproduction and reuse

Big data from the “Sequencing Coal Face” CBIIT GigaGalaxy Big data from the “Sequencing Coal Face” Data, Data, Data… Data Modeling Pipeline design Validation Applications Tin-Lap Lee, CUHK

Acknowledgements myExperiment Lee Lab (CUHK) GigaScience NBIC Huayan Gao GigaScience Scott Edmunds Peter Li Tam Sneddon BGI-Hong Kong Dennis Chan Edmond Leung Galaxy team Nate Coraor myExperiment Finn Bacall Dave De Roure NBIC Kostas Karasavvas BGI-Shenzhen Ruiqiang Li Ruibang Luo Haofu Wu SOAP team members

Thank you