HPC for large NGS data: Microbial diversity analysis

Slides:



Advertisements
Similar presentations
DAISY Pipeline in NLB Functional and technical requirements.
Advertisements

Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Computer Architecture and Organization Introduction.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
cloud-based platform: application for genomic epidemiology studies Rodrigo Jardim & Alberto M. R. Dávila (Computational and Systems Biology.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Drinking from a fire hose: analysis of metagenomic data Rachel Mackelprang, Ph.D. Assistant Professor of Biology California State University Northridge.
Bringing your favorite analysis applications to iPlant using Docker containers Nirav Merchant
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
Transforming Science Through Data-driven Discovery Bringing your Bioinformatics tools to CyVerse’s Discovery Environment using Docker Upendra Kumar Devisetty.
QA Process within OEM Services Ethan Chang QA Engineer OEM Service, Canonical
Brief introduction about “Grid at LNS”
HPC In The Cloud Case Study: Proteomics Workflow
EthERNet Social Network
Olawale Olayide, Abdulazeez Adelopo & Rising Osazuwa
EthERNet Research & Education Repository
WEKA Machine Learning Use Case – Breast Cancer - Final report
Intelligent Medical Image Analyzer
Software Architecture ATAM Process Presentation
A web portal for management of biological data and applications
Tools and Services Workshop
GWAS-TOOL – Final report
Development of an interactive pipeline for Genome wide association analysis Falola Damilare & Adigun Taiwo – Covenant University Bioinformatics research.
Joslynn Lee – Data Science Educator
Cloud based NGS data analysis
MIPAR Extension- Final report
EthERNet Research & Education Repository
ACEPRD Plant Repository – Intermediate report
Development of a SGW-based Plant Tissue Culture Micropropagation Yield Forecasting Application, Plantisc2 Collins Udanor – University of Nigeria Nsukka.
Segun OYEYIOLA – Obafemi Awolowo University, Ile-Ife - Nigeria
Considerations for metagenomics data analysis and summary of workflows
iGrid Aron Kondoro – University of Dar-es-Salaam - Tanzania
Education eLibrary and Repository
Genomic Data Clustering on FPGAs for Compression
Computing Resource Allocation and Scheduling in A Data Center
WIMEA – ICT: Science Gateway for Weather Information Management in East Africa to interact with ICT Tool WRF MAKWEBA, Damas – DSM Institute of Technology.
An easier path? Customizing a “Global Solution”
Development of a SGW-based Plant Tissue Culture Micropropagation Yield Forecasting Application, Plantisc2 - Final report Collins Udanor – University of.
EthERNet Research & Education Repository
USF Health Informatics Institute (HII)
HII Technical Infrastructure
Introduction and History
WEB BASED PREDICTIVE DEFUZZIFIER
Introduction and History
Using Galaxy for Molecular Assay Design
Segun OYEYIOLA – Obafemi Awolowo University -
Development of a SGW-based Plant Tissue Culture Micropropagation Yield Forecasting Application, Plantisc2 – Intermediate report Collins Udanor – University.
Chapter 2: System Structures
Introduction to G-OnRamp
Introduction and History
Gaussian Mekuanent Getachew Kassaye — EthERNet
Mid Term review CSC345.
MIPAR (Extension)– Intermediate report
Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz
Use case name FirstName LastName – Organisation - Country ( address)
Introduction and History
Operating System Introduction.
Introduction to High Performance Computing Using Sapelo2 at GACRC
Use case name – Intermediate report
Korea Software HRD Center
Use case name - Final report
Introduction and History
Use case name FirstName LastName – Organisation - Country ( address)
MIPAR Extension- Final report
HPC416S - Final report Trust Odia – Covenant University Bioinformatics Research Group - Nigeria WACREN e-Research.
Use case name – Intermediate report
Use case name - Final report
Campus and Phoenix Resources
General overview of the bioinformatic pipelines for the 16S rRNA gene microbial profiling and shotgun metagenomics. General overview of the bioinformatic.
Presentation transcript:

HPC for large NGS data: Microbial diversity analysis Trust Odia – Covenant University Bioinformatics Research Group - Nigeria (trust.odia@covenantuniversity.edu.ng) WACREN e-Research Hackfest – Lagos (Nigeria) 1

Proposed Application Architecture Data model (workflow) Outline introduction motivation Proposed Application Architecture Data model (workflow) Implementation strategy 2

NGS Sequencing : Introduction Whole genome/exome of organisms. Large data generation (> 1000 mega base pairs, sequence length). Various samples from DNA extraction( blood, environment etc) Metagenomics: 16S rRNA gene annotating unknown samples Functional prediction of genes Measure abundance of microbial diversity 3

Huse et al., 2014. BMC Bioinformatics. 15. 41

Bioinformatics analysis Motivation from encountered problems Bioinformatics analysis High Performance Computing (program memory crash) User analysis Flexibility (ease if use) Sequence data issues - workaround No Sample metadata (information) 5

Proposed Application architecture 6

Data Model (Data Flow) 7

Linux based tools Tool Stack (technology) Several Installed Programs python perl Java runtime Available Supplementary materials Standard of Operations (SOPs) in documentations A bunch of scripts Tools I am thinking of (workflow based) BioBlend: Sloggett et al., 2013. Bioinformatics.29.13. 1685-1686 Docker to wrap tools into an environment Nextflow to automate workflow Ruffus to automate workflow Cwl and yml 8

High-performance computing (HPC) Hardware: Linux 64-bit OS, > 20GB RAM and > 4 CPU cores. > 20TB HD Job scheduling/management Memory allocation Data storage 9

Existing system : something similar Afgan et al. , 2016 Existing system : something similar Afgan et al., 2016. Nucleic Acids Research. 10

The tool creation interface. The workflow creation interface. Existing Job Management system : something similar Brown et al., 2015. PloS One. The tool creation interface. The workflow creation interface. Workflow patterns. Workflows. Tool versioning interface. 11

The tool creation interface. The workflow creation interface. Existing system : something similar Brown et al., 2015. PloS One. The tool creation interface. The workflow creation interface. Workflow patterns. Workflows. Tool versioning interface. 12

Thank you! sci-gaia.eu info@sci-gaia.eu 13