Presentation is loading. Please wait.

Presentation is loading. Please wait.

HPC for large NGS data: Microbial diversity analysis

Similar presentations

Presentation on theme: "HPC for large NGS data: Microbial diversity analysis"— Presentation transcript:

1 HPC for large NGS data: Microbial diversity analysis
Trust Odia – Covenant University Bioinformatics Research Group - Nigeria WACREN e-Research Hackfest – Lagos (Nigeria) 1

2 Proposed Application Architecture Data model (workflow)
Outline introduction motivation Proposed Application Architecture Data model (workflow) Implementation strategy 2

3 NGS Sequencing : Introduction Whole genome/exome of organisms.
Large data generation (> 1000 mega base pairs, sequence length). Various samples from DNA extraction( blood, environment etc) Metagenomics: 16S rRNA gene annotating unknown samples Functional prediction of genes Measure abundance of microbial diversity 3

4 Huse et al., 2014. BMC Bioinformatics. 15. 41

5 Bioinformatics analysis
Motivation from encountered problems Bioinformatics analysis High Performance Computing (program memory crash) User analysis Flexibility (ease if use) Sequence data issues - workaround No Sample metadata (information) 5

6 Proposed Application architecture

7 Data Model (Data Flow) 7

8 Linux based tools Tool Stack (technology) Several Installed Programs
python perl Java runtime Available Supplementary materials Standard of Operations (SOPs) in documentations A bunch of scripts Tools I am thinking of (workflow based) BioBlend: Sloggett et al., Bioinformatics Docker to wrap tools into an environment Nextflow to automate workflow Ruffus to automate workflow Cwl and yml 8

9 High-performance computing (HPC)
Hardware: Linux 64-bit OS, > 20GB RAM and > 4 CPU cores. > 20TB HD Job scheduling/management Memory allocation Data storage 9

10 Existing system : something similar Afgan et al. , 2016
Existing system : something similar Afgan et al., Nucleic Acids Research. 10

11 The tool creation interface. The workflow creation interface.
Existing Job Management system : something similar Brown et al., PloS One. The tool creation interface. The workflow creation interface. Workflow patterns. Workflows. Tool versioning interface. 11

12 The tool creation interface. The workflow creation interface.
Existing system : something similar Brown et al., PloS One. The tool creation interface. The workflow creation interface. Workflow patterns. Workflows. Tool versioning interface. 12

13 Thank you! 13

Download ppt "HPC for large NGS data: Microbial diversity analysis"

Similar presentations

Ads by Google