HPC for large NGS data: Microbial diversity analysis Trust Odia – Covenant University Bioinformatics Research Group - Nigeria (trust.odia@covenantuniversity.edu.ng) WACREN e-Research Hackfest – Lagos (Nigeria) 1
Proposed Application Architecture Data model (workflow) Outline introduction motivation Proposed Application Architecture Data model (workflow) Implementation strategy 2
NGS Sequencing : Introduction Whole genome/exome of organisms. Large data generation (> 1000 mega base pairs, sequence length). Various samples from DNA extraction( blood, environment etc) Metagenomics: 16S rRNA gene annotating unknown samples Functional prediction of genes Measure abundance of microbial diversity 3
Huse et al., 2014. BMC Bioinformatics. 15. 41
Bioinformatics analysis Motivation from encountered problems Bioinformatics analysis High Performance Computing (program memory crash) User analysis Flexibility (ease if use) Sequence data issues - workaround No Sample metadata (information) 5
Proposed Application architecture 6
Data Model (Data Flow) 7
Linux based tools Tool Stack (technology) Several Installed Programs python perl Java runtime Available Supplementary materials Standard of Operations (SOPs) in documentations A bunch of scripts Tools I am thinking of (workflow based) BioBlend: Sloggett et al., 2013. Bioinformatics.29.13. 1685-1686 Docker to wrap tools into an environment Nextflow to automate workflow Ruffus to automate workflow Cwl and yml 8
High-performance computing (HPC) Hardware: Linux 64-bit OS, > 20GB RAM and > 4 CPU cores. > 20TB HD Job scheduling/management Memory allocation Data storage 9
Existing system : something similar Afgan et al. , 2016 Existing system : something similar Afgan et al., 2016. Nucleic Acids Research. 10
The tool creation interface. The workflow creation interface. Existing Job Management system : something similar Brown et al., 2015. PloS One. The tool creation interface. The workflow creation interface. Workflow patterns. Workflows. Tool versioning interface. 11
The tool creation interface. The workflow creation interface. Existing system : something similar Brown et al., 2015. PloS One. The tool creation interface. The workflow creation interface. Workflow patterns. Workflows. Tool versioning interface. 12
Thank you! sci-gaia.eu info@sci-gaia.eu 13