Download presentation
Presentation is loading. Please wait.
1
Queensland Parallel Supercomputing Foundation 1. Professor Mark Ragan (Institute for Molecular Bioscience) 2. Dr Thomas Huber (Department of Mathematics) Computational Biology and Bioinformatics Environment ComBinE National Facility Projects
2
Queensland Parallel Supercomputing Foundation Comparison of protein families among completely sequenced microbial genomes The scientific problem: Handcrafted analyses suggest that gene transfer in nature may be not only from parents to offspring (“vertical”), but also from one lineage to another (“lateral” or “horizontal”) From microbial genomics we have complete inventories of genes & proteins in ~ 80 genomes Comparative analysis should identify all cases of vertical and lateral gene transfer
3
Queensland Parallel Supercomputing Foundation Computational requirement for 80 genomes: 10 12 BLAST comparisons 5000 T-Coffee alignments 5000 Bayesian inference trees 10 7 topological comparisons Find all interestingly large protein families in all microbial genomes Generate structure-sensitive multiple alignments Infer phylogenetic trees with appropriate statistics Compare trees, look for topological incongruence The approach
4
Queensland Parallel Supercomputing Foundation Computations on APAC National Facility Motif-based multiple alignment 30-50 sequences = 2-5 hours per run Will need ~5000 runs @ 4 - 60 seqs Bayesian inference Parameterisation of (MC) 3 search NF used for trials of up to 10 6 Markov chain generations (~200 hours / run) 1.5-2.0 Gb RAM per run Usage of NF: Code not yet parallelised With each run costing a few 10s of hours and need for 1000s analyses, it’s more efficient to use many processors simultaneously
5
Queensland Parallel Supercomputing Foundation Parameterisation of Metropolis-coupled Markov chain Monte Carlo optimisation through protein tree space Log-likelihood as a function of number of Markov chain generations Approach to stationarity under Jones et al. (1992) and General time-reversible models of protein sequence change Bayesian inference (MrBayes 2.0) applied to 34-sequence Elongation Factor 1 dataset. Eight simultaneous Markov chains, discrete approximation of gamma distribution ( = 0.29), chain temperature 0.1000
6
Queensland Parallel Supercomputing Foundation With thanks to collaborators Mark Borodovsky, Georgia Tech Robert Charlebois, NGI Inc. (Ottawa) Tim Harlow, University of Queensland Jeffrey Lawrence, University of Pittsburgh Thomas Rand, St Mary’s University
7
Queensland Parallel Supercomputing Foundation 1. Professor Mark Ragan (Institute for Molecular Bioscience) 2. Dr Thomas Huber (Department of Mathematics) Computational Biology and Bioinformatics Environment ComBinE National Facility Projects
8
Queensland Parallel Supercomputing Foundation Protein Structure Prediction Two Lineages The bioinformatics approach –Compare sequence to other sequence –huge datasets (0.5*10 6 sequences) –Match sequence with known structure –(Low resolution force field development) The biophysics approach –Simulations that mimic natural behaviour
9
Queensland Parallel Supercomputing Foundation Protein Structure Prediction Two Lineages The bioinformatics approach –Compare sequence to other sequence –huge datasets (0.5*10 6 sequences) –Match sequence with known structure –(Low resolution force field development) The biophysics approach –Simulations that mimic natural behaviour Hardware Requirements: CPU: minutes/seq Mem: 1 GB CPU: hours/seq Mem: 100s MB CPU: 100s hours Mem: 10s MB
10
Queensland Parallel Supercomputing Foundation Protein Structure Prediction Two Lineages The bioinformatics approach –Compare sequence to other sequence –huge datasets (0.5*10 6 sequences) –Match sequence with known structure –(Low resolution force field development) The biophysics approach –Simulations that mimic natural behaviour Parallelism: Trivial parallel Trivial parallel Hard parallel High bandwidth + low latency requirement
11
Queensland Parallel Supercomputing Foundation Force splitting and multiple time step integration (Ian Lenane) MD Simulation Propagating Molecular Models in Time Start With Old System State Add Information On Energy And Force New System State Apply Numerical Integrator Mechanical Description Newton’s Laws of Motion Time step required: 10 -15 s Time scale wanted: >10 -3 s System is split in different domains Fast varying forces (cheap to calculate) are integrated more frequent Slow varying forced (expensive to calculate) are integrated less frequent +More efficient integration +Easy to expand to parallel simulations
12
Queensland Parallel Supercomputing Foundation Path simulations (Ben Gladwin) What if start and end points are given? proteins: unfolded folded Molecular machines: 1 cycle Shortest path calculations –Floyd, Dijkstra Hamilton’s principle of least action +Computationally very attractive Extremely long time steps Very well suited for parallel architectures (Floyd algorithm parallelized, but performance problems >4PE on -GS NUMA architecture)
13
Queensland Parallel Supercomputing Foundation National Facility supercomputer use 2001 CPU quota: 2*5250 + 8000 service units –Total use 12000 units ( 3000 units in parallel) 2002 CPU quota: 4 * 6000 service units –First quarter: 2000 units –Second quarter: 85 units Collaborators Dr A. Torda (ANU) Low resolution force fields / protein structure prediction Prof. D. Hume, A/Prof. B. Kobe and Dr. J. Martin (UQ) Structural genomics project Prof. K. Burrage, I. Lenane and B. Galdwin (UQ) Numerical integration and path simulations Special Thanks Mrs J. Jenkinson and Dr D. Singleton (NF/ANUSF)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.