Queensland Parallel Supercomputing Foundation 1. Professor Mark Ragan (Institute for Molecular Bioscience) 2. Dr Thomas Huber (Department of Mathematics)

Slides:



Advertisements
Similar presentations
Time averages and ensemble averages
Advertisements

Andrew Meade School of Biological Sciences.
Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Ontology annotation: mapping genomic regions biological function Paul D Thomas, Huaiyu Mi and Suzanna Lewis.
Bioinformatics For MNW 2 nd Year Jaap Heringa FEW/FALW Integrative Bioinformatics Institute VU (IBIVU) Tel ,
The Protein Folding Problem David van der Spoel Dept. of Cell & Mol. Biology Uppsala, Sweden
University of British Columbia Department of Computer Science Tamara Munzner Interactive Visualization of Evolutionary Trees and Gene Sequences February.
Structural bioinformatics
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
An Efficient Parallel Approach for Identifying Protein Families from Large-scale Metagenomics Data Changjun Wu, Ananth Kalyanaraman School of Electrical.
Bayesian Classification of Protein Data Thomas Huber Computational Biology and Bioinformatics Environment ComBinE Department of Mathematics.
Bioinformatics and Phylogenetic Analysis
Atomistic Protein Folding Simulations on the Submillisecond Timescale Using Worldwide Distributed Computing Qing Lu CMSC 838 Presentation.
Comparative ab initio prediction of gene structures using pair HMMs
Thomas Huber Computational Biology and Bioinformatics Environment ComBinE Department of Mathematics The University of Queensland.
Beowulf Cluster Computing Each Computer in the cluster is equipped with: – Intel Core 2 Duo 6400 Processor(Master: Core 2 Duo 6700) – 2 Gigabytes of DDR.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Geometric Crossovers for Supervised Motif Discovery Rolv Seehuus NTNU.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Metagenomic Analysis Using MEGAN4
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Monte Carlo Simulation and Personal Finance Jacob Foley.
T.Jadczyk, Bioinformatics Applications in the Virtual Laboratory Bioinformatics Applications in the Virtual Laboratory Tomasz Jadczyk AGH University of.
Computer Matchmaking in the Protein Sequence/Structure Universe Thomas Huber Supercomputer Facility Australian National University Canberra
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
11 Overview Paracel GeneMatcher2. 22 GeneMatcher2 The GeneMatcher system comprises of hardware and software components that significantly accelerate a.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Introduction to Bioinformatics Biostatistics & Medical Informatics 576 Computer Sciences 576 Fall 2008 Colin Dewey Dept. of Biostatistics & Medical Informatics.
2006 Winter School in Mathematical and Computational Biology Hosted by ARC Centre in Bioinformatics and Institute for Molecular Bioscience The University.
BioPerf: A Benchmark Suite to Evaluate High- Performance Computer Architecture on Bioinformatics Applications David A. Bader, Yue Li Tao Li Vipin Sachdeva.
2006 Winter School in Mathematical and Computational Biology Hosted by ARC Centre in Bioinformatics and Institute for Molecular Bioscience The University.
2007 Winter School in Mathematical and Computational Biology Hosted by ARC Centre in Bioinformatics and Institute for Molecular Bioscience The University.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
1/20 Study of Highly Accurate and Fast Protein-Ligand Docking Method Based on Molecular Dynamics Reporter: Yu Lun Kuo
Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:
CSE 291A Interconnection Networks Instructor: Prof. Chung-Kuan, Cheng CSE Dept. UCSD Winter-2007.
EB3233 Bioinformatics Introduction to Bioinformatics.
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Modelling proteomes Ram Samudrala Department of Microbiology How does the genome of an organism specify its behaviour and characteristics?
Application of the MCMC Method for the Calibration of DSMC Parameters James S. Strand and David B. Goldstein The University of Texas at Austin Sponsored.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
Bayesian II Spring Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have a large enough sample of the posterior?
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bioinformatics activity Christophe BLANCHET.
Mark Vorster Supervisor: Prof Philip Machanick. Research Overview Goal  Aid bioinformaticians in research by providing a tool which can identify similar.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
Computational Chemistry Trygve Helgaker CTCC, Department of Chemistry, University of Oslo.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Bioinformatics Computation in the Cloud A Joint Collaboration Between Microsoft’s External Research and eXtreme Computing Groups
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Sub-fields of computer science. Sub-fields of computer science.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Reducing Photometric Redshift Uncertainties Through Galaxy Clustering
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Hidden Markov Models (HMM)
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Genomic Data Clustering on FPGAs for Compression
Large Time Scale Molecular Paths Using Least Action.
WELCOME TO ALL.
Presentation transcript:

Queensland Parallel Supercomputing Foundation 1. Professor Mark Ragan (Institute for Molecular Bioscience) 2. Dr Thomas Huber (Department of Mathematics) Computational Biology and Bioinformatics Environment ComBinE National Facility Projects

Queensland Parallel Supercomputing Foundation Comparison of protein families among completely sequenced microbial genomes The scientific problem: Handcrafted analyses suggest that gene transfer in nature may be not only from parents to offspring (“vertical”), but also from one lineage to another (“lateral” or “horizontal”) From microbial genomics we have complete inventories of genes & proteins in ~ 80 genomes Comparative analysis should identify all cases of vertical and lateral gene transfer

Queensland Parallel Supercomputing Foundation Computational requirement for 80 genomes: BLAST comparisons 5000 T-Coffee alignments 5000 Bayesian inference trees 10 7 topological comparisons Find all interestingly large protein families in all microbial genomes Generate structure-sensitive multiple alignments Infer phylogenetic trees with appropriate statistics Compare trees, look for topological incongruence The approach

Queensland Parallel Supercomputing Foundation Computations on APAC National Facility Motif-based multiple alignment sequences = 2-5 hours per run Will need ~ seqs Bayesian inference Parameterisation of (MC) 3 search NF used for trials of up to 10 6 Markov chain generations (~200 hours / run) Gb RAM per run Usage of NF: Code not yet parallelised With each run costing a few 10s of hours and need for 1000s analyses, it’s more efficient to use many processors simultaneously

Queensland Parallel Supercomputing Foundation Parameterisation of Metropolis-coupled Markov chain Monte Carlo optimisation through protein tree space Log-likelihood as a function of number of Markov chain generations Approach to stationarity under Jones et al. (1992) and General time-reversible models of protein sequence change Bayesian inference (MrBayes 2.0) applied to 34-sequence Elongation Factor 1  dataset. Eight simultaneous Markov chains, discrete approximation of gamma distribution (  = 0.29), chain temperature

Queensland Parallel Supercomputing Foundation With thanks to collaborators Mark Borodovsky, Georgia Tech Robert Charlebois, NGI Inc. (Ottawa) Tim Harlow, University of Queensland Jeffrey Lawrence, University of Pittsburgh Thomas Rand, St Mary’s University

Queensland Parallel Supercomputing Foundation 1. Professor Mark Ragan (Institute for Molecular Bioscience) 2. Dr Thomas Huber (Department of Mathematics) Computational Biology and Bioinformatics Environment ComBinE National Facility Projects

Queensland Parallel Supercomputing Foundation Protein Structure Prediction Two Lineages The bioinformatics approach –Compare sequence to other sequence –huge datasets (0.5*10 6 sequences) –Match sequence with known structure –(Low resolution force field development) The biophysics approach –Simulations that mimic natural behaviour

Queensland Parallel Supercomputing Foundation Protein Structure Prediction Two Lineages The bioinformatics approach –Compare sequence to other sequence –huge datasets (0.5*10 6 sequences) –Match sequence with known structure –(Low resolution force field development) The biophysics approach –Simulations that mimic natural behaviour Hardware Requirements: CPU: minutes/seq Mem:  1 GB CPU: hours/seq Mem:  100s MB CPU: 100s hours Mem: 10s MB

Queensland Parallel Supercomputing Foundation Protein Structure Prediction Two Lineages The bioinformatics approach –Compare sequence to other sequence –huge datasets (0.5*10 6 sequences) –Match sequence with known structure –(Low resolution force field development) The biophysics approach –Simulations that mimic natural behaviour Parallelism: Trivial parallel Trivial parallel Hard parallel High bandwidth + low latency requirement

Queensland Parallel Supercomputing Foundation Force splitting and multiple time step integration (Ian Lenane) MD Simulation Propagating Molecular Models in Time Start With Old System State Add Information On Energy And Force New System State Apply Numerical Integrator Mechanical Description Newton’s Laws of Motion Time step required: s Time scale wanted: >10 -3 s  System is split in different domains Fast varying forces (cheap to calculate) are integrated more frequent Slow varying forced (expensive to calculate) are integrated less frequent +More efficient integration +Easy to expand to parallel simulations

Queensland Parallel Supercomputing Foundation Path simulations (Ben Gladwin) What if start and end points are given? proteins: unfolded  folded Molecular machines: 1 cycle Shortest path calculations –Floyd, Dijkstra Hamilton’s principle of least action +Computationally very attractive Extremely long time steps Very well suited for parallel architectures (Floyd algorithm parallelized, but performance problems >4PE on  -GS NUMA architecture)

Queensland Parallel Supercomputing Foundation National Facility supercomputer use 2001 CPU quota: 2* service units –Total use  units (  3000 units in parallel) 2002 CPU quota: 4 * 6000 service units –First quarter:  2000 units –Second quarter: 85 units Collaborators Dr A. Torda (ANU) Low resolution force fields / protein structure prediction Prof. D. Hume, A/Prof. B. Kobe and Dr. J. Martin (UQ) Structural genomics project Prof. K. Burrage, I. Lenane and B. Galdwin (UQ) Numerical integration and path simulations Special Thanks Mrs J. Jenkinson and Dr D. Singleton (NF/ANUSF)