Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Slides:



Advertisements
Similar presentations
Protein Structure Prediction using ROSETTA
Advertisements

Chemotaxis Pathway How can physics help? Davi Ortega.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Structural bioinformatics
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 1 Introduction Aleppo University Faculty of technical engineering.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Thomas Blicher Center for Biological Sequence Analysis
Thomas Huber Computational Biology and Bioinformatics Environment ComBinE Department of Mathematics The University of Queensland.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
In double vision when drunk By Thomas Huber 23 November 2001 Alexandra Headland.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Queensland Parallel Supercomputing Foundation 1. Professor Mark Ragan (Institute for Molecular Bioscience) 2. Dr Thomas Huber (Department of Mathematics)
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Modelling, comparison, and analysis of proteomes Ram Samudrala University of Washington.
Modelling proteomes An integrated computational framework for systems biology research Ram Samudrala University of Washington How does the genome of an.
Protein Structure Prediction Ram Samudrala University of Washington.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Samudrala group - overall research areas CASP6 prediction for T Å C α RMSD for all 70 residues CASP6 prediction for T Å C α RMSD for all.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
An Integrated Computational Framework for Systems Biology Ram Samudrala University of Washington How does the genome of an organism specify its behaviour.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Structure prediction: Homology modeling
Computational engineering of bionanostructures Ram Samudrala University of Washington How can we analyse, design, & engineer peptides capable of specific.
Central dogma: the story of life RNA DNA Protein.
Modelling protein tertiary structure Ram Samudrala University of Washington.
EB3233 Bioinformatics Introduction to Bioinformatics.
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.
THERAPUETIC DISCOVERY BY MODELLING INTERACTOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How does the genome of an organism specify its.
Bioinformatics and Computational Biology
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
COMPUTATIONAL ENGINEERING OF BIONANOSTRUCTURES
MODELLING INTERACTOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How does the genome of an organism specify its behaviour and characteristics?
Modelling proteomes Ram Samudrala Department of Microbiology How does the genome of an organism specify its behaviour and characteristics?
MODELLING PROTEOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How does the genome of an organism specify its behaviour and characteristics?
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
BMC Bioinformatics 2005, 6(Suppl 4):S3 Protein Structure Prediction not a trivial matter Strict relation between protein function and structure Gap between.
MODELLING INTERACTOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How does the genome of an organism specify its behaviour and characteristics?
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Discovery of Therapeutics to Improve Quality of Life Ram Samudrala University of Washington.
Modelling proteomes Ram Samudrala University of Washington.
Modelling proteomes: Application to understanding HIV disease progression Ram Samudrala Department of Microbiology University of Washington How does the.
COMPUTATIONAL ENGINEERING OF BIONANOSTRUCTURES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How can we design peptides and proteins capable.
Structure/function studies of HIV proteins HIV gp120 V3 loop modelling using de novo approaches HIV protease-inhibitor binding energy prediction.
Modelling genome structure and function Ram Samudrala University of Washington.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
Modelling genome structure and function - a practical approach Ram Samudrala University of Washington.
MODELLING PROTEOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How does the genome of an organism specify its behaviour and characteristics?
Bioinformatics Overview
How does the genome of an organism
University of Washington
MODELLING INTERACTOMES
Modelling the rice proteome
MODELLING INTERACTOMES
University of Washington
Genomes and Their Evolution
Protein dynamics Folding/unfolding dynamics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
How does the genome of an organism
Rosetta: De Novo determination of protein structure
University of Washington
Protein structure prediction.
Presentation transcript:

Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Examples of biological problems Protein structure prediction/docking simulations - need to run different trajectories that sometimes talk with each other Molecular dynamics simulations - need more cohesive parallelisation Polarisable force fields - need true parallelisation Bioinformatics searches/exploration - trivially parallelisable

Computational issues Need efficient methods to start/stop jobs Need load/balancing queuing system Need fast communications at times Need stability (months/years uptimes) Need low maintainance/management overhead Need low installation overhead Needs to be cheap!

Hardware and operating system 256 AMD and Intel CPUs (1-2.5 GHz) GB RAM, GB HD, dual processor MBs 100Mbps ethernet connectivity for 64 processor sets White boxes are good but use up space – 1u racks ideal Minimal Linux installation – create clone “CD” – copy on all machines

Our solution No single solution – user implements their own Completely decentralised Analyse problem and determine parallelisable parts Implementation specific to problem Use local scratch space for computation Redundant storage of data for faster access Limit problem space to specific problems

Problem specific implementation MCSA/GA: socket-based communication of trajectories; multiple trajectories on different CPUs Docking: sample different ligands/regions of the protein on different CPUs MD: Pairwise force-fields are additive PFF: ? Bioinformatics: trivial parallelisation; communication by disk

Modelling proteomes Ram Samudrala University of Washington

What is a “proteome”? All proteins of a particular system (organelle, cell, organism) What does it mean to “model a proteome”? For any protein, we wish to: - figure out what it looks like (structure or form) - understand what it does (function) Repeat for all proteins in a system Understand the relationships between all of them ANNOTATION { EXPRESSION + INTERACTION }

Protein folding …-L-K-E-G-V-S-K-D-… …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… one amino acid DNA protein sequence unfolded protein native state spontaneous self-organisation (~1 second) not unique mobile inactive expanded irregular

Protein folding …-L-K-E-G-V-S-K-D-… …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… one amino acid DNA protein sequence unfolded protein native state spontaneous self-organisation (~1 second) unique shape precisely ordered stable/functional globular/compact helices and sheets not unique mobile inactive expanded irregular

De novo prediction of protein structure sample conformational space such that native-like conformations are found astronomically large number of conformations 5 states/100 residues = = select hard to design functions that are not fooled by non-native conformations (“decoys”)

Semi-exhaustive segment-based folding EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK generate fragments from database 14-state ,  model …… minimise monte carlo with simulated annealing conformational space annealing, GA …… filter all-atom pairwise interactions, bad contacts compactness, secondary structure

CASP5 prediction for T Å Cα RMSD for 84 residues

CASP5 prediction for T Å Cα RMSD for 67 residues

CASP5 prediction for T Å Cα RMSD for all 69 residues

CASP5 prediction for T Å Cα RMSD for 68 residues

CASP5 prediction for T Å Cα RMSD for 74 residues

CASP5 prediction for T Å Cα RMSD for 66 residues

Comparative modelling of protein structure KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** …… scan align refine physical functions build initial model minimum perturbation construct non-conserved side chains and main chains graph theory, semfold de novo simulation

CASP5 prediction for T Å Cα RMSD for 133 residues (57% id)

CASP5 prediction for T Å Cα RMSD for 249 residues (41% id)

CASP5 prediction for T Å Cα RMSD for 99 residues (32% id)

CASP5 prediction for T Å Cα RMSD for 428 residues (24% id)

CASP5 prediction for T Å Cα RMSD for 125 residues (22% id)

CASP5 prediction for T Å Cα RMSD for 260 residues (14% id)

Prediction of SARS CoV proteinase inhibitors Ekachai Jenwitheesuk

Computational aspects of structural genomics D. ab initio prediction C. fold recognition * * * * * * * * * * B. comparative modelling A. sequence space * * * * * * * * * * * * E. target selection targets F. analysis * * (Figure idea by Steve Brenner.)

Computational aspects of functional genomics structure based methods microenvironment analysis zinc binding site? structure comparison homology function? sequence based methods sequence comparison motif searches phylogenetic profiles domain fusion analyses + experimental data single molecule + genomic/proteomic + * * * * Bioverse * * assign function to entire protein space

Bioverse – explore relationships among molecules and systems Jason McDermott

Bioverse – explore relationships among molecules and systems Jason Mcdermott

Bioverse – prediction of protein interaction networks Jason Mcdermott Interacting protein database protein α protein β experimentally determined interaction Target proteome protein A 85% predicted interaction protein B 90% Assign confidence based on similarity and strength of interaction

Bioverse – E. coli predicted protein interaction network Jason McDermott

Bioverse – M. tuberculosis predicted protein interaction network Jason McDermott

Bioverse – C. elegans predicted protein interaction network Jason McDermott

Bioverse – H. sapiens predicted protein interaction network Jason McDermott

Bioverse – organisation of the interaction networks Jason McDermott C i = 2n/k i (k i -1)

Jason McDermottDefense-related proteins Bioverse – mapping pathways on the rice predicted network

Jason McDermottTryptophan biosynthesis

Bioverse – network-based annotation for C. elegans Jason McDermott

Bioverse – H. sapiens protein-protein similarity network

Bioverse – viewer Aaron Chang

Future directions Network connection with multiple ethernet cards based on traffic analysis Gigabit ethernet (switches are still expensive) Better network filesystems

Take home message Prediction of protein structure and function can be used to model whole genomes to understand organismal function and evolution

Acknowledgements Aaron Chang Ashley Lam Ekachai Jenwitheesuk Gong Cheng Jason McDermott Kai Wang Ling-Hong Hung Lynne Townsend Marissa LaMadrid Mike Inouye Stewart Moughon Shing-Chung Ngan Yi-Ling Cheng Zach Frazier National Institutes of Health National Science Foundation Searle Scholars Program (Kinship Foundation) UW Advanced Technology Initative in Infectious Diseases