Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington

Examples of biological problems Protein structure prediction/docking simulations - need to run different trajectories that sometimes talk with each other Molecular dynamics simulations - need more cohesive parallelisation Polarisable force fields - need true parallelisation Bioinformatics searches/exploration - trivially parallelisable

Computational issues Need efficient methods to start/stop jobs Need load/balancing queuing system Need fast communications at times Need stability (months/years uptimes) Need low maintainance/management overhead Need low installation overhead Needs to be cheap!

Hardware and operating system 256 AMD and Intel CPUs (1-2.5 GHz) 0.5-1 GB RAM, 100-200 GB HD, dual processor MBs 100Mbps ethernet connectivity for 64 processor sets White boxes are good but use up space – 1u racks ideal Minimal Linux installation – create clone “CD” – copy on all machines

Our solution No single solution – user implements their own Completely decentralised Analyse problem and determine parallelisable parts Implementation specific to problem Use local scratch space for computation Redundant storage of data for faster access Limit problem space to specific problems

Problem specific implementation MCSA/GA: socket-based communication of trajectories; multiple trajectories on different CPUs Docking: sample different ligands/regions of the protein on different CPUs MD: Pairwise force-fields are additive PFF: ? Bioinformatics: trivial parallelisation; communication by disk

Modelling proteomes Ram Samudrala University of Washington

What is a “proteome”? All proteins of a particular system (organelle, cell, organism) What does it mean to “model a proteome”? For any protein, we wish to: - figure out what it looks like (structure or form) - understand what it does (function) Repeat for all proteins in a system Understand the relationships between all of them ANNOTATION { EXPRESSION + INTERACTION }

Protein folding …-L-K-E-G-V-S-K-D-… …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… one amino acid DNA protein sequence unfolded protein native state spontaneous self-organisation (~1 second) not unique mobile inactive expanded irregular

Protein folding …-L-K-E-G-V-S-K-D-… …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… one amino acid DNA protein sequence unfolded protein native state spontaneous self-organisation (~1 second) unique shape precisely ordered stable/functional globular/compact helices and sheets not unique mobile inactive expanded irregular

De novo prediction of protein structure sample conformational space such that native-like conformations are found astronomically large number of conformations 5 states/100 residues = 5 100 = 10 70 select hard to design functions that are not fooled by non-native conformations (“decoys”)

Semi-exhaustive segment-based folding EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK generate fragments from database 14-state ,  model …… minimise monte carlo with simulated annealing conformational space annealing, GA …… filter all-atom pairwise interactions, bad contacts compactness, secondary structure

CASP5 prediction for T138 4.6 Å Cα RMSD for 84 residues

CASP5 prediction for T170 4.8 Å Cα RMSD for all 69 residues

Comparative modelling of protein structure KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** …… scan align refine physical functions build initial model minimum perturbation construct non-conserved side chains and main chains graph theory, semfold de novo simulation

CASP5 prediction for T129 1.0 Å Cα RMSD for 133 residues (57% id)

Prediction of SARS CoV proteinase inhibitors Ekachai Jenwitheesuk

Computational aspects of structural genomics D. ab initio prediction C. fold recognition * * * * * * * * * * B. comparative modelling A. sequence space * * * * * * * * * * * * E. target selection targets F. analysis * * (Figure idea by Steve Brenner.)

Computational aspects of functional genomics structure based methods microenvironment analysis zinc binding site? structure comparison homology function? sequence based methods sequence comparison motif searches phylogenetic profiles domain fusion analyses + experimental data single molecule + genomic/proteomic + * * * * Bioverse * * assign function to entire protein space

Bioverse – explore relationships among molecules and systems Jason McDermott http://bioverse.compbio.washington.edu

Bioverse – explore relationships among molecules and systems Jason Mcdermott

Bioverse – prediction of protein interaction networks Jason Mcdermott Interacting protein database protein α protein β experimentally determined interaction Target proteome protein A 85% predicted interaction protein B 90% Assign confidence based on similarity and strength of interaction

Bioverse – E. coli predicted protein interaction network Jason McDermott

Bioverse – M. tuberculosis predicted protein interaction network Jason McDermott

Bioverse – C. elegans predicted protein interaction network Jason McDermott

Bioverse – H. sapiens predicted protein interaction network Jason McDermott

Bioverse – organisation of the interaction networks Jason McDermott C i = 2n/k i (k i -1)

Jason McDermottDefense-related proteins Bioverse – mapping pathways on the rice predicted network

Jason McDermottTryptophan biosynthesis

Bioverse – network-based annotation for C. elegans Jason McDermott

Bioverse – H. sapiens protein-protein similarity network

Bioverse – viewer Aaron Chang

Future directions Network connection with multiple ethernet cards based on traffic analysis Gigabit ethernet (switches are still expensive) Better network filesystems

Take home message Prediction of protein structure and function can be used to model whole genomes to understand organismal function and evolution

Acknowledgements Aaron Chang Ashley Lam Ekachai Jenwitheesuk Gong Cheng Jason McDermott Kai Wang Ling-Hong Hung Lynne Townsend Marissa LaMadrid Mike Inouye Stewart Moughon Shing-Chung Ngan Yi-Ling Cheng Zach Frazier National Institutes of Health National Science Foundation Searle Scholars Program (Kinship Foundation) UW Advanced Technology Initative in Infectious Diseases

Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Similar presentations

Presentation on theme: "Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.

Similar presentations

Presentation on theme: "Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback