Modelling genome structure and function - a practical approach Ram Samudrala University of Washington.

Slides:



Advertisements
Similar presentations
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Advertisements

Protein Structure Prediction using ROSETTA
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structural bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Protein Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Thomas Blicher Center for Biological Sequence Analysis
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.
Protein Structural Prediction. Protein Structure is Hierarchical.
Current Status of Homology Modeling Using MCSG Structures 319 MCSG structures in PDB have over 400,000 sequence homologues. These structures represent.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Modelling, comparison, and analysis of proteomes Ram Samudrala University of Washington.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
Representations of Molecular Structure: Bonds Only.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Protein Structure Prediction Ram Samudrala University of Washington.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
An Integrated Computational Framework for Systems Biology Ram Samudrala University of Washington How does the genome of an organism specify its behaviour.
Structure prediction: Homology modeling
Computational engineering of bionanostructures Ram Samudrala University of Washington How can we analyse, design, & engineer peptides capable of specific.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.
Programme Last week’s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Summary.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Modelling proteomes Ram Samudrala Department of Microbiology How does the genome of an organism specify its behaviour and characteristics?
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Modelling proteomes Ram Samudrala University of Washington.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
Structure/function studies of HIV proteins HIV gp120 V3 loop modelling using de novo approaches HIV protease-inhibitor binding energy prediction.
Ab-initio protein structure prediction ? Chen Keasar BGU Any educational usage of these slides is welcomed. Please acknowledge.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Automated Structure Prediction using Robetta in CASP11 Baker Group David Kim, Sergey Ovchinnikov, Frank DiMaio.
Chapter 14 Protein Structure Classification
Protein Structure Visualisation
University of Washington
Modelling the rice proteome
University of Washington
Protein Structure Prediction and Protein Homology modeling
Prediction of Protein Structure and Function on a Proteomic Scale
Rosetta: De Novo determination of protein structure
University of Washington
Homology Modeling.
Protein structure prediction.
Programme Last week’s quiz results + Summary
Protein Homology Modelling
Protein structure prediction
Presentation transcript:

Modelling genome structure and function - a practical approach Ram Samudrala University of Washington

Given an arbitrary protein sequence, how can we best construct a model using current methods for protein structure prediction? How can we use these models to understand function? As a demonstration, I requested sequences from biologists for which an experimental structure has not been determined Selected 3 sequences: dhfr-ts: 608 aa, dihydrofolate reductase/thymidylate synthase, P. falciparum invb: 135 aa, type III chaperone, S. typhimurium drae: 141 aa, dr. adhesin, E. coli Problem Demonstration

Installation of RAMP: Connectivity to the Internet Installation of other software as required: clustalx, clustalw - dssp - tinker – Requirements Conventions Web addresses are indicated by underlined italics Commands are indicated in fixed-width font \ is used to indicate continuation of a command

Search for related proteins with known structures Use CAFASP Meta Server: Meta Server submits hits to several servers, including PDB-BLAST, SAM, genthreader, INBGU, PSIPRED, JPRED, etc. drae, invb – no hits to proteins of known structure dhfr-ts – many hits to proteins of known structure

Ab initio prediction for non-homologous proteins (invb, drae) Sample astronomically large number of conformations 5 states/100 residues = = select hard to design functions that are not fooled by non-native conformations (“decoys”)

Semi-exhaustive segment-based folding EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK generate fragments from database 14-state ,  model …… minimise monte carlo with simulated annealing conformational space annealing, GA …… filter all-atom pairwise interactions, bad contacts compactness, secondary structure

Predict secondary structure using PSIPRED Submit sequences to PSIPRED server: Save PSIPRED results in a file (ex: drae.psipred, invb.psipred ) Create a DSSP file from PSIPRED prediction: dssp_to_ss invb.psipred invb.predict.dssp dssp_to_ss drae.psipred drae.predict.dssp Edit and remove low-confidence (< 7) assignments Make sure at least three residues are present between secondary structure elements

Sampling commands and details Place sequence in a single file as a single line – invb.sequence Create a 3-tuplet database file for invb: prune_ntuplet_db invb.sequence $RAMP_ROOT/ramp/lib/scop_e4_xray.3tuplet_db \ 3 invb.3tuplet_db The large database file ( scop_e4_xray.3tuplet_db ) was created using mcgen_ntuplets from a set of PDB files and is available as part of the RAMP distribution. To recreate, mcgen_ntuplets needs to be run on a set of protein conformations. Similarly, scop_e4_allatoms_xray_scores below was created by running compile_raw_counts and compile_scores on a list of protein conformations. There is usually no need to re-compile these files. Run structure prediction program: mcgen_semfold_ss invb.sequence \ $RAMP_ROOT/lib/scoring_functions/scop_e4_allatoms_xray_scores \ invb.predict.dssp invb.3tuplet_db run is the starting seed for the random number generator; different seeds will produce different trajectories

Sampling commands and details continued Run on many processors with different starting seeds Minimise all the conformations using ENCAD (implemented in TINKER) Repeat for drae Store conformations in a file: list

Selection commands and details Run dssp on all conformations Run potential, hcf, electrostatics, vdw, ss_scores, density_scores, Shell Combine different scores after normalisation (divide by the standard deviation) Look at the best scoring structures for each function and determine consensus Write scripts to automate above process (not currently available as part of the RAMP distribution) For drae, look at best conformations with disulphide bonds between the single pair of cysteines present: constraints_filter list constraints-file contains one line: constraint

Current performance of our ab initio prediction methods Consistently predicted correct topology at CASP4 - ~4-6.0 Å for residues **T110/rbfa – 4.0 Å (80 residues; 1-80)*T114/afp1 – 6.5 Å (45 residues; 36-80) **T97/er29 – 6.0 Å (80 residues; 18-97) **T106/sfrp3 – 6.2 Å (70 residues; 6-75) *T98/sp0a – 6.0 Å (60 residues; )**T102/as48 – 5.3 Å (70 residues; 1-70)

Best predictions for invb

Mapping mutations on invb predictions Binding to sspa, type III effector, actin nucleation PHE122 PHE83 ASP36 LEU98 PRO41 LEU98

Best predictions for drae

Mapping mutations on drae predictions Binding to daf, decay accelarating factor ASP65 THR10 ASN79 THR131 ASP63 ILE75

Comparative modelling for homologous proteins (dhfr-ts) KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** …… align build initial model construct non-conserved side chains and main chains refine

Generate alignments using Meta Server and clustalx – ts domain

Template choice: 1hvy-A, 56% identity, 1.9 Å resolution, 0.20 R Best e-value as per SAM output Model residues

Generate alignments using Meta Server and clustalx – dhfr domain

Template choice:1drf, 34 % identity, 2.0 Å resolution, 0.19 R Best e-value as per SAM output Model Residues 1-230

Construct initial models Build dhfr and ts domains separately and then build linker region (residues ) Construct initial models scgen_mutate 1dfr.pdb 1dfr-dhfr.alignment scgen_mutate 1hvy-A.pdb 1hvy-A-ts.alignment

Build non-conserved main chains (loops) Delineate loop regions within domains: dhfr: 1-26, 64-93, , , ts: Constuct loop data files for each loop. Example: MMGN ts.v000.pdb none 0 constraint Need to run prune_ntuplet_db for the loop sequence Build loops sequentially using this command for each loop: mcgen_semfold_loop \ $RAMP_ROOT/lib/scop_e4_allatoms_xray_scores \ loop Minimise conformation after building each loop to relieve bumps

Put the two domains together – construct linker region Complicated Do ab initio simulation, but keep the conformations of the two domains (as modelled by comparative modelling) fixed and only let the linker region (residues ) vary Need to use idealised geometry; fit torsion angles in each model in the final set of conformations to 14 discrete φ/χ states: mcgen_fit –syssearch.14 Create a DSSP file containing the φ/χ angles for the two domains mcgen_semfold_region_ss dhfr_ts.sequence \ $RAMP_ROOT/lib/scoring_functions/scop_e4_allatoms_xray_scores \ dhfr_ts.dssp dhfr_ts.3tuplet_db run Sample as many conformations as computational resources permit Select final conformations using an identical strategy to the one employed for invb and drae

Current performance of our comparative modelling methods Overall model accuracy at CASP4 ranging from 1 Å to 6 Å for 50-10% identity Approximately 75% of side chain χ 1 angles within 30º Short (<= 6 aa) loops predicted to within 1.0 Å global Cα RMSD Longer loops (7-12 aa) loops predicted to within 3.0 Å global Cα RMSD **T112/dhso – 4.9 Å (348 residues; 24%)**T92/yeco – 5.6 Å (104 residues; 12%) **T128/sodm – 1.0 Å (198 residues; 50%) **T125/sp18 – 4.4 Å (137 residues; 24%) **T111/eno – 1.7 Å (430 residues; 51%)**T122/trpa – 2.9 Å (241 residues; 33%)

Prediction for dhfr domain Expect: < 2.0 Å Cα RMSD

Prediction for ts domain Expect: < 4.0 Å Cα RMSD

Prediction for dhfr-ts entire sequence (sample)

Mixing and matching between the best possibilities More than one model can be produced (different templates, parametres, etc.) Select best parts of each model and mix and match between models in an exhausitive manner using a graph theoretic approach: cf_single $RAMP_ROOT/lib/scop_e4_allatoms_xray_scores is a list of filenames each of which contains all or a portion of a model

A graph theoretic representation of protein structure -0.6 (V 1 ) -1.0 (F) -0.7 (K) -0.5 (I) -0.9 (V 2 ) weigh nodes -0.5 (I)-0.9 (V 2 ) -1.0 (F) -0.7 (K) find cliques W = -4.5 represent residues as nodes -0.5 (I) -0.6 (V 1 ) -0.9 (V 2 ) -1.0 (F) -0.7 (K) construct graph -0.1

Computational aspects of structural genomics D. ab initio prediction C. fold recognition * * * * * * * * * * B. comparative modelling A. sequence space * * * * * * * * * * * * E. target selection targets F. analysis * * (Figure idea by Steve Brenner.)

Computational aspects of functional genomics structure based methods microenvironment analysis zinc binding site? structure comparison homology function? sequence based methods sequence comparison motif searches phylogenetic profiles domain fusion analyses + experimental data + * * * * G. assign function * * assign function to entire protein space

Conclusions: structure Ab initio prediction can produce low resolution models that may aid gross functional studies Comparative modelling can produce high resolution models that can be used to study detailed function Large scale structure prediction will complement experimental structural genomics efforts

Conclusions:function Detailed analysis of structures can be used to predict protein function, complementing experimental and sequence based techniques Structure comparisons and microenvironment analyses can be used to prediction function on a genome-wide scale Large scale function prediction will complement experimental functional genomics efforts

Take home message Prediction of protein structure and function can be used to model whole genomes to understand organismal function and evolution Michael Levitt, Stanford University John Moult, CARB Patrice Koehl, Stanford University Yu Xia, Stanford Univeristy Levitt and Moult groups Acknowledgements