Modelling genome structure and function - a practical approach Ram Samudrala University of Washington.

Modelling genome structure and function - a practical approach Ram Samudrala University of Washington

Given an arbitrary protein sequence, how can we best construct a model using current methods for protein structure prediction? How can we use these models to understand function? As a demonstration, I requested sequences from biologists for which an experimental structure has not been determined Selected 3 sequences: dhfr-ts: 608 aa, dihydrofolate reductase/thymidylate synthase, P. falciparum invb: 135 aa, type III chaperone, S. typhimurium drae: 141 aa, dr. adhesin, E. coli Problem Demonstration

Installation of RAMP: http://compbio.washington.edu/ramp/ Connectivity to the Internet Installation of other software as required: clustalx, clustalw - http://www-igbmc-strasbg.fr/Bioinfo/ClustalX/ dssp - http://www.cmbi.kun.nl/swift/dssp tinker – http://dasher.wustl.edu/tinker/ Requirements Conventions Web addresses are indicated by underlined italics Commands are indicated in fixed-width font \ is used to indicate continuation of a command

Search for related proteins with known structures Use CAFASP Meta Server: http://bioinfo.pl/meta/ Meta Server submits hits to several servers, including PDB-BLAST, SAM, genthreader, INBGU, PSIPRED, JPRED, etc. drae, invb – no hits to proteins of known structure dhfr-ts – many hits to proteins of known structure

Ab initio prediction for non-homologous proteins (invb, drae) Sample astronomically large number of conformations 5 states/100 residues = 5 100 = 10 70 select hard to design functions that are not fooled by non-native conformations (“decoys”)

Semi-exhaustive segment-based folding EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK generate fragments from database 14-state ,  model …… minimise monte carlo with simulated annealing conformational space annealing, GA …… filter all-atom pairwise interactions, bad contacts compactness, secondary structure

Predict secondary structure using PSIPRED Submit sequences to PSIPRED server: http://insulin.brunel.ac.uk/psipred/ Save PSIPRED results in a file (ex: drae.psipred, invb.psipred ) Create a DSSP file from PSIPRED prediction: dssp_to_ss invb.psipred invb.predict.dssp dssp_to_ss drae.psipred drae.predict.dssp Edit and remove low-confidence (< 7) assignments Make sure at least three residues are present between secondary structure elements

Sampling commands and details Place sequence in a single file as a single line – invb.sequence Create a 3-tuplet database file for invb: prune_ntuplet_db invb.sequence $RAMP_ROOT/ramp/lib/scop_e4_xray.3tuplet_db \ 3 invb.3tuplet_db The large database file ( scop_e4_xray.3tuplet_db ) was created using mcgen_ntuplets from a set of PDB files and is available as part of the RAMP distribution. To recreate, mcgen_ntuplets needs to be run on a set of protein conformations. Similarly, scop_e4_allatoms_xray_scores below was created by running compile_raw_counts and compile_scores on a list of protein conformations. There is usually no need to re-compile these files. Run structure prediction program: mcgen_semfold_ss invb.sequence \ $RAMP_ROOT/lib/scoring_functions/scop_e4_allatoms_xray_scores \ invb.predict.dssp invb.3tuplet_db run 10 50000 is the starting seed for the random number generator; different seeds will produce different trajectories

Sampling commands and details continued Run on many processors with different starting seeds Minimise all the conformations using ENCAD (implemented in TINKER) Repeat for drae Store conformations in a file: list

Selection commands and details Run dssp on all conformations Run potential, hcf, electrostatics, vdw, ss_scores, density_scores, Shell Combine different scores after normalisation (divide by the standard deviation) Look at the best scoring structures for each function and determine consensus Write scripts to automate above process (not currently available as part of the RAMP distribution) For drae, look at best conformations with disulphide bonds between the single pair of cysteines present: constraints_filter list constraints-file contains one line: constraint 21 53 6.0 2.0

Current performance of our ab initio prediction methods Consistently predicted correct topology at CASP4 - ~4-6.0 Å for 60-80+ residues **T110/rbfa – 4.0 Å (80 residues; 1-80)*T114/afp1 – 6.5 Å (45 residues; 36-80) **T97/er29 – 6.0 Å (80 residues; 18-97) **T106/sfrp3 – 6.2 Å (70 residues; 6-75) *T98/sp0a – 6.0 Å (60 residues; 37-105)**T102/as48 – 5.3 Å (70 residues; 1-70)

Best predictions for invb

Mapping mutations on invb predictions Binding to sspa, type III effector, actin nucleation PHE122 PHE83 ASP36 LEU98 PRO41 LEU98

Best predictions for drae

Mapping mutations on drae predictions Binding to daf, decay accelarating factor ASP65 THR10 ASN79 THR131 ASP63 ILE75

Comparative modelling for homologous proteins (dhfr-ts) KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** …… align build initial model construct non-conserved side chains and main chains refine

Generate alignments using Meta Server and clustalx – ts domain

Template choice: 1hvy-A, 56% identity, 1.9 Å resolution, 0.20 R Best e-value as per SAM output Model residues 324-608

Generate alignments using Meta Server and clustalx – dhfr domain

Template choice:1drf, 34 % identity, 2.0 Å resolution, 0.19 R Best e-value as per SAM output Model Residues 1-230

Construct initial models Build dhfr and ts domains separately and then build linker region (residues 231-323) Construct initial models scgen_mutate 1dfr.pdb 1dfr-dhfr.alignment scgen_mutate 1hvy-A.pdb 1hvy-A-ts.alignment

Build non-conserved main chains (loops) Delineate loop regions within domains: dhfr: 1-26, 64-93, 131-133, 153-155, 207-209 ts: 15-19 Constuct loop data files for each loop. Example: 15 19 MMGN ts.v000.pdb none 0 constraint 19 20 3.80 0.5 Need to run prune_ntuplet_db for the loop sequence Build loops sequentially using this command for each loop: mcgen_semfold_loop \ $RAMP_ROOT/lib/scop_e4_allatoms_xray_scores \ loop. 10 1000000 Minimise conformation after building each loop to relieve bumps

Put the two domains together – construct linker region Complicated Do ab initio simulation, but keep the conformations of the two domains (as modelled by comparative modelling) fixed and only let the linker region (residues 231-323) vary Need to use idealised geometry; fit torsion angles in each model in the final set of conformations to 14 discrete φ/χ states: mcgen_fit –syssearch.14 Create a DSSP file containing the φ/χ angles for the two domains mcgen_semfold_region_ss dhfr_ts.sequence \ $RAMP_ROOT/lib/scoring_functions/scop_e4_allatoms_xray_scores \ dhfr_ts.dssp dhfr_ts.3tuplet_db run 10 50000 231-323 Sample as many conformations as computational resources permit Select final conformations using an identical strategy to the one employed for invb and drae

Current performance of our comparative modelling methods Overall model accuracy at CASP4 ranging from 1 Å to 6 Å for 50-10% identity Approximately 75% of side chain χ 1 angles within 30º Short (<= 6 aa) loops predicted to within 1.0 Å global Cα RMSD Longer loops (7-12 aa) loops predicted to within 3.0 Å global Cα RMSD **T112/dhso – 4.9 Å (348 residues; 24%)**T92/yeco – 5.6 Å (104 residues; 12%) **T128/sodm – 1.0 Å (198 residues; 50%) **T125/sp18 – 4.4 Å (137 residues; 24%) **T111/eno – 1.7 Å (430 residues; 51%)**T122/trpa – 2.9 Å (241 residues; 33%)

Prediction for dhfr domain Expect: < 2.0 Å Cα RMSD

Prediction for ts domain Expect: < 4.0 Å Cα RMSD

Prediction for dhfr-ts entire sequence (sample)

Mixing and matching between the best possibilities More than one model can be produced (different templates, parametres, etc.) Select best parts of each model and mix and match between models in an exhausitive manner using a graph theoretic approach: cf_single $RAMP_ROOT/lib/scop_e4_allatoms_xray_scores is a list of filenames each of which contains all or a portion of a model

A graph theoretic representation of protein structure -0.6 (V 1 ) -1.0 (F) -0.7 (K) -0.5 (I) -0.9 (V 2 ) weigh nodes -0.5 (I)-0.9 (V 2 ) -1.0 (F) -0.7 (K) -0.3 -0.4 -0.2 -0.1 find cliques W = -4.5 represent residues as nodes -0.5 (I) -0.6 (V 1 ) -0.9 (V 2 ) -1.0 (F) -0.7 (K) -0.3 -0.4 -0.2 -0.1 -0.2 construct graph -0.1

Computational aspects of structural genomics D. ab initio prediction C. fold recognition * * * * * * * * * * B. comparative modelling A. sequence space * * * * * * * * * * * * E. target selection targets F. analysis * * (Figure idea by Steve Brenner.)

Computational aspects of functional genomics structure based methods microenvironment analysis zinc binding site? structure comparison homology function? sequence based methods sequence comparison motif searches phylogenetic profiles domain fusion analyses + experimental data + * * * * G. assign function * * assign function to entire protein space

Conclusions: structure Ab initio prediction can produce low resolution models that may aid gross functional studies Comparative modelling can produce high resolution models that can be used to study detailed function Large scale structure prediction will complement experimental structural genomics efforts

Conclusions:function Detailed analysis of structures can be used to predict protein function, complementing experimental and sequence based techniques Structure comparisons and microenvironment analyses can be used to prediction function on a genome-wide scale Large scale function prediction will complement experimental functional genomics efforts

Take home message Prediction of protein structure and function can be used to model whole genomes to understand organismal function and evolution Michael Levitt, Stanford University John Moult, CARB Patrice Koehl, Stanford University Yu Xia, Stanford Univeristy Levitt and Moult groups Acknowledgements

Modelling genome structure and function - a practical approach Ram Samudrala University of Washington.

Similar presentations

Presentation on theme: "Modelling genome structure and function - a practical approach Ram Samudrala University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modelling genome structure and function - a practical approach Ram Samudrala University of Washington.

Similar presentations

Presentation on theme: "Modelling genome structure and function - a practical approach Ram Samudrala University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback