Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lab 9.3a: Homology Modeling

Similar presentations


Presentation on theme: "Lab 9.3a: Homology Modeling"— Presentation transcript:

1 Lab 9.3a: Homology Modeling
Boris Steipe February 2003 Lab 9.3a: Homology Modeling Boris Steipe Departments of Biochemistry and Molecular and Medical Genetics Program in Proteomics and Bioinformatics University of Toronto 9.3a © 2002 CBW/CGDN

2 Concepts Sequence alignment is the single most important step in homology modeling. Reasons to model need to be defined. Fully automated homology modeling services perform well. SwissModel in practice. 9.3a

3 Boris Steipe February 2003 Concept 1: Sequence alignment is the single most important step in homology modeling. Where are we with respect to our objectives ? 9.3a © 2002 CBW/CGDN

4 What is conserved in structure?
Boris Steipe February 2003 What is conserved in structure? E-E.coli [...] IKTRFAPSPTGYLHVGGARTA [...] EQMAKGE----KPRYDGRC [...] AHVSMINGDDGKKLSKRH E-P.putida [...] VRTRIAPSPTGDPHVGTAYIA [...] EQQARGE----TPRYDGRA [...] CYMPLLRNPDKSKLSKRK Q-E.coli [...] VHTRFPPEPNGYLHIGHAKSI [...] TLTQPGKNSPYRDRSVEEN [...] YEFSRL-NLEYTVMSKRK Q-Fly [...] VHTRFPPEPNGILHIGHAKAI [...] FNPKPS---PWRERPIEES [...] WEYGRL-NMNYALVSKRK Q-Human [...] VRTRFPPEPNGILHIGHAKAI [...] HNTLPS---PWRDRPMEES [...] WEYGRL-NLHYAVVSKRK E-Fly [...] VVVRFPPEASGYLHIGHAKAA [...] QRVE----SANRSNSVEKN [...] WSYSRL-NMTNTVLSKRK E-Human [...] VTVRFPPEASGYLHIGHAKAA [...] QRIE----SKHRKNPIEKN [...] WEYSRL-NLNNTVLSKRK E-Yeast [...] VVTRFPPEPSGYLHIGHAKAA [...] DGVA----SARRDRSVEEN [...] WDFARI-NFVRTLLSKRK ATP-Binding | || | || || | QRS E. coli vs. ERS P. putida: ~ 19% ID Many regions are expected to be highly conserved in structure. Some changes should be straightforward to model. 9.3a © 2002 CBW/CGDN

5 What is conserved in structure?
Boris Steipe February 2003 What is conserved in structure? E-E.coli [...] IKTRFAPSPTGYLHVGGARTA [...] EQMAKGE----KPRYDGRC [...] AHVSMINGDDGKKLSKRH E-P.putida [...] VRTRIAPSPTGDPHVGTAYIA [...] EQQARGE----TPRYDGRA [...] CYMPLLRNPDKSKLSKRK Q-E.coli [...] VHTRFPPEPNGYLHIGHAKSI [...] TLTQPGKNSPYRDRSVEEN [...] YEFSRL-NLEYTVMSKRK Q-Fly [...] VHTRFPPEPNGILHIGHAKAI [...] FNPKPS---PWRERPIEES [...] WEYGRL-NMNYALVSKRK Q-Human [...] VRTRFPPEPNGILHIGHAKAI [...] HNTLPS---PWRDRPMEES [...] WEYGRL-NLHYAVVSKRK E-Fly [...] VVVRFPPEASGYLHIGHAKAA [...] QRVE----SANRSNSVEKN [...] WSYSRL-NMTNTVLSKRK E-Human [...] VTVRFPPEASGYLHIGHAKAA [...] QRIE----SKHRKNPIEKN [...] WEYSRL-NLNNTVLSKRK E-Yeast [...] VVTRFPPEPSGYLHIGHAKAA [...] DGVA----SARRDRSVEEN [...] WDFARI-NFVRTLLSKRK ATP-Binding | || | || || | How would sidechain rotamers be modeled? - conserved dihedral angles - preferred rotamers - DEE (Dead End Elimination theorem) for global consistency. 9.3a © 2002 CBW/CGDN

6 Homology Modeling Issues
Boris Steipe February 2003 Homology Modeling Issues E-E.coli [...] IKTRFAPSPTGYLHVGGARTA [...] EQMAKGE----KPRYDGRC [...] AHVSMINGDDGKKLSKRH E-P.putida [...] VRTRIAPSPTGDPHVGTAYIA [...] EQQARGE----TPRYDGRA [...] CYMPLLRNPDKSKLSKRK Q-E.coli [...] VHTRFPPEPNGYLHIGHAKSI [...] TLTQPGKNSPYRDRSVEEN [...] YEFSRL-NLEYTVMSKRK Q-Fly [...] VHTRFPPEPNGILHIGHAKAI [...] FNPKPS---PWRERPIEES [...] WEYGRL-NMNYALVSKRK Q-Human [...] VRTRFPPEPNGILHIGHAKAI [...] HNTLPS---PWRDRPMEES [...] WEYGRL-NLHYAVVSKRK E-Fly [...] VVVRFPPEASGYLHIGHAKAA [...] QRVE----SANRSNSVEKN [...] WSYSRL-NMTNTVLSKRK E-Human [...] VTVRFPPEASGYLHIGHAKAA [...] QRIE----SKHRKNPIEKN [...] WEYSRL-NLNNTVLSKRK E-Yeast [...] VVTRFPPEPSGYLHIGHAKAA [...] DGVA----SARRDRSVEEN [...] WDFARI-NFVRTLLSKRK ATP-Binding | || | || || | How would you (or should you even) model indels? - Where should the insertion be placed? - What is the conformation of the new residues? - Which residues should be deleted? - How many additional residues need to change conformation? 9.3a © 2002 CBW/CGDN

7 Alignment is the limiting step for homology model accuracy
Boris Steipe February 2003 Alignment is the limiting step for homology model accuracy No amount of forcefield minimization will put a misaligned residue in the right place ! CASP4: Williams MG et al. (2001) Proteins Suppl.5: 92-97 9.3a © 2002 CBW/CGDN

8 Superposition vs. Alignment
Boris Steipe February 2003 Superposition vs. Alignment The coordinates of two proteins can be superimposed in space. An alignment may be derived from a superposition by correlating residues that are close in space. An optimal sequence alignment may lead to a different alignment ... 1GTR vs 2TS1 9.3a © 2002 CBW/CGDN

9 Superposition vs. Alignment
Boris Steipe February 2003 TyrRS ERVTLYCGFDPTAdS--LHIGHLATILTMRRFQQAGHRPIALVGGAtgligdpsgkkser | | | ||||| | | | | | 1GTR TTVHTRFPPEPNG-YLHIGHAKSICL--NF GIAqDYKGQCN-- | | ||||| | 2TS ERVTLYCGFDPTAdSLHIGHLATILT--MR RFQ-QAGHRPI-- TyrRS tlnaketVEAWSARIKEQLgrfldfeadgnpa k IKN | | | || | ||| 1GTR LRFD-DTnpv keDIEYVESIKN || 2TS ALVG-GAtgligdpsgkksertlnaketVEAWSARIKE TyrRS NYDWIgpldvitflrdvgk----hfsvnymmakesvqsrietgisftefsYMMLQAYDFL | | | | | | | 1GTR 26 DVewl gf----hwsgnVRYSSD YFdql | 2TS1 29 QLgrf ldfeadgnpakIKNNYD WIgpl TyrRS RLYetegCRLQIGGSDQwgnitaGL ELIRKTKgearAFGLTIPLV | | | || | || | | | 1GTR 26 hayaie linkglayvdeltpeqireyrgtltqpgknspyrdrsveen 2TS1 29 dvitfl rdvgkhfsvnym TyrRS 1GTR 26 lalfekmraggfeegkaclrakidmaspfivmrdpvlyrikfaehhqtgnkwciypmYDF 2TS makesvqsrietgisftefsYMM 1GTR 26 THCISDALEG----ITHSLCTLEFqdnrrlYDWVLDNITipvhPRQYEFSRL 262 2TS1 29 LQAYDFLRLYetegCRLQIGGSDQwgnitaGLELIRKTKgearAFGLTIPLV 223 Superposition vs. Alignment Example: structural vs. sequence alignment between E. coli GlnRS and G. stearothermophilus TyrRS. Although the optimal sequence alignment is not unreasonable (19% ID = 40/212 residues), comparison with the structure shows it is actually wrong for all but 11 residues ! The structure based alignment is quite dissimilar in sequence ( 4.5%ID = 12/265 residues) but the superposition actually matches 39% of residues ( 104/265 ) over the length of the domain. 9.3a © 2002 CBW/CGDN

10 Inserts may be accomodated in a distant part of the structure
Boris Steipe February 2003 Inserts may be accomodated in a distant part of the structure Example - a five residue insert Sequence aligment (shows what happened) gktlit nfsqehip gktlisflyeqnfsqehip Structure alignment (shows how it's accomodated) gktlitnfsq ehip a-helix 9.3a © 2002 CBW/CGDN

11 Boris Steipe February 2003 Off by 1, Off by 4 3.8Å A shift in alignment of 1 residue corresponds to a skew in the modeled structure of about 4 Å (3.8 Å is the inter-alpha carbon distance) Nothing you can do AFTER an alignment will fix this error (not even molecular dynamics). 9.3a © 2002 CBW/CGDN

12 Indels (inserts or deletions)
Boris Steipe February 2003 Indels (inserts or deletions) Observations of known similarities in structures demonstrate that uniform gap penalty assumptions are NOT BIOLOGICAL. Indels are most often observed in loops, less often in secondary structure elements When they do not occur in loops, there is usually a maintenance of helical or strand properties. 9.3a © 2002 CBW/CGDN

13 Can we do better with the gap assumption?
Boris Steipe February 2003 Can we do better with the gap assumption? Required: position specific gap penalties One approach: implemented in Clustal as secondary structure masks Get secondary structure information, convert it to Clustal mask format. (Easy - read documentation !) 9.3a © 2002 CBW/CGDN

14 Secondary structure from PDB ....
Boris Steipe February 2003 Secondary structure from PDB .... (Algorithm ?) 9.3a © 2002 CBW/CGDN

15 Secondary structure from RasMol ....
Boris Steipe February 2003 Secondary structure from RasMol .... (DSSP !) 9.3a © 2002 CBW/CGDN

16 Reasons to model need to be defined.
Boris Steipe February 2003 Concept 2: Reasons to model need to be defined. Where are we with respect to our objectives ? 9.3a © 2002 CBW/CGDN

17 Use of homology models Biochemical inference from 3D similarity Bonds
Boris Steipe February 2003 Use of homology models Biochemical inference from 3D similarity Bonds Angles, plain and dihedral Surfaces, solvent accessibility Amino acid functions, presence in structure patterns Spatial relationship of residues to active site Spatial relationship to other residues Participation in function / mechanism Static and dynamic disorder Electrostatics Conservation patterns (structural and functional) Posttranslational modification sites (but not structural consequences!) Suitability as drug target Don't ! 9.3a © 2002 CBW/CGDN

18 Abuse of homology models
Boris Steipe February 2003 Abuse of homology models Modelling properties that cannot / will not be verified Analysing geometry of model Interpreting loop structures near indels Inferring relative domain arrangement Inferring structures of complexes 9.3a © 2002 CBW/CGDN

19 Databases of Models Don’t make models unless you check first...
Boris Steipe February 2003 Databases of Models Don’t make models unless you check first... Swiss-Model repository 64,000 models based on 4000 structures and Swiss-Prot proteins ModBase Made with "Modeller" - 15,000 reliable models for substantial segments of approximately 4,000 proteins in the genomes of Saccharomyces cerevisiae, Mycoplasma genitalium, Methanococcus jannaschii, Caenorhabditis elegans, and Escherichia coli. 9.3a © 2002 CBW/CGDN

20 Fully automated services perform well.
Boris Steipe February 2003 Concept 3: Fully automated services perform well. Where are we with respect to our objectives ? 9.3a © 2002 CBW/CGDN

21 Homology Modeling Process
TAR Boris Steipe February 2003 Homology Modeling Process PSI-BLAST Search nr (PDB) HOM TEM These are really two queries rolled into one procedure. TAR: Target sequence T-Coffee Align Search: Sequence database similarity search Cinema nr: non-redundant Genbank subset, (with annotated structures) MSA HOM: Homologous sequences SwissModel Model ExPDB TEM: Sequences of homologues with known structure LIG Align: Careful Multiple Sequence Alignment 3D MSA: Multiple Sequence Alignment TextEditor Model: Generate 3D Model Complete ExPDB: Modeling template structure database 3DC Complete: Add ligands, substrates etc. to model Analyse: Interpret and conclude RasMol Analyse Consurf PUB: Publish results 9.3a PUB © 2002 CBW/CGDN

22 Homology Modeling Software?
Boris Steipe February 2003 Homology Modeling Software? Freely available packages perform as good as commercial ones at CASP (Critical Assessment of Structure Prediction) Swiss Model (see your Integrated Assignment) Modeller ( 9.3a © 2002 CBW/CGDN

23 Swiss-Model steps: Search for sequence similarities BLASTP against
Boris Steipe February 2003 Swiss-Model steps: Search for sequence similarities BLASTP against EX-NRL 3D Peitsch M & Guex N (1997) Electrophoresis 18: 2714 9.3a © 2002 CBW/CGDN

24 Swiss-Model steps: Search for sequence similarities
Boris Steipe February 2003 Swiss-Model steps: Search for sequence similarities Evaluate suitable templates Identity: > 25% Expected model : > 20 resid. Peitsch M & Guex N (1997) Electrophoresis 18: 2714 9.3a © 2002 CBW/CGDN

25 Swiss-Model steps: Search for sequence similarities
Boris Steipe February 2003 Swiss-Model steps: Search for sequence similarities Evaluate suitable templates Generate structural alignments Select regions of similarity and match in coordinate-space (EXPDB). Peitsch M & Guex N (1997) Electrophoresis 18: 2714 9.3a © 2002 CBW/CGDN

26 Swiss-Model steps: Search for sequence similarities
Boris Steipe February 2003 Swiss-Model steps: Search for sequence similarities Evaluate suitable templates Generate structural alignments Average backbones Compute weighted average coordinates for backbone atoms expected to be in model. Peitsch M & Guex N (1997) Electrophoresis 18: 2714 9.3a © 2002 CBW/CGDN

27 Swiss-Model steps: Search for sequence similarities
Boris Steipe February 2003 Swiss-Model steps: Search for sequence similarities Evaluate suitable templates Generate structural alignments Average backbones Build loops Pick plausible loops from library, ligate to stems; if not possible, try combinatorial search. Peitsch M & Guex N (1997) Electrophoresis 18: 2714 9.3a © 2002 CBW/CGDN

28 Swiss-Model steps: Search for sequence similarities
Boris Steipe February 2003 Swiss-Model steps: Search for sequence similarities Evaluate suitable templates Generate structural alignments Average backbones Build loops Bridge incomplete backbones Bridge with overlapping pieces from pentapeptide fragment library, anchor with the terminal residues and add the three central residues. Peitsch M & Guex N (1997) Electrophoresis 18: 2714 9.3a © 2002 CBW/CGDN

29 Swiss-Model steps: Search for sequence similarities
Boris Steipe February 2003 Swiss-Model steps: Search for sequence similarities Evaluate suitable templates Generate structural alignments Average backbones Build loops Bridge incomplete backbones Rebuild sidechains Rebuild sidechains from rotamer library - complete sidechains first, then regenerate partial sidechains from probabilistic approach. Peitsch M & Guex N (1997) Electrophoresis 18: 2714 9.3a © 2002 CBW/CGDN

30 Swiss-Model steps: Search for sequence similarities
Boris Steipe February 2003 Swiss-Model steps: Search for sequence similarities Evaluate suitable templates Generate structural alignments Average backbones Build loops Bridge incomplete backbones Rebuild sidechains Energy minimize Gromos 96 - Energy minimization Peitsch M & Guex N (1997) Electrophoresis 18: 2714 9.3a © 2002 CBW/CGDN

31 Swiss-Model steps: Search for sequence similarities
Boris Steipe February 2003 Swiss-Model steps: Search for sequence similarities Evaluate suitable templates Generate structural alignments Average backbones Build loops Bridge incomplete backbones Rebuild sidechains Energy minimize Write Alignment and PDB file results Peitsch M & Guex N (1997) Electrophoresis 18: 2714 9.3a © 2002 CBW/CGDN

32 Boris Steipe February 2003 CASP5 (2002) - Homology worse than template better shocking! RMSD(target,template) – RMSD(target, model), Å Remote sequence similarity detection methods have improved. Coordinate manipulations do not improve accuracy. Tramontano A & Morea V (2003) Assessment of homology based predictions in CASP5 Proteins S6: 9.3a © 2002 CBW/CGDN

33 Swissmodel in comparison
Boris Steipe February 2003 Swissmodel in comparison 3D-Crunch: 211,000 sequences -> 64,000 models Controls: >50 % ID: ~ 1 Å RMSD 40-49% ID: 63% < 3Å 25-29% ID: 49% < 4Å Manual alternatives: Modeller ... Automatic alternatives: SwissModel sdsc1 3djigsaw pcomb_pcons cphmodels easypred # 1 for RMSD and % correct aligned, #2 for coverage Guex et al. (1999) TIBS 24: EVA: Eyrich et al. (2001) Bioinformatics 17: ( 9.3a © 2002 CBW/CGDN

34 SwissModel in practice.
Boris Steipe February 2003 Concept 4: SwissModel in practice. Where are we with respect to our objectives ? 9.3a © 2002 CBW/CGDN

35 SwissModel ... first approach mode
Boris Steipe February 2003 SwissModel ... first approach mode 9.3a © 2002 CBW/CGDN

36 ... enter the ExPDB template ID...
Boris Steipe February 2003 ... enter the ExPDB template ID... 9.3a © 2002 CBW/CGDN

37 ... run in Normal Mode (Except if defining a DeepView project )...
Boris Steipe February 2003 ... run in Normal Mode (Except if defining a DeepView project )... 9.3a © 2002 CBW/CGDN

38 ... successful submission.
Boris Steipe February 2003 ... successful submission. Results come by . 9.3a © 2002 CBW/CGDN

39 Homology Modeling in Practice
Boris Steipe February 2003 Homology Modeling in Practice How to assess model reliability ? - All indels are wrong - Structure analysis ("threading", "solvent accessibility", compatibility with ligands) can point out possible alignment errors - But: no point in "repairing" stereochemistry, only review alignment. 9.3a © 2002 CBW/CGDN

40 Homology Modeling in Practice
Boris Steipe February 2003 Homology Modeling in Practice Can you predict function from your model ? No (and yes) - the model may be incompatible with a specific function. 9.3a © 2002 CBW/CGDN

41 Uses of structure revisited - I:
Boris Steipe February 2003 Uses of structure revisited - I: Prototype 1: Analytical Explain mechanistic aspects of protein. (e.g. in terms of) residues involved in catalysis global properties (like electrostatics) shape, relative orientation and distances of domains or subdomains flexibility and dynamics - e.g. hypothesizing about the rate limiting step 9.3a © 2002 CBW/CGDN

42 Uses of structure revisited - II:
Boris Steipe February 2003 Uses of structure revisited - II: Prototype 2: Comparative Bring conservation patterns into a spatial context in order to infer causality from (database) correlations. (e.g. in terms of) describing context specific conservation patterns and anlyizing these according to conserved properties analyizing the predicted effect of sequence variation (e.g. for engineering changes, fusing domains or predicting SNP effects) distinguish physiological vs. nonphysiological interactions 9.3a © 2002 CBW/CGDN

43 boris.steipe@utoronto.ca Questions ? Feedback ?
February 2003 Questions ? Feedback ? 9.3a © 2002 CBW/CGDN


Download ppt "Lab 9.3a: Homology Modeling"

Similar presentations


Ads by Google