9.3a1 Lab 9.3: Homology Modeling Boris Steipe Departments of Biochemistry and Molecular.

Slides:



Advertisements
Similar presentations
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Advertisements

PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Protein Tertiary Structure Prediction
Structural bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Protein structure (Part 2 of 2).
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Applications of Homology Modeling Hanka Venselaar.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Protein Structure Prediction II
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Homology Modeling Seminar produced by Hanka Venselaar.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structure Prediction and Analysis
Multiple sequence alignment
Protein modelling ● Protein structure is the key to understanding protein function ● Protein structure ● Topics in modelling and computational methods.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.
COMPARATIVE or HOMOLOGY MODELING
Representations of Molecular Structure: Bonds Only.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Bioinformatics 2 -- Lecture 8 More TOPS diagrams Comparative modeling tutorial and strategies.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
HOMOLOGY MODELLING Chris Wilton. Homology Modelling   What is it and why do we need it? principles of modelling, applications available   Using Swiss-Model.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Structure prediction: Homology modeling
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Homology modeling with SWISS-MODEL
(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Structure Visualization
Lab Lab 10.2: Homology Modeling Lab Boris Steipe Departments of Biochemistry and.
Lab 9.3a: Homology Modeling
Protein Structure Prediction and Protein Homology modeling
Protein dynamics Folding/unfolding dynamics
Protein Structure Prediction
Protein Folding and Protein Threading
Protein Structures.
Homology Modeling.
Protein structure prediction.
Protein structure prediction
Homology modeling in short…
Presentation transcript:

9.3a1 Lab 9.3: Homology Modeling Boris Steipe Departments of Biochemistry and Molecular and Medical Genetics Program in Proteomics and Bioinformatics University of Toronto

9.3a2

9.3a3 Concepts 1.Sequence alignment is the single most important step in homology modeling. 2.Reasons to model need to be defined. 3.Fully automated homology modeling services perform well. 4.SwissModel in practice.

9.3a4 Concept 1: Sequence alignment is the single most important step in homology modeling.

9.3a5 What is conserved in structure? E-E.coli [...] IKTRFAPSPTGYLHVGGARTA [...] EQMAKGE----KPRYDGRC [...] AHVSMINGDDGKKLSKRH E-P.putida [...] VRTRIAPSPTGDPHVGTAYIA [...] EQQARGE----TPRYDGRA [...] CYMPLLRNPDKSKLSKRK Q-E.coli [...] VHTRFPPEPNGYLHIGHAKSI [...] TLTQPGKNSPYRDRSVEEN [...] YEFSRL-NLEYTVMSKRK Q-Fly [...] VHTRFPPEPNGILHIGHAKAI [...] FNPKPS---PWRERPIEES [...] WEYGRL-NMNYALVSKRK Q-Human [...] VRTRFPPEPNGILHIGHAKAI [...] HNTLPS---PWRDRPMEES [...] WEYGRL-NLHYAVVSKRK E-Fly [...] VVVRFPPEASGYLHIGHAKAA [...] QRVE----SANRSNSVEKN [...] WSYSRL-NMTNTVLSKRK E-Human [...] VTVRFPPEASGYLHIGHAKAA [...] QRIE----SKHRKNPIEKN [...] WEYSRL-NLNNTVLSKRK E-Yeast [...] VVTRFPPEPSGYLHIGHAKAA [...] DGVA----SARRDRSVEEN [...] WDFARI-NFVRTLLSKRK ATP-Binding | || | || || | QRS E. coli vs. ERS P. putida: ~ 19% ID Many regions are expected to be highly conserved in structure. Some changes should be straightforward to model.

9.3a6 What is conserved in structure? E-E.coli [...] IKTRFAPSPTGYLHVGGARTA [...] EQMAKGE----KPRYDGRC [...] AHVSMINGDDGKKLSKRH E-P.putida [...] VRTRIAPSPTGDPHVGTAYIA [...] EQQARGE----TPRYDGRA [...] CYMPLLRNPDKSKLSKRK Q-E.coli [...] VHTRFPPEPNGYLHIGHAKSI [...] TLTQPGKNSPYRDRSVEEN [...] YEFSRL-NLEYTVMSKRK Q-Fly [...] VHTRFPPEPNGILHIGHAKAI [...] FNPKPS---PWRERPIEES [...] WEYGRL-NMNYALVSKRK Q-Human [...] VRTRFPPEPNGILHIGHAKAI [...] HNTLPS---PWRDRPMEES [...] WEYGRL-NLHYAVVSKRK E-Fly [...] VVVRFPPEASGYLHIGHAKAA [...] QRVE----SANRSNSVEKN [...] WSYSRL-NMTNTVLSKRK E-Human [...] VTVRFPPEASGYLHIGHAKAA [...] QRIE----SKHRKNPIEKN [...] WEYSRL-NLNNTVLSKRK E-Yeast [...] VVTRFPPEPSGYLHIGHAKAA [...] DGVA----SARRDRSVEEN [...] WDFARI-NFVRTLLSKRK ATP-Binding | || | || || | How would sidechain rotamers be modeled? - conserved dihedral angles - preferred rotamers - DEE (Dead End Elimination theorem) for global consistency.

9.3a7 Homology Modeling Issues E-E.coli [...] IKTRFAPSPTGYLHVGGARTA [...] EQMAKGE----KPRYDGRC [...] AHVSMINGDDGKKLSKRH E-P.putida [...] VRTRIAPSPTGDPHVGTAYIA [...] EQQARGE----TPRYDGRA [...] CYMPLLRNPDKSKLSKRK Q-E.coli [...] VHTRFPPEPNGYLHIGHAKSI [...] TLTQPGKNSPYRDRSVEEN [...] YEFSRL-NLEYTVMSKRK Q-Fly [...] VHTRFPPEPNGILHIGHAKAI [...] FNPKPS---PWRERPIEES [...] WEYGRL-NMNYALVSKRK Q-Human [...] VRTRFPPEPNGILHIGHAKAI [...] HNTLPS---PWRDRPMEES [...] WEYGRL-NLHYAVVSKRK E-Fly [...] VVVRFPPEASGYLHIGHAKAA [...] QRVE----SANRSNSVEKN [...] WSYSRL-NMTNTVLSKRK E-Human [...] VTVRFPPEASGYLHIGHAKAA [...] QRIE----SKHRKNPIEKN [...] WEYSRL-NLNNTVLSKRK E-Yeast [...] VVTRFPPEPSGYLHIGHAKAA [...] DGVA----SARRDRSVEEN [...] WDFARI-NFVRTLLSKRK ATP-Binding | || | || || | How would you (or should you even) model indels? - Where should the insertion be placed? - What is the conformation of the new residues? - Which residues should be deleted? - How many additional residues need to change conformation?

9.3a8 Alignment is the limiting step for homology model accuracy No amount of forcefield minimization will put a misaligned residue in the right place ! CASP4: Williams MG et al. (2001) Proteins Suppl.5: 92-97

9.3a9 Superposition vs. Alignment The coordinates of two proteins can be superimposed in space. An alignment may be derived from a superposition by correlating residues that are close in space. An optimal sequence alignment may lead to a different alignment... 1GTR vs 2TS1

9.3a10 Superposition vs. Alignment Example: structural vs. sequence alignment between E. coli GlnRS and G. stearothermophilus TyrRS. Although the optimal sequence alignment is not unreasonable (19% ID = 40/212 residues), comparison with the structure shows it is actually wrong for all but 11 residues ! The structure based alignment is quite dissimilar in sequence ( 4.5%ID = 12/265 residues) but the superposition actually matches 39% of residues ( 104/265 ) over the length of the domain. TyrRS ERVTLYCGFDPTAdS--LHIGHLATILTMRRFQQAGHRPIALVGGAtgligdpsgkkser | | | ||||| | | | | | 1GTR 26 TTVHTRFPPEPNG-YLHIGHAKSICL--NF GIAqDYKGQCN-- | | ||||| | 2TS1 29 ERVTLYCGFDPTAdSLHIGHLATILT--MR RFQ-QAGHRPI-- TyrRS tlnaketVEAWSARIKEQLgrfldfeadgnpa k IKN | | | || | ||| 1GTR LRFD-DTnpv keDIEYVESIKN || 2TS ALVG-GAtgligdpsgkksertlnaketVEAWSARIKE TyrRS NYDWIgpldvitflrdvgk----hfsvnymmakesvqsrietgisftefsYMMLQAYDFL | | | | | | | 1GTR 26 DVewl gf----hwsgnVRYSSD YFdql | 2TS1 29 QLgrf ldfeadgnpakIKNNYD WIgpl TyrRS RLYetegCRLQIGGSDQwgnitaGL ELIRKTKgearAFGLTIPLV | | | || | || | | | 1GTR 26 hayaie linkglayvdeltpeqireyrgtltqpgknspyrdrsveen 2TS1 29 dvitfl rdvgkhfsvnym TyrRS 1GTR 26 lalfekmraggfeegkaclrakidmaspfivmrdpvlyrikfaehhqtgnkwciypmYDF | 2TS makesvqsrietgisftefsYMM TyrRS 1GTR 26 THCISDALEG----ITHSLCTLEFqdnrrlYDWVLDNITipvhPRQYEFSRL 262 2TS1 29 LQAYDFLRLYetegCRLQIGGSDQwgnitaGLELIRKTKgearAFGLTIPLV 223

9.3a11 Inserts may be accomodated in a distant part of the structure Sequence aligment (shows what happened) gktlit nfsqehip gktlisflyeqnfsqehip Structure alignment (shows how it's accomodated) gktlitnfsq ehip gktlisflyeqnfsqehip Example - a five residue insert  -helix

9.3a12 Off by 1, Off by 4 3.8Å A shift in alignment of 1 residue corresponds to a skew in the modeled structure of about 4 Å (3.8 Å is the inter-alpha carbon distance) Nothing you can do AFTER an alignment will fix this error (not even molecular dynamics).

9.3a13 Indels (inserts or deletions) Observations of known similarities in structures demonstrate that uniform gap penalty assumptions are NOT BIOLOGICAL. Indels are most often observed in loops, less often in secondary structure elements When they do not occur in loops, there is usually a maintenance of helical or strand properties.

9.3a14 Can we do better with the gap assumption? Required: position specific gap penalties One approach: implemented in Clustal as secondary structure masks Get secondary structure information, convert it to Clustal mask format. (Easy - read documentation !)

9.3a15 Secondary structure from PDB.... (Algorithm ?)

9.3a16 Secondary structure from RasMol.... (DSSP !)

9.3a17 Concept 2: Reasons to model need to be defined.

9.3a18 Use of homology models Biochemical inference from 3D similarity Bonds Angles, plain and dihedral Surfaces, solvent accessibility Amino acid functions, presence in structure patterns Spatial relationship of residues to active site Spatial relationship to other residues Participation in function / mechanism Static and dynamic disorder Electrostatics Conservation patterns (structural and functional) Posttranslational modification sites (but not structural consequences!) Suitability as drug target Don't !

9.3a19 Abuse of homology models Modelling properties that cannot / will not be verified Analysing geometry of model Interpreting loop structures near indels Inferring relative domain arrangement Inferring structures of complexes

9.3a20 Databases of Models Don’t make models unless you check first... Swiss-Model repository 64,000 models based on 4000 structures and Swiss-Prot proteins ModBase Made with "Modeller" - 15,000 reliable models for substantial segments of approximately 4,000 proteins in the genomes of Saccharomyces cerevisiae, Mycoplasma genitalium, Methanococcus jannaschii, Caenorhabditis elegans, and Escherichia coli.

9.3a21 Concept 3: Fully automated services perform well.

9.3a22 Homology Modeling Process Search nr (PDB) TAR PSI-BLAST TAR: Target sequence Search : Sequence database similarity search nr: non-redundant Genbank subset, (with annotated structures) HOMTEM Align Model MSA ExPDB Complete 3D 3DC SwissModel TextEditor HOM: Homologous sequences TEM: Sequences of homologues with known structure Align : Careful Multiple Sequence Alignment MSA: Multiple Sequence Alignment T-Coffee Cinema Model : Generate 3D Model ExPDB: Modeling template structure database Analyse PUB RasMol Consurf LIG Complete : Add ligands, substrates etc. to model Analyse : Interpret and conclude PUB: Publish results These are really two queries rolled into one procedure.

9.3a23 Homology Modeling Software? Freely available packages perform as good as commercial ones at CASP (Critical Assessment of Structure Prediction) Swiss Model (see your Integrated Assignment) Modeller ( )

9.3a24 Swiss-Model steps: Peitsch M & Guex N (1997) Electrophoresis 18: Search for sequence similarities BLASTP against EX-NRL 3D

9.3a25 Swiss-Model steps: Peitsch M & Guex N (1997) Electrophoresis 18: Search for sequence similarities 2. Evaluate suitable templates Identity: > 25% Expected model : > 20 resid.

9.3a26 Swiss-Model steps: Peitsch M & Guex N (1997) Electrophoresis 18: Search for sequence similarities 2. Evaluate suitable templates 3. Generate structural alignments Select regions of similarity and match in coordinate- space (EXPDB).

9.3a27 Swiss-Model steps: Peitsch M & Guex N (1997) Electrophoresis 18: Search for sequence similarities 2. Evaluate suitable templates 3. Generate structural alignments 4. Average backbones Compute weighted average coordinates for backbone atoms expected to be in model.

9.3a28 Swiss-Model steps: Peitsch M & Guex N (1997) Electrophoresis 18: Search for sequence similarities 2. Evaluate suitable templates 3. Generate structural alignments 4. Average backbones 5. Build loops Pick plausible loops from library, ligate to stems; if not possible, try combinatorial search.

9.3a29 Bridge with overlapping pieces from pentapeptide fragment library, anchor with the terminal residues and add the three central residues. Swiss-Model steps: Peitsch M & Guex N (1997) Electrophoresis 18: Search for sequence similarities 2. Evaluate suitable templates 3. Generate structural alignments 4. Average backbones 5. Build loops 6. Bridge incomplete backbones

9.3a30 Swiss-Model steps: Peitsch M & Guex N (1997) Electrophoresis 18: Search for sequence similarities 2. Evaluate suitable templates 3. Generate structural alignments 4. Average backbones 5. Build loops 6. Bridge incomplete backbones 7. Rebuild sidechains Rebuild sidechains from rotamer library - complete sidechains first, then regenerate partial sidechains from probabilistic approach.

9.3a31 Swiss-Model steps: Peitsch M & Guex N (1997) Electrophoresis 18: Search for sequence similarities 2. Evaluate suitable templates 3. Generate structural alignments 4. Average backbones 5. Build loops 6. Bridge incomplete backbones 7. Rebuild sidechains 8. Energy minimize Gromos 96 - Energy minimization

9.3a32 Swiss-Model steps: Peitsch M & Guex N (1997) Electrophoresis 18: results 1. Search for sequence similarities 2. Evaluate suitable templates 3. Generate structural alignments 4. Average backbones 5. Build loops 6. Bridge incomplete backbones 7. Rebuild sidechains 8. Energy minimize 9. Write Alignment and PDB file

9.3a33 CASP5 (2002) - Homology Remote sequence similarity detection methods have improved. Tramontano A & Morea V (2003) Assessment of homology based predictions in CASP5 Proteins S6: Coordinate manipulations do not improve accuracy. shocking! RMSD(target,template) – RMSD(target, model), Å worse than template better

9.3a34 Swissmodel in comparison 3D-Crunch: 211,000 sequences -> 64,000 models Controls: >50 % ID: ~ 1 Å RMSD 40-49% ID: 63% < 3Å 25-29% ID: 49% < 4Å Guex et al. (1999) TIBS 24: EVA: Eyrich et al. (2001) Bioinformatics 17: ( Manual alternatives: Modeller... Automatic alternatives: SwissModel sdsc1 3djigsaw pcomb_pcons cphmodels easypred # 1 for RMSD and % correct aligned, #2 for coverage

9.3a35 Concept 4: SwissModel in practice.

9.3a36 SwissModel... first approach mode

9.3a37... enter the ExPDB template ID...

9.3a38... run in Normal Mode (Except if defining a DeepView project )...

9.3a39... successful submission. Results come by .

9.3a40 Homology Modeling in Practice How to assess model reliability ? - All indels are wrong - Structure analysis ("threading", "solvent accessibility", compatibility with ligands) can point out possible alignment errors - But: no point in "repairing" stereochemistry, only review alignment.

9.3a41 Homology Modeling in Practice Can you predict function from your model ? No (and yes) - the model may be incompatible with a specific function.

9.3a42 Uses of structure revisited - I: Prototype 1: Analytical Explain mechanistic aspects of protein. (e.g. in terms of) residues involved in catalysis global properties (like electrostatics) shape, relative orientation and distances of domains or subdomains flexibility and dynamics - e.g. hypothesizing about the rate limiting step

9.3a43 Uses of structure revisited - II: Prototype 2: Comparative Bring conservation patterns into a spatial context in order to infer causality from (database) correlations. (e.g. in terms of) describing context specific conservation patterns and anlyizing these according to conserved properties analyzing the predicted effect of sequence variation (e.g. for engineering changes, fusing domains or predicting SNP effects) distinguish physiological vs. nonphysiological interactions