Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparative modeling with MODELLER Ben Webb, Andrej Sali Lab UC San Francisco Maya Topf, Birkbeck College, London.

Similar presentations


Presentation on theme: "Comparative modeling with MODELLER Ben Webb, Andrej Sali Lab UC San Francisco Maya Topf, Birkbeck College, London."— Presentation transcript:

1 Comparative modeling with MODELLER http://salilab.org/modeller/ Ben Webb, Andrej Sali Lab UC San Francisco Maya Topf, Birkbeck College, London

2 Comparative modeling overview Why build comparative models? Many more sequences available than structures (millions vs. tens of thousands) Many applications (e.g. determination of function) rely on structural information Structure is often more conserved than sequence (in 2005, 900K of 1.7M structures modeled), since evolution tends to preserve function

3 Comparative modeling overview How does it work? Extract information from known structures (one or more templates), and use to build the structure for the ‘target’ sequence Should also consider information from other sources: physical force fields, statistics (e.g. PDB mining) Classes of methods for comparative modeling Assembly of rigid bodies (core, loops, sidechains) Segment matching Satisfaction of spatial restraints

4 Comparative modeling by satisfaction of spatial restraints - MODELLER A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993. J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994. A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1753, 2000.

5 1. Align sequence with structures First, must determine the template structures Simplistically, try to align the target sequence against every known structure’s sequence In practice, this is too slow, so heuristics are used (e.g. BLAST) Profile or HMM searches are generally more sensitive in difficult cases (e.g. Modeller’s profile.build method, or PSI-BLAST) Could also use threading or other web servers Alignment to templates generally uses global dynamic programming Sequence-sequence: relies purely on a matrix of observed residue-residue mutation probabilities (‘align’) Sequence-structure: gap insertion is penalized within secondary structure (helices etc.) (‘align2d’) Other features and/or user-defined (‘salign’) or use an external program

6 2. Extract spatial restraints Spatial restraints incorporate homology information, statistical preferences, and physical knowledge Template Cα- Cα internal distances Backbone dihedrals (φ/ψ) Sidechain dihedrals given residue type of both target and template Force field stereochemistry (bond, angle, dihedral) Statistical potentials Other experimental constraints etc.

7 3. Satisfy spatial restraints All information is combined into a single objective function Restraints and statistics are converted to an “energy” by taking the negative log Force field (CHARMM 22) simply added in Function is optimized by conjugate gradients and simulated annealing molecular dynamics, starting from the target sequence threaded onto template structure(s) Multiple models are generally recommended; ‘best’ model or cluster or models chosen by simply taking the lowest objective function score, or using a model assessment method such as Modeller’s own DOPE or GA341, fit to EM density, or external programs such as PROSA or DFIRE

8 Typical errors in comparative models Distortion/shifts in aligned regions Region without a template Sidechain packing Incorrect template MODEL X-RAY TEMPLATE Misalignment Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.

9 Model Accuracy as a Function of Target-Template Sequence Identity Sánchez, R., Šali, A. Proc Natl Acad Sci U S A. 95 pp13597-602. (1998).

10 Model accuracy Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000. MEDIUM ACCURACYLOW ACCURACYHIGH ACCURACY NM23 Seq id 77% CRABP Seq id 41% EDN Seq id 33% X-RAY/ MODEL Scope for improvement: Sidechains Cα equiv 147/148 RMSD 0.41Å Sidechains Core backbone Loops Cα equiv 122/137 RMSD 1.34Å Sidechains Core backbone, Loops Alignment, Fold assignment Cα equiv 90/134 RMSD 1.17Å

11 Applications of protein structure models D. Baker & A. Sali. Science 294, 93, 2001.

12 Loop modeling Often, there are parts of the sequence which have no detectable templates (usually loops) “Mini folding problem” – these loops must be sampled to get improved conformations Database searches only complete for 4-6 residue loops Modeller uses conformational search with a custom energy function optimized for loop modeling (statistical potential derived from PDB) Fiser/Melo protocol (‘loopmodel’) Newer DOPE + GB/SA protocol (‘dope_loopmodel’)

13 Accuracy of loop models as a function of amount of optimization

14 Fraction of loops modeled with medium accuracy (<2Å)

15 Fitting Structural Models in cryoEM maps Problem: comparative models are often inaccurate. Solution: Use cryoEM maps to assess the models by rigid density fitting. refinement ΔGΔG Problem: the structures may exhibit conformational changes (induced fit, target-template differences). Solution: use flexible fitting to refine the structures in the map. Problem: the resolution of the map can be too low for an unambiguous placement of a component. Solution: use additional information to determine the assembly architecture. Topf & Sali. Curr Opin Struct Biol 2005.

16 Errors in Comparative Modeling vs. Resolution Distortion and shifts of aligned regions Regions without a template Sidechain packing Incorrect templates MisalignmentsRigid-body movements 20 Å 10 Å 2 Å Rigid fitting

17 Rigid Density Fitting with MODELLER/Mod-EM LE ………………… r  probe native LE - Local exhaustive search (rotations only or rotations+translations) MC …………………  probe native r MC - Monte Carlo in translation, with exhaustive rotation SMC - Scanning of the map to find regions with high CC; LE or MC search probe ………………… probe SMC native Rigid fitting Topf, Baker, John, Chiu & Sali. J Struct Biol 2005.

18 Quality of fit vs. quality of model R 2 =0.6-0.7 Fitting score (CC) Topf, Baker, John, Chiu & Sali. J Struct Biol 2005. Ranking: Native (1dxt ): 1 Best model: 2 Template (1hbg): 132 (8 Å), 139 (12 Å) Rigid model-fitting Structural overlap

19 cryoEM Density Map Selects an Accurate Model 1cid:2rhe 12% seq. identity 10 Å resolution Native structure (0) 1.00 Structural Overlap (Rank by CC) Template ( 101 ) 0.55 Best-fitting model (1) 0.69 ( 11 ) 0.73 Most accurate model Rigid fitting

20 8i1b:4fgf 14% seq. identity 10 Å resolution Iterative Alignment, Model Building and CC-based Assessment Topf, Baker, Marti-Renom, Chiu & Sali. J. Mol. Biol., 2006. X-ray Initial model (A) Final model (B) Rigid fitting B A Cα RMSD = 9.1Å Structural Overlap = 36% Cα RMSD = 5.2Å Structural Overlap = 62%

21 Modeling, Rigid and Flexible Fitting Protocol initial model final model x-ray structure 33% sequence identity RMSD: 5.4 Å -> 3.5 Å Density-based real-Space refinement while maintaining correct stereochemistry Flexible fitting

22 Arranging Components in a CryoEM Map of their Assembly Simultaneous optimization of multi-component assembly. Assembly architecture Single component fitting result Multi-component optimization result 20 Å resolution with Keren Lasker & Haim Wolfson

23 Programs, servers and databases External Resources PDB, Uniprot, GENBANK, NR, PIR, INTERPRO, Kinase Resource UCSC Genome Browser, CHIMERA, Pfam, SCOP, CATH LS-SNP Web Server http://salilab.org/LS-SNP/ Predicts functional impact of residue substitution MODBASE Database http://salilab.org/modbase/ Fold assignments,alignments models, model assessments for all sequences related to a known structure CCPR Center for Computational Proteomics Research http://www.ccpr.ucsf.edu MODWEB Web Server http://salilab.org/modweb/ Provides a web interface to MODPIPE ICEDB Database/LIMS http://nysgxrc.org Tracks targets for structural genomics by NYSGXRC MODELLER Program http://salilab.org/modeller/ Implements most operations in comparative modeling MODLOOP Web Server http://salilab.org/modloop/ Models loops in protein structures EVA Web Server http://salilab.org/eva/ Evaluates and ranks web servers for protein structure prediction PIBASE Database http://salilab.org/pibase/ Contains structurally defined protein interfaces DBALI Database http://salilab.org/DBAli/ Contains a comprehensive set of pairwise and multiple structure-based alignments LIGBASE Database Ligand binding sites and inheritance (accessible through MODBASE) MODPIPE Program Automatically calculates comparative models of many protein sequences

24 Useful resources http://salilab.org/bioinformatics_resources.shtml

25 For further examples… http://salilab.org/modeller/tutorial/

26 References Protein Structure Prediction: Marti-Renom el al. Annu. Rev. Biophys. Biomol. Struct. 29, 291-325, 2000. Baker & Sali. Science 294, 93-96, 2001. Comparative Modeling: Marti-Renom et al. Annu. Rev. Biophys. Biomol. Struct. 29, 291-325, 2000. Marti-Renom et al. Current Protocols in Protein Science 1, 2.9.1-2.9.22, 2002. Shen & Sali. Protein Science 15, 2507 - 2524, 2006. Eswar et al. Current Protocols in Bioinformatics, Supplement 15, 5.6.1-5.6.30, 2006. Madhusudhan et al, The Proteomics Protocols Handbook. Humana Press Inc., 831-860, 2005. MODELLER: Sali & Blundell. J. Mol. Biol. 234, 779-815, 1993. Density fitting: Topf et al. J Struct Biol 2005 Topf et al. J Mol. Biol. 2006 Topf & Sali Curr Opin Struct Biol 2006

27 UCSF Andrej Sali Lab Narayanan Eswar Ursula Pieper M. S. Madhusudhan Marc Marti-Renom Roberto Sanchez (MSSM) Min-yi Shen Andras Fiser (AECOM) David Eramian Mark Peterson Francisco Melo (Catholic U.) Ash Stuart (Rampallo Coll.) Eric Feyfant (GI) Valentin Ilyin (NE) Frank Alber Bino John (Pitsburg U.) Fred Davis Andrea Rossi Tom Goddard (Chimera group) Acknowledgements Baylor College Wah Chiu Matt Baker Yao Cong Irina Serysheva Mike Schmid Tel Aviv University Haim Wolfson Keren Lasker


Download ppt "Comparative modeling with MODELLER Ben Webb, Andrej Sali Lab UC San Francisco Maya Topf, Birkbeck College, London."

Similar presentations


Ads by Google