Comparative modeling with MODELLER Ben Webb, Andrej Sali Lab UC San Francisco Maya Topf, Birkbeck College, London.

Slides:



Advertisements
Similar presentations
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Modeling the Structures of Proteins and Macromolecular Assemblies Depts. Of Biopharmaceutical Sciences and Pharmaceutical Chemistry California Institute.
05/27/2006 Modeling and Determining the Structures of Proteins and Macromolecular Assemblies Depts. of Biopharmaceutical Sciences and Pharmaceutical Chemistry.
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Protein Tertiary Structure Prediction
Structural bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Protein structure (Part 2 of 2).
Homology modelling ? X-ray ? NMR ?. Homology Modelling !
Thomas Blicher Center for Biological Sequence Analysis
The Protein Data Bank (PDB)
Homology Modeling comparative modeling vs. ab initio folding alignment (check gaps) threading loop building re-packing side-chains in core, DEE, SCWRL.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Protein modelling ● Protein structure is the key to understanding protein function ● Protein structure ● Topics in modelling and computational methods.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
MODELLER hands-on Ben Webb, Sali Lab, UC San Francisco Maya Topf, Birkbeck College, London.
Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Comparative Protein Structure Modeling Lecture 4.1
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Representations of Molecular Structure: Bonds Only.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
Modeling the Structures of Macromolecular Assemblies
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
Structure prediction: Homology modeling
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Bioinformatics – NSF Summer School 2003 Z. Luthey-Schulten, UIUC.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Protein Structure Prediction Graham Wood Charlotte Deane.
Protein Homologue Clustering and Molecular Modeling L. Wang.
Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Mean Field Theory and Mutually Orthogonal Latin Squares in Peptide Structure Prediction N. Gautham Department of Crystallography and Biophysics University.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
PROTEIN MODELLING Presented by Sadhana S.
Computational Structure Prediction
Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.
Protein Structure Prediction and Protein Homology modeling
Protein Structure Fitting and Refinement Guided by Cryo-EM Density
Protein dynamics Folding/unfolding dynamics
Protein Structures.
Modeling the Structures of Proteins and Macromolecular Assemblies
Homology Modeling.
Protein structure prediction.
Homology modeling in short…
Presentation transcript:

Comparative modeling with MODELLER Ben Webb, Andrej Sali Lab UC San Francisco Maya Topf, Birkbeck College, London

Comparative modeling overview Why build comparative models? Many more sequences available than structures (millions vs. tens of thousands) Many applications (e.g. determination of function) rely on structural information Structure is often more conserved than sequence (in 2005, 900K of 1.7M structures modeled), since evolution tends to preserve function

Comparative modeling overview How does it work? Extract information from known structures (one or more templates), and use to build the structure for the ‘target’ sequence Should also consider information from other sources: physical force fields, statistics (e.g. PDB mining) Classes of methods for comparative modeling Assembly of rigid bodies (core, loops, sidechains) Segment matching Satisfaction of spatial restraints

Comparative modeling by satisfaction of spatial restraints - MODELLER A. Šali & T. Blundell. J. Mol. Biol. 234, 779, J.P. Overington & A. Šali. Prot. Sci. 3, 1582, A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1753, 2000.

1. Align sequence with structures First, must determine the template structures Simplistically, try to align the target sequence against every known structure’s sequence In practice, this is too slow, so heuristics are used (e.g. BLAST) Profile or HMM searches are generally more sensitive in difficult cases (e.g. Modeller’s profile.build method, or PSI-BLAST) Could also use threading or other web servers Alignment to templates generally uses global dynamic programming Sequence-sequence: relies purely on a matrix of observed residue-residue mutation probabilities (‘align’) Sequence-structure: gap insertion is penalized within secondary structure (helices etc.) (‘align2d’) Other features and/or user-defined (‘salign’) or use an external program

2. Extract spatial restraints Spatial restraints incorporate homology information, statistical preferences, and physical knowledge Template Cα- Cα internal distances Backbone dihedrals (φ/ψ) Sidechain dihedrals given residue type of both target and template Force field stereochemistry (bond, angle, dihedral) Statistical potentials Other experimental constraints etc.

3. Satisfy spatial restraints All information is combined into a single objective function Restraints and statistics are converted to an “energy” by taking the negative log Force field (CHARMM 22) simply added in Function is optimized by conjugate gradients and simulated annealing molecular dynamics, starting from the target sequence threaded onto template structure(s) Multiple models are generally recommended; ‘best’ model or cluster or models chosen by simply taking the lowest objective function score, or using a model assessment method such as Modeller’s own DOPE or GA341, fit to EM density, or external programs such as PROSA or DFIRE

Typical errors in comparative models Distortion/shifts in aligned regions Region without a template Sidechain packing Incorrect template MODEL X-RAY TEMPLATE Misalignment Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, , 2000.

Model Accuracy as a Function of Target-Template Sequence Identity Sánchez, R., Šali, A. Proc Natl Acad Sci U S A. 95 pp (1998).

Model accuracy Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, , MEDIUM ACCURACYLOW ACCURACYHIGH ACCURACY NM23 Seq id 77% CRABP Seq id 41% EDN Seq id 33% X-RAY/ MODEL Scope for improvement: Sidechains Cα equiv 147/148 RMSD 0.41Å Sidechains Core backbone Loops Cα equiv 122/137 RMSD 1.34Å Sidechains Core backbone, Loops Alignment, Fold assignment Cα equiv 90/134 RMSD 1.17Å

Applications of protein structure models D. Baker & A. Sali. Science 294, 93, 2001.

Loop modeling Often, there are parts of the sequence which have no detectable templates (usually loops) “Mini folding problem” – these loops must be sampled to get improved conformations Database searches only complete for 4-6 residue loops Modeller uses conformational search with a custom energy function optimized for loop modeling (statistical potential derived from PDB) Fiser/Melo protocol (‘loopmodel’) Newer DOPE + GB/SA protocol (‘dope_loopmodel’)

Accuracy of loop models as a function of amount of optimization

Fraction of loops modeled with medium accuracy (<2Å)

Fitting Structural Models in cryoEM maps Problem: comparative models are often inaccurate. Solution: Use cryoEM maps to assess the models by rigid density fitting. refinement ΔGΔG Problem: the structures may exhibit conformational changes (induced fit, target-template differences). Solution: use flexible fitting to refine the structures in the map. Problem: the resolution of the map can be too low for an unambiguous placement of a component. Solution: use additional information to determine the assembly architecture. Topf & Sali. Curr Opin Struct Biol 2005.

Errors in Comparative Modeling vs. Resolution Distortion and shifts of aligned regions Regions without a template Sidechain packing Incorrect templates MisalignmentsRigid-body movements 20 Å 10 Å 2 Å Rigid fitting

Rigid Density Fitting with MODELLER/Mod-EM LE ………………… r  probe native LE - Local exhaustive search (rotations only or rotations+translations) MC …………………  probe native r MC - Monte Carlo in translation, with exhaustive rotation SMC - Scanning of the map to find regions with high CC; LE or MC search probe ………………… probe SMC native Rigid fitting Topf, Baker, John, Chiu & Sali. J Struct Biol 2005.

Quality of fit vs. quality of model R 2 = Fitting score (CC) Topf, Baker, John, Chiu & Sali. J Struct Biol Ranking: Native (1dxt ): 1 Best model: 2 Template (1hbg): 132 (8 Å), 139 (12 Å) Rigid model-fitting Structural overlap

cryoEM Density Map Selects an Accurate Model 1cid:2rhe 12% seq. identity 10 Å resolution Native structure (0) 1.00 Structural Overlap (Rank by CC) Template ( 101 ) 0.55 Best-fitting model (1) 0.69 ( 11 ) 0.73 Most accurate model Rigid fitting

8i1b:4fgf 14% seq. identity 10 Å resolution Iterative Alignment, Model Building and CC-based Assessment Topf, Baker, Marti-Renom, Chiu & Sali. J. Mol. Biol., X-ray Initial model (A) Final model (B) Rigid fitting B A Cα RMSD = 9.1Å Structural Overlap = 36% Cα RMSD = 5.2Å Structural Overlap = 62%

Modeling, Rigid and Flexible Fitting Protocol initial model final model x-ray structure 33% sequence identity RMSD: 5.4 Å -> 3.5 Å Density-based real-Space refinement while maintaining correct stereochemistry Flexible fitting

Arranging Components in a CryoEM Map of their Assembly Simultaneous optimization of multi-component assembly. Assembly architecture Single component fitting result Multi-component optimization result 20 Å resolution with Keren Lasker & Haim Wolfson

Programs, servers and databases External Resources PDB, Uniprot, GENBANK, NR, PIR, INTERPRO, Kinase Resource UCSC Genome Browser, CHIMERA, Pfam, SCOP, CATH LS-SNP Web Server Predicts functional impact of residue substitution MODBASE Database Fold assignments,alignments models, model assessments for all sequences related to a known structure CCPR Center for Computational Proteomics Research MODWEB Web Server Provides a web interface to MODPIPE ICEDB Database/LIMS Tracks targets for structural genomics by NYSGXRC MODELLER Program Implements most operations in comparative modeling MODLOOP Web Server Models loops in protein structures EVA Web Server Evaluates and ranks web servers for protein structure prediction PIBASE Database Contains structurally defined protein interfaces DBALI Database Contains a comprehensive set of pairwise and multiple structure-based alignments LIGBASE Database Ligand binding sites and inheritance (accessible through MODBASE) MODPIPE Program Automatically calculates comparative models of many protein sequences

Useful resources

For further examples…

References Protein Structure Prediction: Marti-Renom el al. Annu. Rev. Biophys. Biomol. Struct. 29, , Baker & Sali. Science 294, 93-96, Comparative Modeling: Marti-Renom et al. Annu. Rev. Biophys. Biomol. Struct. 29, , Marti-Renom et al. Current Protocols in Protein Science 1, , Shen & Sali. Protein Science 15, , Eswar et al. Current Protocols in Bioinformatics, Supplement 15, , Madhusudhan et al, The Proteomics Protocols Handbook. Humana Press Inc., , MODELLER: Sali & Blundell. J. Mol. Biol. 234, , Density fitting: Topf et al. J Struct Biol 2005 Topf et al. J Mol. Biol Topf & Sali Curr Opin Struct Biol 2006

UCSF Andrej Sali Lab Narayanan Eswar Ursula Pieper M. S. Madhusudhan Marc Marti-Renom Roberto Sanchez (MSSM) Min-yi Shen Andras Fiser (AECOM) David Eramian Mark Peterson Francisco Melo (Catholic U.) Ash Stuart (Rampallo Coll.) Eric Feyfant (GI) Valentin Ilyin (NE) Frank Alber Bino John (Pitsburg U.) Fred Davis Andrea Rossi Tom Goddard (Chimera group) Acknowledgements Baylor College Wah Chiu Matt Baker Yao Cong Irina Serysheva Mike Schmid Tel Aviv University Haim Wolfson Keren Lasker