CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.

Slides:



Advertisements
Similar presentations
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Advertisements

Review: Amino Acid Side Chains Aliphatic- Ala, Val, Leu, Ile, Gly Polar- Ser, Thr, Cys, Met, [Tyr, Trp] Acidic (and conjugate amide)- Asp, Asn, Glu, Gln.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
1 Levels of Protein Structure Primary to Quaternary Structure.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Protein Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Thomas Blicher Center for Biological Sequence Analysis
It & Health 2009 Summary Thomas Nordahl Petersen.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
ProteinStructuralDatabases. Proteins are built from amino-acids. Introduction H | NH2-c-CO2H | R.
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structural Prediction. Protein Structure is Hierarchical.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
What are proteins? Proteins are important; e.g. for catalyzing and regulating biochemical reactions, transporting molecules, … Linear polymer chain composed.
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Representations of Molecular Structure: Bonds Only.
Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I Prof. Corey O’Hern Department of Mechanical Engineering & Materials.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Department of Mechanical Engineering
Secondary structure prediction
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Structure prediction: Homology modeling
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Programme Last week’s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Summary.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
X-ray detection xray/facilities.html.
Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Ab-initio protein structure prediction ? Chen Keasar BGU Any educational usage of these slides is welcomed. Please acknowledge.
Modelling genome structure and function Ram Samudrala University of Washington.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Protein Structure Visualisation
Computational Structure Prediction
Protein Structure and Properties
University of Washington
Protein Structure Prediction and Protein Homology modeling
Protein dynamics Folding/unfolding dynamics
Molecular Modeling By Rashmi Shrivastava Lecturer
Homology Modeling.
Levels of Protein Structure
Protein structure prediction.
Programme Last week’s quiz results + Summary
Protein Homology Modelling
Homology modeling in short…
Presentation transcript:

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Why Do We Need Homology Modelling?  Ab Initio protein folding (random sampling): –100 aa, 3 conf./residue gives approximately different overall conformations!  Random sampling is NOT feasible, even if conformations can be sampled at picosecond ( sec) rates. –Levinthal’s paradox  Do homology modelling instead.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU How Is It Possible?  The structure of a protein is uniquely determined by its amino acid sequence (but sequence is sometimes not enough): –prions –pH, ions, cofactors, chaperones  Structure is conserved much longer than sequence in evolution. –Structure > Function > Sequence

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU How Often Can We Do It?  There are currently ~40000 structures in the PDB (but only ~4000 if you include only ones that are not more than 30% identical and have a resolution better than 3.0 Å).  An estimated 25% of all sequences can be modeled and structural information can be obtained for ~50%.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Worldwide Structural Genomics  ”Fold space coverage”  Complete genomes  Signaling proteins  Improving technology  Disease-causing organisms  Model organisms  Membrane proteins  Protein-ligand interactions

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Structural Genomics in North America  10 year $600 million project initiated in 2000, funded largely by NIH.  AIM: structural information on unique proteins (now ), so far 1000 have been determined.  Improve current techniques to reduce time (from months to days) and cost (from $ to $20.000/structure).  9 research centers currently funded (2005), targets are from model and disease-causing organisms (a separate project on TB proteins).

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling for Structural Genomics Roberto Sánchez et al. Nature Structural Biology 7, (2000)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU How Well Can We Do It? Sali, A. & Kuriyan, J. Trends Biochem. Sci. 22, M20–M24 (1999)

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU How Is It Done?  Identify template(s) – initial alignment  Improve alignment  Backbone generation  Loop modelling  Side chains  Refinement  Validation 

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Template Identification  Search with sequence –Blast –Psi-Blast –Fold recognition methods  Use biological information  Functional annotation in databases  Active site/motifs

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Alignment

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU FDICRLPGSAEAVC F N V C R T P E A I C PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS FDICRLPGSAEAVC F N V C R T P E A I C

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS PHE ASN VAL CYS ARG THR PRO GLU ALA ILE CYS From ”Professional Gambling” by Gert Vriend Improving the Alignment

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Template Quality  Selecting the best template is crucial!  The best template may not be the one with the highest % id (best p-value…) –Template 1: 93% id, 3.5 Å resolution  –Template 2: 90% id, 1.5 Å resolution

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU The Importance of Resolution high low 4 Å 2 Å 3 Å 1 Å

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Ramachandran Plot  Allowed backbone torsion angles in proteins N H Amino acid residue

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Template Quality – Ramachandran Plot NMR structure – low quality data… X-ray structure – good data.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Backbone Generation  Generate the backbone coordinates from the template for the aligned regions.  Several programs can do this, most of the groups at CASP6 use Modeller:

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Loop Modelling  Knowledge based: –Searches PDB for fragments that match the sequence to be modelled (Levitt, Holm, Baker etc.).  Energy based: –Uses an energy function to evaluate the quality of the loop and minimizes this function by Monte Carlo (sampling) or molecular dynamics (MD) techniques.  Combination

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Loops – the Rosetta Method  Find fragments (10 per amino acid) with the same sequence and secondary structure profile as the query sequence.  Combine them using a Monte Carlo scheme to build the loop. David Baker et al.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Side Chains  Side chain rotamers are dependent on backbone conformation.  Most successful method in CASP6 was SCWRL by Dunbrack et al.: –Graph-theory knowledge based method to solve the combinatorial problem of side chain modelling.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Side Chains  Prediction accuracy is high for buried residues, but much lower for surface residues –Experimental reasons: side chains at the surface are more flexible. –Theoretical reasons: much easier to handle hydrophobic packing in the core than the electrostatic interactions, including H-bonds to waters.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Side Chains  If the seq. id is high, the networks of side chain contacts may be conserved, and keeping the side chain rotamers from the template may be better than predicting new ones.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Refinement  Energy minimization  Molecular dynamics –Big errors like atom clashes can be removed, but force fields are not perfect and small errors will also be introduced – keep minimization to a minimum or matters will only get worse.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Error Recovery  If errors are introduced in the model, they normally can NOT be recovered at a later step –The alignment can not make up for a bad choice of template. –Loop modeling can not make up for a poor alignment.  If errors are discovered, the step where they were introduced should be redone.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Validation  Most programs will get the bond lengths and angles right.  The Ramachandran plot of the model usually looks pretty much like the Ramachandran plot of the template (so select a high quality template).  Inside/outside distributions of polar and apolar residues can be useful.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Validation – ProQ Server  ProQ is a neural network based predictor that based on a number of structural features predicts the quality of a protein model.  ProQ is optimized to find correct models in contrast to other methods which are optimized to find native structures. Arne Elofssons group:

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Structure Validation  ProCheck heck.html  WhatIf server

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Servers  Eva-CM performs continous and automated analysis of comparative protein structure modeling servers  A current list of the best performing servers can be found at:

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU The Hardest Target in CASP6  Only 8% sequence id between target and template. Dunbrack, Wang & Jin (2004) CASP6 Fold Recognition Assessment

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Summary  Successful homology modelling depends on the following: –Template quality –Alignment (add biological information) –Modelling program/procedure (use more than one)  Always validate your final model!