CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.

Slides:



Advertisements
Similar presentations
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Advertisements

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Tutorial Homology Modelling. A Brief Introduction to Homology Modeling.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Protein Fold recognition Morten Nielsen, CBS, BioSys, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Protein Fold recognition Morten Nielsen, CBS, BioCentrum, DTU.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
Protein Structure, Databases and Structural Alignment
Protein structure and homology modeling Morten Nielsen, CBS, BioCentrum, DTU.
Protein structure (Part 2 of 2).
Protein Fold recognition Morten Nielsen, CBS, Department of Systems Biology, DTU.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Protein Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Thomas Blicher Center for Biological Sequence Analysis
Protein Fold recognition
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Protein Secondary Structures Assignment and.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Modelling Workshop - Some Relevant Questions Prof. David Jones University College London Where are we now? Where are we going? Where should.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Protein Fold recognition Morten Nielsen, CBS, BioCentrum, DTU.
Protein Structures.
Bioinformatics Ayesha M. Khan Spring 2013.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.
What are proteins? Proteins are important; e.g. for catalyzing and regulating biochemical reactions, transporting molecules, … Linear polymer chain composed.
COMPARATIVE or HOMOLOGY MODELING
Representations of Molecular Structure: Bonds Only.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
HOMOLOGY MODELLING Chris Wilton. Homology Modelling   What is it and why do we need it? principles of modelling, applications available   Using Swiss-Model.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Department of Mechanical Engineering
Secondary structure prediction
Part I : Introduction to Protein Structure A/P Shoba Ranganathan Kong Lesheng National University of Singapore.
Protein Secondary Structure Prediction G P S Raghava.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Manually Adjusting Multiple Alignments Chris Wilton.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Protein Structure Prediction Graham Wood Charlotte Deane.
Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.
Hyperthermophile subtilases
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein structure prediction Haixu Tang School of Informatics.
Lab Lab 10.2: Homology Modeling Lab Boris Steipe Departments of Biochemistry and.
Protein Structure Visualisation
Computational Structure Prediction
The heroic times of crystallography
Introduction to Bioinformatics II
Protein Structures.
Molecular Basis of Box C/D RNA-Protein Interactions
Volume 6, Issue 6, Pages (December 2000)
Yang Liu, Perry Palmedo, Qing Ye, Bonnie Berger, Jian Peng 
Volume 8, Issue 8, Pages (August 2000)
Protein Homology Modelling
Solution Structure of the Proapoptotic Molecule BID
Protein structure prediction
Presentation transcript:

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU NO!

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Identification of Protein- model accuracy Why is it important? What is accuracy –RMSD, fraction correct,… Protein model correctness/quality –Procheck, Whatif, ProsaII, Verify3d Prediction of protein model accuracy –ProQ server

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Why is it so important Reliable fold recognition –P-value, E-value, Z-score… –Tells you if you should believe in the fold!! Alignment (model construction) –No obvious method to estimate reliability of alignment Number of gaps, length of gaps Amino acids in protein core and loops –% id is too conservative Many low homology models are accurate, and some high homology model are wrong Correct fold, wrong alignment => Terrible model How to gain confidence in a protein model?

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Model accuracy. Swiss-model models sharing 25-95% sequence identity with the submitted sequences (

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU What is protein model accuracy Model quality (correctness) –Does the model look like a protein? Hydrophobic residues in core, hydrophilic on surface Backbone geometry (phi/psi angles, bond-length) Amino acid environment A correct model can be completely wrong Structure accuracy (if we know the answer) –RMSD –Fraction of correct modeled residues

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Amino acid environment of different protein sequences (Swissprot) different solved protein structures (PDB) 600 different protein folds => Typical amino acid environment

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Model accuracy Fraction correct = N c /N N c = number correct (dij<4Å) d ij Blue model Yellow structure

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Evaluation of model quality Check for proper protein stereochemistry –ProCheck ( Ramachandran plot, bond-length, … –Whatif ( Packing quality –Both web-servers Fitness of sequence to structure –ProsaII ( Program runs on Linux and Unix –Verify3D ( Web-server

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU ProCheck Peptide backbone geometry Peptide planes –C  NCC  Dihedral angles   degrees –  strand  degrees –  helix From s peedy.st-and.ac.uk/.../lectures/ 3014/lecture/dars1.htm 

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU B. Beta strand A. Right handed helix L. Left handed helix Color coding –White. Disallowed –Red. Most favorable –Yellow. Allowed region Glycine triangles Ramachandran plot B A L

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Find the wrong structure 1RIP Ribosomal protein. 1PLC Electron transport protein

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Procheck. Bond length 1plc

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU

1plc

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU What-if. Fine packing Quality Statistical description of local chemical environment in high quality protein structures –Superimpose tryptophans and find average local environment. Same for other amino acids –Full atom model G. Vriend and C. Sander, 1992

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Example. Casp Model T0133 T0133 Casp5 target Modeled by X3M ( CPHModels-2.0, Lund O., 2002) RMSD=7.3

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Casp Model - Fine packing quality ---Residue----- State AllAll BB-BB BB-SC SC-BB SC-SC ILE ( 33 ) SER ( 34 ) … ALA ( 296 ) GLU ( 297 ) HIS ( 298 ) ============================================================ All contacts : Average = Z-score = BB-BB contacts : Average = Z-score = BB-SC contacts : Average = Z-score = SC-BB contacts : Average = Z-score = SC-SC contacts : Average = Z-score = ============================================================ Average protein values ("Z-score for all contacts") can be read as follows: -5.0 Guaranteed wrong structure. Bad structure or poor model -3.0 Probably bad structure or unrefined model. Doubtful structure or model -2.0 Structure OK or good model. Good structures 0.0 Good structures. 2.0 Good structures. Unusually Good structures 4.0 Probably a strange model of a perfect helix Bad model BB: Backbone SC: Sidechain

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T0133 structure - Fine packing quality ---Residue----- State AllAll BB-BB BB-SC SC-BB SC-SC ILE ( 33 ) A SER ( 34 ) A … ALA ( 296 ) A GLU ( 297 ) A HIS ( 298 ) A ============================================================ All contacts : Average = Z-score = BB-BB contacts : Average = Z-score = BB-SC contacts : Average = Z-score = 0.90 SC-BB contacts : Average = Z-score = SC-SC contacts : Average = Z-score = 0.02 ============================================================ Average protein values ("Z-score for all contacts") can be read as follows: -5.0 Guaranteed wrong structure. Bad structure or poor model -3.0 Probably bad structure or unrefined model. Doubtful structure or model -2.0 Structure OK or good model. Good structures 0.0 Good structures. 2.0 Good structures. Unusually Good structures 4.0 Probably a strange model of a perfect helix Good model

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sippl, J.M. (1990) J. Mol. Biol. 213, (1990). ProsaII (Potential of Mean Force) Likelihood of amino acid packing Method developed by Manfred Sippl., 1993 Works for C  -models For high quality protein structure estimate nearest neighbor counts for all aa E = -log(P(N|a)/P(N)) Hydrophobic residues tend to have many neighbors (buried) Hydrophilic residues tend to have fewer N (exposed) Finding an hydrophilic aa with many NN can indicate wrong model Exposure potential for D D is a charged aa

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU ProsaII (Potential of Mean Force) Likelihood of amino acid packing E = -log(P(r|abs)/P(r|s)) If D and E are close in sequence (s=3), then they prefer to be close in distance d~5.5Å Hydrogen bonds? Sippl, J.M. (1990) J. Mol. Biol. 213, (1990). a b r s Pair potential for D, E. s=3

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Verify 3D (Eisenberg et al. 1997) Closely related to ProsaII exposure potential. How well does aa fit its local environment (hydrophobic/hydrophilic) –T0133 Casp5 target –Modeled by X3M (Lund, O., 2002) –RMSD=7.3 –Red: Crystal structure, –Blue: Model

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Model T0133. Verify 3D Sequence has poor match to structure

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU ProQ. Prediction of Model accuracy Neural network to identify correct protein models. –B. Wallner and Arne Elofsson, 2003 – Input, a pdb structure/model Output, accuracy measure –LGscore –Maxsub score

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU ProQ Input to neural net –Atom-atom contacts C, N, O How often is C in contact with N? –Residue-residue contacts How often is E in contact with D? –Solvent accessibility surface Average exposure of L’s –Secondary structure prediction How consistent is prediction with model?

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Casp model T0113

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Structure 1RIP

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU LifeBench data Models 220 targets Modeled by Pcons Incorrect model Lgscore <1.5 Maxsub < 0.1

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Conclusions Correct protein models cannot (yet!) reliably be identified!! Many methods from the protein crystallography world are useful to identify wrong models Bad models can however pass all filters ProQ is a first attempt of an “accuracy prediction server” –Can integrate information from many sources –Future will show if this approach can provide reliable prediction of model accuracy