Burkhard Rost (Columbia New York) Some gory details of protein secondary structure prediction Burkhard Rost CUBIC Columbia University

Slides:



Advertisements
Similar presentations
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Advertisements

PhyCMAP: Predicting protein contact map using evolutionary and physical constraints by integer programming Zhiyong Wang and Jinbo Xu Toyota Technological.
Using a Mixture of Probabilistic Decision Trees for Direct Prediction of Protein Functions Paper by Umar Syed and Golan Yona department of CS, Cornell.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Predicting local Protein Structure Morten Nielsen.
An Introduction to Bioinformatics Protein Structure Prediction.
NIH-PSI Target Selection, Nov 13-14, 2003© Burkhard Rost (Columbia New York) Comprehensive strategy for integrated target selection in structural genomics.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Protein Structure Modeling (1). Protein Folding Problem A protein folds into a unique 3D structure under physiological conditions Lysozyme sequence: KVFGRCELAA.
Protein Secondary Structures Assignment and prediction.
Thomas Blicher Center for Biological Sequence Analysis
Protein Fold recognition
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Protein Secondary Structures Assignment and prediction.
The Protein Data Bank (PDB)
University of Ghent, 2000 Alfonso Valencia %identity with a protein of known structure 0%0% 100 % 30 % Homology modeling Threading Ab initio 3D Structure.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Secondary Structure Prediction Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
Structure Prediction in 1D
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Secondary Structures Assignment and prediction.
Predicting local Protein Structure Morten Nielsen.
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST.
Class 7: Protein Secondary Structure
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Structures.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Rising accuracy of protein secondary structure prediction Burkhard Rost
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Prediction of protein structure
Protein Secondary Structure Prediction
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Secondary Structure Prediction G P S Raghava.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Predicting Protein Structure: Comparative Modeling (homology modeling)
1 Improve Protein Disorder Prediction Using Homology Instructor: Dr. Slobodan Vucetic Student: Kang Peng.
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Protein Prediction with Neural Networks! Chris Alvino CS152 Fall ’06 Prof. Keller.
An Efficient Index-based Protein Structure Database Searching Method 陳冠宇.
Protein motif /domain Structural unit Functional unit Signature of protein family How are they defined?
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Protein Structure Visualisation
Common Motifs in Kinases and Phosphatases that Share a Substrate
Protein Structures.
Protein structure prediction.
Reliability of Assessment of Protein Structure Prediction Methods
Homology modeling in short…
Presentation transcript:

Burkhard Rost (Columbia New York) Some gory details of protein secondary structure prediction Burkhard Rost CUBIC Columbia University

Burkhard Rost (Columbia New York) FoRc HoMo 1D ….the art of being humble

Burkhard Rost (Columbia New York) Goal of secondary structure prediction

Secondary structure predictions of 1. and 2. generation single residues (1. generation) –Chou-Fasman, GOR / % accuracy segments (2. generation) –GORIII % accuracy problems –< 100% they said: 65% max – < 40% they said: strand non-local –short segments

Burkhard Rost (Columbia New York) Helix formation is local THYROID hormone receptor (2nll)

Burkhard Rost (Columbia New York)  -sheet formation is NOT local

Burkhard Rost (Columbia New York) SEQ KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPA AYVKKLD OBS EEEE E E E EEEEEE EEEEEE EEEEEEHHHEEEE TYP EHHHH EE EEEE EE HHHEE EEEHH Problems of secondary structure predictions (before 1994)

Burkhard Rost (Columbia New York) Simple neural network

Burkhard Rost (Columbia New York) Training a neural network 1

Burkhard Rost (Columbia New York) Errare = (out net - out want) 2 Training a neural network 2

Burkhard Rost (Columbia New York) Training a neural network 3

Burkhard Rost (Columbia New York) Training a neural network 4

Burkhard Rost (Columbia New York) Neural networks classify points

Burkhard Rost (Columbia New York) Simple neural network with hidden layer

Burkhard Rost (Columbia New York) Neural Network for secondary structure

Burkhard Rost (Columbia New York) Secondary structure predictions of 1. and 2. generation single residues (1. generation) –Chou-Fasman, GOR / % accuracy segments (2. generation) –GORIII % accuracy problems –< 100% they said: 65% max – < 40% they said: strand non-local –short segments

Burkhard Rost (Columbia New York)

normal training balanced training Balanced training

Burkhard Rost (Columbia New York)

PHDsec: structure-to-structure network

Burkhard Rost (Columbia New York) Better prediction of segment lengths

Burkhard Rost (Columbia New York) Evolution has it!

Burkhard Rost (Columbia New York)

Spectrin homology domain (SH3)

Burkhard Rost (Columbia New York) Prediction accuracy varies!

Burkhard Rost (Columbia New York) Why so bad?

Burkhard Rost (Columbia New York) Stronger predictions more accurate!

Burkhard Rost (Columbia New York) Correct prediction of correctly predicted residues

Burkhard Rost (Columbia New York) BAD errors are frequent!

Burkhard Rost (Columbia New York) False prediction for engineered proteins!

Burkhard Rost (Columbia New York) PHDsec: the un-g(l)ory details average accuracy > 72% (helix, strand, other) 72% is average over distribution: ≈ 10% stronger predictions more accurate WARNING: reliability index almost factor 2 too large for single sequences

Burkhard Rost (Columbia New York) Details PHDsec: Multiple alignment single sequences => accuracy clearly lower id nali Q3sec Q2acc AA KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLD OBS EEEE E E EEEEEE EEEEEE EEEEEEHHHEEEE 30 N EEEEEEE EEE EEEEE EEEE EE EEE self EEEEEEE EEEE EEEEE EEEEEE HHHHH

Burkhard Rost (Columbia New York) PHDsec: the un-g(l)ory details average accuracy > 72% (helix, strand, other) 72% is average over distribution: ≈ 10% stronger predictions more accurate WARNING: reliability index almost factor 2 too large for single sequences

Burkhard Rost (Columbia New York) Details PHDsec: Multiple alignment single sequences => accuracy clearly lower id nali Q3sec Q2acc AA KELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLD OBS EEEE E E EEEEEE EEEEEE EEEEEEHHHEEEE 30 N EEEEEEE EEE EEEEE EEEE EE EEE self EEEEEEE EEEE EEEEE EEEEEE HHHHH

Burkhard Rost (Columbia New York) Secondary structure prediction Limit of prediction accuracy reached? How complementing other methods? Ultimate rôle in structure prediction (1D-3D)? Better to use "pure" secondary structure prediction methods, or to use 3D methods and read the secondary structure off the 3D model? Conversely, are 3D predictors making optimal use of secondary structure predictions? Will secondary structure and 3D prediction merge completely?

Burkhard Rost (Columbia New York) Secondary structure prediction 2000 history 1st generation 50-55% 2nd generation55-62% 3rd generation % 2000> 76% what improves? database growth+3 PSI-BLAST+0.5 new training+1 ‘clever method’+1 limit? max88% -> 12% to go 1/5 of proteins with more than 100 proteins -> >80% and from there?

Burkhard Rost (Columbia New York) Prediction of protein secondary structure 1980: 55%simple 1990: 60%less simple 1993: 70%evolution 2000: 76%more evolution what is the limit? 88% for proteins of similar structure 80% for 1/5th of proteins with families > 100 missing through: better definition of secondary structure including long-range interactions structural switches chameleon / folding

Burkhard Rost (Columbia New York) CAFASP statistics 29 proteins not similar to known PDB –T0086,T0087,T0090,T0091,T0092,T0094,T0095,T0096,T0097,T0098,T0 101,T0102,T0104,T0105,T0106,T0107,T0108,T0109,T0110,T0114,T011 5,T0116,T0117,T0118,T0120,T0124,T0125,T0126,T proteins with PSI-BLAST homologue –T0089,T proteins with trivial homologue to PDB –T0099,T0100,T0111,T0112,T0113,T0121,T0122,T0123,T0128

Burkhard Rost (Columbia New York) CAFASP sec unique

Burkhard Rost (Columbia New York) CAFASP sec homologous

Burkhard Rost (Columbia New York) CAFASP concept Targets & Non-targets –comparative modelling 85% > all current methods Never compare methods on different proteins Never rank when too few proteins (Never show numbers for one protein between different proteins)

Burkhard Rost (Columbia New York) What is significant

Burkhard Rost (Columbia New York) Rank only if significant e.g. M1 = 75, M2 = 73 say 16 proteins rule-of-thumb: significant sigma / sqrt(Number of porteins) -> 10/4 = 2.5 -> M1 and M2 cannot be distinguished

Burkhard Rost (Columbia New York) EVA: automatic continuous EVAluation of structure prediction

Burkhard Rost (Columbia New York) EVA: automatic continuous EVAluation of structure prediction statistics: 31 weeks -> 1549 new structures 352 new sequence unique chains (of 2200) categories: –secondary structure prediction (7 methods) –comparative modelling (4) –fold recognition (7) –contact prediction (4)

Burkhard Rost (Columbia New York) EVA: secondary structure MAJOR lessons from EVA: –no point comparing apples and oranges –no point comparing < 20 apples EVA team: –CUBIC, Columbia: Volker Eyrich, Dariusz Przybylski, Burkhard Rost –Rockefeller: Marc Marti-Renom, Andras Fiser, Andrej Sali –Madrid: Florencio Pazos, Alfonso Valencia URL:

Burkhard Rost (Columbia New York) EVA: secondary structure 76%

Burkhard Rost (Columbia New York) Accuracy varies for proteins!

Burkhard Rost (Columbia New York) Averaging over many methods not always a good idea!

Burkhard Rost (Columbia New York) Some proteins predicted better

Burkhard Rost (Columbia New York) Reliability correlates with accuracy!

Burkhard Rost (Columbia New York) Conclusion big gain through using evolutionary information are we going to reach above 80%? How high? continuous secondary structure better methods other features use secondary structure: ASP Young M, Kirshenbaum K, Dill KA, Highsmith S: Predicting conformational switches in proteins. Protein Sci 1999, 8:

Burkhard Rost (Columbia New York) Availability of methods –subject:HELP –file: WWW: predictprotein/ META: predictprotein/submit_meta.html EVA: CUBIC: address options # protein name SEQWENCE