Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,

Slides:



Advertisements
Similar presentations
Progress in Transmembrane Protein Research 12 Month Report Tim Nugent.
Advertisements

Secondary structure prediction from amino acid sequence.
Using a Mixture of Probabilistic Decision Trees for Direct Prediction of Protein Functions Paper by Umar Syed and Golan Yona department of CS, Cornell.
Protein Structure Prediction
Prediction of protein structure
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in.
Protein Secondary Structures
Proteins Structural Bioinformatics. 2 3 Specific databases of protein sequences and structures  Swissprot  PIR  TREMBL (translated from DNA)  PDB.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Protein secondary structure prediction methods TDVEAAVNSLVNLYLQASYLS “From sequence to structure”
Predicting local Protein Structure Morten Nielsen.
Protein secondary structure prediction methods TDVEAAVNSLVNLYLQASYLS “From sequence to structure”
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear.
Protein Structure Modeling (1). Protein Folding Problem A protein folds into a unique 3D structure under physiological conditions Lysozyme sequence: KVFGRCELAA.
Protein Secondary Structures Assignment and prediction.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Protein Secondary Structures Assignment and prediction.
Protein secondary structure prediction methods TDVEAAVNSLVNLYLQASYLS “From sequence to structure”
The Protein Data Bank (PDB)
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen
Structure Prediction in 1D
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Protein Secondary Structures Assignment and.
Protein Secondary Structures Assignment and prediction.
Predicting local Protein Structure Morten Nielsen.
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST.
Comparing Database Search Methods & Improving the Performance of PSI-BLAST Stephen Altschul.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.
Protein structure prediction
Lecture 11, CS5671 Secondary Structure Prediction Progressive improvement –Chou-Fasman rules –Qian-Sejnowski –Burkhard-Rost PHD –Riis-Krogh Chou-Fasman.
Secondary Structure Prediction Protein Analysis Workshop 2008 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta
Rising accuracy of protein secondary structure prediction Burkhard Rost
Proteins Secondary Structure Predictions Structural Bioinformatics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Secondary Structure Prediction
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Prediction of protein structure
Protein Secondary Structure Prediction
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Visualisation/prediction 3D structures. Recognition ability is the basis of biological function 3D struture is key for recognition.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Proteins Secondary Structure Predictions
Structural Bioinformatics
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Proteins Secondary Structure Predictions
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Proteins Structure Predictions Structural Bioinformatics.
molecule's structure prediction
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Secondary structure prediction
Introduction to Bioinformatics II
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
Protein Structure Prediction
Presentation transcript:

Protein Secondary Structure Prediction

Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand, or loop. Protein secondary structure prediction

Servers for SS prediction AGADIR - An algorithm to predict the helical content of peptides APSSP - Advanced Protein Secondary Structure Prediction Server CFSSP - Chou & Fasman Secondary Structure Prediction Server GOR - Garnier et al, 1996 HNN - Hierarchical Neural Network method (Guermeur, 1997) HTMSRAP - Helical TransMembrane Segment Rotational Angle Prediction Jpred - A consensus method for protein secondary structure prediction at University of Dundee JUFO - Protein secondary structure prediction from sequence (neural network) NetSurfP - Protein Surface Accessibility and Secondary Structure Predictions NetTurnP - Prediction of Beta-turn regions in protein sequences nnPredict - University of California at San Francisco (UCSF) Porter - University College Dublin PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom, EvalSec from Columbia University Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction PSA - BioMolecular Engineering Research Center (BMERC) / Boston PSIpred - Various protein structure prediction methods at Bloomsbury Centre for Bioinformatics SOPMA - Geourjon and Delage, 1995 Scratch Protein Predictor DLP-SVM - Domain linker prediction using SVM at Tokyo University of Agriculture and Technology

SS prediction Methods Most basic idea - probabilities Chou-Fasman method (1974) Most basic idea - probabilities Chou-Fasman method (1974) Conditional probabilities GOR method (1978) Conditional probabilities GOR method (1978) Machine learning techniques SVM, Neural network (2004/5) Machine learning techniques SVM, Neural network (2004/5) Other improvements Environment, solvent accessibility (ongoing) Other improvements Environment, solvent accessibility (ongoing) ~50% ~60% ~70% ~80%

Query SwissProt BLASTp Query Subject psiBLAST, MaxHom MSA Machine Learning Approach HHHLLLHHHEEE Known structures Protein secondary structure prediction

Evaluating secondary structure prediction methods Assume you have a new method for SS prediction. Given the following sequence you get the result: GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT ---EEEEEEE---EEEE HHHHHHHH-----EEEE EEEEEEEEEE How can you assess how good your result is? 1)Compare it to the TRUTH, assuming this structure exists. (what if it doesn’t?) 2)Calculate the percentage of amino acids whose secondary structure class (helix, coil, or sheet) is correctly predicted. (Q3) Coil: -, Beta strand: E, Alpha helix: H

Original sequence: GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT Prediction: ---EEEEEEE---EEEE HHHHHHHH-----EEEE EEEEEEEEEE Truth (from a PDB file): -----EE HHHHHHHHHH EE HHHHHHH----- Evaluating secondary structure prediction methods

GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT ---EEEEEEE---EEEE HHHHHHHH-----EEEE EEEEEEEEEE -----EE HHHHHHHHHH EE HHHHHHH----- YYYNNYYNNNYYYNNNNYYYNNNNYYYYYYNNYYYYYNYYNYYYYYYYNNNNNNNNNNNN Evaluating secondary structure prediction methods What can be the problem with such calculation? Overall, there are 61 AA. Number of correctly predicted ( Y ) is 31. So the Q3 score of this method would be: 50.81%

Evaluating secondary structure prediction methods What can be the problem with such calculation? Assume that alpha helix is the SS of 60% of the residues. Then a constant prediction of alpha helices would yield a Q3 measurement of 60%. This method rewards over prediction of more common secondary structure classes in the database. What can be the problem with such calculation? Assume that alpha helix is the SS of 60% of the residues. Then a constant prediction of alpha helices would yield a Q3 measurement of 60%. This method rewards over prediction of more common secondary structure classes in the database.

There are other ways to measure correlation between the result and the ‘truth’. Most of them rely on the ratio between 1.True positive (TP) = correctly identified 2.True negative (TN) = correctly rejected 3.False positive (FP) = incorrectly identified 4.False negative (FN) = incorrectly rejected Evaluating secondary structure prediction methods

For instance, for the α-helix: –TP: number of α-helix residues that are correctly predicted. –TN: number of residues observed in β-strands and loops that are not predicted as α-helix. –FP: number of residues incorrectly predicted in α-helix conformation. –FN: number of residues observed in α-helices but predicted to be either in β-strands or loops. Evaluating secondary structure prediction methods

Sensitivity and specificity are statistical measures of the performance of a binary classification test. Sensitivity measures the proportion of actual positives which are correctly identified as such (e.g. the percentage of sick people who are correctly identified as having the condition). Specificity measures the proportion of negatives which are correctly identified (e.g. the percentage of healthy people who are correctly identified as not having the condition). Sensitivity and specificity

Question: –If the predictor perfectly predicts the truth, what would be the sensitivity rate? The specificity rate? Answer: –A perfect predictor would be described as ______% sensitivity (i.e. predict all people from the sick group as sick) and ______% specificity (i.e. not predict anyone from the healthy group as sick). Sensitivity and specificity

For any test, there is usually a trade-off between the measures. For example: in an airport security setting in which one is testing for potential threats to safety, scanners may be set to trigger on low- risk items like belt buckles and keys (low specificity), in order to reduce the risk of missing objects that do pose a threat to the aircraft and those aboard (high sensitivity). Sensitivity and specificity

Exercise Calculate the specificity and sensitivity of the alpha helix prediction in the following SS prediction: Original sequence: GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT Prediction: ---EEEEEEE---EEEE HHHHHHHH-----EEEE EEEEEEEEEE Truth (from a PDB file): -----EE HHHHHHHHHH EE HHHHHHH-----

Answer ---EEEEEEE---EEEE HHHHHHHH-----EEEE EEEEEEEEEE -----EE HHHHHHHHHH EE HHHHHHH----- Alpha helix: –TP = 6 –FP=2 –FN=4+7=11 –TN=61-(6+2+11)=42 TP - Alpha helices Correctly identified FP - Alpha helices Incorrectly identified FN - Alpha helices incorrectly rejected

Jpred 3 – SS prediction server

MSA Buried/exposed prediction Reliability score Final SS prediction

Original sequence: GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT Jpred Prediction + reliability: -----HHHH HHHHHHHHHHH EEE Truth (from a PDB file): -----EE HHHHHHHHHH EE HHHHHHH----- Jpred 3 – SS prediction server