Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,

Protein Secondary Structure Prediction

Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand, or loop. Protein secondary structure prediction

Servers for SS prediction AGADIR - An algorithm to predict the helical content of peptides APSSP - Advanced Protein Secondary Structure Prediction Server CFSSP - Chou & Fasman Secondary Structure Prediction Server GOR - Garnier et al, 1996 HNN - Hierarchical Neural Network method (Guermeur, 1997) HTMSRAP - Helical TransMembrane Segment Rotational Angle Prediction Jpred - A consensus method for protein secondary structure prediction at University of Dundee JUFO - Protein secondary structure prediction from sequence (neural network) NetSurfP - Protein Surface Accessibility and Secondary Structure Predictions NetTurnP - Prediction of Beta-turn regions in protein sequences nnPredict - University of California at San Francisco (UCSF) Porter - University College Dublin PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader, MaxHom, EvalSec from Columbia University Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction PSA - BioMolecular Engineering Research Center (BMERC) / Boston PSIpred - Various protein structure prediction methods at Bloomsbury Centre for Bioinformatics SOPMA - Geourjon and Delage, 1995 Scratch Protein Predictor DLP-SVM - Domain linker prediction using SVM at Tokyo University of Agriculture and Technology

SS prediction Methods Most basic idea - probabilities Chou-Fasman method (1974) Most basic idea - probabilities Chou-Fasman method (1974) Conditional probabilities GOR method (1978) Conditional probabilities GOR method (1978) Machine learning techniques SVM, Neural network (2004/5) Machine learning techniques SVM, Neural network (2004/5) Other improvements Environment, solvent accessibility (ongoing) Other improvements Environment, solvent accessibility (ongoing) ~50% ~60% ~70% ~80%

Query SwissProt BLASTp Query Subject psiBLAST, MaxHom MSA Machine Learning Approach HHHLLLHHHEEE Known structures Protein secondary structure prediction

Evaluating secondary structure prediction methods Assume you have a new method for SS prediction. Given the following sequence you get the result: GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT ---EEEEEEE---EEEE-------HHHHHHHH-----EEEE---------EEEEEEEEEE How can you assess how good your result is? 1)Compare it to the TRUTH, assuming this structure exists. (what if it doesn’t?) 2)Calculate the percentage of amino acids whose secondary structure class (helix, coil, or sheet) is correctly predicted. (Q3) Coil: -, Beta strand: E, Alpha helix: H

Original sequence: GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT Prediction: ---EEEEEEE---EEEE-------HHHHHHHH-----EEEE---------EEEEEEEEEE Truth (from a PDB file): -----EE-------------HHHHHHHHHH--------EE--------HHHHHHH----- Evaluating secondary structure prediction methods

GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT ---EEEEEEE---EEEE-------HHHHHHHH-----EEEE---------EEEEEEEEEE -----EE-------------HHHHHHHHHH--------EE--------HHHHHHH----- YYYNNYYNNNYYYNNNNYYYNNNNYYYYYYNNYYYYYNYYNYYYYYYYNNNNNNNNNNNN Evaluating secondary structure prediction methods What can be the problem with such calculation? Overall, there are 61 AA. Number of correctly predicted ( Y ) is 31. So the Q3 score of this method would be: 50.81%

Evaluating secondary structure prediction methods What can be the problem with such calculation? Assume that alpha helix is the SS of 60% of the residues. Then a constant prediction of alpha helices would yield a Q3 measurement of 60%. This method rewards over prediction of more common secondary structure classes in the database. What can be the problem with such calculation? Assume that alpha helix is the SS of 60% of the residues. Then a constant prediction of alpha helices would yield a Q3 measurement of 60%. This method rewards over prediction of more common secondary structure classes in the database.

There are other ways to measure correlation between the result and the ‘truth’. Most of them rely on the ratio between 1.True positive (TP) = correctly identified 2.True negative (TN) = correctly rejected 3.False positive (FP) = incorrectly identified 4.False negative (FN) = incorrectly rejected Evaluating secondary structure prediction methods

For instance, for the α-helix: –TP: number of α-helix residues that are correctly predicted. –TN: number of residues observed in β-strands and loops that are not predicted as α-helix. –FP: number of residues incorrectly predicted in α-helix conformation. –FN: number of residues observed in α-helices but predicted to be either in β-strands or loops. Evaluating secondary structure prediction methods

Sensitivity and specificity are statistical measures of the performance of a binary classification test. Sensitivity measures the proportion of actual positives which are correctly identified as such (e.g. the percentage of sick people who are correctly identified as having the condition). Specificity measures the proportion of negatives which are correctly identified (e.g. the percentage of healthy people who are correctly identified as not having the condition). Sensitivity and specificity

Question: –If the predictor perfectly predicts the truth, what would be the sensitivity rate? The specificity rate? Answer: –A perfect predictor would be described as ______% sensitivity (i.e. predict all people from the sick group as sick) and ______% specificity (i.e. not predict anyone from the healthy group as sick). Sensitivity and specificity

For any test, there is usually a trade-off between the measures. For example: in an airport security setting in which one is testing for potential threats to safety, scanners may be set to trigger on low- risk items like belt buckles and keys (low specificity), in order to reduce the risk of missing objects that do pose a threat to the aircraft and those aboard (high sensitivity). Sensitivity and specificity

Exercise Calculate the specificity and sensitivity of the alpha helix prediction in the following SS prediction: Original sequence: GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT Prediction: ---EEEEEEE---EEEE-------HHHHHHHH-----EEEE---------EEEEEEEEEE Truth (from a PDB file): -----EE-------------HHHHHHHHHH--------EE--------HHHHHHH-----

Answer ---EEEEEEE---EEEE-------HHHHHHHH-----EEEE---------EEEEEEEEEE -----EE-------------HHHHHHHHHH--------EE--------HHHHHHH----- Alpha helix: –TP = 6 –FP=2 –FN=4+7=11 –TN=61-(6+2+11)=42 TP - Alpha helices Correctly identified FP - Alpha helices Incorrectly identified FN - Alpha helices incorrectly rejected

Jpred 3 – SS prediction server

MSA Buried/exposed prediction Reliability score Final SS prediction

Original sequence: GLGGYMLGSAMSRPMIHFGNDWEDRYYRENMYRYPNQVYYRPVDQYSNQNNFVHDCVNIT Jpred Prediction + reliability: -----HHHH------------HHHHHHHHHHH-------------------EEE------ 997500000026777567776017899988721577400467777777773000000699 Truth (from a PDB file): -----EE-------------HHHHHHHHHH--------EE--------HHHHHHH----- Jpred 3 – SS prediction server

Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,

Similar presentations

Presentation on theme: "Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,

Similar presentations

Presentation on theme: "Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,"— Presentation transcript:

Similar presentations

About project

Feedback