2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

A Hidden Markov Model for Protein Secondary Structure Prediction
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
Predicting local Protein Structure Morten Nielsen.
Garnier-Osguthorpe-Robson
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear.
IPM-POLYTECHNIQUE-WPI Workshop on Bioinformatics and Biomathematics April 11-21, 2005 IPM School of Mathematics Tehran.
Protein Secondary Structures Assignment and prediction.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU April 8, 2003Claus Lundegaard Protein Secondary Structures Assignment and prediction.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Protein Secondary Structures Assignment and prediction.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Protein Secondary Structures Assignment and prediction Pernille Haste Andersen
Protein Secondary Structure Prediction Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia.
Structure Prediction in 1D
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU October 29, 2004Claus Lundegaard Protein Secondary Structures Assignment and.
Protein Secondary Structures Assignment and prediction.
Predicting local Protein Structure Morten Nielsen.
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST.
Protein Structure July 2, 2006 Learning objectives-Understand the basis of the secondary structure prediction program- Psi-PRED. Introduce the concept.
Class 7: Protein Secondary Structure
Protein structure prediction May 24, 2005 Return of Quiz#3 Writing assignments-please hand in. Learning objectives-Understand the basis of secondary structure.
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.
Predicting Protein Solvent Accessibility with Sequence, Evolutionary Information and Context-based Features 12/05/2013 Ashraf Yaseen Department of Mathematics.
Burkhard Rost (Columbia New York) Some gory details of protein secondary structure prediction Burkhard Rost CUBIC Columbia University
Protein structure prediction
Lecture 11, CS5671 Secondary Structure Prediction Progressive improvement –Chou-Fasman rules –Qian-Sejnowski –Burkhard-Rost PHD –Riis-Krogh Chou-Fasman.
Secondary Structure Prediction Protein Analysis Workshop 2008 Bioinformatics group Institute of Biotechnology University of helsinki Hung Ta
Rising accuracy of protein secondary structure prediction Burkhard Rost
Proteins Secondary Structure Predictions Structural Bioinformatics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Levels of Protein Structure
Protein Secondary Structure Prediction Some of the slides are adapted from Dr. Dong Xu’s lecture notes.
Secondary Structure Prediction and Signal Peptides Protein Analysis Workshop 2012 Bioinformatics group Institute of Biotechnology University of helsinki.
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
A simple and fast secondary structure prediction method with hidden neural networks Authors: Kuang Lin, Victor A. Simossis, Willam R. Taylor and Jaap Heringa.
Protein Secondary Structure Prediction: A New Improved Knowledge-Based Method Wen-Lian Hsu Institute of Information Science Academia Sinica, Taiwan.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Secondary Structure Prediction
Secondary structure prediction
P ROTEIN SEONDARY & SUPER-SECONDARY STRUCTURE PREDICTION WITH HMM By En-Shiun Annie Lee CS 882 Protein Folding Instructed by Professor Ming Li.
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Protein Secondary Structure Prediction G P S Raghava.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Obtaining secondary structure from sequence. Chapter 11 Creating a Predictor – The Task: what, why, how? – Finding some Examples – Finding some Features.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
HMMs and SVMs for Secondary Structure Prediction
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
Protein structure prediction June 27, 2003 Learning objectives-Understand the basis of secondary structure prediction programs. Become familiar with the.
Proteins Structure Predictions Structural Bioinformatics.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Introduction to Bioinformatics II
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”

The Truth (Information) is Out (In) There

But we’re still having a tough time finding it.

Given a protein sequence (primary structure), predict its secondary structures HWIATGQLIREAYEDYSS GHWIATRGQLIREAYEDYRHFSSECPFIP EEEEECCEEEEECCCHHHH CEEEEECCCEEEEECCCHHHHHHCCCCCC E:  -strand H:  -helix C: coil Assumption: short stretches of residues have propensity to adopt certain conformation ⇒ conformation of the central residue in a sequence fragment depends only on flanking residues (sliding window) Protein Secondary Structure Prediction H: ( H:  - helix, G: 3 10 helix, I:  -helix ) E: (E:  -strand, B: bridge) C: (T:  -turn, S: bend, C: coil)

-- Because we can (kind of). --Because it could be a first step towards prediction of protein tertiary structure. Why secondary structure prediction? “Have solution, need problem.” Nearly every imaginable algorithm has been applied to secondary structure prediction.

1. First generation: Single amino acid propensities Chou-Fasman method (1974), GOR I-IV ~56-60% accuracy 2. Second generation: Segments of 3-51 adjacent residues NNSSP, SSPAL ~65% accuracy 3. Neural network PHD, Psi-Pred, J-Pred 4. Support vector machine (SVM) 5. Hidden Markov Models (HMM) Third generation methods using evolutionary information ~76% accuracy Secondary Structure Prediction Methods

1. three-state per-residue prediction accuracy M ii, number of residues observed in state i and predicted in state i N obs, the total number of residues observed in 3 states Secondary Structure Prediction Accuracy 2. per-segment prediction accuracy (SOV, Segment of OVerlap) Per-stage segment overlap: S1: observed SS segment S2: predicted SS segment

Calculate the propensity for a given amino acid to adopt a certain ss-type l Example: from a data set with 30 proteins #Ala=2,000, #residues=20,000, #helix=4,000, #Ala in helix=580 p( ,aa) = 580/20,000, p(  ) = 4,000/20,000, p(aa) = 2,000/20,000 P = 580 / (4,000/10) = 1.45 i, amino acid , secondary structure state Single Residue Propensity Methods

Amino Acid Propensities to Secondary Structures Chou-Fasman method

* The idea is simple: predict SS of the central residue of a given segment from homologous segments (neighbors). For example, from database, find some number of the closest sequences to a subsequence defined by a window around the central residue, then use max (N , N , Nc) to assign the SS. Nearest Neighbor Methods RSTEVRASRQLAKEKVN Window size Homologous sequences ECCHHCCECCHHCC C Key parameters: 1.How to define similarity? 2.What size window of sequence should be examined? 3.How many close sequences should be selected?

The Devil is in the details…

D. Jones, J. Mol. Boil. 292, 195 (1999). Method : Neural network Input data : PSSM generated by PSI-BLAST Bigger and better sequence database Combining several database and data filtering Training and test sets preparation Ss prediction only makes sense for proteins with no homologous structure. No sequence & structural homologues between training and test sets by CATH and PSI-BLAST (mimicking realistic situation). Psi-Pred Method

Window size = 15 Two networks First network (sequence-to-structure): 315 = (20 + 1)  15 inputs extra unit to indicate where the windows spans either N or C terminus Data are scaled to [0-1] range by using 1/[1+exp(-x)] 75 hidden units 3 outputs (H, E, L) Second network (structure-to-structure): Structural correlation between adjacent sequences 60 = (3 + 1)  15 inputs 60 hidden units 3 outputs Accuracy ~76% Psi-Pred Method--Neural Network

Conf: Confidence (0=low, 9=high) ---very important!!!! Pred: Predicted secondary structure (H=helix, E=strand, C=coil) AA: Target sequence # PSIPRED HFORMAT (PSIPRED V2.3 by David Jones) Conf: Pred: CCHHHHHHHHHHHHHHHCCCCCCCHHHHHHHHHHHCCCCCEEECCCCEEEEEEECCCCCC AA: MMWEQFKKEKLRGYLEAKNQRKVDFDIVELLDLINSFDDFVTLSSCSGRIAVVDLEKPGD Conf: Pred: CCCCEEEEEECCCCCHHHHHHHHHCCCCCEEEEECCCEEEEECCCHHHHHHHHHHHHHCC AA: KASSLFLGKWHEGVEVSEVAEAALRSRKVAWLIQYPPIIHVACRNIGAAKLLMNAANTAG Conf: Pred: CCCCCCEECCCEEEEEECCCEEEEEECCCCCEEECHHHHHHHHHHHHHHHHHHHHHHHHH AA: FRRSGVISLSNYVVEIASLERIELPVAEKGLMLVDDAYLSYVVRWANEKLLKGKEKLGRL Sample Psi-Pred Output ***Compare the prediction for residues 9 and 17***

Sample Psi-Pred Output-II

Again, voting rules methods tend to be best ATKAVCVLKGDGPVQGTIHFEAKGDTVVVTGSITGLTEGDHGFHVHQFGDNTQGCTSAGP 2SOD CCCCCCCCCCCCCCCCEEHCCHHECEEEEEEEEEEEECCCCCCCCCCCCCCCCCCCCCCC BPS CCHEEEEECCCCCCCCEEEHHHCCCEEEEEEEEECECCCCCCEEEECCCCCCCCCCCCCC D_R CCCEEEEEECCCCCEEEEEEEECCCEEEEEEEEEEEECCCCCEEEEECCCCCCCCCCCCC DSC CCCEEEEECCCCCCCEEEEEECCCCEEEEEEEEECCCCCCCCEEEEEECCCCCCCCCCCC GGR HHHCEEEECCCCCCCEEEEEECCCCEEEEEECEEEEEECCCCEEEEECCCCCCEEECCCC GOR CCCCEEEECCCCCCCCCEEECCCCCCEEEEECEEECCCCCCCEEEECCCCCCCCEEECCC H_K CCCCEEEEECCCCCCCCCEEECCCCCEEEECCCCCCCCCCCEEEEEEEECCCCCCCCCCC K_S CCCCEEEECCCCCCCCEEEEECCCCEEEEEEEEEEECCCCCCEEEEECCCCCCCCCCCCC JOI ---EEEEE------EEEEEEEEE--EEEEEEEEE-----EEEEEEEE SOD HFNPLSKKHGGPKDEERHVGDLGNVTADKNGVAIVDIVDPLISLSGEYSIIGRTMVVHEK 2SOD CCCCCCCCCCCCCCCCCCCCCCECCCCCCHEECCCCCCCCCECCEECEEEEEEEEEEECC BPS CCCCCCCCCCCCCCCHHCECCCCCECCCCCCEEEEEEECCEEEECCCEEEEEEEEEEECC D_R CCCCCCCCCCCCCCEEEEECCCCCCCCCCCCEEEEEECCCCCCCCCCEEEEEEEEEEECC DSC CCCCCCCCCCCCCCCCEEECCCCCCCCCCCCCEEEEECCCCCCCCCCEEEECEEEEEECC GGR CCCCCCCCCCCCCCHHEEECCCCCCCCCCCCEEEEEEECCEEECCCCEEEEEEEEEECCC GOR CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEECCCCCCCCCCCCCCHHHHHHEECCC H_K CCCCCCCCCCCCCCCCEEECCCCCCCCCCCCCEEEEEEEEEEEEECCCEEECCEEEEEEE K_S CCCCCCCCCCCCCCCCEEECCCCCCCCCCCCEEEEEECCCCECCCCCEEEEEEEEEEECC JOI EEEEEE------EEEEEEE EEEEE-- 2SOD

Prediction Accuracy (EVA) EVA: Automatic evaluation of prediction servers

 Currently ~76%  Proteins with more than 100 homologues  80%  Assignment is ambiguous (5-15%). Recall DSSP vs STRIDE. -- non-unique protein structures (dynamic), H-bond cutoff, etc.  Different secondary structures between homologues (~12%).  Non-locality. Secondary structure is influenced by long-range interactions. -- Some segments can have multiple structure types (chameleon sequences). How Far Can We Go?

 Conceptually similar problem to SS prediction: Buried vs. Exposed.  Weighted Ensemble Solvent Accessibility predictor: Solvent accessibility E E E E E E B B B B B B

 To provide structural context for putative mutations that one wants to characterize biochemically or biophysically. Why bother?

 Again, conceptually similar problem to SS prediction: TM vs. Not. Transmembrane Segment Prediction