Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,

Slides:



Advertisements
Similar presentations
Structural Classification and Prediction of Reentrant Regions in Alpha-Helical Transmembrane Proteins: Application to Complete Genomes Håkan Viklunda,
Advertisements

Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Secondary structure prediction from amino acid sequence.
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Training and applying hidden Markov models and support vector machines for prediction of T-cell epitopes Van Hai Van, Cao Thi Ngoc Phuong, Tran Linh Thuoc.
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph Algorithms Jianlin Cheng and Pierre Baldi Institute for Genomics.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein Secondary Structures
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Learning to Align Polyphonic Music. Slide 1 Learning to Align Polyphonic Music Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.
Comparative ab initio prediction of gene structures using pair HMMs
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Structure Prediction in 1D
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
Carnegie Mellon School of Computer Science Copyright © 2003, Carnegie Mellon. All Rights Reserved. Biological Language Modeling Project TXTpred: A New.
CISC667, F05, Lec27, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Review Session.
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.
Protein Tertiary Structure Prediction
Masquerade Detection Mark Stamp 1Masquerade Detection.
Rising accuracy of protein secondary structure prediction Burkhard Rost
Proteins Secondary Structure Predictions Structural Bioinformatics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
Intelligent Systems for Bioinformatics Michael J. Watts
Protein Secondary Structure Prediction. Input: protein sequence Output: for each residue its associated Secondary structure (SS): alpha-helix, beta-strand,
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Protein Secondary Structure Prediction G P S Raghava.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
New Strategies for Protein Folding Joseph F. Danzer, Derek A. Debe, Matt J. Carlson, William A. Goddard III Materials and Process Simulation Center California.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
. Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.
A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction Li Lihong (Anna Lee) Cumputer science 22th,Apr.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
HMMs and SVMs for Secondary Structure Prediction
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Protein Prediction with Neural Networks! Chris Alvino CS152 Fall ’06 Prof. Keller.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Proteins Structure Predictions Structural Bioinformatics.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Introduction to Bioinformatics II
Protein Structures.
Deep Learning in Bioinformatics
Protein structure prediction
Presentation transcript:

Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies, Andrew Lee, Marten van Dijk, and Srinivas Devadas Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Workshop on Pattern Recognition in Bioinformatics – August 20, 2006

Protein Structure Prediction Classical problem: given sequence, predict structure High-level approaches 1. Energy-minimization (ab-initio) techniques - Elegant, but often lack correct parameters 2. Homology-based techniques - Useful, but hard to predict new proteins Sequence Structure Our approach: Use energy minimization, but learn parameters from existing proteins

Our Framework (Training) Amino - acid Sequence Prediction Algorithm Correct structure Protein Data Bank Energy Parameters Predicted structure correctincorrect Done! Constraints energy(incorrect) > energy(correct) Learning Algorithm

Our Framework (Testing) Energy Parameters Predicted structure Prediction Algorithm Amino - acid Sequence

Initial Focus: Secondary Structure Classify each residue as alpha helix, beta strand, coil –In this paper, restrict to all-alpha proteins Applications: –Informing tertiary structure predictors –Identification of homologous proteins –Identification of active sites (coils)

Secondary Structure Predictors

Statistical Methods HMMs Sequence Only Sequence + Alignment Statistical Methods Sequence Only Sequence + Alignment

Secondary Structure Predictors Statistical Methods Neural Networks HMMs Sequence Only Sequence + Alignment Statistical Methods Neural Networks Sequence Only Sequence + Alignment

Secondary Structure Predictors Statistical Methods Neural Networks HMMs Sequence Only Sequence + Alignment Statistical Methods Neural Networks Sequence Only Sequence + Alignment SVMs

Secondary Structure Predictors Statistical Methods Neural Networks HMMs Sequence Only Sequence + Alignment Statistical Methods Neural Networks HMMs Sequence Only Sequence + Alignment SVMs

Secondary Structure Predictors Exploits biochemical models Offers biological insight Statistical Methods Neural Networks HMMs Sequence Only Sequence + Alignment Statistical Methods Neural Networks HMMs Sequence Only Sequence + Alignment SVMs parameters 680 MB of support vectors 471 parameters

Secondary Structure Predictors 302 params Statistical Methods Neural Networks HMMs Sequence Only Sequence + Alignment Statistical Methods Neural Networks HMMs Sequence Only Sequence + Alignment SVMs parameters 471 parameters Exploits biochemical models Offers biological insight 680 MB of support vectors

Our Framework Applied to Helix Prediction Amino - acid Sequence Correct structure Protein Data Bank Energy Parameters Predicted structure correctincorrect Done! Constraints energy(incorrect) > energy(correct) Learning Algorithm Prediction Algorithm Hidden Markov Model Support Vector Machines Alpha Helices MNIFEMLRIDEGL HHHHHHHHH

Energy Parameters Description of Energy Parameters Number of Parameters Name Energy of residue R in a helix20HRHR Energy of residue R at offset i (-3…3) from N-cap140N R,i Energy of residue R at offset i (-3…3) from C-cap140C R,i Penalty for coils of length 1 or Total

Energy Parameters Description of Energy Parameters Number of Parameters Name Energy of residue R in a helix20HRHR Energy of residue R at offset i (-3…3) from N-cap140N R,i Energy of residue R at offset i (-3…3) from C-cap140C R,i Penalty for coils of length 1 or 22 Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = 302 Total

Energy Parameters Description of Energy Parameters Number of Parameters Name Energy of residue R in a helix20HRHR Energy of residue R at offset i (-3…3) from N-cap140N R,i Energy of residue R at offset i (-3…3) from C-cap140C R,i Penalty for coils of length 1 or 22 Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = H F + H E + H L + H R + H I + H D (Helix) 302 Total

Energy Parameters Description of Energy Parameters Number of Parameters Name Energy of residue R in a helix20HRHR Energy of residue R at offset i (-3…3) from N-cap140N R,i Energy of residue R at offset i (-3…3) from C-cap140C R,i Penalty for coils of length 1 or 22 Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = H F + H E + H L + H R + H I + H D (Helix) + N M,-3 + N N,-2 + N I,-1 + N F,0 + N E,1 + N L,2 + N R,3 (N-cap) 302 Total

Energy Parameters Description of Energy Parameters Number of Parameters Name Energy of residue R in a helix20HRHR Energy of residue R at offset i (-3…3) from N-cap140N R,i Energy of residue R at offset i (-3…3) from C-cap140C R,i Penalty for coils of length 1 or 22 Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = H F + H E + H L + H R + H I + H D (Helix) + N M,-3 + N N,-2 + N I,-1 + N F,0 + N E,1 + N L,2 + N R,3 (N-cap) + C L,-3 + C R,-2 + C I,-1 + C D,0 + C E,1 + C G,2 + C L,3 (C-cap) 302 Total

Energy Parameters Description of Energy Parameters Number of Parameters Name Energy of residue R in a helix20HRHR Energy of residue R at offset i (-3…3) from N-cap140N R,i Energy of residue R at offset i (-3…3) from C-cap140C R,i Penalty for coils of length 1 or 22 Example: Sequence: MNIFELRIDEGL Structure: HHHHHH Energy = H F + H E + H L + H R + H I + H D (Helix) + N M,-3 + N N,-2 + N I,-1 + N F,0 + N E,1 + N L,2 + N R,3 (N-cap) + C L,-3 + C R,-2 + C I,-1 + C D,0 + C E,1 + C G,2 + C L,3 (C-cap) 302 Total

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Correct structure Energy ( ) = H A *A + H G *G = w ¢ [A G] Highest energy in direction of energy parameters w Feature Space where w represents the energy parameters [H A H G ]

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Feature Space w Legal structure Correct structure Energy ( ) = H A *A + H G *G = w ¢ [A G] Highest energy in direction of energy parameters w where w represents the energy parameters [H A H G ]

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure w 1. Predict stucture

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure 1. Predict stucture

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure Separating Hyperplane 1. Predict stucture 2. Refine parameters

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure w Separating Hyperplane 1. Predict stucture 2. Refine parameters

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure w 1. Predict stucture 2. Refine parameters

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure w 1. Predict stucture 2. Refine parameters 3. Predict structure

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure 1. Predict stucture 2. Refine parameters 3. Predict structure

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure 1. Predict stucture 2. Refine parameters 3. Predict structure 4. Refine parameters

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure w 1. Predict stucture 2. Refine parameters 3. Predict structure 4. Refine parameters

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure w 1. Predict stucture 2. Refine parameters 3. Predict structure 4. Refine parameters

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure w 1. Predict stucture 2. Refine parameters 3. Predict structure 4. Refine parameters 5. Predict structure

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure 1. Predict stucture 2. Refine parameters 3. Predict structure 4. Refine parameters 5. Predict structure

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure 1. Predict stucture 2. Refine parameters 3. Predict structure 4. Refine parameters 5. Predict structure 6. Refine parameters

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure w 1. Predict stucture 2. Refine parameters 3. Predict structure 4. Refine parameters 5. Predict structure 6. Refine parameters

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure w 1. Predict stucture 2. Refine parameters 3. Predict structure 4. Refine parameters 5. Predict structure 6. Refine parameters

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure w Structure already predicted 1. Predict stucture 2. Refine parameters 3. Predict structure 4. Refine parameters 5. Predict structure 6. Refine parameters 7. Predict structure

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure w Structure already predicted 1. Predict stucture 2. Refine parameters 3. Predict structure 4. Refine parameters 5. Predict structure 6. Refine parameters 7. Predict structure 8. Terminate

Learning the Parameters A: # of Alanines in Helices G: # of Glycines in Helices Legal structure Feature Space Correct structure Predicted structure w Structure already predicted 1. Predict stucture 2. Refine parameters 3. Predict structure 4. Refine parameters 5. Predict structure 6. Refine parameters 7. Predict structure 8. Terminate Details in paper: - How to converge faster - Early termination condition - [Tsochantaridis et al., ICML’02]

Experimental Methodology Data set: 300 non-homologous all-alpha proteins –From EVA’s sequence-unique subset of the PDB, July 2005 –Only consider alpha helices (“H” symbol in DSSP) Randomly split into 150 training, 150 test proteins

Results Comparison to others –Best HMM method to date that does not utilize alignment info Offers 3.5% (Q  ), 0.2% (SOV  ) over previous best –Lags behind neural networks; e.g., Porter overall SOV = 76.6% –However, we could likely gain 6-8% from alignment profiles Caveats –Moving beyond all-alpha proteins, we could suffer 3% –By considering 3/10 helices, we could decrease 2% MetricValueExplanation QQ 77.6%percent of residues correctly predicted SOV  73.4%segment overlap measure [Zemla’99] [Nguyen02] [Rost93] [Jones99]

Conclusions Represents first step toward learning biophysical parameters for energy minimization techniques –Iterative, demand-driven learning process using SVMs Promising results on alpha-helix prediction –77.6% among best Q  for methods without alignment info Future work: super-secondary structure –Will predict full “contact maps” rather than 3-state labels –For beta sheets, replace HMMs by multi-tape grammars

Extra Slides

Prediction Algorithm Parameters represent energetic benefit of a given feature in a protein structure –Features are fixed, chosen by designer –Example features: Number of prolines in an alpha helix Number of coils shorter than 2 residues Energy (structure) =  features 2 structure Energy (feature) Minimal-energy structure found with dynamic prog. –Idea: consider all structures, exploiting overlapping problems –Implemented as HMM using Viterbi algorithm Structure with Minimal Energy

Learning Algorithm Constraints have form: For all incorrectly predicted structures S i, in future selection of the parameters w: Energy w (S i ) > Energy w (correct structure) Constraints are linear in the energy parameters. If feasible, could solve with linear programming In general, solve with Support Vector Machines (SVMs) –Energy(S i ) ¸ Energy (correct structure)  i (  i ¸ 0) –Find parameters w minimizing ½ ||w|| 2 + C/n  i=1  i n Provides general solution using soft-margin criterion