Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang
What is secondary structure? How to evaluate secondary structure prediction? How secondary structure prediction affects the accuracy of tertiary structure prediction? Our perspective: ``elite''
What is secondary structure?
Hydrogen bond: a non-covalent bond A hydrogen bond is identified if E in the following equation is less than -0.5 kcal/mol
8-state annotation by DSSP
Prediction Early methods of secondary-structure prediction were restricted to predicting the three predominate states: helix, sheet, or random coil. These methods were based on the helix- or sheet-forming propensities of individual amino acids, sometimes coupled with rules for estimating the free energy of forming secondary structure elements. Such methods were typically ~60% accurate in predicting which of the three states (helix/sheet/coil) a residue adopts.
A significant increase in accuracy (to nearly ~80%) was made by exploiting multiple sequence alignment; knowing the full distribution of amino acids that occur at a position (and in its vicinity, typically ~7 residues on either side) throughout evolution provides a much better picture of the structural tendencies near that position. For illustration, a given protein might have a glycine at a given position, which by itself might suggest a random coil there. However, multiple sequence alignment might reveal that helix-favoring amino acids occur at that position (and nearby positions) in 95% of homologous proteins spanning nearly a billion years of evolution. Moreover, by examining the average hydrophobicity at that and nearby positions, the same alignment might also suggest a pattern of residue solvent accessibility consistent with an α-helix. Taken together, these factors would suggest that the glycine of the original protein adopts α-helical structure, rather than random coil. Several types of methods are used to combine all the available data to form a 3-state prediction, including neural networks, hidden Markov models and support vector machines. Modern prediction methods also provide a confidence score for their predictions at every position.
Outline CNF model by Jinbo Multi-step learning model by Yaoqi Iterative deep learning model by Yaoqi Our perspective: Elite. –A new enperiment to detect how elite affects secondary structure prediction.
Methods –How to model the probability –Feature Selection Results –vs. other methods –Improvement
Protein 8-class secondary structure prediction using conditional neural fields Zhiyong Wang, Feng Zhao, Jian Peng, and Jinbo Xu Proteomics. 2011
Model
Training & Prediction
Features
Training/testing set
Results Outperform SSpro8 on each state
Regularization factor effect: insensitive, optimal when the factor is set to 9.
Neff effective: for SS prediction, it may not be the best strategy to use evolutionary information in as many homologs as possible. Instead, we should use a subset of sequence homologs to build sequence profile when there are many sequence homologs available.
J Comput Chem. 2012