Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp 1612-1621.

Slides:

Advertisements

Similar presentations

Transmembrane Protein Topology Prediction Using Support Vector Machines Tim Nugent and David Jones Bioinformatics Group, Department of Computer Science,

Advertisements

Introduction to Support Vector Machines (SVM)

(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab

Neural networks Introduction Fitting neural networks

Support Vector Machines

High Throughput Computing and Protein Structure Stephen E. Hamby.

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]

Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]

Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.

Reduced Support Vector Machine

Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.

Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.

Protein Structures.

Radial-Basis Function Networks

An Introduction to Support Vector Machines Martin Law.

M ACHINE L EARNING FOR P ROTEIN C LASSIFICATION : K ERNEL M ETHODS CS 374 Rajesh Ranganath 4/10/2008.

July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.

Efficient Model Selection for Support Vector Machines

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.

Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,

Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.

RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.

Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.

An Introduction to Support Vector Machines (M. Law)

Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.

Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac.

Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.

Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-

1 CISC 841 Bioinformatics (Fall 2007) Kernel engineering and applications of SVMs.

Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.

Final Report (30% final score) Bin Liu, PhD, Associate Professor.

Protein Prediction with Neural Networks! Chris Alvino CS152 Fall ’06 Prof. Keller.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.

Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Deep Feedforward Networks

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

CSE 4705 Artificial Intelligence

Basic machine learning background with Python scikit-learn

Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.

Feature Extraction Introduction Features Algorithms Methods

Introduction Feature Extraction Discussions Conclusions Results

Extra Tree Classifier-WS3 Bagging Classifier-WS3

Introduction to Bioinformatics II

Support Vector Machine (SVM)

Artificial Intelligence Chapter 3 Neural Networks

Protein Structures.

network of simple neuron-like computing elements

Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.

Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.

Artificial Intelligence Chapter 3 Neural Networks

Protein structure prediction.

Yang Liu, Perry Palmedo, Qing Ye, Bonnie Berger, Jian Peng

Artificial Intelligence Chapter 3 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Support Vector Machines 2

Artificial Intelligence Chapter 3 Neural Networks

Presentation transcript:

Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp with additional background information from Brad Morantz Presented by Brad Morantz 15 November 2004

Index ● Introduction ● Brief background of authors ● Why is this paper important? ● Introduction and background ● Quick overview of artificial neural networks ● Quick overview of support vector machines ● Hypothesis ● Methodology ● Results ● Conclusions ● Additional references

Brief Background of Authors ● Kang is in Computer Science ● Leslie is from Pharmacology ● Yang is in Pharmacology, Columbia Genome Center, and Computational Biology and Bioinformatics ● Research Funded by NIH and PhRMA

Importance of this Paper ● Protein backbone torsion angle provides more information than alpha, beta, & coil (conventional 3 state predictions) ● More information will contribute to better modeling of local structure of protein sequence segments ● Structure and function are highly correlated ● Hence, will allow better prediction of protein function

Introduction & Background ● Protein backbone torsion angles are highly correlated to protein secondary structures ● Loop residues in protein chain structurally determine regular secondary structure elements which leads to specific protein folding topology ● Involve enzymatic activities and protein to protein interactions such as antibody and antigen

The Key Concept ● The analysis of protein sequence-structure function relationship is facilitated significantly by local structure information from predictive algorithms

More Background ● Conformational variability is high, causing problems in molecular modeling. ● Three state (alpha, beta, & coil) structure modeling do not distinguish loop structure ● Backbone torsion angle modeling helps in modeling loop regions ● Little attention has been paid to this area.

Literature Review ● DeBrevern et al (2000): study of predictability ● Bystroff et al (2000): first backbone torsion angle work using HMM ● Karchin (2003): fold recognition, not prediction ● Yang & Wang (2003): database prediction using RMS residual

Overview of ANNs (Artificial Neural Networks) ● In Supervised mode – Data driven general function approximator – Teacher shows it examples and what each one is. The ANN is trained on this information. – Now show it something and it will tell you what it is. – It also functions like regression, only not constrained to linear functions – Black box performance – No understanding available from internal parameters

Non Linear Regression Training 1, 2, 421 3, 5, 238 4, 4, 668 5, 7, 799 7, 7, , 10, Testing 1, 2, 530 4, 4, 448 5, 6, 477 8, 8, , 9, Note: not tested on same as training Concept: sum of squares of inputs but total never quite reaches 100

Pattern Recognition Apple Flower

Support Vector Machine ● Linear Separator ● Draws line to divide groups ● Attempts to maximize width of line – Create fences to maximize separation ● Trade off is trying to maximize width but then there are more violations ● Concept expanded for N space ● PSVM has polynomial kernel – Allows curved line/plane to separate

Example SVM WhitebloodcountWhitebloodcount Systolic blood pressure MI Angina Example from Cohen & Hudson

Goal ● Predict backbone conformational state of each residue in protein chains ● Based on 4 (A, B, G, E) or 3 (A, B, G/E) conformational states

Hypothesis ● The two methods (SVM & ANN) are more accurate at predicting backbone torsion angle than previously published methods

Methodology ● Protein backbone torsion angles mapped onto  -  plot. ● Divide the map into the 4 conformational states (A, B, G, & E) ● Use PSSM (position specific score matrix) ● Use nine-sequence segments in non- redundant protein structures

The ANN Predictor ● 216 input nodes – 9 groups of 24 (1 group for each residue) ● Categorical inputs of the 20 amino acids ● Flag for residue position outside the C or N terminus of the protein chain ● 3 for backbone torsion angle prediction from LSBPS1 database ● 50 nodes in hidden layer ● 3 output nodes – A, B, & G/E ● E has only 1.7% of training cases so grouped with G

Methodology Cont'd ● Output from ANN converted to PSSM (position specific score matrix) by using long formula ● ANN trained on line – I suspect back propagation – Not related to the testing set – Terminated training when accuracy attenuated – Wrong way to do it! – Indicative of other mistakes ● 10 fold Jack-knife cross validation process – 10 runs, each with a different 10% testing set – Average of runs used for predictive accuracy

Accuracy Calculation ● Compare true backbone conformational state with predicted ● Predicted by using the output node with the largest value ● Then run trained ANN on entirely new set of proteins ● Cross validation produced average accuracy of 78.2%

Comparing Input Contribution ● Amino acids only gave 61.5% ● Torsion angle prediction from database gave 67.8% ● Both together gave 78.2% ● Not surprising – Standard method as more information supplied then better result expected

SVM Prediction ● Classification of 3 or 4 conformational states ● Inputs: – Amino acid sequences – Profiles from PSI Blast – secondary structures from PSI-Pred ● Input vector 189 dimensions – 9 protein sequences – 21 to code the categorical data for each ● Choose prediction to be class that gives the biggest margin for each example ● Use publicly available SVM light package

Testing the SVM ● Dunbrack-culled PDB dataset – Benchmark testing to compare to literature ● LSBSP1 dataset – 10 fold cross validation ● Results about the same for both methods ● Dunbrack-in scop dataset – 3 fold cross validation to match other test

SVM ● Binary mapping worse ● Profile mapping 6% better ● Secondary feature mapping 3% better ● Profile and secondary good in alpha and beta regions, but less in the loops ● No mention of repeatability (precision), ANOVA, or t tests (which means these close margins prove absolutely nothing) ● Polynomial kernel tried but only 1% improvement (see above note)

ANN and LSBSP1 ● 81.5% of A Backbone correctly predicted ● 76.6% of B backbone correctly predicted ● 46.5% of G/E backbone correctly predicted ● 77% correct overall ● Large enough sample size to be valid test Results

SVM vs ANN ● SVM better than ANN – On all residues 78.7% vs 78.2% – On loop residues 65.1% vs 63.5% ● No proper statistics supplied in support of this conclusion – 1% difference would easily be statistically insignificant with a large enough variance

Conclusions ● Optimally combining information to improve prediction is standard in decision science, but is “a difficult challenge in knowledge-based protein structure prediction procedures” ● Nearing limit of prediction accuracy of protein structures. ● Notice that they did not address the original hypothesis, nor do statistical testing to compare against other published work

Additional References ● IEEE Computational Intelligence Society – ● Author's web page – – ● Brad's web page – ● Technical Library ● AAAI- American Association for Artificial Intelligence –