Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.

Slides:



Advertisements
Similar presentations
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Advertisements

ECG Signal processing (2)
Christoph F. Eick Questions and Topics Review Dec. 10, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
High Throughput Computing and Protein Structure Stephen E. Hamby.
Training a Neural Network to Recognize Phage Major Capsid Proteins Author: Michael Arnoult, San Diego State University Mentors: Victor Seguritan, Anca.
Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids Y. Wang, O. Zaiane, R. Goebel.
Classification and risk prediction Usman Roshan. Disease risk prediction What is the best method to predict disease risk? –We looked at the maximum likelihood.
Reduced Support Vector Machine
Support Vector Machines Kernel Machines
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.
Protein Tertiary Structure Prediction
LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel:
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
Protein Local 3D Structure Prediction by Super Granule Support Vector Machines (Super GSVM) Dr. Bernard Chen Assistant Professor Department of Computer.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
An Introduction to Support Vector Machines (M. Law)
Frontiers in the Convergence of Bioscience and Information Technologies 2007 Seyed Koosha Golmohammadi, Lukasz Kurgan, Brendan Crowley, and Marek Reformat.
Applying Statistical Machine Learning to Retinal Electrophysiology Matt Boardman January, 2006 Faculty of Computer Science.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Evaluation of Techniques for Classifying Biological Sequences Authors: Mukund Deshpande and George Karypis Speaker: Sarah Chan CSIS DB Seminar May 31,
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
12/4/981 Automatic Target Recognition with Support Vector Machines Qun Zhao, Jose Principe Computational Neuro-Engineering Laboratory Department of Electrical.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
 Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues  Results show that.
Support Vector Machines
Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Support-Vector Networks C Cortes and V Vapnik (Tue) Computational Models of Intelligence Joon Shik Kim.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces Speaker: Po-Kai Shen Advisor: Tsai-Rong Chang Date: 2010/6/14.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
CS 9633 Machine Learning Support Vector Machines
Basic machine learning background with Python scikit-learn
Prediction of RNA Binding Protein Using Machine Learning Technique
Physics-guided machine learning for milling stability:
SVMs for Document Ranking
Presentation transcript:

Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical and Computer Engineering, Purdue Universiy, West Lafayette,IN 1.Introduction We usually say protein has four structures. In our research, we want to predict the secondary structure from the amino acid sequence. 2. Support Vector Machine SVM was invented by Vapnik in the 1990s. The algorithm is trying to find the best separating hyperplane with largest margin width and smaller training errors. In SVM, the hyperplane of the non-separable case is determined by solving the following equation: Acknowledgement This research was supported by NSF Grant MCB and partly by NSF Grant # Hydrophobicity Hydrophobicity and hydrophobic moment were previously used to classify membrane and surface protein sequences. In Eisenberg hydrophobicity, the scale was normalized with mean value equal to zero and standard deviation equal to unity. The hydrophobic moment is calculated as follows : 3.Dataset The protein sequence and protein structure data for membrane photosystem proteins were downloaded from the database of secondary structure assignments (DSSP) and the Protein Data Bank (PDB). We know which proteins are membrane proteins by using the information given in the Stephen White laboratory at UC Irvine. The protein id’s for these membrane photosystem proteins are 2PPS, 1JB0, 1FE1, 1S5L, 2AXT, 1IZL, 1VF5 and 1Q90. where μ H is the hydrophobic moment, H n is the hydrophobicity of the amino acid number n, N is the window length used ( it is 7 here), and δ is the angle of the turn between two amino acids. For classical helices as discussed here, the angle was set to Method Amino acids have different characteristics in hydrophobicity and hydrophobic moment. We first segment input data into two groups based on hydrophobicity or hydrophobic moment. Then, the data in each group is classified by an SVM. A threshold is picked by maximizing the difference of the composition. The input pattern with hydrophobicity greater than the threshold is sent to SVM1. The input pattern with hydrophobicity smaller than the threshold is sent to SVM2. In this way, by segmenting the original dataset into 2 groups, each SVM is trained with similar composition of structural types, and the overall classification accuracy is expected to be higher. 6.Experiment Result In each experiment, we randomly pick some proteins for training and some for testing so we can compare whether this method does improve the accuracy. We set the hydrophobicity threshold equal to We set the hydrophobic moment threshold equal to 1.7. Table 1. Comparison of the testing classification accuracy for H/ not H between one SVM only, and with two SVMs by considering hydrophobicity. Experiment12345 Only 1 SVM73.49%81.19%58.08%71.69%62.55% SVM186.78%88.63%74.89%82.61%73.59% SVM276.57%83.52%67.41%81.22%65.39% 2 SVMs overall80.07%85.29%70.06%81.72%68.4% Table 2. Comparison of the testing classification accuracy for H/ not H between one SVM only, and with two SVMs by considering hydromoment. Experiment12345 Only 1 SVM73.49%81.19%71.69%62.55%72.81% SVM172.99%82.19%75.42%60.57%72.74% SVM279.46%83.79%76.87%60.64%83.04% 2 SVMs overall75.75%82.87%76.05%60.6%77.2% Table 3. Comparison of Qtotal with one SVM only, and with two SVMs. Experiment12345 Only 1 SVM76.02%54.09%60.35%54.09%69.05% 2 SVMs overall80.66%63.04%63.74%63.04%76.95% 7.Conclusions This method is most useful when the distribution of hydrophobicity of the three classes are very separable. From the hydromoment experiment, Table 2, we see that one of the experiments actually gives worse result than one SVM. The separability of the three classes of structure is really important to improve classification accuracy. In these experiments, the ratio of support vectors was almost half of the training vectors. This means the patterns are really difficult to differentiate. Including hydrophobicity and/ or hydrophobic moment improves classification accuracy in the membrane proteins. In the future, we plan to search for some other natural discriminating features in alpha-helix, beta-sheet and coil to further increase classification accuracy. References B. E. Boser, I. M. Guyon, and V. N. Vapnik, ”A training algorithm for optimal margin classifiers,” D. Haussler, editor, 5th Annual ACM Workshop on COLT, pages , Pittsburgh, PA, ACM Press. D. Eisenberg, E. Schwarz, M. Kmaromy and R. Wall, ”Analysis of membrane and surface protein sequences with the hydrophobic moment plot,” Journal of Molecular Biology, Volume 179, Issue 1, 15 October 1984, Pages J.Y. Yang, M.Q. Yang, O. Ersoy, ”Datamining and knowledge discovery from membrane proteins,” Proceeding of IEEE Intelligence: Methods and Applications Conference, Istanbul, June Input Coding method Assuming window length is equal to 7, the window includes the previous 3 and subsequent 3 positions of the amino acid. Each amino acid is represented as a 1x 20 vector in one-of-m representation like [ ]. If the window length is 7, the pattern within a window is represented as a 1x140 vector.