Protein Folding recognition with Committee Machine Mika Takata.

Slides:



Advertisements
Similar presentations
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Advertisements

Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
High Throughput Computing and Protein Structure Stephen E. Hamby.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Strict Regularities in Structure-Sequence Relationship
Mismatch string kernels for discriminative protein classification By Leslie. et.al Presented by Yan Wang.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 9 Clustering Algorithms Bioinformatics Data Analysis and Tools.
Machine Learning for Protein Classification Ashutosh Saxena CS 374 – Algorithms in Biology Thursday, Nov 16, 2006.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
The Cell, Central Dogma and Human Genome Project.
Tutorial 2: Some problems in bioinformatics 1. Alignment pairs of sequences Database searching for sequences Multiple sequence alignment Protein classification.
Protein Homology Detection Using String Alignment Kernels Jean-Phillippe Vert, Tatsuya Akutsu.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Remote Homology detection: A motif based approach CS 6890: Bioinformatics - Dr. Yan CS 6890: Bioinformatics - Dr. Yan Swati Adhau Swati Adhau 04/14/06.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Presented by Liu Qi An introduction to Bioinformatics Algorithms Qi Liu
Protein Tertiary Structure Prediction
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
SUPERVISED NEURAL NETWORKS FOR PROTEIN SEQUENCE ANALYSIS Lecture 11 Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIMAS,
Truncation of Protein Sequences for Fast Profile Alignment with Application to Subcellular Localization Man-Wai MAK and Wei WANG The Hong Kong Polytechnic.
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel:
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
Intelligent Systems for Bioinformatics Michael J. Watts
Friday 17 rd December 2004Stuart Young Capstone Project Presentation Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by a grant from the National.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Protein Classification Using Averaged Perceptron SVM
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
Protein Classification. Given a new protein, can we place it in its “correct” position within an existing protein hierarchy? Methods BLAST / PsiBLAST.
1 CISC 841 Bioinformatics (Fall 2007) Kernel engineering and applications of SVMs.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
RBP1 Splicing Regulation in Drosophila Melanogaster Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla This presentation available at
Application of latent semantic analysis to protein remote homology detection Wu Dongyin 4/13/2015.
2 classes: ICS 280, BIT Forum Meeting only on Mondays from 5 to 6:20 in CS2 136 (BIT). (P. Baldi and L. Ralaivola) ICS 280: Baldi group meeting and projects.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Protein Family Classification using Sparse Markov Transducers Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology.
CZ5225 Methods in Computational Biology Lecture 2-3: Protein Families and Family Prediction Methods Prof. Chen Yu Zong Tel:
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
Bioinformatics Overview
SMA5422: Special Topics in Biotechnology
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
Protein Structures.
Generalizations of Markov model to characterize biological sequences
Sequential Hierarchical Clustering
Modeling of Spliceosome
Presentation transcript:

Protein Folding recognition with Committee Machine Mika Takata

Outline  Background  System Outline  Experiment  Experimental result  Reference 2

Background  Computation + biology + chemical + medicine + ・・・・ = significantly important  Structure Classification Of Protein database  Fold level class : remote homology  Better recognition, better Tertiary structure prediction All alpha SCOP All beta a/ba+b Globin- like Cytoch- rome c Cupre- doxins (TIM)- barrel β- grasp class Fold ・・・・・ ・・・・

1. Chemical approaching parameter ( i ) i. 6 types of Chemical features ii. String windows N-grams iii. Protein molecular weight value iv. Protein sequential length value 4

1. Chemical approaching parameter ( ii ): Global parameter  Symbol C  Frequencies of 20 amino acid symbols in a protein sequence  Symbol S, H, V, P, Z  (3-dim: composition, 3-dim: transition, 3×5-dim: Distribution)

1. Chemical approaching parameter ( iii )  Protein molecular weight value  Sum of Amino acids molecular weight  Utilize of molecular weight  Protein sequential length value  Utilize of sequential length

2. Feature parameter based on Sliding window N-Gram  Proteomic fragment similarity (*) string length =2 …… NSDWTNNETRHAIVILIIIIIMLRHGKIPYWCMIPFAA …

3: Feature parameter based on HMM Fig 1 : feature parameter flow based on HMM

Training data Test data Model Ⅲ Model Ⅰ C S V H P Seq-Length Z Mol-Weight Model Ⅱ Spectrum Kernel HMM decision_ Committe e SVM_1 Committe e SVM_ Committe e SVM_27 ・・・・・・・・ Step 2 Step 1

Evaluation measurement : ”Accuracy Q” shows how correctly recognized in class i The numbers of data in each class are various

Experiment  Parameter i. Chemical approaching parameter ii. Feature parameter based on Sliding window kernel (string length = 2 & 3) iii. Feature parameter based on HMM i. Classification Methods i. independent SVM ii. Committee SVM Array  Multi-class recognition approaches i. One-vs-others ii. All-vs-All method  Data set  Training data : 341, test data : 353 (total: 694)   Cross Validation : 10 times

Result (1) : Independent SVM- Model I

Result (2) : CM- Model I

Result (3) : CM- Model II

Result (3) : Model I & II

Result (4) : Model I & III

Result (5) : Model I & II & III

Conlusion  Improvement by using all models of Committee Machine  Spectrum kernel was works if used with string length of 2  advantage  Take advantage of sporadic data ( ex. chemical base and hmm)  Reduce of computational cost

Reference ( i ) 1. Takata, M., Matsuyama, Y.: Protein Folding Classification by Committee SVM Array, Lecture Notes in Computer Science, No.5507, pp , Matsuyama, Y., Kawasaki, K., Hotta, T, mizutani, Takata, M., Ishida, A.: Eukaryotic transcription start site recognition involving non-promoter model. Intelligent Systems for Molecular Biology, Toronto (2008) L05 3. Matsuyama, Y., Ishihara, Y., Ito, Y., Hotta, T., Kawasaki, K., Hasegawa, T., Takata, M.: Promoter recognition involving motif detection: Studies on E. coli and human genes. Intelligent Systems for Molecular Biology, Vienna (2007) H Dubchak, I., Muchunik, I., Holbrook, S.R., Kim, S-H.: Prediction of protein folding class using global description of amino acid sequence. Proc. Natl. Acad. Sci. USA 92 (1995) 8700– Dubchak, I., Muchnik, I., Mayor, C., Dralyyuk, I., Kim, S-H.: Recognition of a Protein Fold in the Context of the SCOP Classification. Proteins: Structure, Function, and Genetics 35 (1999) 401–407

Reference ( ii ) 1. Ding, C.H.Q, Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinfo. 17 (2001) 349– Mount,. D.W.: Bioinformatics. Cold Spring Harbor Laboratory Press (2001) 3. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247 (1995) 536– Leslie, C., Eskin, E., Noble, W.S.: The Spectrum kernel: A string kernel for SVM protein classification. Pacific Symposium on Biocomputing 7 (2002) 566– Tabrez, M., Shamim, A., Anwaruddin, M., Nagarajaram, H.A.: Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. Bioinfo. 23 (2007) 3320– Lodhi, H,., Saunders, C., Shawe-Taylor, J., Watkins, C.: Text classification using string kernels. J. of Machine Learning Research 2 (2002) 419–444.