LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877

Slides:



Advertisements
Similar presentations
STOR 892 Object Oriented Data Analysis Radial Distance Weighted Discrimination Jie Xiong Advised by Prof. J.S. Marron Department of Statistics and Operations.
Advertisements

Proteins & Nucleic Acids Proteins make up around 50% of the bodies dry mass and serve many functions in the body including: – Enzymes - Catalysts that.
1 DNA Analysis Amir Golnabi ENGS 112 Spring 2008.
LSM3241: Bioinformatics and Biocomputing Lecture 2: Bioinformatics of viral genome Prof. Chen Yu Zong Tel:
Mismatch string kernels for discriminative protein classification By Leslie. et.al Presented by Yan Wang.
Identifying Extracellular Plant Proteins Based on Frequent Subsequences of Amino Acids Y. Wang, O. Zaiane, R. Goebel.
Remote homology detection  Remote homologs:  low sequence similarity, conserved structure/function  A number of databases and tools are available 
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Essential Bioinformatics and Biocomputing (LSM2104: Section I) Biological Databases and Bioinformatics Software Prof. Chen Yu Zong Tel:
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.
Protein Homology Detection Using String Alignment Kernels Jean-Phillippe Vert, Tatsuya Akutsu.
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Biology 102 Lecture 5: Biological Molecules (cont.)
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
LSM2104/CZ2251 Essential Bioinformatics and Biocomputing Essential Bioinformatics and Biocomputing Protein Structure and Visualization (3) Chen Yu Zong.
Lecture 7: Computer aided drug design: Statistical approach. Lecture 7: Computer aided drug design: Statistical approach. Chen Yu Zong Department of Computational.
Protein Tertiary Structure Prediction
BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
CZ3253: Computer Aided Drug design Lecture 3: Drug and Cheminformatics Databases Prof. Chen Yu Zong Tel:
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
CZ5225 Methods in Computational Biology Lecture 4-5: Protein Structure and Structural Modeling Prof. Chen Yu Zong Tel:
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
CZ3253: Computer Aided Drug design Lecture 1: Drugs and Drug Development Part I Prof. Chen Yu Zong Tel:
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Protein Classification Using Averaged Perceptron SVM
Proteins. Protein Function  Catalysis  Structure  Movement  Defense  Regulation  Transport  Antibodies.
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
LSM3241: Bioinformatics and Biocomputing Lecture 6: Fundamentals of Molecular Modeling Prof. Chen Yu Zong Tel:
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002.
Application of latent semantic analysis to protein remote homology detection Wu Dongyin 4/13/2015.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
CZ5225 Methods in Computational Biology Lecture 2-3: Protein Families and Family Prediction Methods Prof. Chen Yu Zong Tel:
LSM3241: Bioinformatics and Biocomputing Lecture 7: Molecular Modeling Software Prof. Chen Yu Zong Tel:
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
CZ5226: Advanced Bioinformatics Lecture 7: Statistical Learning Methods Prof. Chen Yu Zong Tel:
Protein Folding recognition with Committee Machine Mika Takata.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
CZ3253: Computer Aided Drug design Lecture 7: Drug Design Methods II: SVM Prof. Chen Yu Zong Tel:
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
BIOINFORMATION A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation - - 王红刚 14S
Support Feature Machine for DNA microarray data
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
SMA5422: Special Topics in Biotechnology
CZ5226: Advanced Bioinformatics Lecture 3: MHC Molecules Prof
CZ3253: Computer Aided Drug design Introduction about the module Prof
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Prediction of RNA Binding Protein Using Machine Learning Technique
Machine Learning Week 1.
Extra Tree Classifier-WS3 Bagging Classifier-WS3
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
Other Classification Models: Support Vector Machine (SVM)
Andrey V Kajava, Gilbert Vassart, Shoshana J Wodak  Structure 
Presentation transcript:

LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: Room 07-24, level 7, SOC1, National University of Singapore

2 Protein Function and Functional Family Proteins of similar functional characteristics can be grouped into a family

3 Protein Function and Functional Family Proteins of similar functional characteristics can be grouped into a family

4 Protein Function and Functional Family Proteins of similar functional characteristics can be grouped into a family

5 Functional Classification of Proteins by SVM A protein is classified as either belong (+) or not belong (-) to a functional family By screening against all families, the function of this protein can be identified (example: SVMProt)SVMProt Protein Family-1 SVM Family-2 SVM Family-3 SVM Protein belongs to Family

6 Functional Classification of Proteins by SVM What is SVM? Support vector machines, a machine learning method, learning by examples, statistical learning, classify objects into one of the two classes. Advantages of SVM: Diversity of class members (no racial discrimination). Use of sequence-derived physico-chemical features as basis for classification. Suitable for functional classification of novel proteins (distantly-related proteins, homologous proteins of different functions).

7 Machine Learning Method Inductive learning: Example-based learning Descriptor Positive examples Negative examples

8 Machine Learning Method A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1) Feature vectors: Descriptor Feature vector Positive examples Negative examples

9 SVM Method Feature vectors in input space: A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1) Z Input space X Y B A E F Feature vector

10 SVM Method Border New border Project to a higher dimensional space Protein family members Nonmembers Protein family members Nonmembers

11 SVM method Support vector New border Protein family members Nonmembers

12 SVM Method Protein family members Nonmembers New border Support vector

13 SVM Method Border line is nonlinear

14 SVM method Non-linear transformation: use of kernel function

15 SVM method Non-linear transformation

16 SVM Method

17 SVM Method

18 SVM Method

19 SVM Method

20 SVM for Classification of Proteins How to represent a protein? Each sequence represented by specific feature vector assembled from encoded representations of tabulated residue properties: –amino acid composition –Hydrophobicity –normalized Van der Waals volume –polarity, –Polarizability –Charge –surface tension –secondary structure –solvent accessibility Three descriptors, composition (C), transition (T), and distribution (D), are used to describe global composition of each of these properties. Nucleic Acids Res., 31:

21 SVM for Classification of Proteins How to represent a protein?

22 SVM for Classification of Proteins How to represent a protein? From protein sequence: To Feature vector : (C_amino acid composition, T_ amino acid composition, D_ amino acid composition, C_hydrophobicity, T_hydrophobicity, D_hydrophobicity, … ) Nucleic Acids Res., 31:

Protein function prediction software SVMProt Useful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions Your protein sequence Computer loaded with SVMProt Support vector machines classifier for every protein functional family Identified Functional families Protein functional indications Send sequence to classifier Nucl. Acids Res. 31, (2003) Input sequence through internet Option 2Option 1 Input sequence on local machine Your protein sequence Which functional families your protein belong to?

Protein function prediction software SVMProt Useful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions. Protein families covered: 46 enzyme families, 3 receptor families, 4 transporter and channel families, 6 DNA- and RNA-binding families, 8 structural families, 2 regulator/factor families. SVMProt web-version at: Nucl. Acids Res. 31, (2003)

Protein function prediction software SVMProt Nucl. Acids Res. 31, (2003) Check covered protein families here Input sequence here Check format here

Protein function prediction software SVMProt Nucl. Acids Res. 31, (2003) Probability of correct prediction Prediction score

27 Summary of Today’s lecture Machine learning method for protein function prediction. Use of SVMProt for probing protein function