Presentation is loading. Please wait.

Presentation is loading. Please wait.

LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877

Similar presentations


Presentation on theme: "LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877"— Presentation transcript:

1 LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg http://bidd.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore csccyz@nus.edu.sg http://bidd.nus.edu.sgcsccyz@nus.edu.sg http://bidd.nus.edu.sg

2 2 Protein Function and Functional Family Proteins of similar functional characteristics can be grouped into a family

3 3 Protein Function and Functional Family Proteins of similar functional characteristics can be grouped into a family

4 4 Protein Function and Functional Family Proteins of similar functional characteristics can be grouped into a family

5 5 Functional Classification of Proteins by SVM A protein is classified as either belong (+) or not belong (-) to a functional family By screening against all families, the function of this protein can be identified (example: SVMProt)SVMProt Protein Family-1 SVM Family-2 SVM Family-3 SVM Protein belongs to Family-3 - - + - -

6 6 Functional Classification of Proteins by SVM What is SVM? Support vector machines, a machine learning method, learning by examples, statistical learning, classify objects into one of the two classes. Advantages of SVM: Diversity of class members (no racial discrimination). Use of sequence-derived physico-chemical features as basis for classification. Suitable for functional classification of novel proteins (distantly-related proteins, homologous proteins of different functions).

7 7 Machine Learning Method Inductive learning: Example-based learning Descriptor Positive examples Negative examples

8 8 Machine Learning Method A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1) Feature vectors: Descriptor Feature vector Positive examples Negative examples

9 9 SVM Method Feature vectors in input space: A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1) Z Input space X Y B A E F Feature vector

10 10 SVM Method Border New border Project to a higher dimensional space Protein family members Nonmembers Protein family members Nonmembers

11 11 SVM method Support vector New border Protein family members Nonmembers

12 12 SVM Method Protein family members Nonmembers New border Support vector

13 13 SVM Method Border line is nonlinear

14 14 SVM method Non-linear transformation: use of kernel function

15 15 SVM method Non-linear transformation

16 16 SVM Method

17 17 SVM Method

18 18 SVM Method

19 19 SVM Method

20 20 SVM for Classification of Proteins How to represent a protein? Each sequence represented by specific feature vector assembled from encoded representations of tabulated residue properties: –amino acid composition –Hydrophobicity –normalized Van der Waals volume –polarity, –Polarizability –Charge –surface tension –secondary structure –solvent accessibility Three descriptors, composition (C), transition (T), and distribution (D), are used to describe global composition of each of these properties. Nucleic Acids Res., 31: 3692-3697

21 21 SVM for Classification of Proteins How to represent a protein?

22 22 SVM for Classification of Proteins How to represent a protein? From protein sequence: To Feature vector : (C_amino acid composition, T_ amino acid composition, D_ amino acid composition, C_hydrophobicity, T_hydrophobicity, D_hydrophobicity, … ) Nucleic Acids Res., 31: 3692-3697

23 Protein function prediction software SVMProt Useful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions Your protein sequence Computer loaded with SVMProt Support vector machines classifier for every protein functional family Identified Functional families Protein functional indications Send sequence to classifier Nucl. Acids Res. 31, 3692-3697 (2003) Input sequence through internet Option 2Option 1 Input sequence on local machine http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi Your protein sequence Which functional families your protein belong to?

24 Protein function prediction software SVMProt Useful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions. Protein families covered: 46 enzyme families, 3 receptor families, 4 transporter and channel families, 6 DNA- and RNA-binding families, 8 structural families, 2 regulator/factor families. SVMProt web-version at: http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi Nucl. Acids Res. 31, 3692-3697 (2003)

25 Protein function prediction software SVMProt Nucl. Acids Res. 31, 3692-3697 (2003) Check covered protein families here Input sequence here Check format here

26 Protein function prediction software SVMProt Nucl. Acids Res. 31, 3692-3697 (2003) Probability of correct prediction Prediction score

27 27 Summary of Today’s lecture Machine learning method for protein function prediction. Use of SVMProt for probing protein function


Download ppt "LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877"

Similar presentations


Ads by Google