Download presentation
Presentation is loading. Please wait.
Published byAnissa Miller Modified over 9 years ago
1
LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg http://bidd.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore csccyz@nus.edu.sg http://bidd.nus.edu.sgcsccyz@nus.edu.sg http://bidd.nus.edu.sg
2
2 Protein Function and Functional Family Proteins of similar functional characteristics can be grouped into a family
3
3 Protein Function and Functional Family Proteins of similar functional characteristics can be grouped into a family
4
4 Protein Function and Functional Family Proteins of similar functional characteristics can be grouped into a family
5
5 Functional Classification of Proteins by SVM A protein is classified as either belong (+) or not belong (-) to a functional family By screening against all families, the function of this protein can be identified (example: SVMProt)SVMProt Protein Family-1 SVM Family-2 SVM Family-3 SVM Protein belongs to Family-3 - - + - -
6
6 Functional Classification of Proteins by SVM What is SVM? Support vector machines, a machine learning method, learning by examples, statistical learning, classify objects into one of the two classes. Advantages of SVM: Diversity of class members (no racial discrimination). Use of sequence-derived physico-chemical features as basis for classification. Suitable for functional classification of novel proteins (distantly-related proteins, homologous proteins of different functions).
7
7 Machine Learning Method Inductive learning: Example-based learning Descriptor Positive examples Negative examples
8
8 Machine Learning Method A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1) Feature vectors: Descriptor Feature vector Positive examples Negative examples
9
9 SVM Method Feature vectors in input space: A=(1, 1, 1) B=(0, 1, 1) C=(1, 1, 1) D=(0, 1, 1) E=(0, 0, 0) F=(1, 0, 1) Z Input space X Y B A E F Feature vector
10
10 SVM Method Border New border Project to a higher dimensional space Protein family members Nonmembers Protein family members Nonmembers
11
11 SVM method Support vector New border Protein family members Nonmembers
12
12 SVM Method Protein family members Nonmembers New border Support vector
13
13 SVM Method Border line is nonlinear
14
14 SVM method Non-linear transformation: use of kernel function
15
15 SVM method Non-linear transformation
16
16 SVM Method
17
17 SVM Method
18
18 SVM Method
19
19 SVM Method
20
20 SVM for Classification of Proteins How to represent a protein? Each sequence represented by specific feature vector assembled from encoded representations of tabulated residue properties: –amino acid composition –Hydrophobicity –normalized Van der Waals volume –polarity, –Polarizability –Charge –surface tension –secondary structure –solvent accessibility Three descriptors, composition (C), transition (T), and distribution (D), are used to describe global composition of each of these properties. Nucleic Acids Res., 31: 3692-3697
21
21 SVM for Classification of Proteins How to represent a protein?
22
22 SVM for Classification of Proteins How to represent a protein? From protein sequence: To Feature vector : (C_amino acid composition, T_ amino acid composition, D_ amino acid composition, C_hydrophobicity, T_hydrophobicity, D_hydrophobicity, … ) Nucleic Acids Res., 31: 3692-3697
23
Protein function prediction software SVMProt Useful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions Your protein sequence Computer loaded with SVMProt Support vector machines classifier for every protein functional family Identified Functional families Protein functional indications Send sequence to classifier Nucl. Acids Res. 31, 3692-3697 (2003) Input sequence through internet Option 2Option 1 Input sequence on local machine http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi Your protein sequence Which functional families your protein belong to?
24
Protein function prediction software SVMProt Useful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions. Protein families covered: 46 enzyme families, 3 receptor families, 4 transporter and channel families, 6 DNA- and RNA-binding families, 8 structural families, 2 regulator/factor families. SVMProt web-version at: http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi Nucl. Acids Res. 31, 3692-3697 (2003)
25
Protein function prediction software SVMProt Nucl. Acids Res. 31, 3692-3697 (2003) Check covered protein families here Input sequence here Check format here
26
Protein function prediction software SVMProt Nucl. Acids Res. 31, 3692-3697 (2003) Probability of correct prediction Prediction score
27
27 Summary of Today’s lecture Machine learning method for protein function prediction. Use of SVMProt for probing protein function
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.