Download presentation
Presentation is loading. Please wait.
Published byPaula Norris Modified over 9 years ago
1
Protein Folding recognition with Committee Machine Mika Takata
2
Outline Background System Outline Experiment Experimental result Reference 2
3
Background Computation + biology + chemical + medicine + ・・・・ = significantly important Structure Classification Of Protein database Fold level class : remote homology Better recognition, better Tertiary structure prediction All alpha SCOP All beta a/ba+b Globin- like Cytoch- rome c Cupre- doxins (TIM)- barrel β- grasp class Fold ・・・・・ ・・・・
4
1. Chemical approaching parameter ( i ) i. 6 types of Chemical features ii. String windows N-grams iii. Protein molecular weight value iv. Protein sequential length value 4
5
1. Chemical approaching parameter ( ii ): Global parameter Symbol C Frequencies of 20 amino acid symbols in a protein sequence Symbol S, H, V, P, Z (3-dim: composition, 3-dim: transition, 3×5-dim: Distribution)
6
1. Chemical approaching parameter ( iii ) Protein molecular weight value Sum of Amino acids molecular weight Utilize of molecular weight Protein sequential length value Utilize of sequential length
7
2. Feature parameter based on Sliding window N-Gram Proteomic fragment similarity (*) string length =2 …… NSDWTNNETRHAIVILIIIIIMLRHGKIPYWCMIPFAA …
8
3: Feature parameter based on HMM Fig 1 : feature parameter flow based on HMM
9
Training data Test data Model Ⅲ Model Ⅰ C S V H P Seq-Length Z Mol-Weight Model Ⅱ Spectrum Kernel HMM decision_ Committe e SVM_1 Committe e SVM_ Committe e SVM_27 ・・・・・・・・ Step 2 Step 1
10
Evaluation measurement : ”Accuracy Q” shows how correctly recognized in class i The numbers of data in each class are various
11
Experiment Parameter i. Chemical approaching parameter ii. Feature parameter based on Sliding window kernel (string length = 2 & 3) iii. Feature parameter based on HMM i. Classification Methods i. independent SVM ii. Committee SVM Array Multi-class recognition approaches i. One-vs-others ii. All-vs-All method Data set Training data : 341, test data : 353 (total: 694) http://www.nersc.gov/~cding/protein Cross Validation : 10 times
12
Result (1) : Independent SVM- Model I
13
Result (2) : CM- Model I
14
Result (3) : CM- Model II
15
Result (3) : Model I & II
16
Result (4) : Model I & III
17
Result (5) : Model I & II & III
18
Conlusion Improvement by using all models of Committee Machine Spectrum kernel was works if used with string length of 2 advantage Take advantage of sporadic data ( ex. chemical base and hmm) Reduce of computational cost
19
Reference ( i ) 1. Takata, M., Matsuyama, Y.: Protein Folding Classification by Committee SVM Array, Lecture Notes in Computer Science, No.5507, pp. 369-377, 2009. 2. Matsuyama, Y., Kawasaki, K., Hotta, T, mizutani, Takata, M., Ishida, A.: Eukaryotic transcription start site recognition involving non-promoter model. Intelligent Systems for Molecular Biology, Toronto (2008) L05 3. Matsuyama, Y., Ishihara, Y., Ito, Y., Hotta, T., Kawasaki, K., Hasegawa, T., Takata, M.: Promoter recognition involving motif detection: Studies on E. coli and human genes. Intelligent Systems for Molecular Biology, Vienna (2007) H06. 4. Dubchak, I., Muchunik, I., Holbrook, S.R., Kim, S-H.: Prediction of protein folding class using global description of amino acid sequence. Proc. Natl. Acad. Sci. USA 92 (1995) 8700–8704 5. Dubchak, I., Muchnik, I., Mayor, C., Dralyyuk, I., Kim, S-H.: Recognition of a Protein Fold in the Context of the SCOP Classification. Proteins: Structure, Function, and Genetics 35 (1999) 401–407
20
Reference ( ii ) 1. Ding, C.H.Q, Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinfo. 17 (2001) 349–358 2. Mount,. D.W.: Bioinformatics. Cold Spring Harbor Laboratory Press (2001) 3. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247 (1995) 536–540. 4. Leslie, C., Eskin, E., Noble, W.S.: The Spectrum kernel: A string kernel for SVM protein classification. Pacific Symposium on Biocomputing 7 (2002) 566–575 5. Tabrez, M., Shamim, A., Anwaruddin, M., Nagarajaram, H.A.: Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. Bioinfo. 23 (2007) 3320–3327 6. Lodhi, H,., Saunders, C., Shawe-Taylor, J., Watkins, C.: Text classification using string kernels. J. of Machine Learning Research 2 (2002) 419–444.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.