Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Folding recognition with Committee Machine Mika Takata.

Similar presentations


Presentation on theme: "Protein Folding recognition with Committee Machine Mika Takata."— Presentation transcript:

1 Protein Folding recognition with Committee Machine Mika Takata

2 Outline  Background  System Outline  Experiment  Experimental result  Reference 2

3 Background  Computation + biology + chemical + medicine + ・・・・ = significantly important  Structure Classification Of Protein database  Fold level class : remote homology  Better recognition, better Tertiary structure prediction All alpha SCOP All beta a/ba+b Globin- like Cytoch- rome c Cupre- doxins (TIM)- barrel β- grasp class Fold ・・・・・ ・・・・

4 1. Chemical approaching parameter ( i ) i. 6 types of Chemical features ii. String windows N-grams iii. Protein molecular weight value iv. Protein sequential length value 4

5 1. Chemical approaching parameter ( ii ): Global parameter  Symbol C  Frequencies of 20 amino acid symbols in a protein sequence  Symbol S, H, V, P, Z  (3-dim: composition, 3-dim: transition, 3×5-dim: Distribution)

6 1. Chemical approaching parameter ( iii )  Protein molecular weight value  Sum of Amino acids molecular weight  Utilize of molecular weight  Protein sequential length value  Utilize of sequential length

7 2. Feature parameter based on Sliding window N-Gram  Proteomic fragment similarity (*) string length =2 …… NSDWTNNETRHAIVILIIIIIMLRHGKIPYWCMIPFAA …

8 3: Feature parameter based on HMM Fig 1 : feature parameter flow based on HMM

9 Training data Test data Model Ⅲ Model Ⅰ C S V H P Seq-Length Z Mol-Weight Model Ⅱ Spectrum Kernel HMM decision_ Committe e SVM_1 Committe e SVM_ Committe e SVM_27 ・・・・・・・・ Step 2 Step 1

10 Evaluation measurement : ”Accuracy Q” shows how correctly recognized in class i The numbers of data in each class are various

11 Experiment  Parameter i. Chemical approaching parameter ii. Feature parameter based on Sliding window kernel (string length = 2 & 3) iii. Feature parameter based on HMM i. Classification Methods i. independent SVM ii. Committee SVM Array  Multi-class recognition approaches i. One-vs-others ii. All-vs-All method  Data set  Training data : 341, test data : 353 (total: 694)  http://www.nersc.gov/~cding/protein  Cross Validation : 10 times

12 Result (1) : Independent SVM- Model I

13 Result (2) : CM- Model I

14 Result (3) : CM- Model II

15 Result (3) : Model I & II

16 Result (4) : Model I & III

17 Result (5) : Model I & II & III

18 Conlusion  Improvement by using all models of Committee Machine  Spectrum kernel was works if used with string length of 2  advantage  Take advantage of sporadic data ( ex. chemical base and hmm)  Reduce of computational cost

19 Reference ( i ) 1. Takata, M., Matsuyama, Y.: Protein Folding Classification by Committee SVM Array, Lecture Notes in Computer Science, No.5507, pp. 369-377, 2009. 2. Matsuyama, Y., Kawasaki, K., Hotta, T, mizutani, Takata, M., Ishida, A.: Eukaryotic transcription start site recognition involving non-promoter model. Intelligent Systems for Molecular Biology, Toronto (2008) L05 3. Matsuyama, Y., Ishihara, Y., Ito, Y., Hotta, T., Kawasaki, K., Hasegawa, T., Takata, M.: Promoter recognition involving motif detection: Studies on E. coli and human genes. Intelligent Systems for Molecular Biology, Vienna (2007) H06. 4. Dubchak, I., Muchunik, I., Holbrook, S.R., Kim, S-H.: Prediction of protein folding class using global description of amino acid sequence. Proc. Natl. Acad. Sci. USA 92 (1995) 8700–8704 5. Dubchak, I., Muchnik, I., Mayor, C., Dralyyuk, I., Kim, S-H.: Recognition of a Protein Fold in the Context of the SCOP Classification. Proteins: Structure, Function, and Genetics 35 (1999) 401–407

20 Reference ( ii ) 1. Ding, C.H.Q, Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinfo. 17 (2001) 349–358 2. Mount,. D.W.: Bioinformatics. Cold Spring Harbor Laboratory Press (2001) 3. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247 (1995) 536–540. 4. Leslie, C., Eskin, E., Noble, W.S.: The Spectrum kernel: A string kernel for SVM protein classification. Pacific Symposium on Biocomputing 7 (2002) 566–575 5. Tabrez, M., Shamim, A., Anwaruddin, M., Nagarajaram, H.A.: Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. Bioinfo. 23 (2007) 3320–3327 6. Lodhi, H,., Saunders, C., Shawe-Taylor, J., Watkins, C.: Text classification using string kernels. J. of Machine Learning Research 2 (2002) 419–444.


Download ppt "Protein Folding recognition with Committee Machine Mika Takata."

Similar presentations


Ads by Google