Training and applying hidden Markov models and support vector machines for prediction of T-cell epitopes Van Hai Van, Cao Thi Ngoc Phuong, Tran Linh Thuoc Faculty of Biology, University of Natural Sciences, VNU-HCMC, Vietnam Sixth International Conference on Bioinformatics InCoB2007
Epitope prediction “Epitope is the portion of an antigen that is recognized by the antigen receptor on lymphocytes” Molecular Biology Epitope prediction: Computers aid to develop epitope-based vaccines against various human pathogens for which no vaccines currently exist
T-cell epitope prediction T-cell epitopes are a subset of MHC binding peptides prediction of the peptides binding to MHC is essential for design of peptide-based vaccines HLA-A0201 Sequence Binding motifs Quantitative matrices Decision tree Artificial neural networks Hidden Markov models Support vector machines Molecular Biology
HMMs & SVMs HMMs (Hidden Markov Models) Statistical model that can capture complex relationships in data sets. SVMs (Support Vector Machines): Learning machine that can find the optimal separating hyperplane.
Epitope prediction for dengue virus Tropical disease Dengue fever Dengue hemorraghic fever Dengue shock syndrome Hypothesis of pathogenesis Antibody – dependent enhancement Virus virulence No dengue vaccine is available In our research:. Develop procedure for building automatically T-cell epitope predicting models. Find candidates in silico for making multivalent vaccines on 4 types of Dengue virus
Building models for predicting T-cell epitopes & applying these models on dengue virus
Building effective prediction models? The predicting ability of HMM and SVM models depends on: Experimentally peptides binding to MHC molecules Partition of the peptides into training set and testing set Encoding method A system finds easily and quickly the best prediction model when type of MHC molecules and quantity of binding peptides are changed
Processing MHC-binding experimental peptides
Create training and testing sets
Training & testing procedure HMMs (HMMer)SVMs (SVM_light)
Experiment 1 MethodHMMsSVMs DatabasesMHCBN, MHCPEP Homology7- amino acid No. homologous groupsbinding seq.: 11, non-binding seq.: 3 Kind of peptideBinding Non- binding Binding Non- binding No. peptides Training set Testing set Training times200 ParametersE-value = 0 ÷ 10 Linear kernel, c = 0 Encoding: binary, Blosum-62, physical-chemical method
Result of the training by HMMs HMM.7.136: A ROC =0.914 Choose parameter from HMM.7.136: At point: E=3.4, S=-8.5, SE=0.91, SP= 0.86, A ROC =0.885
Result of the training by SVMs Binary encoding: A ROC =0.42÷0.77 Blosum-62 encoding: A ROC = 0.47÷0.87 Chemical-physical encoding: A ROC = 0.41÷0.71 At blosum-62 encoding, data set SVM.7.blo62.46: SE=0.83, SP=0.90, A ROC =0.87
Experiment 2 MethodHMMsSVMs DatabasesMHCBN, MHCPEP, IEDB Homology7- amino acid, 6-amino acid, 5-amino acid Training times ParametersE-value = 40 ÷ 80 Linear kernel, c = 0 Encoding: binary, Blosum-62, Binary - Blosum-62 method
Result of the training by HMMs Homology5-amino acid6-amino acid7-amino acid Kind of peptideBinding No. homologous group No. Sequences in homologous groups Total peptides Training set Testing set A ROC 0.832÷ ÷ ÷0.876 The best HMM profileHMM.6.78
Training in 6-amino acid homologous groups Parameters of HMM.6.78: At point: E=42, S=-9.2, SE=0.91, SP= 0.84, A ROC =0.875 HMM.6.78: A ROC =0.883
Result of the training by SVMs methods Homology5-amino acid6-amino acid7-amino acid Kind of peptide Binding Non- binding Binding Non- binding Binding Non- binding Total homologous group Sequence in homologous groups Total sequences Training set Testing set A ROC Binary encoding (1) 0.847÷ ÷ ÷0.882 Blosum-62 encoding (2) 0.843÷ ÷ ÷0.894 Binary-Blosum- 62 encoding (3) 0.849÷ ÷ ÷0.891 Chosen set SVM.blo
Training in 7-amino acid homologous groups At SVM : SE=0.93, SP=0.86, A ROC =0.894 : Binary encoding : Blosum-62 encoding : Binary-Blosum-62 encoding
Epitope predicting procedure for dengue virus 1. Do multiple sequence alignment 2. Extract consensus sequences more than or equal 9 amino acids 3. Create 9-mer overlap sequences 4. Predict peptides binding to MHC by HMMs profile or SVMs model
Experiment 1 Proteins (1,2,3,4)Epitope sequencesMethods 537NS3, 536NS3, 2010DV3_gp1, 536NS3 LMRRGDLPVWL HMMs, SVMs 763NS5, 764NS5, 515NS5, 765NS5 LMYFHRRDLRL HMMs, SVMs 358NS3, 357NS3, 2HELICc, 357NS3 KTVWFVPSI SVMs 658NS5, 659NS5, 410NS5, 660NS5 AISGDDCVV SVMs 472NS5, 473NS5, 223NS5, 473NS5 AIWYMWLGA SVMs 101E, 99E, 99glycoprot, 99E RGWGNGCGL SVMs 194NS1, 194NS1, 193NS1, 194NS1 VHADMGYWI SVMs 352NS5, 353NS5, 103NS5, 353NS5 RVFKEKVDT SVMs 13NS1, 13NS1, 12NS1, 13NS1 LKCGSGIFV SVMs 26NS1, 26NS1, 25NS1, 26NS1 HTWTEQYKF SVMs 230NS1, 230NS1, 229NS1, 230NS1 TLWSNGVLES SVMs 327NS1, 327NS1, 326NS1, 327NS1 DGCWYGMEIRP SVMs 148NS3, 148NS3, 142Pep_S7, 148NS3 GLYGNGVVT SVMs 256NS3, 255NS3, 67DEXHc, 255NS3 EIVDLMCHA SVMs 297NS3, 296NS3, 108DEXHc, 296NS3 ARGYISTRV SVMs 410NS3, 409NS3, 54HELICc, 409NS3 DISEMGANF SVMs 36NS4B, 35NS4B, 35NS4B, 32NS4B ASAWTLYAV SVMs 118NS4B, 117NS4B, 117NS4B, 114NS4B HYAIIGPGLQA SVMs 142NS4B, 141NS4B, 141NS4B, 138NS4B IMKNPTVDGI SVMs 224NS4B, 223NS4B, 223NS4B, 220NS4B NIFRGSYLAGA SVMs 81NS5, 81NS5, 27FtsJ, 81NS5 GCGRGGWSY SVMs 529NS5, 530NS5, 280NS5, 530NS5 MYADDTAGW SVMs 602NS5, 603NS5, 353NS5, 603NS5 QVGTYGLNT SVMs 606NS5, 607NS5, 357NS5, 607NS5 YGLNTFTNM SVMs 682NS5, 683NS5, 434NS5, 684NS5 DMGKVRKDI SVMs 745NS5, 746NS5, 497NS5, 747NS5 WSLRETACLG SVMs 788NS5, 789NS5, 540NS5, 790NS5 PTSRTTWSI SVMs Proteins (1,2,3,4)Epitope sequencesMethods 537NS3, 536NS3, 2010DV3_gp1, 536NS5 LMRRGDLPV HMMs 763NS5, 764NS5, 515NS5, 765NS5 LMYFHRRDLRL HMMs 358NS3, 357NS3, 2HELICc, 357NS3 KTVWFVPSI HMMs 658NS5, 659NS5, 410NS5, 660NS5 AISGDDCVV HMMs 469NS5, 470NS5, 220NS5, 470NS5 GSRAIWYMWLGAR HMMs 103E, 101E, 101DV3_gp1, 101E WGNGCGLFG SVMs 193NS1, 193NS1, 192NS1, 193NS1 AVHADMGYWIES SVMs 348NS5, 349NS5, 99NS5, 349NS5 FGQQRVFKE SVMs 568NS5, 569NS5, 319NS5, 569NS5 FKLTYQNKV HMMs Experiment 2 Result of epitope prediction (peptide binding to HLA- A0201 prediction): Join overlap 9-amino acid peptides predicted binding to HLA-A0201 molecules
Result of prediction HMMs profile is stable and increase ability of prediction when there are additional data sets. SVMs model is good but ability of prediction decreases when amount of training data increases.
Conclusion Successfully building system for training Hidden Markov models and Support Vector Machines Generating training and testing data based on separating data set into homologous groups give us good result. Could predict consensus epitope for 4 types of Dengue virus based on data of peptides binding to HLA-A0201
Future plans Set other kernels on SVMs method Survey other encoding method for sequences having flexible length Survey other methods for classifying MHC data to homologous groups Automate procedure collecting and updating data of peptide binding MHC from databases
Thank you very much! Thank you very much!