Download presentation
Presentation is loading. Please wait.
Published byDortha Cross Modified over 9 years ago
1
Napovedovanje imunskega odziva iz peptidnih mikromrež Mitja Luštrek 1 (2), Peter Lorenz 2, Felix Steinbeck 2, Georg Füllen 2, Hans-Jürgen Thiesen 2 1 Odsek za inteligentne sisteme, Institut Jožef Stefan 2 Univerza v Rostocku
2
1.Introduction 2.Immune response prediction 3.Interpretation
3
1.Introduction 2.Immune response prediction 3.Interpretation
4
Peptide = part of protein = short sequence of amino acids Image taken from EMBL website
5
Peptide = part of protein = short sequence of amino acids SNDIVLT = string of letters from 20-letter alphabet (1 letter = 1 amino acid, 20 standard amino acids) Image taken from EMBL website
6
Epitope Antigen protein Antibody binding Antibody
7
Epitope Antibody binding Antibody Epitope Antigen protein
8
Epitope Peptide Antigen protein
9
Epitope Antigen protein
10
Epitope Antigen protein Antibody binding Antibody
11
Epitope Antigen protein Antibody binding Antibody
12
Epitope Antigen protein Antibody binding Antibody
13
Epitope Antigen protein
14
Epitope Antigen protein
15
Peptide arrays Peptide array Peptides (15 amino acids) Glass slide
16
Peptide arrays Peptide array IVIg antibody mixture Peptides (15 amino acids) Glass slide
17
Peptide arrays Peptide array IVIg antibody mixture Red = epitopes (bind antibodies) Black = non-epitopes Peptides (15 amino acids) Glass slide
18
Peptide arrays Red = epitopes (bind antibodies) Black = non-epitopes Peptide Antibody Antibody against antibody + dye Glass slide
19
Peptide arrays Red = epitopes (bind antibodies) Black = non-epitopes Peptide Class PGIGFPGPPGPKGDQ non-ep. PNMVFIGGINCANGK non-ep. DGIGGAMHKAMLMAQ non-ep. REDNLTLDISKLKEQ non-ep. TPLAGRGLAERASQQ non-ep. DQVHPVDPYDLPPAG non-ep.... RRMISRMPIFYLMSG epitope LPPGFKRFTCLSIPR epitope EFSQMESYPEDYFPI epitope...
20
1.Introduction 2.Immune response prediction 3.Interpretation
21
Our task Peptide RRKGGLEEPQPPAEQ SEDLENALKAVINDK EDHVKLVNEVTEFAK GEKIIQEFLSKVKQM ILVSRSLKMRGQAFV YTCQCRAGYQSTLTR...
22
Our task Peptide RRKGGLEEPQPPAEQ SEDLENALKAVINDK EDHVKLVNEVTEFAK GEKIIQEFLSKVKQM ILVSRSLKMRGQAFV YTCQCRAGYQSTLTR... Peptide Class RRKGGLEEPQPPAEQ non-ep. SEDLENALKAVINDK non-ep. EDHVKLVNEVTEFAK non-ep. GEKIIQEFLSKVKQM non-ep. ILVSRSLKMRGQAFV epitope YTCQCRAGYQSTLTR epitope... Machine learning
23
Our task Peptide RRKGGLEEPQPPAEQ SEDLENALKAVINDK EDHVKLVNEVTEFAK GEKIIQEFLSKVKQM ILVSRSLKMRGQAFV YTCQCRAGYQSTLTR... Peptide Class RRKGGLEEPQPPAEQ non-ep. SEDLENALKAVINDK non-ep. EDHVKLVNEVTEFAK non-ep. GEKIIQEFLSKVKQM non-ep. ILVSRSLKMRGQAFV epitope YTCQCRAGYQSTLTR epitope... Machine learning Training set: 13,638 peptides (3,420 epitopes) Test set: 13,640 peptides (3,421 epitopes) Balanced until the final testing
24
Machine learning Peptide Class PGIGFPGPPGPKGDQ non-ep. / epitope
25
Machine learning Peptide Class PGIGFPGPPGPKGDQ non-ep. / epitope Attribute 1Attribute 2...Class value 1value 2non-ep. / epitope Attribute representation
26
Machine learning Peptide Class PGIGFPGPPGPKGDQ non-ep. / epitope Attribute 1Attribute 2...Class value 1value 2non-ep. / epitope ML Attribute representation Classifier Proability for epitope p
27
Machine learning Peptide Class PGIGFPGPPGPKGDQ non-ep. / epitope Attribute 1Attribute 2...Class value 1value 2non-ep. / epitope ML Attribute representation Classifier Proability for epitope p
28
Machine learning Peptide Class PGIGFPGPPGPKGDQ non-ep. / epitope Attribute representation 1 Attribute representation 8 Classifier 1Classifier 8... ML
29
Machine learning Peptide Class PGIGFPGPPGPKGDQ non-ep. / epitope Attribute representation 1 Attribute representation 8 Classifier 1Classifier 8... Probabilities for epitopeClass p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 p7p7 p8p8 non-ep. / epitope ML Meta classifier ML Final proability for epitope p
30
Machine learning Peptide Class PGIGFPGPPGPKGDQ non-ep. / epitope Attribute representation 1 Attribute representation 8 Classifier 1Classifier 8... Probabilities for epitopeClass p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 p7p7 p8p8 non-ep. / epitope ML Meta classifier ML Final proability for epitope p SVM (SMO), Logistic regression Linear regression
31
Attribute representation 1 RRMISRMPIFYLMSG Count ofACDEFGHIKLMNPQRSTVWY 112131321 Amino-acid counts
32
Attribute representation 2 RRMISRMPIFYLMSG Amino-acid count differences Difference in counts ofF–GF–IF–LF–MF–PF–RF–SF–YG–FG–I... 0–10–20 –100
33
Attribute representation 3 Count ofRRRMMI...RRMRMIMIS...ACDE...ACDEF... 12111100 RRMISRMPIFYLMSG Subsequence counts
34
Attribute representation 4 Amino-acid class counts Count oftinysmalllargebasicacidicneutral... 31113012 l l l l t l l s l l l l l t t RRMISRMPIFYLMSG bbnnnbnnnnnnnnn
35
Attribute representation 5 Amino-acid class subsequence counts l l l l t l l s l l l l l t t RRMISRMPIFYLMSG bbnnnbnnnnnnnnn Count oflllttllssltt...bbbnnbnn... 82111112110
36
Attribute representation 6 Amino-acid pair counts Rationale: antibodies may bind in two places due to their two- chain structure. Antibody Peptide
37
Attribute representation 6 RRMISRMPIFYLMSG Amino-acid pair counts Rationale: antibodies may bind in two places due to their two- chain structure. Count of pairs at distance(R,R) at 1(R,M) at 2(R,I) at 3...(A,C) at 1(A,C) at 2... 11200 12 3 3 Antibody Peptide
38
Attribute representation 7 Amino-acids at distances from first + first amino acid Rationale: antibodies may bind in two places, first amino acid most accesible on the peptide array. Antibody Peptide
39
Attribute representation 7 R RMISRMPIFYLMSG Amino-acids at distances from first + first amino acid Rationale: antibodies may bind in two places, first amino acid most accesible on the peptide array. Count of at distance...R at 1...M at 2...A at 3C at 3...First 1100R Antibody Peptide
40
Attribute representation 8 RRMISRMPIFYLMSG Average amino-acid properties HydrophobicitySizePolarityFlexibilityAccesibility... 0.4480.5960.3060.2310.376
41
Attribute representation 9 (not used) RRMISRMPIFYLMSG Amino-acid counts with a difference RRMISRMPIWYLMSG Equivalent for epitope prediction?
42
Attribute representation 9 (not used) RRMISRMPIFYLMSG Amino-acid counts with a difference RRMISRMPIWYLMSG Equivalent for epitope prediction? Count F as: 1 F 0.8 W 0.4 Y... Count W as: 1 W 0.7 F 0.3 Y...
43
Attribute representation 9 (not used) Amino-acid substitution matrix ACD...FWY A1 C1 D1 F10.80.4 W0.710.3 Y1
44
Attribute representation 9 (not used) Amino-acid substitution matrix ACD...FWY A1 C1 D1 F10.80.4 W0.710.3 Y1 Optimize with a genetic algorithm to maximize classification accuracy
45
Results – training set Attribute representationAUCAccuracy Amino-acid counts0.87080.7 % Amino-acid count differences0.86880.3 % Subsequence counts0.86780.5 % Amino-acid class counts0.87381.2 % Amino-acid class subsequence counts0.86680.5 % Amino-acid pair counts0.86580.6 % Amino acids at distances from the first0.87381.2 % Average amino-acid properties0.86380.3 %
46
Results – training set Attribute representationAUCAccuracy Amino-acid counts0.87080.7 % Amino-acid count differences0.86880.3 % Subsequence counts0.86780.5 % Amino-acid class counts0.87381.2 % Amino-acid class subsequence counts0.86680.5 % Amino-acid pair counts0.86580.6 % Amino acids at distances from the first0.87381.2 % Average amino-acid properties0.86380.3 % Combined0.88183.3 %
47
Results – test set Attribute representation / datasetAUCAccuracy Best single / training set0.87381.2 % Combined / training set0.88183.3 % Combined / test set0.88383.7 %
48
Results – test set Attribute representation / datasetAUCAccuracy Best single / training set (balanced)0.87381.2 % Combined / training set (balanced)0.88183.3 % Combined / test set (balanced)0.88383.7 % Combined / test set (original)0.88485.9 % Epitope : non-epitope = 1 : 1 Epitope : non-epitope = 1 : 3
49
Results – test set Attribute representation / datasetAUCAccuracy Best single / training set (balanced)0.87381.2 % Combined / training set (balanced)0.88183.3 % Combined / test set (balanced)0.88383.7 % Combined / test set (original)0.88485.9 % EL-Manzalawy / test set (balanced)0.86882.0 % EL-Manzalawy / test set (original)0.87483.9 % State of the art: SVM + string kernel (EL-Manzalawy et al., 2008) Trained and tested on our data.
50
Results – test set Our results Balanced: 0.883 / 83.7 % Original: 0.884 / 85.9 % EL-Manzalawy Balanced: 0.868 / 82.0 % Original: 0.874 / 83.9 %
51
1.Introduction 2.Immune response prediction 3.Interpretation
52
Rules Interpretable classifier: Interpretable attributes (frequencies, properties of amino acids) RIPPER (JRip) to induce rules
53
Rules PropertyLow/highApplies to peptides AromaticityHigh53.8 % If a peptide has a high aromaticity, it binds antibodies. This applies to 53.8 % of peptides that bind antibodies. (Aromaticity is the percentage of aromatic amino acids in the peptide.) Interpretable classifier: Interpretable attributes (frequencies, properties of amino acids) RIPPER (JRip) to induce rules
54
Rules PropertyLow/highApplies to peptides AromaticityHigh53.8 % PolarityLow27.7 % Frequency of tyrosineHigh26.2 % HydrophobicityLow22.5 % Frequency of arginineHigh19.7 % Summary factor 2High16.7 % AcidityLow11.4 % Preference for -sheets Low4.3 % Summary factor 5High3.0 %
55
Epitope propensity Frequency in peptides with epitopes, divided by frequency in peptides without epitopes
56
Epitope propensity Aromatic
57
Epitope propensity Non-polar
58
Epitope propensity Tyrosine
59
(Un)classifiable peptides Simplified classifier: Interpretable attributes (frequencies, properties of amino acids) Logistic regression to train the classifier PeptidesAUCAccuracy All0.86083.0 %
60
(Un)classifiable peptides Simplified classifier: Interpretable attributes (frequencies, properties of amino acids) Logistic regression to train the classifier PeptidesAUCAccuracy All0.86083.0 % Classifiable Unclassifiable Classified correctly Classified incorrectly
61
(Un)classifiable peptides Simplified classifier: Interpretable attributes (frequencies, properties of amino acids) Logistic regression to train the classifier PeptidesAUCAccuracy All0.86083.0 % Classifiable0.99998.8 % Unclassifiable0.95691.5 % Expected Strange?
62
(Un)classifiable – rules Attribute ClassifiableUnclassifiable L/hAppliesL/hApplies AromaticityHigh74.3 %Low53.3 % PolarityLow58.7 %High27.5 % Frequency of arginineHigh31.5 %Low34.0 % Frequency of tyrosineHigh20.7 %Low16.9 % Summary factor 5High15.1 %Low15.2 % AntigenicityHigh7.3 %Low8.7 % HydrophobicityLow4.7 %High6.5 % Frequency of histidineLow3.9 % Frequency of cysteineLow10.4 % Preference for reverse turnsHigh10.4 % Occurrence in turnsLow10.4 % Frequency of alanineHigh8.7 %
63
(Un)classifiable – rules Attribute ClassifiableUnclassifiable L/hAppliesL/hApplies AromaticityHigh74.3 %Low53.3 % PolarityLow58.7 %High27.5 % Frequency of arginineHigh31.5 %Low34.0 % Frequency of tyrosineHigh20.7 %Low16.9 % Summary factor 5High15.1 %Low15.2 % AntigenicityHigh7.3 %Low8.7 % HydrophobicityLow4.7 %High6.5 % Frequency of histidineLow3.9 % Frequency of cysteineLow10.4 % Preference for reverse turnsHigh10.4 % Occurrence in turnsLow10.4 % Frequency of alanineHigh8.7 % All: 53.8 % All: 27.7 %
64
(Un)classifiable – epitope propensity
65
(Un)classifiable peptides Simplified classifier: Interpretable attributes (frequencies, properties of amino acids) Logistic regression to train the classifier PeptidesAUCAccuracy All0.86083.0 % Classifiable0.99998.8 % Unclassifiable0.95691.5 % Strange? Not really! Inevitable or does it mean something?
66
2 nd degree (un)classifiable peptides Unclassifiable peptides only Simplified classifier PeptidesAUCAccuracy All unclassifiable0.95691.5 %
67
2 nd degree (un)classifiable peptides Unclassifiable peptides only Simplified classifier PeptidesAUCAccuracy All unclassifiable0.95691.5 % Classifiable unclassifiable Unclassifiable unclassifiable Classified correctly Classified incorrectly
68
2 nd degree (un)classifiable peptides Unclassifiable peptides only Simplified classifier PeptidesAUCAccuracy All unclassifiable0.95691.5 % Classifiable unclassifiable0.99297.8 % Unclassifiable unclassifiable0.68365.0 %
69
2 nd degree (un)classifiable peptides PeptidesAUCAccuracy All unclassifiable0.95691.5 % Classifiable unclassifiable0.99297.8 % Unclassifiable unclassifiable0.68365.0 % (Un)classifiable peptides PeptidesAUCAccuracy All0.86083.0 % Classifiable0.99998.8 % Unclassifiable0.95691.5 % Inevitable or does it mean something? Not inevitable!
70
2 nd degree (un)cl. – epitope propensity
71
Conclusions Epitopes have common characteristics
72
Conclusions Epitopes have common characteristics – Epitopes are parts of antigens that bind antibodies Our peptides mostly did not come from known antigens Probably partly general and partly antibody-specific binding
73
Conclusions Epitopes have common characteristics – Epitopes are parts of antigens that bind antibodies Epitope characteristics are not unexpected Our peptides mostly did not come from known antigens Probably partly general and partly antibody-specific binding
74
Conclusions Epitopes have common characteristics – Epitopes are parts of antigens that bind antibodies Epitope characteristics are not unexpected Two groups of epitopes: – around 80 % “typical” (classifiable) – around 20 % “atypical” (unclassifiable) Our peptides mostly did not come from known antigens Probably partly general and partly antibody-specific binding
75
Conclusions Epitopes have common characteristics – Epitopes are parts of antigens that bind antibodies Epitope characteristics are not unexpected Two groups of epitopes: – around 80 % “typical” (classifiable) – around 20 % “atypical” (unclassifiable) Our peptides mostly did not come from known antigens Probably partly general and partly antibody-specific binding Mostly general- purpose antibodies? Mostly antigen- specific antibodies?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.