Bioinformatics of Disease: immune epitope prediction

Bioinformatics of Disease: immune epitope prediction
Shoba Ranganathan Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences & Adjunct Professor Biotechnology Research Institute Dept. of Biochemistry Macquarie University Yong Loo Lin School of Medicine Sydney, Australia National University of Singapore, Singapore Visiting Institute for Infocomm Research (I2R), Singapore

Bioinformatics is ….. Bioinformatics is the study of living systems through computation

Data in Bioinformatics (in the main) and their management and analysis
Genetics and populations Networks, pathways and systems Sequences Structures Genomes Data in Bioinformatics (in the main) and their management and analysis Transcriptomes Databases, ontologies Data & text mining Algorithms Maths/Stats Physics/ Chemistry Evolution and phylogenetics

Overview of my research
Genome analysis Transcriptome analysis Protein/Proteome analysis Systems Biology Immunoinformatics Genome-phenome mapping Biodiversity Informatics

5. What is Immunoinformatics?
Using Bioinformatics to address problems in Immunology Application of bioinformatics to accelerate immune system research has the potential to deliver vaccines and address immunotherapeutics. Computational systems biology of immune response

Immunoinformatics Immunology Computer Science Biology
Structural Immunoinformatics is the convergence btw 3 discipline of science – immunology, computer science and structural biology to facilitate research in immunology. Research in immunology provides insights into how our body respond to infection and disease while structural biology provides us with knowledge of how the various receptor-ligand systems interact. Computer science, on the other hand, provides an effective means to store and analyze large volumes of complex data. Combining the three fields increases the efficiency of biological research and offers the potential for major advances in the study of biological systems. In this research, my project is focused on an important component of the immune system, which is the T-cell mediated adaptive immune responses.

IMMUNOINFORMATICS Basic Clinical immunology immunology Networks,
-omics Networks, pathways, and systems IMMUNOINFORMATICS Artificial intelligence Cell biology Physics/ Chemistry Databases Algorithms Maths/Stats

Disease alleviation Genome screening - marker detection
Proteomics/genomics of diseased state Sequence analysis of antigens/markers Structure analysis of antigens T cell epitope analysis Antibody epitope analysis Vaccine design

Summary Introduction Structural Immunoinformatic Database development
Networks, pathways and systems Summary Genetics and populations Introduction Structural Immunoinformatic Database development Data Analysis Computational models Applications Clinical immunology Basic immunology -omics

The immune system Composed of many interdependent cell types, organs, and tissues to protect the body from infections (bacterial, parasitic, fungal, or viral) and arrest abnormal growth and differentiation Inappropriate immune responses lead to allergies and autoimmunity 2nd most complex system in the human body

Genomics vs. Immunomics
Genomics: solving the genome puzzle 104 genes coding for 106 products Immunomics: understanding immune response genes leading to >1012 products Enormous diversity in immunomics has implications for immune function and modulation

It is a numbers game…. >1013 MHC class I haplotypes (IMGT-HLA)
T cell receptors (Arstila et al., 1999) >109 combinatorial antibodies (Jerne, 1993) 1012 B cell clonotypes (Jerne, 1993) 1011 linear epitopes composed of nine amino acids >>1011 conformational epitopes

T cell mediated adaptive immune response
Specific peptide residues critical for stimulating cellular immune responses Major histocompatibility complex (MHC) molecules (Human Leukocyte Antigen or HLA in humans) bind and present short antigenic peptides to T cell receptors, for inspection Antigen presentation is by two classes of MHC (class I and class II) Those peptides that bind to specific MHC and trigger T cell recognition (T cell epitopes) are targets for vaccine and immunotherapy development

How to generate a T cell-mediated immune response
3. T cell receptor 2. MHC 1. Epitope

Major histocompatibility complex
Gene structure of the human MHC 3D structure of the human MHC MHC Class II MHC Class I

MHC Class I for endogenous peptides
Figure by Eric A.J. Reits

MHC class II for exogenous peptides
Figure by Eric A.J. Reits

Yewdell et al. Ann. Rev Immunol (1999)
Antigen processing pathway: peptides, MHC, T-cells Degradation of antigen Peptide binding to MHC Recognition of peptide-MHC complex by T-cells Yewdell et al. Ann. Rev Immunol (1999) 0.05% chance of immunogenicity MHC or major histocompatibility complexes are a series of genes on chromosome 6 that code for cell surface proteins which control the adaptive immune response. There are generally 2 major groups, the Class I and II MHC genes that encode human leukocyte antigen (HLA) and perform antigen presentation. This diagram illustrates the antigen processing pathway. Of particular importance is the binding of antigenic peptides to MHC and the presentation of MHC-peptides to T-cells. T-cells upon recognizing MHC-peptide complex secretes cytokines which stimulate proliferation of T-cells, B-cells, macrophages and production of antibodies by B-cells. How well a peptide binds to the MHC is determined by its binding affinity and is represented by the dissociation constant Kd. There are many different experimental techniques used to obtain an estimate of Kd, such as competitive binding assay or IC50 and peptide binding stabilization assay or BL50. In addition to the binding affinity, the stability of the complex is best reflected by the Gibb’s free energy which is related to the dissociation constant by the following equation G = RT ln Kd. R = Boltzmann constant Boltzmann constant is the physical term relating temperature to energy, named after Ludwig Boltzmann. The experimentally determined value is k= x JK-1 20% processed 0.5% bind MHC 50% CTL response

Physico-chemical properties affect MHC-peptide binding
There are many factors that affect the binding of MHC-peptides and some of these are shown in this diagram. Physically, 5 or more factors affect MHC-peptide interaction: These include residue size, residue position, the orientation of side-chains, peptide length and peptide backbone conformation. Chemically, four or more factors affect binding: These include chemical property of amino acids, the overall chemistry of the binding groove, overall chemistry of peptide and chemistry of environment. In addition, some other factors include the possibility and absence of anchor residues as well as uncertain core residues of peptide in the binding groove.

Epitope prediction º “Fishing”

Computational models can help identify T cell epitopes
Suggest candidate epitopes by in silico screening of entire proteins and even proteomes with specificity at: the allele level the supertype level disease-implicated alleles alone. Minimize the number of wet-lab experiments Cut down the lead time involved in epitope discovery and vaccine design

Tong, Tan and Ranganathan (2007) Briefings in Bioinformatics 8: 96-108
Predicting MHC-binding peptides Tong, Tan and Ranganathan (2007) Briefings in Bioinformatics 8: Sequence-based approach Pattern recognition techniques binding motif, matrices, ANN, HMM, SVM Main limitations: Require large amount of data for training Preclude data with limited sequence conservation Structure-based approach Rigid backbone modeling techniques Flexible docking techniques Main advantage: large training datasets unnecessary

Our aim: Structure-based prediction of MHC-binding peptides

Why structure? generate biologically meaningful data for analysis
Great potential to: generate biologically meaningful data for analysis predict candidate peptides for alleles that have not been widely studied, where sequence-based approaches fail or are not attempted predict binding affinity of peptides predict non-contiguous epitopes Structure determination through experimental methods is both expensive and time-consuming Has not been extensively studied due to high computational costs and development complexity There are several reasons why structure-based prediction is adopted in this research. One of the reasons is that I am interested in understanding the molecular basis of what constitutes an ‘binding peptide’ and how it differs from non-binding peptides. In particular, what is the selection mechanism of different alleles and how do they differ among different alleles. This approach can be used to generate biologically meaningful data such as the interacting residues and important bonds and contacts for analysis. To date, structure-based approach have not been extensively studied due to high complexity in developing the technique and long computational time. So one of my aims is to investigate how well structure-based prediction can be applied in this context. Another reason is structure-based prediction offers the potential to predict potential peptides for alleles that have not been widely studied. This is particularly attractive in the absence of large quantity of data for training machine-learning techniques. Moreover, structure-based prediction offers the potential to predict the absolute binding affinity of peptides. Lastly, determination of crystal structures through experimental methods is very expensive and time-consuming and structure-based prediction can be used to provide a reliable estimate of the crystal structure in the presence of a high quality template.

Existing Structure-based Prediction Techniques
Protein Threading [Altuvia et al. 1995; Schueler-Furman et al. 2000] Homology Modeling [Michielin et al. 2000] Rigid/Flexible Docking [Rosenfeld et al. 1993; Sezerman et al. 1996; Rognan et al. 1999; Desmet et al. 2000; Michielin et al. 2003] In general, structure-based techniques can be broadly classified into three categories: protein threading, homology modeling and docking. (1) Protein threading involves substituting the amino acid sequence of a known peptide bound to a given MHC with the target peptide while retaining the backbone conformation. (2) In Homology Modeling, the amino acid sequence of a peptide is adopted to the structure of a homologous protein with known 3D structure. (3) While, Docking attempts to find the best fit conformation of peptide within the binding groove. There is 2 types of docking techniques: rigid docking does not consider flexibility of the ligand and receptor while flexible docking considers the flexibility of either the ligand, the receptor or both the ligand and receptor.

Will existing structure-based techniques suffice?
Quality of predicted structures Protein Threading, Homology Modeling and Rigid Docking Cannot handle peptide flexibility Available flexible docking techniques Poor accuracy Too slow Usability of Models to predict binding Existing free energy scoring functions Tested only on small datasets Poor correlation with experimental data Two issues must be addressed for structure-based prediction, namely the quality of the generated models and whether these models can be successfully applied to predict binding. However, existing structure-based prediction techniques encounter difficulties in addressing the two issues that was raised. For the first issue, the main problem faced by protein threading, homology modeling and rigid docking techniques is that they cannot handle peptide flexibility This problem is critical because MHC-peptides are generally small in length and an inaccurate structure will have a deep impact on the usability of the modeled structure. In addition, existing flexible docking techniques are mostly too slow and inaccurate for application. Concerning the second issue, the majority of existing free energy scoring functions have only been tested on a small set of up to 5 MHC-peptides and their application to modeled structures are hypothetical. In addition, all of them could not correlate well with experimental binding data.

Hypothesis for epitope selection
Peptides bound to MHC alleles are similar to substrates bound to enzymes “Lock-and-key” mechanism for peptide selection Shape Size Electrostatic characteristics

Structural Immunoinformatic Database development Data Analysis
Databases, ontologies Introduction Structural Immunoinformatic Database development Data Analysis Computational models Applications Sequences Structures Genetics and populations Basic immunology

RDB of 82 curated pMHC complexes (Class I: 64 & Class II:18)
MPID:MHC-Peptide Interaction Database Govindarajan et al. (2003) Bioinformatics, 19: RDB of 82 curated pMHC complexes (Class I: 64 & Class II:18)

Peptide/MHC interaction characteristics
Length Gap Volume Interface area Interacting Residues Intermolecular hydrogen bonds Gap volume Interface area Gap index =

MPID-T: MHC-Peptide-T Cell Receptor Interaction Database Tong et al
MPID-T: MHC-Peptide-T Cell Receptor Interaction Database Tong et al. (2006) Applied Bioinformatics, 5: 187 curated pMHC 16 with TCR Human:110, Murine:74 and Rat:3 Alleles: 40 (interface area, H bonds, gap volume and gap index)

Distribution of MHC by allele
101 new entries 187 entries (Human: 110; Murine: 74; Rat: 3) 134 non-redundant entries (class I: 100; class II: 34) 121 class I and 41 class II entries 26 HLA alleles (class I: 18; class II: 8) 14 rodent alleles (class I: 8; class II: 6) 16 TCR/peptide/MHC complexes

Peptide/MHC binding motifs
Polar Amide Basic Acidic Hydrophobic Conserved peptide properties in solution structures Classified according to Alleles Peptide length

How to obtain structures of experimentally unsolved alleles?
There were only 36 crystal structures of unique MHC (2006) alleles vs unique MHC alleles identified in IMGT/HLA database Structure determination through experimental methods is both expensive and time-consuming Homology model building for alleles with no structural data!

Structural Immunoinformatic Database development
Structures Introduction Structural Immunoinformatic Database development Data Analysis of pMHC Class I complexes Computational models Applications Data & text mining Maths/Stats

MHC Class I superfamilies have different interaction characteristics
Single linkage cluster analysis of 68 pMHC Class I complexes from 13 alleles (all available A and B) Superfamily HLA-A2 (36 entries) HLA-B7 (12 entries) HLA-B27 (18 entries) Interface area (Å2) 846.3±48.9 876.7±72.4 934.0±136.0 Gap volume (Å3) 799.8±195.2 870.2±198.0 985.1±101.5 Gap index 0.9±0.2 1.0±0.1 1.0±0.3 Hydrogen bonds 11.1±1.9 Concentrated at pockets A, B, F 14.3±2.3 Well distributed 17.9±2.8

Data 68 peptide–HLA complexes spanning 13 classes I alleles from MPID-T Hierarchical clustering Hierarchical clustering using the agglomerative algorithm. Distance between structures computed by single-linkage method (MATLAB version 7.0) based on the separation between the each pair of data points. Nearest neighbors merged into clusters. Smaller clusters were then merged into larger clusters based on inter-cluster distances, until all structures are combined. Last 3 levels considered for defining HLA class I supertypes. Interaction parameters Significant for the characterization of peptide/MHC interface: Intermolecular hydrogen bonds pMHC Interface area Binding characteristics of HLA supertypes analyzed Details Gap volume Gap index

Do the Class I alleles aggregate into “superfamilies” using receptor-ligand interaction patterns?
Legend B27 B44 B7, black; B27, green; B44, orange; B62, blue; Outlier/B8, red. B7 B62 B8

MHC Class I superfamilies from receptor-ligand interactions
80 HLA class I complexes 13 class I alleles Five descriptors Hierarchical clustering using nearest neighbor algorithm 77% consensus with data from other groups Supertype definition: receptor structure, ligand binding motifs, or receptor-ligand interaction patterns Tong, Tan and Ranganathan (2007) Bioinformatics, 23: B7, black; B27, green; B44, orange; B62, blue; Outlier/B8, red. Legend B27 B44 B7 B62 B8

Structural Immunoinformatic Database development Data Analysis
Sequences Structures Introduction Structural Immunoinformatic Database development Data Analysis Computational models Applications Physics/ Chemistry Maths/Stats

Two-step approach to predict MHC-binding peptides
Finding the best fit conformation (docking) of peptides within the MHC binding groove Screening potential binders from the background

>1010 possible conformations for a 10-residue peptide
Docking is a computationally exhaustive procedure Large number of possible peptide conformations 3 global translational degrees of freedom 3 global rotational degrees of freedom 1 conformational degree of freedom for each rotatable bond >1010 possible conformations for a 10-residue peptide y x z R N C Ca O   defines a space of (2k)n possible conformations for a protein with n amino acids (assuming each phi and psi pair is allowed to assume k distinct values).

Conservation of nonamer peptide backbone conformation
Class I peptides N-termini residues 0.02 – 0.29 Å C-termini residues 0.00 – 0.25 Å Class II binding registers Only 9 residues fit in the binding groove 0.01 – 0.22 Å 0.02 – 0.27 Å

Rapid docking of peptide to MHC
Tong, Tan & Ranganathan (2004) Protein Sci. 13: Anchoring root fragments to reduce search space (Pseudo-Brownian rigid body docking ) Loop modeling (Loop closure of central backbone by satisfaction of spatial restraints) Ligand backbone and side-chain refinement (entire backbone and interacting side-chains 1 2 3

… … … … … … … … … … … … Benchmarking with experimental Structural data
Docking peptides from 40 non-redundant complexes back into the original allele structure 85% (34/40) of peptides within RMSD of 1.00 Å from experimental structure – min 0.09 Å, max 1.53 Å … … … … Original allele … … … … Original peptide … … … … Original allele

Benchmarking using a single template
Docking peptides from 13 non-redundant complexes into a single template structure 77% (10/13) of peptides within RMSD of 1.00 Å from experimental structure – min 0.38 Å, max 1.48 Å … … … … Original allele … … … … Original peptide Single template

Benchmarking with existing techniques
Author Technique Peptide RMSDa RMSDb Rognan et al. Simulated Annealing TLTSCNTSV 1.04 0.46 FLPSDFFPSV 1.59 1.10 GILGFVFTL 0.32 ILKEPVHGV 0.87 LLFGYPVYV 0.78 0.33 Desmet et al. Combinatorial Buildup Algorithm RGYVYQGL 0.56 Rosenfeld et al. Multiple Copy Algorithm FAPGNYPAL 2.70 0.40 1.40 Sezerman et al. ILKGPVHGV 1.30 1.60 2.20 aRMSD of peptide backbone obtained from respective authors. bRMSD of peptide backbone obtained in our work from redocking bound complexes and single template respectively.

Quantitative separation of binders from non-binders: empirical free energy scoring function
DQ3.2b involved in several autoimmune diseases: Celiac disease insulin-dependent diabetes mellitus IDDM-associated periodontal disease autoimmune polyendocrine syndrome type II

Quantitative separation of binders from non-binders: empirical free energy scoring function
Gbind = αGH + βGS + GEL + C Gbind = binding free energy GH = hydrophobic term GS = decrease in side chain entropy GEL = electrostatic term C = entropy change in system due to external factors α, β, γ optimized by least-square multivariate regression with experimental binding affinities (IC50) of MHC-peptides in training dataset (Rognan et al., 1999)

Test case: MHC Class II DQ8
DQ3.2b (DQA1*0301/DQB1*0302) is involved in several autoimmune diseases: Celiac disease insulin-dependent diabetes mellitus IDDM-associated periodontal disease autoimmune polyendocrine syndrome type II

Data used Structure: 1JK8 - DQ3.2β–insulin B9-23 complex
Dataset I: 127 peptides with experimentally determined IC50 values [70 high-affinity (IC50 < 500 nM), 13 medium-affinity (500 nM < IC50 < 1500 nM )and 23 low-affinity (1500 < IC50 < 5000 nM) binders and 21 non-binders (5000 < IC50)] derived from biochemical studies. 87 with known binding registers. Dataset II: 12 Dermatophagoides pternnyssinus (Der p 2) peptides with experimental T-cell proliferation values from functional studies, with 7 peptides eliciting DQ3.2β-restricted T-cell proliferation.

Scoring: Training & testing datasets
56 binding conformations with known registers 30 non-binding conformations from 3 non-binders Testing Test set 1 – 68 peptides from biochemical studies 16 strong ; 13 medium; 21 weak; 18 non-binders Test set 2 – 12 peptides from functional studies 7 elicit T-cell proliferation

Screening class II binding register: a sliding window approach
E285B peptide Y Q T I E E N I K I F E E D A Core sequence Binding Energy YQTIEENIK -23.12 QTIEENIKI -21.34 TIEENIKIF -25.32 IEENIKIFE -29.53 EENIKIFEE -32.27 ENIKIFEED -21.72 NIKIFEEDA -22.95

Training and test sets Training of the DQ3.2β prediction model was performed by sampling the bound conformations of binding peptides with experimentally determined registers that can be recognized by MHC, and the best conformations of non-binding peptides without any preferred register in the binding groove. Dataset I was divided into training and test datasets. Training set: 59 peptides with 56 binding conformations with known registers and 30 non-binding conformations generated from the 3 non-binding peptides without any binding registers. Test set 1: 68 peptides (the rest of Dataset I) with experimental IC50 values (16 high-affinity binders, 13 medium affinity binders, 21 low affinity binders and 18 non-binders) from biochemical studies (with 31 binding registers) and Test set 2: all 12 peptides from Dataset II, with known T-cell proliferation values.

Binding energy determination
ICM software (Abagyan and Totrov, 1999) hydrophobic energy computed as the product of solvent accessible surface area entropic contribution from the protein side-chains computed from the maximal burial entropies for each type of amino acid and their relative accessibilities electrostatic term composed of receptor-ligand coulombic interactions and the desolvation of partial charges transferred from an aqueous medium to a protein core environment numeric solution of the Poisson equation using an implementation of the boundary element algorithm entropy change in the system due to the decrease of free molecular concentration and the loss of rotational/ translational degrees of freedom upon binding.

4-step protocol used Docking A B C D Anchoring root fragments
(probes) to reduce search space Loop modeling Refinement of binding register Extension of flanking residues for MHC Class II A B C D

Parameters optimized Default ICM coefficients (a=b=g=1; C=0) resulted in poor correlation (r2=0.43, s=2.91 kJ/mol) The optimal scoring function, after 10-fold cross-validation (q2=0.85, spress=2.20 kJ/mol):

Accuracy estimates Sensitivity (SE), specificity (SP) and receiver operating characteristic (ROC) analysis % Predicted binders: SE=TP/(TP+FN) and non-binders: SP=TN/(TN+FP), ROC curve is generated by plotting SE as a function of (1-SP) for various classification thresholds. The area under the ROC curve (AROC) provides a measure of overall prediction accuracy: AROC<70% for poor, AROC>80% for good and AROC>90% for excellent predictions We consider values of SP≥80% useful in practice and assessed SE for three values of SP (80%, 90% and 95%).

Accuracy estimates Sensitivity (SE) = number of binders correctly predicted = TP/AP (TP+FN) Specificity (SP) = number of non-binders correctly predicted = TN/AN (TN+FP) Area under ROC (receiver operating characteristics) curve: >90% excellent >80% good

Results for Training set
High SE (good for most predictions) Very few FPs, but also fewer predictions

Screening class II binding register: HLA-DQ8 prediction accuracy for Test Set I
Group LMH MH H AROC 0.88 0.93 Classification of binding peptides High-affinity binders (H) IC50 ≤ 500 nM Medium-affinity binders (M) 500 nM < IC50 ≤ 1500 nM Low-affinity binders (L) 1500 < IC50 ≤ 5000 nM

Test Set 1: Improved detection of binders lacking position specific binding motifs

Binding registers T-cell proliferation 20/23 (87%) binding registers
Only register (aa 4-12) from Test Set 2 (Der p 2: 1-20) (SE=0.80; SP(LMH)=0.90) Top 5 predictions are experimental positives at very stringent threshold criteria (SE=0.95; SP(H)=0.63) T-cell proliferation

Multiple registers (SP=0.95, SE(LMHP =0.81): 58% of Test Set 1)
Mainly for medium and high binders Experimental support: Sinha et al. for DRB1*0402 Is this why binding motifs are unsuccessful?

Introduction Structural Immunoinformatic Database development Data Analysis Computational models developed Applications

Pemphigus vulgaris (PV)
adam.about.com Autoimmune blistering skin disorder Characterized by autoantibodies targeting desmoglein-3 (Dsg3) Strong association with DR4 and DR6 alleles

Who are the major players in PV?
DR4 PV implicated alleles (for Semitic) DRB1*0401 DRB1*0402 DRB1*0404 DRB1*0406 DR6 PV implicated alleles (for Caucasians) DRB1*1401 DRB1*1404 DRB1*1405 DQB1*0503

What is known about DR4? DR4 PV implicated alleles (DRB1*0401, *0402, *0404, *0406) High sequence conservation 97.9 – 99.0% identity 98.4 – 99.5% similarity High structural conservation Cα RMSD <0.22 Å for all key binding pockets 7 polymorphic residues within binding cleft Pocket 1 (β86), Pocket 4 (β70, 71, 74) Pocket 6 (β11) Pocket 7 (β71) Pocket 9 (β37)

What is known about DR6? DR6 PV implicated alleles
(DRB1*1401, *1404, *1405, DQB1*0503) High sequence conservation 85.8 – 94.1% identity 83.2 – 97.3% similarity High structural conservation Cα RMSD <0.22 Å for all key binding pockets 14 polymorphic residues within binding clefts Pocket 1 (β86) Pocket 4 (β13, 70, 71, 74, 78) Pocket 6 (β11) Pocket 7 (β28, 30, 67, 71) Pocket 9 (β9, 37, 57, 60)

Clues… 9 stimulatory Dsg3 peptides tested on PV patients possessing DR4 and DR6 PV implicated alleles Dsg (DR4, DR6) Dsg (DR4, DR6) Dsg (DR4, DR6) Dsg (DR4, DR6) Dsg (DR4, DR6) Dsg (DR4, DR6) Dsg (DR4, DR6) Dsg (DR4) Dsg (DR4)

Disease associated alleles vs. innocent bystanders
DR4 PV 8/9 investigated Dsg3 peptides fit perfectly into DRB1*0402 Atomic clashes with all other investigated DR4 subtypes DR6 PV 6/9 investigated Dsg3 peptides fit perfectly into DRB1*0503 Atomic clashes with all other investigated DR6 subtypes HLA association in DR6 PV more likely to be at DQ than DR locus Consistent with experimental work done by Sinha et al. (2002, 2005, 2006) Tong et al. (2006) Immunome Research, 2: 1

Whither sequence motifs (again!)?
1/9 investigated Dsg3 peptides fits existing binding motifs Flanking residues – clashes in fitting binding register Register-shift for Peptide V (Dsg ) Detected binding register: Dsg Binding motifs: Dsg (Veldman et al., 2003) : Dsg (Sinha et al., 2006) Veldman P1 hydrophobic P4 positive P6 small, hydrophobic or hydrophilic Sinha P4 positive, neutral P6 large, small

Large-scale screening of Dsg3 peptides
Tong et al. (2006) BMC Bioinformatics, 7(Suppl 5): S7 Docking of mer Dsg3 peptides generated using a sliding window of size 15 across the entire Dsg3 glycoprotein Dsg3 peptide (sliding window width 15) N C Binding register (sliding window width 9) Flanking residues Training set: 8 peptides each, with exp. IC50 values and known binding registers (5 binders and 3 non-binders)

Large-scale screening of Dsg3 peptides

Common epitopes possibly responsible for inducing disease in DR4 & DR6 patients
Significant level of cross reactivity observed between DRB1*0402 and DQB1*0503 ( AROC=0.93) 57% of peptides investigated in this study predicted to bind to both alleles with high affinity 90% of known Dsg3 peptides predicted to bind to both alleles 12/20 top predicted DQB1*0503-specific Dsg3 peptides from transmembrane region All top predicted DQB1*0402-specific Dsg3 peptides from extracellular regions Disease initiation implications: DR4 from ECD; DR6 from TM

Multiple binding registers revisited
76% (410/539) predicted high-affinity binders to DRB1*0402 possess > 2 binding registers 57% (384/673) predicted high-affinity binders to DQB1*0503 possess > 2 binding registers 66% (354/539) bind both alleles at different registers Similar proportion (70%) detected in known binders to both alleles Both alleles bind similar peptides via different binding registers

What next? We have developed a predictive model for HLA-C (Cw*0401) with very limited (only six) experimental binding values. The model yields excellent results for test data (AROC=0.93). Application to determine immunological hot spots for HIV-1 p24gag and gp160gag glycoproteins shows binding energies similar to HLA-A and –B.

Conclusions Computational models for immunogenic epitope prediction can be successfully developed, even for alleles with limited experimental data. While computations can never completely replace “wet-lab” experiments, in silico predictions can significantly cut down the development time of therapeutic vaccines.

1. Genome analysis Approaches EST analysis
Annotation pipeline using workflow strategies Applications Parasitic nematodes Cancer EST data Outcomes Comprehensive annotation at the gene and protein levels Novel &/or pathogen-specific genes Immune response evasion strategies

2. Transcriptome analysis
Approaches Graph formalism for alternative splicing Genome-wide analysis Applications Drosophila genome Chicken compared to human and mouse Kallikrein variants as markers Outcomes New mRNA-gDNA alignment method, MGAlign & MGAlignIt First splicing graph database, DEDB Web server for splicing graphs, ASGS Sub-graph elements for alternative splicing Multi-species splicing graph database, GraphDB

3. Protein/Proteome research: Origin and evolution of structural domains
Approaches Intron mapping to domain boundary All eukaryotic proteins analyzed Applications Domain prediction in EST/genome data Effect of splice variants on domains Outcomes New database of protein coding genes, XPro Visualization of intronic locations on protein structural doimains, XDomView Analysis tool, Go Module Viewer

3. Protein/Proteome research: Small disulfide-rich proteins
<100 aa per domain; ≥ 2 SS bonds Approaches Multiple structure alignment and hierarchical classification Comparative modeling rules Sequence, structure and evolutionary analysis of Potato II inhibitor family Outcomes New database, DSFD Server for model building, SDPMOD Understanding of wound-induced protease inhibitor folding Applications Design of protease inhibitors, channel modulators, growth regulators

3. Protein/Proteome research: Protease cleavage site prediction
Approaches Detailed structural modeling and docking of signal peptide moiety to signal peptidase I SVM for caspases Applications Enhanced production of therapeutic and cemmercial heterologous proteins Apoptosis initiation Outcomes New databases, SPdb, CasBase Server for caspase clevage prediction, CASVM Signal peptide cleavage prediction (under development)

4. Systems Biology Approaches
Holistic computational, molecular biology and FRET study to locate secretion roadblocks EST analysis of host-parasite interactions Applications Trichoderma reesei as fungal bioreactor Parasites that lead to: liver cancer - food borne trematode (Opisthorchis viverrini) and bladder cancer (Schistosoma haematobium). Outcomes Improved heterologous protein production using filamentous fungi Understanding of how parasites evade host immune activation

6. Genome-Phenome mapping
Approaches Mutation data for non-laboratory animals Mapping to OMIM Mapping to structure Applications OMIA-OMIM mapping to structure Correlation between genotype and disease pehnotype Outcomes OMIA database, with links to OMIM (courtesy NCBI) Mutations linked to severity of disease for α-D-mannosidosis Predictions of new human disease mutations from known mutation sites in cow, cat and guinea pig

7. Biodiversity Informatics: Customary medicinal plants
Approaches Integrating, visualizing and analyzing ethnobotanical, phytochemical and pharmacological data on customary medicinal plants Data from Australian aboriginal elders and Indian Siddha doctors Applications Novel antimicrobial, anti-inflammatory and anti-cancer lead compunds Outcomes CMkb, an integrated knowledgebase

Dedications Prof. Bernard Pullman Mme. Alberte Pullman
My brother, a CML survivor

Acknowledgements Dr. (Victor) J.C. Tong, NUS&I2R, Singapore
A/Prof. Tin Wee Tan, NUS Dr. Animesh Sinha, Weill Medical College of Cornell University & Michigan State University, USA Drs. J. Tom August (JHU) and Vladimir Brusic (DFCI) (NIAID-NIH Grant #5 U19 AI56541 & Contract #HHSN C). All of you!

Bioinformatics of Disease: immune epitope prediction

Similar presentations

Presentation on theme: "Bioinformatics of Disease: immune epitope prediction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bioinformatics of Disease: immune epitope prediction

Similar presentations

Presentation on theme: "Bioinformatics of Disease: immune epitope prediction"— Presentation transcript:

Similar presentations

About project

Feedback