Structural Modelling and Bioinformatics in Drug Discovery and Infectious Disease Shoba Ranganathan Professor and Chair – Bioinformatics Dept. of Chemistry.

Slides:



Advertisements
Similar presentations
Antigen Presentation K.J. Goodrum Department of Biomedical Sciences Ohio University 2005.
Advertisements

T-cell epitope prediction by molecular dynamics simulations Irini Doytchinova Medical University of Sofia School of Pharmacy Medical University of Sofia.
Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes G.L. Zhang, A.M.
Understanding biology through structuresCourse work 2006 Understanding Immune Recognition.
Computer Aided Vaccine Design Dr G P S Raghava. Concept of Drug and Vaccine Concept of Drug Concept of Drug –Kill invaders of foreign pathogens –Inhibit.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
MHC Polymorphism Ole Lund. Objectives What is HLA polymorphism? What is it good for? How does it make life difficult for vaccine design? Definition of.
Computational Immunology An Introduction Rose Hoberman BioLM Seminar April 2003.
Structural bioinformatics
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
“Theoretical and Experimental description of Peptide-MHC binding”
MHC Polymorphism. MHC Class I pathway Figure by Eric A.J. Reits.
“Inverse Kinematics” The Loop Closure Problem in Biology Barak Raveh Dan Halperin Course in Structural Bioinformatics Spring 2006.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
An Integrated Approach to Protein-Protein Docking
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
The Major Histocompatibility Complex And Antigen Presentation
Institute of Immunology, ZJU
Comparative Evaluation of 11 Scoring Functions for Molekular Docking Authors: Renxiao Wang, Yipin Lu and Shaomeng Wang Presented by Florian Lenz.
Protein Tertiary Structure Prediction
BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin
INSTITUTE FOR IMMUNOBIOLOGY Major Histocompatibility Complex MHC Department of Immunology Fudan University Bo GAO, Ph.D
Machine-learning in building bioinformatics databases for infectious diseases Victor Tong Institute for Infocomm Research A*STAR, Singapore ASEAN-China.
Bioinformatics of Disease: immune epitope prediction
COMPARATIVE or HOMOLOGY MODELING
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Supporting bioinformatics education in the Asia-Pacific Shoba Ranganathan Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences.
1 Computer-aided Subunit Vaccine Design G.P.S. Raghava, Institute of Microbial Technology, Chandigarh  Understanding immune system  Breaking complex.
Flexible Multi-scale Fitting of Atomic Structures into Low- resolution Electron Density Maps with Elastic Network Normal Mode Analysis Tama, Miyashita,
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
CHAPTER 23 Molecular Immunology.
Telling self from non-self: Learning the language of the Immune System Rose Hoberman and Roni Rosenfeld BioLM Workshop May 2003.
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
Altman et al. JACS 2008, Presented By Swati Jain.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
1 Web Site: Dr. G P S Raghava, Head Bioinformatics Centre Institute of Microbial Technology, Chandigarh, India Prediction.
Statistical physics of T cell receptor selection and function Thesis committee meeting, 04/15/2009 Andrej Košmrlj Physics Department Massachusetts Institute.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Specific Defenses of the Host Part 2 (acquired or adaptive immunity)
Bioinformatics in Vaccine Design
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
How NMR is Used for the Study of Biomacromolecules Analytical biochemistry Comparative analysis Interactions between biomolecules Structure determination.
Protein Tertiary Structure Prediction Structural Bioinformatics.
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
Immunoinformatics Approach for Non-Small Cell Lung Cancer
T Cell Receptor (TCR) & MHC Complexes-Antigen Presentation
Intracellular Pathogens Extracellular Pathogens
APPLICATIONS OF BIOINFORMATICS IN DRUG DISCOVERY
The Major Histocompatibility Complex (MHC)
Protein Structure Prediction and Protein Homology modeling
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Volume 19, Issue 8, Pages (August 2011)
Virtual Screening.
Ligand Docking to MHC Class I Molecules
Telling self from non-self: Learning the language of the Immune System
The Major Histocompatibility Complex (MHC)
Volume 7, Issue 3, Pages (September 1997)
Protein structure prediction.
Volume 19, Issue 8, Pages (August 2011)
Protein structure prediction
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

Structural Modelling and Bioinformatics in Drug Discovery and Infectious Disease Shoba Ranganathan Professor and Chair – Bioinformatics Dept. of Chemistry and Biomolecular Sciences &Adjunct Professor Biotechnology Research Institute Dept. of Biochemistry Macquarie University Yong Loo Lin School of Medicine Sydney, Australia National University of Singapore, Singapore Visiting Institute for Infocomm Research (I 2 R), Singapore

Bioinformatics is …..  Bioinformatics is the study of living systems through computation

Data in Bioinformatics (in the main) and their management and analysis Networks, pathways and systems Sequences Genomes Transcriptomes Databases, ontologies Data & text mining Evolution and phylogenetics Maths/StatsAlgorithms Physics/ Chemistry Genetics and populations Structures

What is Immunoinformatics?  Using Bioinformatics to address problems in Immunology  Application of bioinformatics to accelerate immune system research has the potential to deliver vaccines and address immunotherapeutics.  Computational systems biology of immune response

Immunoinformatics Immunology Computer Science Biology

Summary  Introduction  Structural Immunoinformatic Database development  Data Analysis  Computational models  Applications Networks, pathways and systems Genetics and populations -omics Basic immunology Clinical immunology

The immune system  Composed of many interdependent cell types, organs and tissues  2nd most complex system in the human body Figure by Dr. Standley LJ  Two types: 1.Innate Immune System 2.Adaptive Immune System

It is a numbers game….  >10 13 MHC class I haplotypes (IMGT-HLA)  T cell receptors (Arstila et al., 1999)  >10 9 combinatorial antibodies (Jerne, 1993)  B cell clonotypes (Jerne, 1993)  linear epitopes composed of nine amino acids  >>10 11 conformational epitopes

Adaptive immune system  Major Histocompatabilit y Complex (MHC Class I and II)  Human Leukocyte Antigen (HLA in human)  Peptide binding to MHC  Recognition of pMHC complex by the TCR  Activation of T cells  MHC Class I – CD8+ cytotoxic T cells  MHC Class II – CD4+ helper T cells

1. Epitope 3. T cell receptor How to generate a T cell-mediated immune response 2. MHC

1.Degradation of antigen 2.Peptide binding to MHC 3.Recognition of peptide-MHC complex by T-cells Yewdell et al. Ann. Rev Immunol (1999) 20% processed 0.5% bind MHC 50% CTL response 0.05% chance of immunogenicity Antigen processing pathway: peptides, MHC, T-cells

Physico-chemical properties affect MHC-peptide binding

 Suggest candidate epitopes by in silico screening of entire proteins and even proteomes with specificity at:  the allele level  the supertype level  disease-implicated alleles alone.  Minimize the number of wet-lab experiments  Cut down the lead time involved in epitope discovery and vaccine design Computational models can help identify T cell epitopes

1.Sequence-based approach  Pattern recognition techniques binding motif, matrices, ANN, HMM, SVM  Main limitations: Require large amount of data for training Preclude data with limited sequence conservation 2.Structure-based approach  Rigid backbone modeling techniques  Flexible docking techniques  Main advantage: large training datasets unnecessary Predicting MHC-binding peptides Tong, Tan and Ranganathan (2007) Briefings in Bioinformatics 8:

Our aim: Structure-based prediction of MHC-binding peptides

 Great potential to:  generate biologically meaningful data for analysis  predict candidate peptides for alleles that have not been widely studied, where sequence-based approaches fail or are not attempted  predict binding affinity of peptides  predict non-contiguous epitopes  Structure determination through experimental methods is both expensive and time-consuming  Has not been extensively studied due to high computational costs and development complexity Why structure?

 Protein Threading [Altuvia et al. 1995; Schueler-Furman et al. 2000]  Homology Modeling [Michielin et al. 2000]  Rigid/Flexible Docking [Rosenfeld et al. 1993; Sezerman et al. 1996; Rognan et al. 1999; Desmet et al. 2000; Michielin et al. 2003] Existing Structure-based Prediction Techniques

1.Quality of predicted structures  Protein Threading, Homology Modeling and Rigid Docking  Cannot handle peptide flexibility  Available flexible docking techniques  Poor accuracy  Too slow 2.Usability of Models to predict binding  Existing free energy scoring functions  Tested only on small datasets  Poor correlation with experimental data Will existing structure-based techniques suffice?

Hypothesis for epitope selection  Peptides bound to MHC alleles are similar to substrates bound to enzymes  “Lock-and-key” mechanism for peptide selection  Shape  Size  Electrostatic characteristics

 Introduction  Structural Immunoinformatic Database development  Data Analysis  Computational models  Applications Sequences Databases, ontologies Basic immunology Genetics and populations Structures

MPID:MHC-Peptide Interaction Database Govindarajan et al. (2003) Bioinformatics, 19: RDB of 82 curated pMHC complexes (Class I: 64 & Class II:18)

Gap index = Peptide/MHC interaction characteristics Gap Volume Intermolecular hydrogen bonds Interface area Gap volume Interface area Interacting Residues Peptide Length

MPID-T: MHC-Peptide-T Cell Receptor Interaction Database Tong et al. (2006) Applied Bioinformatics, 5:  187 curated pMHC  16 with TCR  Human:110, Murine:74 and Rat:3  Alleles: 40 (interface area, H bonds, gap volume and gap index)

 101 new entries  187 entries (Human: 110; Murine: 74; Rat: 3)  134 non-redundant entries (class I: 100; class II: 34)  121 class I and 41 class II entries  26 HLA alleles (class I: 18; class II: 8)  14 rodent alleles (class I: 8; class II: 6)  16 TCR/peptide/MHC complexes Distribution of MHC by allele

Peptide/MHC binding motifs  Conserved peptide properties in solution structures  Classified according to Alleles Peptide length PolarAmideBasicAcidicHydrophobic

1.There were only 36 crystal structures of unique MHC (2006) alleles vs unique MHC alleles identified in IMGT/HLA database 2.Structure determination through experimental methods is both expensive and time- consuming 3.Homology model building for alleles with no structural data! How to obtain structures of experimentally unsolved alleles?

 Introduction  Structural Immunoinformatic Database development  Data Analysis of pMHC Class I complexes  Computational models  Applications Data & text mining Maths/Stats Structures

 Class I peptides  N-termini residues 0.02 – 0.29 Å  C-termini residues 0.00 – 0.25 Å  Class II binding registers  Only 9 residues fit in the binding groove  N-termini residues 0.01 – 0.22 Å  C-termini residues 0.02 – 0.27 Å Conservation of nonamer peptide backbone conformation

 Introduction  Structural Immunoinformatic Database development  Data Analysis  Computational models  Applications Maths/Stats Structures Sequences Physics/ Chemistry

1.Finding the best fit conformation (docking) of peptides within the MHC binding groove 2.Screening potential binders from the background Two-step approach to predict MHC-binding peptides

Docking is a computationally exhaustive procedure  Large number of possible peptide conformations  3 global translational degrees of freedom  3 global rotational degrees of freedom  1 conformational degree of freedom for each rotatable bond y x z R N C C C C O  >10 10 possible conformations for a 10-residue peptide

Rapid docking of peptide to MHC Tong, Tan & Ranganathan (2004) Protein Sci. 13: Anchoring root fragments to reduce search space ( Pseudo-Brownian rigid body docking ) Loop modeling ( Loop closure of central backbone by satisfaction of spatial restraints) Ligand backbone and side-chain refinement ( entire backbone and interacting side-chains 2 3 1

Benchmarking with existing techniques AuthorTechniquePeptideRMSD a RMSD b Rognan et al.Simulated Annealing TLTSCNTSV FLPSDFFPSV GILGFVFTL ILKEPVHGV0.87 LLFGYPVYV Desmet et al.Combinatorial Buildup Algorithm RGYVYQGL Rosenfeld et al.Multiple Copy Algorithm FAPGNYPAL GILGFVFTL Sezerman et al.Combinatorial Buildup Algorithm LLFGYPVYV ILKGPVHGV GILGFVFTL TLTSCNTSV a RMSD of peptide backbone obtained from respective authors. b RMSD of peptide backbone obtained in our work from redocking bound complexes and single template respectively.

Quantitative separation of binders from non-binders: empirical free energy scoring function  DQ3.2  involved in several autoimmune diseases:  Celiac disease  insulin-dependent diabetes mellitus  IDDM-associated periodontal disease  autoimmune polyendocrine syndrome type II

 G bind = α  G H + β  G S +  G EL + C   G bind = binding free energy   G H = hydrophobic term   G S = decrease in side chain entropy   G EL = electrostatic term  C = entropy change in system due to external factors  α, β, γ optimized by least-square multivariate regression with experimental binding affinities (IC 50 ) of MHC-peptides in training dataset (Rognan et al., 1999) Quantitative separation of binders from non-binders: empirical free energy scoring function

Test case: MHC Class II DQ8  DQ3.2  (DQA1*0301/DQB1*0302)  is involved in several autoimmune diseases:  Celiac disease  insulin-dependent diabetes mellitus  IDDM-associated periodontal disease  autoimmune polyendocrine syndrome type II

Data used  Structure: 1JK8 - DQ3.2β–insulin B9-23 complex  Dataset I: 127 peptides with experimentally determined IC 50 values [70 high-affinity (IC 50 < 500 nM), 13 medium- affinity (500 nM < IC 50 < 1500 nM )and 23 low-affinity (1500 < IC 50 < 5000 nM) binders and 21 non-binders (5000 < IC 50 )] derived from biochemical studies.  87 with known binding registers.  Dataset II: 12 Dermatophagoides pternnyssinus (Der p 2) peptides with experimental T-cell proliferation values from functional studies, with 7 peptides eliciting DQ3.2β- restricted T-cell proliferation.

 Training  56 binding conformations with known registers  30 non-binding conformations from 3 non- binders  Testing  Test set 1 – 68 peptides from biochemical studies  16 strong ; 13 medium; 21 weak; 18 non-binders  Test set 2 – 12 peptides from functional studies  7 elicit T-cell proliferation Scoring: Training & testing datasets

Y Q T I E E N I K I F E E D A E285B peptide Core sequenceBinding Energy YQTIEENIK QTIEENIKI TIEENIKIF IEENIKIFE EENIKIFEE ENIKIFEED NIKIFEEDA Screening class II binding register: a sliding window approach

Training and test sets Training of the DQ3.2β prediction model was performed by sampling 1.the bound conformations of binding peptides with experimentally determined registers that can be recognized by MHC, and 2.the best conformations of non-binding peptides without any preferred register in the binding groove. Dataset I was divided into training and test datasets. 1.Training set: 59 peptides with 56 binding conformations with known registers and 30 non-binding conformations generated from the 3 non-binding peptides without any binding registers. 2.Test set 1: 68 peptides (the rest of Dataset I) with experimental IC50 values (16 high-affinity binders, 13 medium affinity binders, 21 low affinity binders and 18 non-binders) from biochemical studies (with 31 binding registers) and 3.Test set 2: all 12 peptides from Dataset II, with known T-cell proliferation values.

Binding energy determination  ICM software (Abagyan and Totrov, 1999)  hydrophobic energy computed as the product of solvent accessible surface area  entropic contribution from the protein side-chains computed from the maximal burial entropies for each type of amino acid and their relative accessibilities  electrostatic term composed of receptor-ligand coulombic interactions and the desolvation of partial charges transferred from an aqueous medium to a protein core environment  numeric solution of the Poisson equation using an implementation of the boundary element algorithm  entropy change in the system due to the decrease of free molecular concentration and the loss of rotational/ translational degrees of freedom upon binding.

Docking Anchoring root fragments (probes) to reduce search space Loop modeling Refinement of binding register Extension of flanking residues for MHC Class II A B C D 4-step protocol used

Parameters optimized  Default ICM coefficients (  =  =  =1; C=0) resulted in poor correlation (r 2 =0.43, s=2.91 kJ/mol)  The optimal scoring function, after 10-fold cross-validation (q 2 =0.85, s press =2.20 kJ/mol):

Accuracy estimates  Sensitivity (SE), specificity (SP) and receiver operating characteristic (ROC) analysis  % Predicted binders: SE=TP/(TP+FN) and non-binders: SP=TN/(TN+FP),  ROC curve is generated by plotting SE as a function of (1- SP) for various classification thresholds.  The area under the ROC curve (A ROC ) provides a measure of overall prediction accuracy:  A ROC <70% for poor,  A ROC >80% for good and  A ROC >90% for excellent predictions  We consider values of SP≥80% useful in practice and assessed SE for three values of SP (80%, 90% and 95%).

 Sensitivity (SE) = number of binders correctly predicted = TP/AP (TP+FN)  Specificity (SP) = number of non-binders correctly predicted = TN/AN (TN+FP) Accuracy estimates Area under ROC (receiver operating characteristics) curve: >90% excellent >80% good

Results for Training set  High SE (good for most predictions)  Very few FPs, but also fewer predictions

GroupLMHMHH A ROC Screening class II binding register: HLA-DQ8 prediction accuracy for Test Set I  Classification of binding peptides  High-affinity binders (H)  IC50 ≤ 500 nM  Medium-affinity binders (M)  500 nM < IC50 ≤ 1500 nM  Low-affinity binders (L)  1500 < IC50 ≤ 5000 nM

Test Set 1: Improved detection of binders lacking position specific binding motifs

Binding registers 20/23 (87%) binding registers Only register (aa 4-12) from Test Set 2 (Der p 2: 1-20) (SE=0.80; SP(LMH)=0.90)  Top 5 predictions are experimental positives at very stringent threshold criteria (SE=0.95; SP(H)=0.63) T-cell proliferation

Multiple registers (SP=0.95, SE(LMHP =0.81): 58% of Test Set 1)  Mainly for medium and high binders  Experimental support: Sinha et al. for DRB1*0402  Is this why binding motifs are unsuccessful?

 Introduction  Structural Immunoinformatic Database development  Data Analysis  Computational models developed  Applications

 Autoimmune blistering skin disorder  Characterized by autoantibodies targeting desmoglein-3 (Dsg3)  Strong association with DR4 and DR6 alleles Pemphigus vulgaris (PV) adam.about.com

Who are the major players in PV?  DR4 PV implicated alleles (for Semitic)  DRB1*0401  DRB1*0402  DRB1*0404  DRB1*0406  DR6 PV implicated alleles (for Caucasians)  DRB1*1401  DRB1*1404  DRB1*1405  DQB1*0503

DR4 PV implicated alleles (DRB1*0401, *0402, *0404, *0406)  High sequence conservation  97.9 – 99.0% identity  98.4 – 99.5% similarity  High structural conservation  Cα RMSD <0.22 Å for all key binding pockets  7 polymorphic residues within binding cleft  Pocket 1 (β86),  Pocket 4 (β70, 71, 74)  Pocket 6 (β11)  Pocket 7 (β71)  Pocket 9 (β37) What is known about DR4?

DR6 PV implicated alleles (DRB1*1401, *1404, *1405, DQB1*0503)  High sequence conservation  85.8 – 94.1% identity  83.2 – 97.3% similarity  High structural conservation  Cα RMSD <0.22 Å for all key binding pockets  14 polymorphic residues within binding clefts  Pocket 1 (β86)  Pocket 4 (β13, 70, 71, 74, 78)  Pocket 6 (β11)  Pocket 7 (β28, 30, 67, 71)  Pocket 9 (β9, 37, 57, 60) What is known about DR6?

 9 stimulatory Dsg3 peptides tested on PV patients possessing DR4 and DR6 PV implicated alleles 1.Dsg (DR4, DR6) 2.Dsg (DR4, DR6) 3.Dsg (DR4, DR6) 4.Dsg (DR4, DR6) 5.Dsg (DR4, DR6) 6.Dsg (DR4, DR6) 7.Dsg (DR4, DR6) 8.Dsg (DR4) 9.Dsg (DR4) Clues…

DR4 PV  8/9 investigated Dsg3 peptides fit perfectly into DRB1*0402  Atomic clashes with all other investigated DR4 subtypes DR6 PV  6/9 investigated Dsg3 peptides fit perfectly into DRB1*0503  Atomic clashes with all other investigated DR6 subtypes  HLA association in DR6 PV more likely to be at DQ than DR locus  Consistent with experimental work done by Sinha et al. (2002, 2005, 2006) Disease associated alleles vs. innocent bystanders Tong et al. (2006) Immunome Research, 2: 1

 1/9 investigated Dsg3 peptides fits existing binding motifs  Flanking residues – clashes in fitting binding register  Register-shift for Peptide V (Dsg )  Detected binding register: Dsg  Binding motifs: Dsg (Veldman et al., 2003) : Dsg (Sinha et al., 2006) Whither sequence motifs (again!)?

 Docking of mer Dsg3 peptides generated using a sliding window of size 15 across the entire Dsg3 glycoprotein Large-scale screening of Dsg3 peptides Dsg3 peptide (sliding window width 15) NC Binding register (sliding window width 9) Flanking residues Tong et al. (2006) BMC Bioinformatics, 7(Suppl 5): S7  Training set: 8 peptides each, with exp. IC 50 values and known binding registers (5 binders and 3 non-binders)

Large-scale screening of Dsg3 peptides

Common epitopes possibly responsible for inducing disease in DR4 & DR6 patients Significant level of cross reactivity observed between DRB1*0402 and DQB1*0503 ( A ROC =0.93)  57% of peptides investigated in this study predicted to bind to both alleles with high affinity  90% of known Dsg3 peptides predicted to bind to both alleles  12/20 top predicted DQB1*0503-specific Dsg3 peptides from transmembrane region  All top predicted DQB1*0402-specific Dsg3 peptides from extracellular regions  Disease initiation implications: DR4 from ECD; DR6 from TM

Multiple binding registers revisited  76% (410/539) predicted high-affinity binders to DRB1*0402 possess > 2 binding registers  57% (384/673) predicted high-affinity binders to DQB1*0503 possess > 2 binding registers  66% (354/539) bind both alleles at different registers  Similar proportion (70%) detected in known binders to both alleles  Both alleles bind similar peptides via different binding registers

What next?  We have developed a predictive model for HLA-C (Cw*0401) with very limited (only six) experimental binding values.  The model yields excellent results for test data (A ROC =0.93).  Application to determine immunological hot spots for HIV-1 p24 gag and gp160 gag glycoproteins shows binding energies similar to HLA-A and –B.

Conclusions  Computational models for immunogenic epitope prediction can be successfully developed, even for alleles with limited experimental data.  While computations can never completely replace “wet-lab” experiments, in silico predictions can significantly cut down the development time of therapeutic vaccines.

Acknowledgements  Dr. (Victor) J.C. Tong, I2R, Singapore  A/Prof. Tin Wee Tan, NUS  Dr. Animesh Sinha, Weill Medical College of Cornell University & Michigan State University, USA  Drs. J. Tom August (JHU) and Vladimir Brusic (DFCI) (NIAID-NIH Grant #5 U19 AI56541 & Contract #HHSN C).  All of you!