Immunological Bioinformatics Introduction to the immune system.

Slides:



Advertisements
Similar presentations
Antigen Presentation K.J. Goodrum Department of Biomedical Sciences Ohio University 2005.
Advertisements

Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU.
Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Hidden Markov models) Morten.
Immune system overview in 10 minutes The non-immunologist guide to the immune system Morten Nielsen Department of Systems Biology DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS,
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
Immune system overview in 10 minutes The non-immunologist guide to the immune system.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence information, logos and Hidden Markov Models Morten Nielsen, CBS, BioCentrum,
Sequence motifs, information content, logos, and Weight matrices
Prediction of T cell epitopes using artificial neural networks
MHC binding and MHC polymorphism Or Finding the needle in the haystack.
MHC binding and MHC polymorphism. MHC-I molecules present peptides on the surface of most cells.
Morten Nielsen, CBS, BioCentrum, DTU
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioCentrum, DTU.
Immunological Bioinformatics Or Finding the needle in the haystack Morten Nielsen
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
Sequence motifs, information content, and sequence logos Morten Nielsen, CBS, Depart of Systems Biology, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Department of Systems Biology Technical University of Denmark Immunological Bioinformatics Introduction to the.
Lecture outline Capture of antigens from sites of entry and display of antigens to T cells Function of MHC molecules as the peptide display molecules of.
Hidden Markov Models, HMM’s Morten Nielsen, CBS, Department of Systems Biology, DTU.
Computer Aided Vaccine Design Dr G P S Raghava. Concept of Drug and Vaccine Concept of Drug Concept of Drug –Kill invaders of foreign pathogens –Inhibit.
MHC Polymorphism Ole Lund. Objectives What is HLA polymorphism? What is it good for? How does it make life difficult for vaccine design? Definition of.
Sequence motifs, information content, logos, and Weight matrices Morten Nielsen, CBS, BioCentrum, DTU.
Computational Immunology An Introduction Rose Hoberman BioLM Seminar April 2003.
Characterizing receptor ligand interactions Morten Nielsen, CBS, Depart of Systems Biology, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Department of Systems Biology Technical University of Denmark Immunological Bioinformatics Processing, combined.
Heuristic alignment algorithms and cost matrices
Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU.
MHC Polymorphism. MHC Class I pathway Figure by Eric A.J. Reits.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioCentrum, DTU.
Class I pathway Prediction of proteasomal cleavage and TAP binidng Morten Nielsen, CBS, BioCentrum, DTU.
Immunological Bioinformatics. The Immunological Bioinformatics group Immunological Bioinformatics group, CBS, Technical University of Denmark (
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Technical University of Denmark - DTU Department of systems biology Biopeople Tutorial 2011 Immunological Bioinformatics.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioSys, DTU.
Psi-Blast Morten Nielsen, CBS, Department of Systems Biology, DTU.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioCentrum, DTU.
Hidden Markov Models, HMM’s Morten Nielsen, CBS, BioSys, DTU.
Epitope Selection Rational Vaccine design. Why? Therapeutic vaccines Therapeutic vaccines Treatment of viral infections (e.g., HIV, HCV), and resistant.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioSys, DTU.
T Cell Receptor (TCR) & MHC Complexes-Antigen Presentation
MHC and its functions Review: Class I/peptide TCR/CD8 cytotoxic function Class II/peptide TCR/CD4 Helper function TH1 Macrophages TH2 B cells Strong selective.
Immunogen, antigen, epitope, hapten
The Major Histocompatibility Complex And Antigen Presentation
INTRA Proteasome TAP MHC I Golgi Calnexin Calreticulin Tapasin CD8 T C EXTRA Li MHC II Golgi Vesicle CLIP HLA-DM CD4 T H Summary.
Hidden Markov Models, HMM’s Morten Nielsen, CBS, BioSys, DTU.
Telling self from non-self: Learning the language of the Immune System Rose Hoberman and Roni Rosenfeld BioLM Workshop May 2003.
Artificial Neural Networks 1 Morten Nielsen Department of Systems Biology, DTU.
Fe A. Bartolome, MD, FPASMAP Department of Microbiology Our Lady of Fatima University.
Weight matrices, Sequence motifs, information content, and sequence logos Morten Nielsen, CBS, Department of Systems Biology, DTU and Instituto de Investigaciones.
Immunology B cells and Antibodies – humoral
Specific Defenses of the Host Part 2 (acquired or adaptive immunity)
Bioinformatics in Vaccine Design
Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.
Lecture 13 Immunology and disease: parasite antigenic diversity.
Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.
Adaptive Immunity I.
T cell receptor & MHC complexes-Antigen presentation
T Cell Receptor (TCR) & MHC Complexes-Antigen Presentation
The Body’s Defense Against Pathogens -- Memory
Immunological Bioinformatics
Immunology and disease: parasite antigenic diversity
Motifs, logos, and Profile HMM’s
Adaptive Immune System
Sequence motifs, information content, and sequence logos
Telling self from non-self: Learning the language of the Immune System
Immunological Bioinformatics
Sequence motifs, information content, logos, and HMM’s
Sequence motifs, information content, and sequence logos
Introduction/Terminology
Presentation transcript:

Immunological Bioinformatics Introduction to the immune system

Vaccination Administration of a substance to a person with the purpose of preventing a disease Traditionally composed of a killed or weakened micro organism Vaccination works by creating a type of immune response that enables the memory cells to later respond to a similar organism before it can cause disease

Figure 1-20

Effectiveness of vaccines 1958 start of small pox eradication program

The Immune System The innate immune system The adaptive immune system

The innate immune system Unspecific Antigen independent Immediate response No training/selection hence no memory Pathogen independent (but response might be pathogen type dependent)

The adaptive immune system Pathogen specific –Humoral –Cellular Bacteria Virus Parasite

Adaptive immune response Signal induced –Pathogens Antigens –Epitopes B Cell T Cell

Diversity is a hallmark of the (adaptive) immune system Diversity of lymphocytes –Huge diversity within a host –At least 10 8 different T & B cell clones Receptors made by recombination & N- additions, and Somatic mutation during immune response Repertoires are (partly) random –Randomness requires self tolerance

Figure 1-14

The role of lymphocytes

Cartoon by Eric Reits Humoral immunity

Antibody - Antigen interaction Fab Antigen Epitope Paratope Antibody The antibody recognizes structural properties of the surface of the antigen

Antibody Effect Virus or ToxinNeutralizing Antibodies

Cellular immune response Cartoon by Eric Reits

MHC-I molecules present peptides on the surface of most cells

CTL response Healthy cell Virus- infected cell MHC-I

CTL response Virus- infected cell MHC-I

The death of an infected cell

Polymorphism of MHC Within a host limited number of loci (genes) –only 6 different class I molecules (two A, B and C) –only 12 different class II molecules Within a population > 100 alleles per locus

More MHC molecules: more diversity in the presented peptides 1% probability that MHC molecule presents a peptide Different hosts sample different peptides from same pathogen.

Immunological benefits of MHC polymorphism Heterozygote advantage –Heterozygotes have a selective advantage because they can present more peptides (Hughes.n88). Coevolution –Pathogens avoid presentation on common MHC alleles (HIV) –Frequency dependent selection

Figure 5-13

Heterozygote disadvantage! (for vaccine design) Few human beings will share the same set of HLA alleles –Different persons will react to a pathogen infection in a non-similar manner A CTL based vaccine must include epitopes specific for each HLA allele in a population –A CTL based vaccine must consist of ~800 HLA class I epitopes and ~400 class II epitopes

HLA specificity clustering A0201 A0101 A6802 B0702

HLA polymorphism - supertypes Each HLA molecule within a supertype binds essentially the same peptides Nine major HLA class I supertypes have been defined HLA-A1, A2, A3, A24,B7, B27, B44, B58, and B62 And maybe add three more HLA-A26, HLA-B8, and HLA-B39 => A CTL based vaccine must consist of 9-12 HLA class I epitopes Sette et al, Immunogenetics (1999) 50:

Summary The adaptive immune system is extremely diverse –A immune responds can by raised against any thing foreign! Antibodies defines the humoral response –Antibodies recognize structural properties on the surface of extra cellular antigens T cells defines the cellular response –CTL’s kill cell that present MHC molecules bound with intra cellular derived foreign peptides

Anchor positions MHC class I with peptide

What makes a peptide a potential and effective epitope? Part of a pathogen protein Successful processing –Proteasome cleavage –TAP binding Binds to MHC molecule Protein function and expression –Early in replication –Highly expressed proteins are more likely to generate immunogens Sequence conservation in evolution

Prediction of HLA binding specificity Historical overview Simple Motifs –Allowed/non allowed amino acids Extended motifs –Amino acid preferences (SYFPEITHI)SYFPEITHI) –Anchor/Preferred/other amino acids Hidden Markov models –Peptide statistics from sequence alignment Neural networks –Can take sequence correlations into account

SYFPEITHI predictions Extended motifs based on peptides from the literature and peptides eluted from cells expressing specific HLAs ( i.e., binding peptides) Scoring scheme is not readily accessible. Positions defined as anchor or auxiliary anchor positions are weighted differently (higher) The final score is the sum of the scores at each position Predictions can be made for several HLA-A, -B and - DRB1 alleles, as well as some mice K, D and L alleles.

BIMAS Matrix made from peptides with a measured T 1/2 for the MHC-peptide complex The matrices are available on the website The final score is the product of the scores of each position in the matrix multiplied with a constant, different for each MHC, to give a prediction of the T 1/2 Predictions can be obtained for several HLA-A, -B and - C alleles, mice K,D and L alleles, and a single cattle MHC.

SLLPAIVEL YLLPAIVHI TLWVDPYEV GLVPFLVSV KLLEPVLLL LLDVPTAAV LLDVPTAAV LLDVPTAAV LLDVPTAAV VLFRGGPRG MVDGTLLLL YMNGTMSQV MLLSVPLLL SLLGLLVEV ALLPPINIL TLIKIQHTL HLIDYLVTS ILAPPVVKL ALFPQLVIL GILGFVFTL STNRQSGRQ GLDVLTAKV RILGAVAKV QVCERIPTI ILFGHENRV ILMEHIHKL ILDQKINEV SLAGGIIGV LLIENVASL FLLWATAEA SLPDFGISY KKREEAPSL LERPGGNEI ALSNLEVKL ALNELLQHV DLERKVESL FLGENISNF ALSDHHIYL GLSEFTEYL STAPPAHGV PLDGEYFTL GVLVGVALI RTLDKVLEV HLSTAFARV RLDSYVRSL YMNGTMSQV GILGFVFTL ILKEPVHGV ILGFVFTLT LLFGYPVYV GLSPTVWLS WLSLLVPFV FLPSDFFPS CLGGLLTMV FIAGNSAYE KLGEFYNQM KLVALGINA DLMGYIPLV RLVTLKDIV MLLAVLYCL AAGIGILTV YLEPGPVTA LLDGTATLR ITDQVPFSV KTWGQYWQV TITDQVPFS AFHHVAREL YLNKIQNSL MMRKLAILS AIMDKNIIL IMDKNIILK SMVGNWAKV SLLAPGAKQ KIFGSLAFL ELVSEFSRM KLTPLCVTL VLYRYGSFS YIGEVLVSV CINGVCWTV VMNILLQYV ILTVILGVL KVLEYVIKV FLWGPRALV GLSRYVARL FLLTRILTI HLGNVKYLV GIAGGLALL GLQDCTMLV TGAPVTYST VIYQYMDDL VLPDVFIRC VLPDVFIRC AVGIGIAVV LVVLGLLAV ALGLGLLPV GIGIGVLAA GAGIGVAVL IAGIGILAI LIVIGILIL LAGIGLIAA VDGIGILTI GAGIGVLTA AAGIGIIQI QAGIGILLA KARDPHSGH KACDPHSGH ACDPHSGHF SLYNTVATL RGPGRAFVT NLVPMVATV GLHCYEQLV PLKQHFQIV AVFDRKSDA LLDFVRFMG VLVKSPNHV GLAPPQHLI LLGRNSFEV PLTFGWCYK VLEWRFDSR TLNAWVKVV GLCTLVAML FIDSYICQV IISAVVGIL VMAGVGSPY LLWTLVVLL SVRDRLARL LLMDCSGSI CLTSTVQLV VLHDDLLEA LMWITQCFL SLLMWITQC QLSLLMWIT LLGATCMFV RLTRFLSRV YMDGTMSQV FLTPKKLQC ISNDVCAQV VKTDGNPPE SVYDFFVWL FLYGALLLA VLFSSDFRI LMWAKIGPV SLLLELEEV SLSRFSWGA YTAFTIPSI RLMKQDFSV RLPRIFCSC FLWGPRAYA RLLQETELV SLFEGIDFY SLDQSVVEL RLNMFTPYI NMFTPYIGV LMIIPLINV TLFIGSHVV SLVIVTTFV VLQWASLAV ILAKFLHWL STAPPHVNV LLLLTVLTV VVLGVVFGI ILHNGAYSL MIMVKCWMI MLGTHTMEV MLGTHTMEV SLADTNSLA LLWAARPRL GVALQTMKQ GLYDGMEHL KMVELVHFL YLQLVFGIE MLMAQEALA LMAQEALAF VYDGREHTV YLSGANLNL RMFPNAPYL EAAGIGILT TLDSQVMSL STPPPGTRV KVAELVHFL IMIGVLVGV ALCRWGLLL LLFAGVQCQ VLLCESTAV YLSTAFARV YLLEMLWRL SLDDYNHLV RTLDKVLEV GLPVEYLQV KLIANNTRV FIYAGSLSA KLVANNTRL FLDEFMEGV ALQPGTALL VLDGLDVLL SLYSFPEPE ALYVDSLFF SLLQHLIGL ELTLGEFLK MINAYLDKL AAGIGILTV FLPSDFFPS SVRDRLARL SLREWLLRI LLSAWILTA AAGIGILTV AVPDEIPPL FAYDGKDYI AAGIGILTV FLPSDFFPS AAGIGILTV FLPSDFFPS AAGIGILTV FLWGPRALV ETVSEQSNV ITLWQRPLV Sequence information

Sequence Information Calculate p a at each position Entropy Information content Conserved positions –P V =1, P !v =0 => S=0, I=log(20) Mutable positions –P aa =1/20 => S=log(20), I=0 Say that a peptide must have L at P 2 in order to bind, and that A,F,W,and Y are found at P 1. Which position has most information? How many questions do I need to ask to tell if a peptide binds looking at only P 1 or P 2 ? P1: 4 questions (at most) P2: 1 question (L or not) P2 has the most information

Information content A R N D C Q E G H I L K M F P S T W Y V S I

Sequence logos Height of a column equal to I Relative height of a letter is p Highly useful tool to visualize sequence motifs High information positions HLA-A0201

Characterizing a binding motif from small data sets What can we learn? 1.A at P1 favors binding? 2.I is not allowed at P9? 3.K at P4 favors binding? 4.Which positions are important for binding? lALAKAAAAM lALAKAAAAN lALAKAAAAR lALAKAAAAT lALAKAAAAV lGMNERPILT lGILGFVFTM lTLNAWVKVV lKLNEPVLLL lAVVPFIVSV 10 MHC restricted peptides

Simple motifs Yes/No rules lALAKAAAAM lALAKAAAAN lALAKAAAAR lALAKAAAAT lALAKAAAAV lGMNERPILT lGILGFVFTM lTLNAWVKVV lKLNEPVLLL lAVVPFIVSV 10 MHC restricted peptides Only 11 of 212 peptides identified! Need more flexible rules If not fit P1 but fit P2 then ok Not all positions are equally important We know that P2 and P9 determines binding more than other positions Cannot discriminate between good and very good binders

Simple motifs Yes/No rules Example Two first peptides will not fit the motif. They are all good binders (aff< 500nM) RLLDDTPEV 84 nM GLLGNVSTV 23 nM ALAKAAAAL 309 nM lALAKAAAAM lALAKAAAAN lALAKAAAAR lALAKAAAAT lALAKAAAAV lGMNERPILT lGILGFVFTM lTLNAWVKVV lKLNEPVLLL lAVVPFIVSV 10 MHC restricted peptides

Extended motifs Fitness of aa at each position given by P(aa) Example P1 P A = 6/10 P G = 2/10 P T = P K = 1/10 P C = P D = …P V = 0 Problems –Few data –Data redundancy/duplication lALAKAAAAM lALAKAAAAN lALAKAAAAR lALAKAAAAT lALAKAAAAV lGMNERPILT lGILGFVFTM lTLNAWVKVV lKLNEPVLLL lAVVPFIVSV RLLDDTPEV 84 nM GLLGNVSTV 23 nM ALAKAAAAL 309 nM

Sequence information Raw sequence counting lALAKAAAAM lALAKAAAAN lALAKAAAAR lALAKAAAAT lALAKAAAAV lGMNERPILT lGILGFVFTM lTLNAWVKVV lKLNEPVLLL lAVVPFIVSV

Sequence weighting lALAKAAAA M lALAKAAAAN lALAKAAAAR lALAKAAAAT lALAKAAAAV lGMNERPILT lGILGFVFTM lTLNAWVKV V lKLNEPVLLL lAVVPFIVSV Poor or biased sampling of sequence space Example P1 P A = 2/6 P G = 2/6 P T = P K = 1/6 P C = P D = …P V = 0 } Similar sequences Weight 1/5 RLLDDTPEV 84 nM GLLGNVSTV 23 nM ALAKAAAAL 309 nM

Sequence weighting lALAKAAAAM lALAKAAAAN lALAKAAAAR lALAKAAAAT lALAKAAAAV lGMNERPILT lGILGFVFTM lTLNAWVKVV lKLNEPVLLL lAVVPFIVSV

Pseudo counts lALAKAAAA M lALAKAAAA N lALAKAAAA R lALAKAAAA T lALAKAAAA V lGMNERPIL T lGILGFVFT M lTLNAWVKV V lKLNEPVLL L lAVVPFIVSV I is not found at position P9. Does this mean that I is forbidden (P(I)=0)? No! Use Blosum substitution matrix to estimate pseudo frequency of I at P9

A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V The Blosum matrix Some amino acids are highly conserved (i.e. C), some have a high change of mutation (i.e. I)

A R N D C Q E G H I L K M F P S T W Y V A R N D C …. Y V What is a pseudo count? Say I observe V at P1 Knowing that V at P1 binds, what is the probability that a peptide could have I at P1? P(I|V) = 0.16

Calculate observed amino acids frequencies f a Pseudo frequency for amino acid b Example lALAKAAAA M lALAKAAAAN lALAKAAAAR lALAKAAAAT lALAKAAAAV lGMNERPILT lGILGFVFTM lTLNAWVKV V lKLNEPVLLL lAVVPFIVSV Pseudo count estimation

lALAKAAAAM lALAKAAAAN lALAKAAAAR lALAKAAAAT lALAKAAAAV lGMNERPILT lGILGFVFTM lTLNAWVKVV lKLNEPVLLL lAVVPFIVSV Weight on pseudo count Pseudo counts are important when only limited data is available With large data sets only “true” observation should count  is the effective number of sequences (N-1),  is the weight on prior

Example If  large, p ≈ f and only the observed data defines the motif If  small, p ≈ g and the pseudo counts (or prior) defines the motif  is [50-200] normally lALAKAAAAM lALAKAAAAN lALAKAAAAR lALAKAAAAT lALAKAAAAV lGMNERPILT lGILGFVFTM lTLNAWVKVV lKLNEPVLLL lAVVPFIVSV Weight on pseudo count

Sequence weighting and pseudo counts RLLDDTPEV 84nM GLLGNVSTV 23nM ALAKAAAAL 309nM P 7P and P 7S > 0 lALAKAAAA M lALAKAAAAN lALAKAAAAR lALAKAAAAT lALAKAAAAV lGMNERPILT lGILGFVFTM lTLNAWVKV V lKLNEPVLLL lAVVPFIVSV

Position specific weighting We know that positions 2 and 9 are anchor positions for most MHC binding motifs –Increase weight on high information positions Motif found on large data set

Weight matrices Estimate amino acid frequencies from alignment including sequence weighting and pseudo count What do the numbers mean? –P2(V)>P2(M). Does this mean that V enables binding more than M. –In nature not all amino acids are found equally often q M = 0.025, q V = Finding 7% V is hence not significant, but 2% M highly significant In nature V is found more often than M, so we must somehow rescale with the background A R N D C Q E G H I L K M F P S T W Y V

Weight matrices A weight matrix is given as W ij = log(p ij /q j ) –where i is a position in the motif, and j an amino acid. q j is the background frequency for amino acid j. W is a L x 20 matrix, L is motif length A R N D C Q E G H I L K M F P S T W Y V

Score sequences to weight matrix by looking up and adding L values from the matrix A R N D C Q E G H I L K M F P S T W Y V Scoring a sequence to a weight matrix RLLDDTPEV GLLGNVSTV ALAKAAAAL Which peptide is most likely to bind? Which peptide second? nM 23nM 309nM