Download presentation
Presentation is loading. Please wait.
Published bySandra Anastasia Hoover Modified over 9 years ago
1
Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont
2
HLA background and nomenclature Asymmetric Linkage Disequilibrium (ALD) –Motivation, Definition & Example Amino acid level analyses of HLA disease associations –SFVT Analysis & Pairwise allele level analyses –Conditional Haplotype analyses & ALD Identifying units of selection –ALD as a tool Outline
3
Klein J. et al New Eng J Med, 2000; 343:702-709 An extremely gene-rich region.
4
HLA molecules are cell-surface proteins that present peptide fragments to T-cells HLA molecules bind specific sets of peptides (based on structure) Any given HLA allele codes to present a subset of available peptides to T-cells
5
HLA-A * 24 : 02 : 01 : 02 : L Locus Field 1 (2-Digit) Serological level (where possible) Field 2 (4-Digit) Peptide level (amino acid difference) Field 3 (6-Digit) Nucleotide level [silent] (synonymous substitutions) Field 4 (8-Digit) Intron level (3’ or 5’ polymorphism) Expression N = null L = low S = soluble … For most analyses, we want to distinguish among unique peptide sequences, i.e., 2 fields (“4-digit”) level This level of resolution treats alleles with the same peptide sequence for exons 2 & 3 (class I) or exon 2 (class II) as being equivalent [“binning” alleles] HLA Allele Nomenclature
6
HLA Nomenclature and why it matters Challenges for HLA data management and analysis –The HLA genes are very polymorphic; –HLA nomenclature is complicated; –There are multiple ways to generate HLA data; –All common typing systems generate ambiguous data; –There are multiple ways to report alleles and ambiguities; These issues make meta-analyses of HLA data from different sources very difficult.
7
Extending STREGA to Immunogenomic Studies The STrengthening the REporting of Genetic Association studies (STREGA) statement provides community-based data reporting and analysis standards for genomic disease association studies The IDAWG (immunogenomics.org) has proposed an extension of STREGA: STrengthening the REporting of Immunogenomic Studies (STREIS)
8
From STREGA to STREIS Extensions to the STREGA guidelines for immunogenomic data include: Describing the system(s) used to store, manage, and validate genotype and allele data Documenting all methods applied to resolve ambiguity Defining any codes used to represent ambiguities -e.g., NMDP codes -A*0201/0209/0266 = A*02AJEY -A*0201/0209/0266/0275/0289= A*02BSFJ Describing any binning or combining of alleles into common categories -e.g., G-codes -A*0201/ 0209/ 0243N/ 0266/ 0275/ 0283N/ 0289 = “A020101g” Avoiding the use of subjective terms (e.g. high-resolution typing), that may change over time
9
Immunology Database and Analysis Portal (www.ImmPort.org) Developed under the Bioinformatics Integration Support Contract (BISC) for NIH, NIAID, & DAIT (Division of Allergy, Immunology, and Transplantation) –Data validation pipeline –Analysis tools –Standardized ambiguity reduction tools –Data from a large number of immunogenomic studies ImmunoGenomics Data Analysis Working Group (www.immunogenomics.org) (www.IgDAWG.org) An international collaborative group working to … –facilitate the sharing of immunogenomic data (HLA, KIR, etc.) and –foster consistent analysis and interpretation of immunogenomic data Resources for HLA Data Validation & Analysis
10
HLA background and nomenclature Asymmetric Linkage Disequilibrium (ALD) –Motivation, Definition & Example Amino acid level analyses of HLA disease associations –SFVT Analysis & Pairwise allele level analyses –Conditional Haplotype analyses & ALD Identifying units of selection –ALD as a tool Outline
11
Asymmetric Linkage Disequilibrium (ALD) -Standard LD measures give an incomplete description of the correlation of genetic variation at two loci when there are different numbers of alleles at the loci. -We developed a pair of conditional asymmetric LD (ALD) measures that more accurately capture this information. -For disease association studies, the ALD can help to identify when stratification analyses can be applied to detect primary disease predisposing genes. -For evolutionary studies, the ALD can be informative for the study of forces such as selection acting on individual amino acids, or other loci in high LD. -For SNP studies, ALD measures can be used for analyses of LD between haplotype blocks, for SNP–gene LD, and for haplotype block–gene LD.
12
The two most common measures of the strength of LD are: (1) the normalized measure of the individual LD values, namely D ij ' = D ij / D max (Lewontin 1964); and (2) the correlation coefficient r for bi-allelic data, which is most often reported as r 2 = D 2 / (p A1 p A2 p B1 p B2 ). r =1 only when the allelic variations at the two loci show 100% correlation Their multi-allelic extensions are: Linkage Disequilibrium (LD) Measures
13
When there are different numbers of alleles at two loci, the direct correlation property for the r measure is not retained. The asymmetric LD (ALD) measures more accurately reflect covariation at two loci. -W A/B and W B/A describe variation observed at the 1 st locus conditioned on the 2 nd Example: (two and three alleles at the A and B loci) f(A 1 B 1 ) = 0.3, f(A 2 B 2 ) = 0.5, f(A 2 B 3 ) = 0.2, W n = 1, W A/B = 1 and W B/A = 0.73, There is variation at the B locus on haplotypes containing the A 2 allele there is not 100% correlation. -ALD measures indicate that, with appropriate sample size, stratification analyses could be carried out for some comparisons. -W n = 1 could result in passing over these data for conditional analyses. Asymmetric LD measures: W A/B and W B/A
14
Standard LD measures D’ and Wn Standard LD measures (overall D’ & Wn) assume/force symmetry, even though with >2 alleles per locus that is not the case Data Source: Immport Study#SDY26: Identifying polymorphisms associated with risk for the development of myopericarditis following smallpox vaccine
15
Asymmetric Linkage Disequilibrium (ALD) Interpretation: ALD for HLA-DRB1 conditioning on HLA-DQA1 W DRB1 / DQA1 =.58 ALD for HLA-DQA1 conditioning on HLA-DRB1 W DQA1 / DRB1 =.95 The overall variation for DRB1 is relatively high given specific DQA1 alleles. The overall variation for DQA1 is relatively low given specific DRB1 alleles. ALD row gene conditional on column gene
16
Asymmetric Linkage Disequilibrium (ALD) Thomson and Single(2014) Genetics
17
Asymmetric Linkage Disequilibrium (ALD) Thomson and Single(2014) Genetics
18
Other Conditional Measures of LD Other measures of LD that are conditional have been proposed (Nei and Li, 1980; Chakravarti et al, 1984; Hudson, 1985; Kaplan and Weir, 1992; Guo SW, 1997). -They measure association between alleles at a marker locus (locus B) and alleles at a disease locus (locus A). -They were developed to account for study designs in which individuals are not randomly sampled from a single population, but where sampling intensity varies within disease categories. -They are equivalent to Somer’s D statistic defined on the contingency table relating two categorical variables In contrast, our statistic is a population-based measure that does not depend on a specific patient sampling scheme.
19
ALD & tag-SNPs in the HLA region DeBakker et al. (2006) identified tag-SNPs based on r 2 for SNPs with recoded HLA alleles (recoded as presence/absence of each specific HLA allele) DeBakker et al. (2006) Nature Genetics
20
ALD & tag-SNPs in the HLA region Thomson and Single(2014) Genetics
24
HLA background and nomenclature Asymmetric Linkage Disequilibrium (ALD) –Motivation, Definition & Example Amino acid level analyses of HLA disease associations –SFVT Analysis & Pairwise allele level analyses –Conditional Haplotype analyses & ALD Identifying units of selection –ALD as a tool Outline
25
Risk Category I I II III DRB1 *08:01 *11:04 *13:01 *11:01 *01:01 *03:01 *13:02 *04:04 *15:01 *07:01 *04:01 sum total patients 102 57 90 60 74 89 28 7 38 30 21 596 708 controls 13 11 38 36 50 61 23 16 80 65 47 440 546 OR 6.9 4.3 1.9 1.3 1.2 1.1 0.9 0.3 Overall p-value < 2.6E-27 Juvenile Idiopathic Arthritis oligoarticular persistent (JIA-OP) Common HLA-DRB1 alleles AA 86 implicated via pairwise within serogroup analysis
26
Sequence Feature Variant Type (SFVT) Analysis - Overview An exploratory approach for genetic association studies that uses combinations of amino acid (AA) residues as the unit of analysis. Goal: –To identify biologically relevant amino acid (AA) residues that account for the major disease risk attributable to HLA Genes/proteins are sub-divided into biologically relevant units affecting gene expression and/or protein function (i.e., Sequence Features) –Polymorphic AAs(single AA sites) –Structural features(e.g., beta 1 domain, alpha-helix 2, …) –Functional features(e.g., peptide binding, T-cell interacting, …) –Combinational(e.g., alpha-helix 2 & peptide binding, …)
27
www.immport.org
28
Summary of SFVT Analysis HLA Typing (Allele-level) Group HLA alleles based on structural/ functional sequence motifs (Sequence Features) Perform disease association tests based sequence motifs (Sequence Feature-level) Choose the top Sequence Features associated with disease risk for further study Identify individual AAs & combinations of AAs directly involved in disease risk ORs & p-values LD patterns Conditional/ Stratification analyses
29
Representative Sequence Features: HLA-DRB1 Table from Karp et al. (2010) Hum Molec Genet
30
Variant Types for HLA-DRB1_SF153 “beta-strand 2_peptide antigen binding” … 5 of 11 Variant Types (VTs) for Sequence Feature 153 (SF153) DRB1_SF153_VT1 (LEC): DRB1*0101, 0102, 0103, 0104, 0105, … DRB1_SF153_VT2 (FEL): DRB1*0113, 0701, 0703, 0704, 0705, … DRB1_SF153_VT3 (YDY): DRB1*0301, 0304, 0305, 0306, 0308, … Karp et al 2010 Hum Mol Gen
31
DRB1: AAs 13, 67, 37, 57, 74, 86 in binding pockets 6, 4, 7, and 9 DRB1Amino Acidsp-valueORmaxORmin AA position 13132.00E-284.90.33 Pocket 611, 13, 304.00E-287.10.31 Pocket 413, 26, 28, 70, 71, 74, 786.00E-286.80.28 DRB1 allele9…………………….861.00E-279.40.28 Pocket 728, 30, 47, 61, 67, 719.00E-279.40.28 AA positions X-LD[11, 12, 10, 16]9.00E-253.20.33 AA position 67673.00E-173.40.54 Pocket 99, 37, 574.00E-163.90.33 AA position 74744.00E-166.80.33 AA position 37374.00E-131.80.34 AA position 57576.00E-133.90.44 ………….…………………. AA position 8686ns1.10.9 AAs underlined have a potential effect on disease risk, the effect of those in italics may be explained by LD with AA 13. Note that AA 86 is NS by SFVT analysis SFVT analysis DRB1 summary for JIA-OP
32
SFVT Analysis - Summary An exploratory approach for identifying biologically relevant AAs in HLA association studies Pros –Utilizes information about the inter-relationships among HLA alleles –Covers more extended protein regions than single amino acid-based analyses Cons –Care is needed to address complex patterns of LD among AAs and SFs in order to identify AAs directly involved in disease –Due to multiple comparisons with highly correlated SFs appropriate p-value adjustments are necessary –The effects of some amino acids (or combinations) may be missed, so complementary analyses are useful
33
DRB1 Amino Acids 13 and 67 13 - 67 patientscontrolsOR G - F108 146.8 S - F 130 492.3 S - I 131 711.5 G - I 13 81.3 S - L102 801.0 R - I 44 910.2 others270233 p < 8E-9 AA 13 involved or an AA in LD overall p < 2E-28 Conditional Haplotype Analysis of JIA-OP
34
DRB1 Amino Acids 13 and 67 13 - 67 patientscontrolsOR G - F108 146.8 S - F 130 492.3 S - I 131 711.5 G - I 13 81.3 S - L102 801.0 R - I 44 910.2 others270233 p < 0.002 AA 67 involved or an AA in LD An extensive set of CH analyses are required, as well as consideration of LD patterns p < 0.001 AA 67 involved or an AA in LD Conditional Haplotype Analysis of JIA-OP
35
DRB1: AAs 13, 67, 37, 57, 74, 86 in binding pockets 6, 4, 7, and 9 DRB1Amino Acidsp-valueORmaxORmin AA position 13132.00E-284.90.33 Pocket 611, 13, 304.00E-287.10.31 Pocket 413, 26, 28, 70, 71, 74, 786.00E-286.80.28 DRB1 allele9…………………….861.00E-279.40.28 Pocket 728, 30, 47, 61, 67, 719.00E-279.40.28 AA positions X-LD[11, 12, 10, 16]9.00E-253.20.33 AA position 67673.00E-173.40.54 Pocket 99, 37, 574.00E-163.90.33 AA position 74744.00E-166.80.33 AA position 37374.00E-131.80.34 AA position 57576.00E-133.90.44 ………….…………………. AA position 8686ns1.10.9 AAs underlined have a potential effect on disease risk, the effect of those in italics may be explained by LD with AA 13. Note that AA 86 is NS by SFVT analysis SFVT analysis DRB1 summary for JIA-OP
36
LD for DRB1 AAs W n JIA controls ALD row gene conditional on column gene Asymmetric LD (ALD)W n (symmetric)
37
Conditional Haplotype Analysis of JIA-OP
38
ORAA position136774863757 6.9DRB1*0801GFLGYS 4.3DRB1*1104SFAVYD 1.9DRB1*1301SIAVND 1.3DRB1*1101SFAGYD 1.2DRB1*0101FLAGSD 1.1DRB1*0301SLRVND 0.9DRB1*1302SIAGND 0.3DRB1*0404HLAVYD 0.3DRB1*1501RIAVSD 0.3DRB1*0701YIQGFV 0.3DRB1*0401HLAGYD These alleles show the strongest evidence for direct involvement in JIA-OP disease risk The 6 identified AA sites uniquely define each allele, preventing further stratification analyses Common DRB1 Alleles & AAs in JIA-OP
39
HLA background and nomenclature Asymmetric Linkage Disequilibrium (ALD) –Motivation, Definition & Example Amino acid level analyses of HLA disease associations –SFVT Analysis & Pairwise allele level analyses –Conditional Haplotype analyses & ALD Identifying units of selection –ALD as a tool Outline
40
Balancing selection can result from: - Overdominance/Heterozygote advantage - Frequency-dependent selection - Selective regimes that change over time/space For HLA, the common factor in these models is rare allele advantage, which is consistent with a pathogen-directed frequency-dependent selection model. At the Amino Acid (AA) level we see -High AA variability at antigen recognition sites (ARS) -Relatively even AA frequencies at ARS sites -Higher rates of non-synonymous vs. synonymous changes at ARS Balancing Selection Operates at Most HLA Loci Meyer & Mack, 2008
41
Homozygosity (F) and the Normalized Deviate (Fnd) Neutrality F OBS ≈ F EQ F nd ≈ 0 Directional Selection F OBS > F EQ F nd > 0 Balancing Selection F OBS < F EQ F nd < 0 Fnd = (F OBS - F EQ ) / SD(F EQ )
42
Fnd for DRB1 AA sites in JIA Controls Fnd << 0 gives evidence of possible balancing selection. Fnd >> 0 gives evidence of possible directional selection.
43
Fnd for DRB1 AA sites (Meta-Analysis) Fnd for all polymorphic sites in a meta-analysis of 57 populations Fnd << 0 gives evidence of possible balancing selection. Fnd >> 0 gives evidence of possible directional selection.
44
Asymmetric LD : JIA – Controls (Row gene conditional on column gene) Wn : JIA – Controls Asymmetric LD (ALD) LD for DRB1 AAs Wn (symmetric)
45
Acknowledgements University of Sao Paulo Diogo Meyer University of Graz Wolfgang Helmberg Cincinnati Children’s Hospital Susan Thompson David Glass University of Texas Nishanth Marthandan Paula Guidry David Karp Richard Scheuermann Children's Hospital Oakland Research Inst. Steven J. Mack Jill A. Hollenbach Harvard Medical School Alex Lancaster UC Berkeley Glenys Thomson UC San Francisco Owen Solberg Roche Molecular Systems Henry A. Erlich Anthony Nolan Research Inst. Steven G.E. Marsh Matthew Waller NCBI/NIH Mike Feolo NGIT Jeff Wiser Patrick Dunn Tom Smith
46
Distributions of Fnd values Results from a meta-analysis of 497 HLA population studies in ten geographic regions
47
Solberg et al., 2008 Distributions of Fnd values
48
Cano & Fernandez-Vina (2009) described two sequence dimorphisms that define the primary immunodominant serological epitopes for HLA- DPB1. All DPB1 alleles can be divided into four serologic categories (DP1, DP2, DP3, and DP4): Evidence of Balancing Selection at HLA-DPB1
49
Global Distribution of DP serological categories
50
.. Fnd for DPB1 Alleles ( ) & DP Serological Categories ( )
51
Evidence of Balancing Selection at HLA-DPB1 We constructed a randomization test (“random binning” to 4 categories) to ensure that the effect was not driven by differences in the observed number of variants at the allele-level vs. serotype-level. Randomization tests have confirmed results for European populations more than in other geographic regions -A possible ascertainment bias? (many common alleles were first identified in European populations) -Could natural selection favoring DPB1 diversity at the serologic level be greater in Europe?
52
Evidence of Balancing Selection at HLA-DPB1 Supplementary Figure S1. Mean F nd values for trios of variant DPB1 Exon 2 amino acid positions
53
Acknowledgements University of Sao Paulo Diogo Meyer University of Graz Wolfgang Helmberg Cincinnati Children’s Hospital Susan Thompson David Glass University of Texas Nishanth Marthandan Paula Guidry David Karp Richard Scheuermann Children's Hospital Oakland Research Inst. Steven J. Mack Jill A. Hollenbach Harvard Medical School Alex Lancaster UC Berkeley Glenys Thomson UC San Francisco Owen Solberg Roche Molecular Systems Henry A. Erlich Anthony Nolan Research Inst. Steven G.E. Marsh Matthew Waller NCBI/NIH Mike Feolo NGIT Jeff Wiser Patrick Dunn Tom Smith
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.