Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evolution as a Confounding Factor in Genetic Association Studies 14 December 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern.

Similar presentations


Presentation on theme: "Evolution as a Confounding Factor in Genetic Association Studies 14 December 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern."— Presentation transcript:

1 Evolution as a Confounding Factor in Genetic Association Studies 14 December 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern Medical Center

2 Current projects

3 Outline Hypothesis that evolution should be considered a confounding factor in genetic association studies HLA-mediated autoimmune disease predisposition analysis using SFVT Identifying genetic determinants of influenza species jump events based on convergent evolution Novel general strategies for formally controlling for evolution as a confounding factor in genetic association studies

4 EVOLUTION AS A CONFOUNDING FACTOR IN GENETIC ASSOCIATION STUDIES

5 Population-based genetic association Many diseases exhibit evidence of genetic predispositions Genotype-phenotype association studies ◦ Diagnostic biomarker ◦ Molecular underpinnings of disease pathology GWAS and linkage disequilibrium ◦ Co-inheritance of “linked” genetic markers ◦ Advantage of using SNPs to detect causal variants NGS could obviate the need for using linked SNPs

6 Statistical assumptions Independence (confounding) Random sampling (bias) ◦ Population has reached equilibrium ◦ Test sample represents a random sampling of the equilibrium population

7 HLA-MEDIATED AUTOIMMUNE DISEASE PREDISPOSITION ANALYSIS USING SFVT

8 Class I and II Peptide Sources

9 9 HLA and autoimmune disease DiseaseHLA AlleleRelative Risk Ankylosing spondylitisB2787.4 Postgonococcal arthritisB2714.0 Acute anterior uveitisB2714.6 Rheumatoid arthritisDR45.8 Chronic active hepatitisDR313.9 Sjogren syndromeDR39.7 Insulin-dependent diabetesDR3/DR414.3 21-Hydroxylase deficiencyBW4715.0 Robbins Pathologic Basis of Disease 6th Edition (1999)

10 10 HLA and infectious disease Correlation between HLA genotype and HIV viral burden and progression to AIDS M Dean, M Carrington and SJ O'Brien Annual Review of Genomics and Human Genetics Vol. 3: 263-292 (2002)

11 11 HLA and drug sensitivity HLA alleledrug sensitivityassociationprevalence B*1502carbamazepine (epilepsy)p = 3 x 10 -27 high Chinese absent Caucasians B*5701abacavir (HIV)p = 5 x 10 -20 high Caucasians absent in Africans, Hispanics B*5801allopurinol (gout)p = 5 x 10 -24 high Chinese P. Parham

12 12 HLA-A HLA-BHLA-C 697 (24)1109 (49)381 (9) HLA-DRBHLA-DQA1HLA-DQB1HLA-DPA1HLA-DPB1 690 (20) 3495 (7)27131 MICAMICBTAP 653011 Figures in parenthesis indicate the number of serologically defined antigens at each locus. 500 new submission each year. Number of HLA Alleles IMGT HLA - October 2008

13 HLA Allele Nomenclature 13 HLA - A * 24 02 01 01 Locus Asterisk Allele family (serological where possible) Amino acid difference Non-coding (silent) polymorphism Intron, 3’ or 5’ polymorphism N = null L = low S = Sec. A = Abr. Q = Quest. HLA - A * 24 02 01 02 L

14 14 DRB1 phylogeny DRB1*07 DRB1*09 DRB1*10 DRB1*04 DRB1*16 DRB1*15

15 15 DRB1 phylogeny DRB1*13

16 16 DRB1 phylogeny DRB1*07 DRB1*09 DRB1*10 DRB1*04 DRB1*16 DRB1*15

17 17 DRB1 alignment 07/1507/0909/15

18 Limitations with traditional HLA allele-based association studies Treats entire allele as a single unit and therefore includes both causative and passenger variations Doesn’t take into account structural relationships between alleles ◦ Syntax of the HLA nomenclature was designed to capture some of the structural relationships between alleles, but there are several exceptions

19 19 HLA–mediated disease predisposition Hypothesis: ◦ While the allelic/haplotypic structures reflect evolutionary history of the locus, it is the focused regions in the HLA genes/proteins that affect gene expression, protein structure and/or protein function that are responsible for enhanced disease risk

20 20 An alternative approach DAIT-Data Interoperability Steering Committee/HLA Working Group members HLA Nomenclature : WHO/ IMGT – HLA/ Anthony Nolan Research Institute NCBI - dbMHC Biomedical ontology people

21 Summary of SFVT approach Define individual sequence features (SF) in HLA proteins (genes) Determine the extent of polymorphism for each sequence feature by defining the observed variant types (VT) Re-annotate HLA typing information with complete list of VT for each SF Examine the association between every sequence feature variant type and disease or other phenotype 21

22 Representative Sequence Features

23 23 A*0201 - ‘peptide binding’ SF

24 A*0201 - ‘peptide binding pocket B’ 24

25 25 A*0201 - ‘CD8 binding’ & ‘TCR binding’ SF CD8 Binding TCR Binding

26 Summary of SFs defined 1775 total

27 Variant Types for Hsa_HLA-DRB1_beta-strand 2_peptide antigen binding

28 Representative Sequence Features Variant Types

29 HLA SFVT Association with Systemic Sclerosis Summary of data set ◦ Systemic sclerosis (SSc, scleroderma) is a chronic condition characterized by altered immune reactivity, thickened skin, endothelial dysfunction, interstitial fibrosis, gangrene, pulmonary hypertension, gastrointestinal tract dysmotility, and renal arteriolar dysfunction. ◦ A large cohort of ~1300 SSc patients and ~1000 healthy controls has been assembled by Drs. Frank C. Arnett, John Reveille and colleagues at the University of Texas Health Science Center at Houston. ◦ Information on autoantibody reactivity for over 15 nuclear antigens is available. ◦ 4-digit typing has been done for DRB1, DQA1, and DQB1 in all individuals. Initial re-annotation of 4 digit DRB1 typing data ◦ DRB1*1104 => SF1_VT43; SF2_VT4; SF3_VT12 ……… Statistical analysis ◦ Split data set into two - pseudo-replicates ◦ 2 x n contingency table for every SF (286), where n = number of VT ◦ Chi-squared or Fisher’s Exact Test analysis ◦ Select SF with adjusted p-value <0.01 (83/286) ◦ 2 x 2 contingency table (type vs non-type) for every VT (418 total) ◦ Merge results of pseudo-replicates 29

30 DRB1*0101 Visualization

31 Composite SF - Risk and Protective Variants

32 DRB1*0101 Visualization 67F 70D 71R 86V 26F 37Y 30Y 28D 67I 70D 71R 86G 26F 37F 30L 28E protectiverisk

33 Publication

34 34 ImmPort HLA SFVT Workflow Table of subject vs. HLA 4-digit typing data Table of subject vs. SFVT feature vector Table of p-values, adj. p-values, odds ratio, confidence intervals CD8 Binding TCR Binding

35 35 Summary SFVT Approach ◦ Proposed a novel approach for HLA disease associations based on sequence feature variant type analysis (SFVT) ◦ Defined structural and functional protein sequence features (SF) for all classical human MHC class I and II proteins ◦ Determined variant types (VT) for all SF in known alleles ◦ Available in ImmPort www.immport.org, IMGT-HLA and dbMHCwww.immport.org Systemic Sclerosis Analysis ◦ Based on the SFVT approach, identified a region of the HLA-DRB1 protein centered around peptide-binding pocket 7 that appears to be associated with disease risk ◦ Sequences found in HLA-DRB1*1104 at positions 28, 30, 37, 67 and 86, especially with aromatic amino acids, were associated with increase disease risk ◦ Sequences found in this region of HLA-DRB1*0302 appear to be protective ◦ Different alleles are associated with altered risk in different racial/ethnic populations, but they share common SFVTs ◦ SFVTs associated with risk of developing SSc are different in patients with anti-topo versus anti-cent antibodies, supporting the idea that these are distinct disease ◦ However, the risk-associated SFVTs are from the same SFs suggesting a common mechanism of disease pathogenesis

36 IRD Overview www.fludb.org

37 Influenza A Sequence Features as of 18JUL2011 4128 SFs total

38 SF8 (nuclear export signal)

39 VT for SF8 (nuclear export signal)

40 VT-1 strains

41 VT distribution by host

42 GENETIC DETERMINANTS OF INFLUENZA SPECIES JUMP EVENTS BASED ON CONVERGENT EVOLUTION

43 Flu pandemics of the 20 th and 21 st centuries initiated by species jump events 1918 flu pandemic (Spanish flu) ◦ subtype H1N1 (avian origin) ◦ estimated to have claimed between 2.5% to 5.0% of the world’s population (20 > 100 million deaths) Asian flu (1957 – 1958) ◦ subtype H2N2 (avian origin) ◦ 1 - 1.5 million deaths Hong Kong flu (1968 – 1969) ◦ subtype H3N2 (avian origin) ◦ between 750,000 and 1 million deaths 2009 H1N1 ◦ subtype H1N1 (swine origin) ◦ ~ 16,000 deaths as of March 2010

44 Pandemic stages Adaptive drivers

45 Basic reproductive number (R 0 ) Total number of secondary cases per case Reasonable surrogate of fitness Characteristics of pandemic viruses: ◦ R 0 H >1, and ◦ In genetic neighborhood of viruses with R 0 R>1 and R 0 H<1 Adaptive drivers Pandemic Viruses (R 0 H >1) Pandemic Viruses (R 0 H >1) Stuttering viruses (R 0 R>1 and R 0 H<1) Stuttering viruses (R 0 R>1 and R 0 H<1) Reservoir virus (R 0 R>1 and R 0 H<<1) Reservoir virus (R 0 R>1 and R 0 H<<1) A1A2

46 Adaptive drivers Pepin KM et al. (2010) “Identifying genetics markers of adaptation for surveillance of viral host jump” Nature Reviews Microbiology 8: 802-814.

47 Stuttering transmission and adaptive drivers Stuttering transmission can reveal adaptive drivers by evidence of convergent evolution ◦ Odds of finding the same neutral mutation by chance in multiple species jumps is low ◦ Therefore, finding same mutation in multiple independent species jump events is strong evidence for adaptive driver

48 Genetic convergence during species jump Virus isolate groups from IRD ◦ Avian H5N1 (PB2) from Southeast Asia* up to 2003 (260 records) – reservoirs of source viruses ◦ Human H5N1 (PB2) from Southeast Asia 2003-present (165 records) – many examples of independent species jumps Align amino acid sequence and calculate conservation score Identify highly conserved positions in avian records (≤1/260 variants) (557positions/759) – functionally restricted in reservoir Select subset in which two or more human isolates contained the same sequence variant – either due to human-human transmission or convergent evolution *China, Hong Kong, Indonesia, Thailand, Viet Nam

49 Strain Search – PB2 avian H5N1 Southeast Asia up to 2003

50 260 PB2 records

51 Sequence variation analysis

52 Position order

53 Order by conservation score

54 My Workbench

55 Convergent evolution candidates d d d

56 Surface exposed PB2_A/MEXICO/INDRE4487/2009(H1N1) Conservation score All convergent evolution candidates 586, 591, 627, 629

57 Convergent evolution candidates

58 E627K

59 E627K and species jump

60 P465S

61 K660R

62 Global analysis of human H5 HA clades 1 2.1.2 2.3.4 2.1.3

63 Founder effects

64 Convergent evolution HK2010 Indo2006 VN2008 China2005 HK2010 VN2007 VN2005 VN2004 Indo2006

65 Summary Human influenza pandemics are initiated by species jump events followed by sustained human to human transmission (R 0 H>1) Multiple independent occurrences of the same mutation during stuttering transmission is evidence of convergent evolution of adaptive drivers – hypotheses for experimental testing Surveillance for adaptive drivers in reservoir species could help anticipate the next pandemic N01AI40041

66 TOWARD A GENERAL STRATEGY FOR CONTROLLING FOR EVOLUTION AS A CONFOUNDING FACTOR IN GENETIC ASSOCIATION STUDIES

67

68 68 HLA SFVT Acknowledgements BISC ImmPort Team David Karp (UTSW) Nishanth Marthandan (UTSW) Paula Guidry (UTSW) Frank C. Arnett (UTH) John Reveille (UTH) Chul Ahn (UTSW) Glenys Thompson (Berkeley) Tom Smith (NG) Jeff Wiser (NG) DAIT HLA Working Group David DeLuca (Hannover) Raymond Dunivin (NCBI) Michael Feolo (NCBI) Wolfgang Helmberg (Graz) Steven G. E. Marsh (ANRI) David Parrish (ITN) Bjoern Peters (LIAI) Effie Petersdorf (FHCRC) Matthew J. Waller (ANRI) Sequence Ontology WG Michael Ashburner (Cambridge) Lindsay Cowell (Duke) Alexander D. Diehl (Jackson) Karen Eilbeck (Utah) Suzanna Lewis (LBNL) Chris Mungall (LBNL) Darren A. Natale (Georgetown) Barry Smith (Buffalo) With support from NIAID N01AI40076

69 69 U.T. Southwestern – Richard Scheuermann (PI) – Burke Squires – Jyothi Noronha – Victoria Hunt ◦ Shubhada Godbole – Brett Pickett – Yun Zhang – Haizhou Liu MSSM – Adolfo Garcia-Sastre – Eric Bortz – Gina Conenello – Peter Palese Vecna – Chris Larsen – Al Ramsey LANL – Catherine Macken – Mira Dimitrijevic U.C. Davis ◦ Nicole Baumgarth Northrop Grumman ◦ Ed Klem ◦ Mike Atassi ◦ Kevin Biersack ◦ Jon Dietrich ◦ Wenjie Hua ◦ Wei Jen ◦ Sanjeev Kumar ◦ Xiaomei Li ◦ Zaigang Liu ◦ Jason Lucas ◦ Michelle Lu ◦ Bruce Quesenberry ◦ Barbara Rotchford ◦ Hongbo Su ◦ Bryan Walters ◦ Jianjun Wang ◦ Sam Zaremba ◦ Liwei Zhou IRD SWG – Gillian Air, OMRF – Carol Cardona, Univ. Minnesota – Adolfo Garcia-Sastre, Mt Sinai – Elodie Ghedin, Univ. Pittsburgh – Martha Nelson, Fogarty – Daniel Perez, Univ. Maryland – Gavin Smith, Duke Singapore – David Spiro, JCVI – Dave Stallknecht, Univ. Georgia – David Topham, Rochester – Richard Webby, St Jude USDA – David Suarez Sage Analytica – Robert Taylor – Lone Simonsen CEIRS CentersAcknowledgments N01AI40041


Download ppt "Evolution as a Confounding Factor in Genetic Association Studies 14 December 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern."

Similar presentations


Ads by Google