Evolution as a Confounding Factor in Genetic Association Studies 14 December 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern.

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

HLA: matching and donor selection
John R. LaMontagne Memorial Symposium on Pandemic Influenza Research Working Group 8 Virus Transmission: Understanding and Predicting Pandemic Risk.
Office of Infectious Diseases Computational Challenges for Infectious Diseases Michael Shaw, PhD OID/Office of the Director.
Virus Pathogen Resource (ViPR) 26 September 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern Medical Center.
Centers of Excellence for Influenza Research and Surveillance 6 th Annual Meeting Aug 1, 2012 Status of IRD Development.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Introduction to Bioinformatics Richard H. Scheuermann, Ph.D. Director of Informatics JCVI.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Bioinformatics Resource Centers Influenza Research Database (IRD) Virus Pathogen Database and Analysis Resource (ViPR) 8 December 2010 Richard.
Informatics Support for Vaccine Projects Using and extending the UCSC bioinformatics infrastructure.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Influenza A Virus Pandemic Prediction and Simulation Through the Modeling of Reassortment Matthew Ingham Integrated Sciences Program University of British.
The role of cross-immunity and vaccines on the survival of less fit flu-strains Miriam Nuño Harvard School of Public Health Gerardo Chowell Los Alamos.
Integrated Bioinformatics Data and Analysis Tools for Herpesviridae Viruses in the Virus Pathogen Resource (ViPR) Yun Zhang 1, Brett Pickett 1, Eva Sadat.
Introduction to Molecular Epidemiology Jan Dorman, PhD University of Pittsburgh School of Nursing
Lecture 22 Autoimmunity.
Databases and tools to study the genomes of hundreds of pathogens, plants, and mammals Richard H. Scheuermann, Ph.D. Director of Informatics J. Craig Venter.
Sequence Feature Variant Type and Evolutionary Trajectory Analysis using the Influenza Research Database (IRD) 19 July 2011 Richard H. Scheuermann,
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
Laboratory of Molecular Pathology Retreat - 10 MAR 2011
Pandemic Influenza; A Harbinger of Things to Come Michael T Osterholm PhD, MPH Director, Center for Infectious Disease Research and Policy Associate Director,
Data Mining in the Influenza Research Database (IRD) and the Virus Pathogen Resource (ViPR) JCVI-GSCID/NIAID Workshop University of Limpopo 01 June 2011.
Comparative Genomics in the Influenza Research Database 17 June 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern.
Sequence Variation Identification and Functional/Structural Inference in the Influenza Research Database (IRD) and Virus Pathogen Resource (ViPR) Yun Zhang.
1 Workshop on Infectious Disease Ontology Influenza Informatics in the BioHealthBase Bioinformatics Resource Center Richard H. Scheuermann, Ph.D. Department.
HLA Allelic and Genotypic Ambiguity Reduction in ImmPort
Richard H. Scheuermann, Ph.D. Department of Pathology, UT Southwestern March 30, 2011 Virus Bioinformatics Resource Centers – ViPR & IRD.
Influenza Research Database (IRD) 26 September 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern Medical Center.
BioHealthBase: The Bioinformatics Resource Center for Francisella tularensis Shubhada Godbole 1, Stephen M. Beckstrom-Sternberg 2,3, Paul S. Keim 2,3,
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 8 – Comparing Proportions Marshall University Genomics.
Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,
BioHealthBase: A Web-based Database and Analysis Resource for Francisella Shubhada Godbole 1, Jyothi Noronha 1, Burke Squires 1, Victoria Hunt 1, Ed Klem.
Using Comparative Genomics to Explore the Genetic Code of Influenza Sangeeta Venkatachalam.
Next-Generation Sequencing
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Evolution of influenza A Rachel Albert Craig Bland Evolution of influenza A.
THE QUESTION: SHOULD I GET A FLU SHOT EACH YEAR?.
What’s up with the flu? Novel H1N1? SWINE FLU??? Mexican flu? swine-origin influenza A? A(H1N1)? S-OIV? North American flu? California flu? Schweingrippe.
Yun Zhang J. Craig Venter Institute San Diego, CA, USA August 4, 2012 Integrated Bioinformatics Data and Analysis Tools for Herpesviridae.
Statistical Tool for Identifying Sequence Variations that Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) Brett E.
Next-Generation Sequencing Eric Jorgenson Epidemiology 217 2/28/12.
P ANDEMICS T HROUGHOUT H ISTORY. A pandemic is defined as an unusually high outbreak of a new infectious disease that is spreading through the human population.
BIG Data: Knowledge for Improving Vaccine Virus Selection Richard H. Scheuermann, Ph.D. Director of Informatics JCVI.
Communicable Disease Surveillance and Response, WHO Avian Influenza Credit: WHO Viet Nam.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Integration of Host Factor Data into the Virus Pathogen Database and Analysis Resource (ViPR) and the Influenza Research Database (IRD) Brett E. Pickett.
The Informatics Crystal Ball: Mining the Past to Predict the Species Jump Event 19 April 2011 Richard H. Scheuermann, Ph.D. Department of.
Autoimmunity and Type I Diabetes CCMD 793A: Fundamental Integrated SystemsFALL, 2006 James M. Sheil, Ph.D.
Richard H. Scheuermann, Ph.D. November 5, 2012 Support for Systems Biology Data in IRD/ViPR.
Lecture 11. Topics in Omic Studies (Cancer Genomics, Transcriptomics and Epignomics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational.
Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
ProteinStart position HLAEpitopePositive responses (n) Env209A*0101SFEPIPSHY1 Env310A*0101/Cw*0401GPGPGRAFY1 Gag406A*0302RAPRKKGC WK 1 Nef9A*0101/A*0302SVVGWPAVR1.
Influenza Ontology Infectious Disease Ontology Workshop 2008 Burke Squires.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
MULTIPLE POPULATIONS OF ARTEMISININ-RESISTANT PLASMODIUM FALCIPARUM IN CAMBODIA MIOTTO ET. AL Presented by Josie Benson.
P ANDEMICS T HROUGHOUT H ISTORY. A pandemic is defined as an unusually high outbreak of a new infectious disease that is spreading through the human population.
“Neutralizing Antibodies Derived from the B Cells of 1918 Influenza Pandemic Survivors” (Yu et. al) Daniel Greenberg.
Human survivorship Developed Developing Bob May (2007), TREE 22:
MHRP  The views expressed are those of the authors and should not be construed to represent the positions of the U.S. Army or the Department of Defense.
Impact of immune-driven sequence variation in HIV- 1 subtype C Gag-protease on viral fitness and clinical outcome Thumbi Ndung’u, BVM, PhD HIV Pathogenesis.
SCANNING OF CANDIDATE GENES FOR THE SUSCEPTIBILITY OF KAWASAKI DISEASE IN THE HLA REGION Lee JK, Kim JJ, Kim S, Choi IH, Kim KJ, Hong SJ, Seo EJ, Yoo HW,
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
Specific Binding Characteristics of HLA Alleles Associated with Nevirapine Hypersensitivity Rebecca Pavlos, PhD The Institute for Immunology & Infectious.
Influenza Virologic Surveillance and Vaccine Strain Selection Xiyan Xu MD Deputy Director WHO Collaborating Center for Surveillance, Epidemiology and Control.
Influenza Virus: Evolution in real time
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
ICAR-Directorate of Foot-and-mouth disease, Mukteswar, India
Presentation transcript:

Evolution as a Confounding Factor in Genetic Association Studies 14 December 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern Medical Center

Current projects

Outline Hypothesis that evolution should be considered a confounding factor in genetic association studies HLA-mediated autoimmune disease predisposition analysis using SFVT Identifying genetic determinants of influenza species jump events based on convergent evolution Novel general strategies for formally controlling for evolution as a confounding factor in genetic association studies

EVOLUTION AS A CONFOUNDING FACTOR IN GENETIC ASSOCIATION STUDIES

Population-based genetic association Many diseases exhibit evidence of genetic predispositions Genotype-phenotype association studies ◦ Diagnostic biomarker ◦ Molecular underpinnings of disease pathology GWAS and linkage disequilibrium ◦ Co-inheritance of “linked” genetic markers ◦ Advantage of using SNPs to detect causal variants NGS could obviate the need for using linked SNPs

Statistical assumptions Independence (confounding) Random sampling (bias) ◦ Population has reached equilibrium ◦ Test sample represents a random sampling of the equilibrium population

HLA-MEDIATED AUTOIMMUNE DISEASE PREDISPOSITION ANALYSIS USING SFVT

Class I and II Peptide Sources

9 HLA and autoimmune disease DiseaseHLA AlleleRelative Risk Ankylosing spondylitisB Postgonococcal arthritisB Acute anterior uveitisB Rheumatoid arthritisDR45.8 Chronic active hepatitisDR313.9 Sjogren syndromeDR39.7 Insulin-dependent diabetesDR3/DR Hydroxylase deficiencyBW Robbins Pathologic Basis of Disease 6th Edition (1999)

10 HLA and infectious disease Correlation between HLA genotype and HIV viral burden and progression to AIDS M Dean, M Carrington and SJ O'Brien Annual Review of Genomics and Human Genetics Vol. 3: (2002)

11 HLA and drug sensitivity HLA alleledrug sensitivityassociationprevalence B*1502carbamazepine (epilepsy)p = 3 x high Chinese absent Caucasians B*5701abacavir (HIV)p = 5 x high Caucasians absent in Africans, Hispanics B*5801allopurinol (gout)p = 5 x high Chinese P. Parham

12 HLA-A HLA-BHLA-C 697 (24)1109 (49)381 (9) HLA-DRBHLA-DQA1HLA-DQB1HLA-DPA1HLA-DPB1 690 (20) 3495 (7)27131 MICAMICBTAP Figures in parenthesis indicate the number of serologically defined antigens at each locus. 500 new submission each year. Number of HLA Alleles IMGT HLA - October 2008

HLA Allele Nomenclature 13 HLA - A * Locus Asterisk Allele family (serological where possible) Amino acid difference Non-coding (silent) polymorphism Intron, 3’ or 5’ polymorphism N = null L = low S = Sec. A = Abr. Q = Quest. HLA - A * L

14 DRB1 phylogeny DRB1*07 DRB1*09 DRB1*10 DRB1*04 DRB1*16 DRB1*15

15 DRB1 phylogeny DRB1*13

16 DRB1 phylogeny DRB1*07 DRB1*09 DRB1*10 DRB1*04 DRB1*16 DRB1*15

17 DRB1 alignment 07/1507/0909/15

Limitations with traditional HLA allele-based association studies Treats entire allele as a single unit and therefore includes both causative and passenger variations Doesn’t take into account structural relationships between alleles ◦ Syntax of the HLA nomenclature was designed to capture some of the structural relationships between alleles, but there are several exceptions

19 HLA–mediated disease predisposition Hypothesis: ◦ While the allelic/haplotypic structures reflect evolutionary history of the locus, it is the focused regions in the HLA genes/proteins that affect gene expression, protein structure and/or protein function that are responsible for enhanced disease risk

20 An alternative approach DAIT-Data Interoperability Steering Committee/HLA Working Group members HLA Nomenclature : WHO/ IMGT – HLA/ Anthony Nolan Research Institute NCBI - dbMHC Biomedical ontology people

Summary of SFVT approach Define individual sequence features (SF) in HLA proteins (genes) Determine the extent of polymorphism for each sequence feature by defining the observed variant types (VT) Re-annotate HLA typing information with complete list of VT for each SF Examine the association between every sequence feature variant type and disease or other phenotype 21

Representative Sequence Features

23 A* ‘peptide binding’ SF

A* ‘peptide binding pocket B’ 24

25 A* ‘CD8 binding’ & ‘TCR binding’ SF CD8 Binding TCR Binding

Summary of SFs defined 1775 total

Variant Types for Hsa_HLA-DRB1_beta-strand 2_peptide antigen binding

Representative Sequence Features Variant Types

HLA SFVT Association with Systemic Sclerosis Summary of data set ◦ Systemic sclerosis (SSc, scleroderma) is a chronic condition characterized by altered immune reactivity, thickened skin, endothelial dysfunction, interstitial fibrosis, gangrene, pulmonary hypertension, gastrointestinal tract dysmotility, and renal arteriolar dysfunction. ◦ A large cohort of ~1300 SSc patients and ~1000 healthy controls has been assembled by Drs. Frank C. Arnett, John Reveille and colleagues at the University of Texas Health Science Center at Houston. ◦ Information on autoantibody reactivity for over 15 nuclear antigens is available. ◦ 4-digit typing has been done for DRB1, DQA1, and DQB1 in all individuals. Initial re-annotation of 4 digit DRB1 typing data ◦ DRB1*1104 => SF1_VT43; SF2_VT4; SF3_VT12 ……… Statistical analysis ◦ Split data set into two - pseudo-replicates ◦ 2 x n contingency table for every SF (286), where n = number of VT ◦ Chi-squared or Fisher’s Exact Test analysis ◦ Select SF with adjusted p-value <0.01 (83/286) ◦ 2 x 2 contingency table (type vs non-type) for every VT (418 total) ◦ Merge results of pseudo-replicates 29

DRB1*0101 Visualization

Composite SF - Risk and Protective Variants

DRB1*0101 Visualization 67F 70D 71R 86V 26F 37Y 30Y 28D 67I 70D 71R 86G 26F 37F 30L 28E protectiverisk

Publication

34 ImmPort HLA SFVT Workflow Table of subject vs. HLA 4-digit typing data Table of subject vs. SFVT feature vector Table of p-values, adj. p-values, odds ratio, confidence intervals CD8 Binding TCR Binding

35 Summary SFVT Approach ◦ Proposed a novel approach for HLA disease associations based on sequence feature variant type analysis (SFVT) ◦ Defined structural and functional protein sequence features (SF) for all classical human MHC class I and II proteins ◦ Determined variant types (VT) for all SF in known alleles ◦ Available in ImmPort IMGT-HLA and dbMHCwww.immport.org Systemic Sclerosis Analysis ◦ Based on the SFVT approach, identified a region of the HLA-DRB1 protein centered around peptide-binding pocket 7 that appears to be associated with disease risk ◦ Sequences found in HLA-DRB1*1104 at positions 28, 30, 37, 67 and 86, especially with aromatic amino acids, were associated with increase disease risk ◦ Sequences found in this region of HLA-DRB1*0302 appear to be protective ◦ Different alleles are associated with altered risk in different racial/ethnic populations, but they share common SFVTs ◦ SFVTs associated with risk of developing SSc are different in patients with anti-topo versus anti-cent antibodies, supporting the idea that these are distinct disease ◦ However, the risk-associated SFVTs are from the same SFs suggesting a common mechanism of disease pathogenesis

IRD Overview

Influenza A Sequence Features as of 18JUL SFs total

SF8 (nuclear export signal)

VT for SF8 (nuclear export signal)

VT-1 strains

VT distribution by host

GENETIC DETERMINANTS OF INFLUENZA SPECIES JUMP EVENTS BASED ON CONVERGENT EVOLUTION

Flu pandemics of the 20 th and 21 st centuries initiated by species jump events 1918 flu pandemic (Spanish flu) ◦ subtype H1N1 (avian origin) ◦ estimated to have claimed between 2.5% to 5.0% of the world’s population (20 > 100 million deaths) Asian flu (1957 – 1958) ◦ subtype H2N2 (avian origin) ◦ million deaths Hong Kong flu (1968 – 1969) ◦ subtype H3N2 (avian origin) ◦ between 750,000 and 1 million deaths 2009 H1N1 ◦ subtype H1N1 (swine origin) ◦ ~ 16,000 deaths as of March 2010

Pandemic stages Adaptive drivers

Basic reproductive number (R 0 ) Total number of secondary cases per case Reasonable surrogate of fitness Characteristics of pandemic viruses: ◦ R 0 H >1, and ◦ In genetic neighborhood of viruses with R 0 R>1 and R 0 H<1 Adaptive drivers Pandemic Viruses (R 0 H >1) Pandemic Viruses (R 0 H >1) Stuttering viruses (R 0 R>1 and R 0 H<1) Stuttering viruses (R 0 R>1 and R 0 H<1) Reservoir virus (R 0 R>1 and R 0 H<<1) Reservoir virus (R 0 R>1 and R 0 H<<1) A1A2

Adaptive drivers Pepin KM et al. (2010) “Identifying genetics markers of adaptation for surveillance of viral host jump” Nature Reviews Microbiology 8:

Stuttering transmission and adaptive drivers Stuttering transmission can reveal adaptive drivers by evidence of convergent evolution ◦ Odds of finding the same neutral mutation by chance in multiple species jumps is low ◦ Therefore, finding same mutation in multiple independent species jump events is strong evidence for adaptive driver

Genetic convergence during species jump Virus isolate groups from IRD ◦ Avian H5N1 (PB2) from Southeast Asia* up to 2003 (260 records) – reservoirs of source viruses ◦ Human H5N1 (PB2) from Southeast Asia 2003-present (165 records) – many examples of independent species jumps Align amino acid sequence and calculate conservation score Identify highly conserved positions in avian records (≤1/260 variants) (557positions/759) – functionally restricted in reservoir Select subset in which two or more human isolates contained the same sequence variant – either due to human-human transmission or convergent evolution *China, Hong Kong, Indonesia, Thailand, Viet Nam

Strain Search – PB2 avian H5N1 Southeast Asia up to 2003

260 PB2 records

Sequence variation analysis

Position order

Order by conservation score

My Workbench

Convergent evolution candidates d d d

Surface exposed PB2_A/MEXICO/INDRE4487/2009(H1N1) Conservation score All convergent evolution candidates 586, 591, 627, 629

Convergent evolution candidates

E627K

E627K and species jump

P465S

K660R

Global analysis of human H5 HA clades

Founder effects

Convergent evolution HK2010 Indo2006 VN2008 China2005 HK2010 VN2007 VN2005 VN2004 Indo2006

Summary Human influenza pandemics are initiated by species jump events followed by sustained human to human transmission (R 0 H>1) Multiple independent occurrences of the same mutation during stuttering transmission is evidence of convergent evolution of adaptive drivers – hypotheses for experimental testing Surveillance for adaptive drivers in reservoir species could help anticipate the next pandemic N01AI40041

TOWARD A GENERAL STRATEGY FOR CONTROLLING FOR EVOLUTION AS A CONFOUNDING FACTOR IN GENETIC ASSOCIATION STUDIES

68 HLA SFVT Acknowledgements BISC ImmPort Team David Karp (UTSW) Nishanth Marthandan (UTSW) Paula Guidry (UTSW) Frank C. Arnett (UTH) John Reveille (UTH) Chul Ahn (UTSW) Glenys Thompson (Berkeley) Tom Smith (NG) Jeff Wiser (NG) DAIT HLA Working Group David DeLuca (Hannover) Raymond Dunivin (NCBI) Michael Feolo (NCBI) Wolfgang Helmberg (Graz) Steven G. E. Marsh (ANRI) David Parrish (ITN) Bjoern Peters (LIAI) Effie Petersdorf (FHCRC) Matthew J. Waller (ANRI) Sequence Ontology WG Michael Ashburner (Cambridge) Lindsay Cowell (Duke) Alexander D. Diehl (Jackson) Karen Eilbeck (Utah) Suzanna Lewis (LBNL) Chris Mungall (LBNL) Darren A. Natale (Georgetown) Barry Smith (Buffalo) With support from NIAID N01AI40076

69 U.T. Southwestern – Richard Scheuermann (PI) – Burke Squires – Jyothi Noronha – Victoria Hunt ◦ Shubhada Godbole – Brett Pickett – Yun Zhang – Haizhou Liu MSSM – Adolfo Garcia-Sastre – Eric Bortz – Gina Conenello – Peter Palese Vecna – Chris Larsen – Al Ramsey LANL – Catherine Macken – Mira Dimitrijevic U.C. Davis ◦ Nicole Baumgarth Northrop Grumman ◦ Ed Klem ◦ Mike Atassi ◦ Kevin Biersack ◦ Jon Dietrich ◦ Wenjie Hua ◦ Wei Jen ◦ Sanjeev Kumar ◦ Xiaomei Li ◦ Zaigang Liu ◦ Jason Lucas ◦ Michelle Lu ◦ Bruce Quesenberry ◦ Barbara Rotchford ◦ Hongbo Su ◦ Bryan Walters ◦ Jianjun Wang ◦ Sam Zaremba ◦ Liwei Zhou IRD SWG – Gillian Air, OMRF – Carol Cardona, Univ. Minnesota – Adolfo Garcia-Sastre, Mt Sinai – Elodie Ghedin, Univ. Pittsburgh – Martha Nelson, Fogarty – Daniel Perez, Univ. Maryland – Gavin Smith, Duke Singapore – David Spiro, JCVI – Dave Stallknecht, Univ. Georgia – David Topham, Rochester – Richard Webby, St Jude USDA – David Suarez Sage Analytica – Robert Taylor – Lone Simonsen CEIRS CentersAcknowledgments N01AI40041