Laboratory of Molecular Pathology Retreat - 10 MAR 2011

Slides:



Advertisements
Similar presentations
HLA: matching and donor selection
Advertisements

John R. LaMontagne Memorial Symposium on Pandemic Influenza Research Working Group 8 Virus Transmission: Understanding and Predicting Pandemic Risk.
The pandemic and a brief ABC of influenza Thomas Abraham JMSC 6090.
Virus Pathogen Resource (ViPR) 26 September 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern Medical Center.
Centers of Excellence for Influenza Research and Surveillance 6 th Annual Meeting Aug 1, 2012 Status of IRD Development.
MHC Polymorphism Ole Lund. Objectives What is HLA polymorphism? What is it good for? How does it make life difficult for vaccine design? Definition of.
Influenza Sara Finestone April 8, The influenza virus causes 3-5 million cases of severe illness and up to 500,000 deaths annually.
Bioinformatics Resource Centers Influenza Research Database (IRD) Virus Pathogen Database and Analysis Resource (ViPR) 8 December 2010 Richard.
Influenza A Virus Pandemic Prediction and Simulation Through the Modeling of Reassortment Matthew Ingham Integrated Sciences Program University of British.
Integrated Bioinformatics Data and Analysis Tools for Herpesviridae Viruses in the Virus Pathogen Resource (ViPR) Yun Zhang 1, Brett Pickett 1, Eva Sadat.
Epidemic Vs Pandemic 8.L.1.2.
EPIDEMIOLOGY AND PREVENTION OF INFLUENZA. Introduction Unique epidemiology: – Seasonal attack rates of 10% to 30% – Global epidemics Influenza viruses.
EPIDEMIOLOGY AND PREVENTION OF INFLUENZA. Introduction Unique epidemiology: – Seasonal attack rates of 10% to 30% – Global pandemics Influenza viruses.
Evolution as a Confounding Factor in Genetic Association Studies 14 December 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern.
Lecture 22 Autoimmunity.
Institute of Immunology, ZJU
Laboratory Training for Field Epidemiologists Typing May 2007 Sequencing and Phylogeny.
The evolution of infectious disease. Influenza We generally think of the flu as nothing more than a minor annoyance In an average year, however, the flu.
Sequence Feature Variant Type and Evolutionary Trajectory Analysis using the Influenza Research Database (IRD) 19 July 2011 Richard H. Scheuermann,
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
Pandemic Influenza; A Harbinger of Things to Come Michael T Osterholm PhD, MPH Director, Center for Infectious Disease Research and Policy Associate Director,
Learning from the 2009 H1N1 Pandemic Response 1 Daniel S. Miller MD, MPH Director, International Influenza Unit Office of the Secretary Office of Global.
Data Mining in the Influenza Research Database (IRD) and the Virus Pathogen Resource (ViPR) JCVI-GSCID/NIAID Workshop University of Limpopo 01 June 2011.
Comparative Genomics in the Influenza Research Database 17 June 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern.
4th Year MPharm SRP: Introduction to Pseudotype Viruses
Sequence Variation Identification and Functional/Structural Inference in the Influenza Research Database (IRD) and Virus Pathogen Resource (ViPR) Yun Zhang.
1 Workshop on Infectious Disease Ontology Influenza Informatics in the BioHealthBase Bioinformatics Resource Center Richard H. Scheuermann, Ph.D. Department.
HLA Allelic and Genotypic Ambiguity Reduction in ImmPort
Richard H. Scheuermann, Ph.D. Department of Pathology, UT Southwestern March 30, 2011 Virus Bioinformatics Resource Centers – ViPR & IRD.
Influenza Research Database (IRD) 26 September 2011 Richard H. Scheuermann, Ph.D. Department of Pathology U.T. Southwestern Medical Center.
FCM Data Management and Analysis in ImmPort Richard H. Scheuermann, Ph.D. Department of Pathology and Division of Biomedical Informatics U.T. Southwestern.
April 25, 2009 Mexico Shuts Some Schools Amid Deadly Flu Outbreak Mexico’s flu season is usually over by now, but health officials have noticed a significant.
BioHealthBase: The Bioinformatics Resource Center for Francisella tularensis Shubhada Godbole 1, Stephen M. Beckstrom-Sternberg 2,3, Paul S. Keim 2,3,
Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,
BioHealthBase: A Web-based Database and Analysis Resource for Francisella Shubhada Godbole 1, Jyothi Noronha 1, Burke Squires 1, Victoria Hunt 1, Ed Klem.
Evolution of influenza A Rachel Albert Craig Bland Evolution of influenza A.
THE QUESTION: SHOULD I GET A FLU SHOT EACH YEAR?.
Yun Zhang J. Craig Venter Institute San Diego, CA, USA August 4, 2012 Integrated Bioinformatics Data and Analysis Tools for Herpesviridae.
Statistical Tool for Identifying Sequence Variations that Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) Brett E.
P ANDEMICS T HROUGHOUT H ISTORY. A pandemic is defined as an unusually high outbreak of a new infectious disease that is spreading through the human population.
BIG Data: Knowledge for Improving Vaccine Virus Selection Richard H. Scheuermann, Ph.D. Director of Informatics JCVI.
A Systems Approach to Infectious Disease Research: Influenza Develop a molecular network model of the interaction between influenza virus and the innate.
REASSORTMENT OF INFLUENZA VIRUS
Integration of Host Factor Data into the Virus Pathogen Database and Analysis Resource (ViPR) and the Influenza Research Database (IRD) Brett E. Pickett.
Virion Structure and Organization
The Informatics Crystal Ball: Mining the Past to Predict the Species Jump Event 19 April 2011 Richard H. Scheuermann, Ph.D. Department of.
Autoimmunity and Type I Diabetes CCMD 793A: Fundamental Integrated SystemsFALL, 2006 James M. Sheil, Ph.D.
Richard H. Scheuermann, Ph.D. November 5, 2012 Support for Systems Biology Data in IRD/ViPR.
It’s Just Not the Flu Anymore Rick Hong, MD Associate Chairman CCHS EMC Medical Director, PHPS.
The New Influenza A/H1N1 Isabelle Thomas May 28-29, 2009 Brussels,
Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Influenza Ontology Infectious Disease Ontology Workshop 2008 Burke Squires.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
P ANDEMICS T HROUGHOUT H ISTORY. A pandemic is defined as an unusually high outbreak of a new infectious disease that is spreading through the human population.
“Neutralizing Antibodies Derived from the B Cells of 1918 Influenza Pandemic Survivors” (Yu et. al) Daniel Greenberg.
Human survivorship Developed Developing Bob May (2007), TREE 22:
MHRP  The views expressed are those of the authors and should not be construed to represent the positions of the U.S. Army or the Department of Defense.
Impact of immune-driven sequence variation in HIV- 1 subtype C Gag-protease on viral fitness and clinical outcome Thumbi Ndung’u, BVM, PhD HIV Pathogenesis.
SCANNING OF CANDIDATE GENES FOR THE SUSCEPTIBILITY OF KAWASAKI DISEASE IN THE HLA REGION Lee JK, Kim JJ, Kim S, Choi IH, Kim KJ, Hong SJ, Seo EJ, Yoo HW,
I Introduction to influenza Department of Health 2016 Vaccination Campaign Training workshop Presentation developed by the National Institute for Communicable.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Specific Binding Characteristics of HLA Alleles Associated with Nevirapine Hypersensitivity Rebecca Pavlos, PhD The Institute for Immunology & Infectious.
Influenza pandemic: FluWorkLoss: Software to estimate work days lost
Establishment of Influenza Surveillance System in Liberia
Viruses Small but deadly!.
وبائية أنفلونزا الطيور والإجراءات المتخذة لمواجهة الوباء العالمي
Autoimmune diseases Ali Al Khader, M.D. Faculty of Medicine
HIV-1 Vpu Mediates HLA-C Downregulation
Presentation transcript:

Laboratory of Molecular Pathology Retreat - 10 MAR 2011 Sequence Feature Variant Type (SFVT) Method: HLA Associations with Systemic Sclerosis Genetic Determinants of Influenza Virus Host Range Restriction Y. Megan Kong, Nishanth Marthandan, Paula Guidry, Jyothi Noronha, R. Burke Squires, Elizabeth McClellan, Mengya Liu, Yu Qian, David Dougall, Jie Huang, Diane Xiang, Brett Pickett, Victoria Hunt, Young Kim, Jeff Wiser, Thomas Smith, Jonathan Dietrich, Edward Klem, Lindsay Cowell, Nancy Monson, David Karp, Richard H. Scheuermann Laboratory of Molecular Pathology Retreat - 10 MAR 2011

Abstracts & Posters – Immunology HLA Research Data, Reference Data, Visualization Tools and Analysis Tools in ImmPort Paula A. Guidry, Nishanth Marthandan, Thomas Smith, Patrick Dunn, Steven J. Mack, Glenys Thomson, Jeffrey Wiser, David R. Karp, Richard H. Scheuermann Creating a Cell Detail Page for Hematopoietic Cells in ImmPort David S. Dougall, Shai Shen-Orr, John Campbell, Yue Liu, Patrick Dunn, Y. Megan Kong, Mark M. Davis, Richard H. Scheuermann Minimum Information about a Genotyping Experiment Jie Huang, Nishanth Marthandan, Alexander Pertsemlidis, LiangHao Ding, Julia Kozlitina, Joseph Maher, Nancy Olsen, Jonathan Rios, Michael Story, Chao Xing, Richard H. Scheuermann Translational Research in ImmPort Y. Megan Kong, Carl Dalke, Diane Xiang, Max Y. Qian, David Dougall, David Karp, Richard H. Scheuermann Potential of a Unique Antibody Gene Signature to Predict Conversion to Clinically Definite Multiple Sclerosis A.J. Ligocki, L. Lovato, D. Xiang, P. Guidry, R.H. Scheuermann, S.N. Willis, S. Almendinger, M.K. Racke, E.M. Frohman, D.A. Hafler, K.C. O'Connor, N.L. Monson Analysis of DRB1 Sequence Feature Variant Type Associations with Systemic Sclerosis Autoantibodies Types and Racial Groups Nishanth Marthandan, Paula Guidry, Glenys Thomson, Frank Arnett, David R. Karp, Richard H. Scheuermann An automated analysis and visualization pipeline for identification and comparison of cell populations in high-dimensional flow cytometry data Yu Qian, David Dougall, Megan Kong, Paula Guidry, and Richard H. Scheuermann

Abstracts & Posters – Infectious Diseases Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data & Analysis Victoria Hunt, R. Burke Squires, Jyothi Noronha, Ed Klem, Jon Dietrich, Chris Larsen, Richard H. Scheuermann Tool for Identifying Sequence Variations that Correlate with Virus Phenotypic Characteristics Brett Pickett, Prabakaran Ponraj, Victoria Hunt, Mengya Liu, Liwei Zhou, Sanjeev Kumar, Jonathan Dietrich, Sam Zaremba, Chris Larson, Edward B. Klem, Richard H. Scheuermann Conserved Epitope Regions (CER): Elucidation of Evolutionarily Stable, Immunologically Reactive Regions of Human H1N1 Influenza Viruses R. Burke Squires, Brett Pickett, Jyothi Noronha, Victoria Hunt, Richard H. Scheuermann Influenza NS1-dependent Host Range Restriction Demonstrated By Sequence Feature Variant Type Analysis Jyothi M. Noronha, R. Burke Squires, Mengya Liu, Victoria Hunt, Brett Pickett and Richard H. Scheuermann

MHC-mediated antigen presentation

HLA allele counts IMGT HLA – March 2011 HLA-A HLA-B HLA-C 1519 (1119) 2069 (1601) 1016 (750) HLA-DRB HLA-DQA1 HLA-DQB1 HLA-DPA1 HLA-DPB1 966 (738) 35 (26) 144 (103) 28 (16) 145 (127) MICA MICB TAP 73 (60) 31 (20) 11 (9) Figures in parenthesis indicate the number of unique proteins encoded by the various alleles at each locus. 1634 new alleles were described in 2010 alone.

HLA and autoimmune disease HLA Allele Relative Risk Ankylosing spondylitis B27 87.4 Postgonococcal arthritis 14.0 Acute anterior uveitis 14.6 Rheumatoid arthritis DR4 5.8 Chronic active hepatitis DR3 13.9 Sjogren syndrome 9.7 Insulin-dependent diabetes DR3/DR4 14.3 21-Hydroxylase deficiency BW47 15.0 Robbins Pathologic Basis of Disease 6th Edition (1999)

HLA and infectious disease Correlation between HLA genotype and HIV viral burden and progression to AIDS M Dean, M Carrington and SJ O'Brien Annual Review of Genomics and Human Genetics Vol. 3: 263-292 (2002) Figure4 Associations of HLA class 1 types with progression to AIDS. Relative hazard (RH) values were determined for each of the 54 HLA types used. Bars represent 95% confidence intervals (CI) for each RH. Class I types with CI that cross the line representing a RH of 1 are not significant, whereas all others are significant at p < 0.05. All B*35 alleles combined are significantly associated with progression to AIDS, as is B*53. However, B*35PY, a subset of the B*35 group of alleles, is not significant, whereas the B*35Px subgroup is strongly associated with AIDS progression.

HLA and adverse drug reaction HLA allele Drug sensitivity Association Prevalence B*1502 cabamazepine (epilepsy) p = 3 x 10-27 high Chinese absent Caucasians B*5701 abacavir (HIV) p = 5 x 10-20 high Caucasians absent Africans, Hispanics B*5801 allopurinol (gout) p = 5 x 10-24 P. Parham

HLA Allele Nomenclature Locus Asterisk Allele family (serological where possible) Amino acid difference Non-coding (silent) polymorphism Intron, 3’ or 5’ polymorphism N = null L = low S = Sec. A = Abr. Q = Quest. The hierarchical relationships between alleles that are implied in the allele nomenclature only partially captures the complex sequence relationships between HLA alleles. For example, although all subtypes of the A*24 family share some level of sequence similarity, the relationships between subtypes are not further captured in the four digit nomenclature.

DRB1 phylogeny DRB1*15 DRB1*16 DRB1*04 DRB1*10 DRB1*09 DRB1*07

DRB1 phylogeny DRB1*13 DRB1*13 DRB1*13 DRB1*13 DRB1*13 DRB1*13

DRB1 phylogeny DRB1*15 DRB1*16 DRB1*04 DRB1*10 DRB1*09 DRB1*07

DRB1 alignment 07/15 07/09 09/15

HLA–mediated disease predisposition Hypothesis: While the allelic/haplotypic structures reflect evolutionary history of the locus, it is the focused regions in the HLA genes/proteins that effect gene expression, protein structure and/or protein function that are responsible for enhanced disease risk Focused regions in the HLA genes/proteins that effect gene expression and/or protein function are responsible for enhanced disease risk

Summary of SFVT approach Define individual sequence features (SF) in HLA proteins (genes) Determine the extent of polymorphism for each sequence feature by defining the observed variant types (VT) Re-annotate HLA typing information with complete list of VT for each SF Examine the association between every sequence feature variant type and disease or other phenotype

Representative Sequence Features

A*0201 - ‘peptide binding’ SF

A*0201 - ‘peptide binding pocket B’ SF

A*0201 - ‘CD8 binding’ & ‘TCR binding’ SF

Summary of SFs defined 1775 total

Variant Types for Hsa_HLA-DRB1_beta-strand 2_peptide antigen binding

Representative Sequence Features Variant Types

HLA SFVT Association with Systemic Sclerosis Summary of data set Systemic sclerosis (SSc, scleroderma) is a chronic condition characterized by altered immune reactivity, thickened skin, endothelial dysfunction, interstitial fibrosis, gangrene, pulmonary hypertension, gastrointestinal tract dysmotility, and renal arteriolar dysfunction. A large cohort of ~1300 SSc patients and ~1000 healthy controls has been assembled by Drs. Frank C. Arnett, John Reveille and colleagues at the University of Texas Health Science Center at Houston. Information on autoantibody reactivity for over 15 nuclear antigens is available. 4-digit typing has been done for DRB1, DQA1, and DQB1 in all individuals. Initial re-annotation of 4 digit DRB1 typing data DRB1*1104 => SF1_VT43; SF2_VT4; SF3_VT12 ……… Statistical analysis Split data set into two - pseudo-replicates 2 x n contingency table for every SF (286), where n = number of VT Chi-squared or Fisher’s Exact Test analysis Select SF with adjusted p-value <0.01 (83/286) 2 x 2 contingency table (type vs non-type) for every VT (418 total) Merge results of pseudo-replicates

DRB1*0101 Visualization

Composite SF- Risk and Protective Variants

DRB1*0101 Visualization protective risk 67F 70D 71R 86V 26F 37Y 30Y 86G 37F 30L 28E protective risk

Publication

Limitations to initial study Did not take into account difference in allele frequency distributions in different racial populations Treated SSc as a single disease limited cutaneous involvement associated with pulmonary hypertension; 60-70% are anti-centromere positive diffuse cutaneous involvement associated with more interstitial lung disease and kidney involvement; 30% are anti-topo positive the two antibodies tend to be mutually exclusive

Auto-antibody SFVT associations Separated SSc participants based on presence of anti-topoisomerase or anti-centromere auto-antibody (cases only) 231 anti-topoisomerase 318 anti-centromere 3 both 752 neither SSc with anti-topo vs SSc without anti-topo SSc with anti-cent vs SSc without anti-cent

Overlap of top 100 SFVTs 72 28 75 Anti-centromere SFVTs Anti-topoisomerase SFVTs Purpose of or the points made by this slide: Comparison of SFVT analysis results of SSc patients based on presence/absence of different auto-antibodies: Anti-centromere vs Anti-topoisomerase To introduce the data considered (the top 100 ranked SFVTs) in comparing the significant SFVTs between the anti-centromere and anti-topoisomerase SFVT results (please note that the number of SFVTs are different because of the tied rankings) Caveats: The counts of SFVTs have not taken into account the duplicity of SFVTs (due to the sharing of exactly same variant positions in the dataset). But when I cursorily looked, it doesn’t seem to change the distribution as the duplicity will lower the counts across the board for both the groups. (I am working on getting those numbers but was taking long. But I can get that to you by today if necessary)

28 common SFVTs 10 18 18 10 Anti-centromere SFVTs Anti-topoisomerase SFVTs Risky Protective Purpose of or the points made by this slide: Distribution of the 28 overlapping SFVTs between the anti-centromere and anti-topoisomerase data with respect to risky and protective SFVTs Risky SFVTs defined by odds ratio > 1.0 Protective SFVTs defined by odds ratio < 1.0 To show that the significant SFVTs overlap across the two dataset but the overlapping SFVTs do not share any SFVTs in the same direction of risk or protection indicating that those SFVTs are all in opposite direction (i.e. risky in one while protective on the other) 18 10 10 18 Anti-centromere Anti-topoisomerase Anti-centromere Anti-topoisomerase

39 40 21 18 22 2 10 30 12 40 Risky vs Risky Risky vs Protective 40 21 18 22 Anti-centromere risky SFVTs Anti-topoisomerase risky SFVTs Anti-centromere risky SFVTs Anti-topoisomerase risky SFVTs Protective vs Risky Protective vs Protective Purpose of or the points made by this slide: Comparison of risky and protective SFVTs in the overlapping Sequence features This again illustrates the overlap of the SFVTs between the datasets and the direction of risk and protection shared by those SFVTs between the two auto-antibody dataset This also shows that the though there is significant overlap of the SFs as shown in slide 7 and slide 8 (divided by risky and protective SFVTs), there are different SFVTs belonging to the same SFs that are significantly associated with risk (and protection) between the different autoantibody SFVT results 2 10 30 12 40 Anti-centromere protective SFVTs Anti-topoisomerase risky SFVTs Anti-centromere protective SFVTs Anti-topoisomerase protective SFVTs

Anti-centr 9W_28E_30C_47Y_67L Anti-topo 9E_28D_30Y_47F_67F Table 7. Some of the SFVTs significantly associated with the presence of anti-centromere autoantibody Sequence Feature Variant Type (SFVT) Variant Type Definition Odds ratio No. of case alleles No. of control alleles Corrected p-value DRB1*0101 2.96 100 116 4.55 e-13 DRB1*0401 2.09 62 96 4.91 e-04 DRB1*0801 2.64 30 36 3.29 e-04 Hsa_HLA-DRB1_SF163_VT1 67L_70Q_71R 2.52 194 296 1.28 e-17 Hsa_HLA-DRB1_SF137_VT1 28E_30C_47Y_61W_67L_71R 120 145 6.76 e-16 Hsa_HLA-DRB1_SF142_VT1 9W_56P_57D_60Y_61W_67L 2.87 124 155 1.10 e-15 Hsa_HLA-DRB1_SF130_VT1 60Y_67L_70Q_71R_77T_78Y_81H_82N_85V 2.44 174 267 4.38 e-15 Hsa_HLA-DRB1_SF98_VT1 67L 2.06 343 727 1.16 e-14 Anti-centr 9W_28E_30C_47Y_67L Anti-topo 9E_28D_30Y_47F_67F Table 8. Some of the SFVTs significantly associated with presence of anti-topoisomerase autoantibody Sequence Feature Variant Type (SFVT) Variant Type Definition Odds ratio No. of case alleles No. of control alleles Corrected p-value DRB1*1501 1.78 65 180 8.17 e-03 DRB1*1104 3.70 74 105 1.67 e-06 Hsa_HLA-DRB1_SF163_VT8 67F_70D_71R 2.22 149 375 1.72 e-11 Hsa_HLA-DRB1_SF137_VT25 28D_30Y_47F_61W_67F_71R 2.71 208 2.56 e-13 Hsa_HLA-DRB1_SF142_VT11 9E_56P_57D_60Y_61W_67F 2.72 137 285 1.85 e-16 Hsa_HLA-DRB1_SF130_VT15 60Y_67F_70D_71R_77T_78Y_81H_82N_85V 2.26 370 9.93 e-12 Hsa_HLA-DRB1_SF98_VT3 67F 2.11 156 413 3.72 e-11 Purpose of or the points made by this slide: To show examples of some of the alleles and the SFVTs that are significant with risk The p-values of the SFVTs are much lower than that of the alleles Hsa_HLA-DRB1_SF137_VT25 (all SSc) 1.85 1.38 e-07

ImmPort HLA SFVT Workflow Table of subject vs. HLA 4-digit typing data Table of subject vs. SFVT feature vector Table of p-values, adj. p-values, odds ratio, confidence intervals CD8 Binding TCR Binding

Summary SFVT Approach Systemic Sclerosis Analysis Proposed a novel approach for HLA disease associations based on sequence feature variant type analysis (SFVT) Defined structural and functional protein sequence features (SF) for all classical human MHC class I and II proteins Determined variant types (VT) for all SF in known alleles Available in ImmPort www.immport.org, IMGT-HLA and dbMHC Systemic Sclerosis Analysis Based on the SFVT approach, identified a region of the HLA-DRB1 protein centered around peptide-binding pocket 7 that appears to be associated with disease risk Sequences found in HLA-DRB1*1104 at positions 28, 30, 37, 67 and 86, especially with aromatic amino acids, were associated with increase disease risk Sequences found in this region of HLA-DRB1*0302 appear to be protective Different alleles are associated with altered risk in different racial/ethnic populations, but they share common SFVTs SFVTs associated with risk of developing SSc are different in patients with anti-topo versus anti-cent antibodies, supporting the idea that these are distinct disease However, the risk-associated SFVTs are from the same SFs suggesting a common mechanism of disease pathogenesis

Public Health Impact of Influenza Seasonal flu epidemics occur yearly during the fall/ winter months and result in 3-5 million cases of severe illness worldwide. More than 200,000 people are hospitalized each year with seasonal flu-related complications in the U.S. Approximately 36,000 deaths occur due to seasonal flu each year in the U.S. Populations at highest risk are children under age 2, adults age 65 and older, and groups with other comorbidities. Source: World Health Organization - http://www.who.int/mediacentre/factsheets/fs211/en/index.html Source: World Health Organization - http://www.who.int/mediacentre/factsheets/fs211/en/index.html

Flu pandemics of the 20th and 21st centuries 1918 flu pandemic (Spanish flu) H1N1 subtype The most severe pandemic Estimated to claim 2.5% - 5% of world’s population (20 – 100 million deaths) Asian flu (1957 – 1958) H2N2 subtype 1 – 1.5 million deaths Hong Kong flu (1968 – 1969) H3N2 subtype 750,000 - 1 million deaths 2009 pandemic H1N1 >16,000 deaths as of March 2010

Influenza Virus Orthomyxoviridae family Negative-strand RNA Segmented Enveloped 8 RNA segments encode 11 proteins Classified based on serology of HA and NA

Influenza A_NS1_nuclear-export-signal_137(10) SFVT approach Influenza A_NS1_alpha-helix_171(17) VT-1 I F D R L E T L I L VT-2 I F N R L E T L I L VT-3 I F D R L E T I V L VT-4 L F D Q L E T L V S VT-5 I F D R L E N L T L VT-6 I F N R L E A L I L VT-7 I Y D R L E T L I L VT-8 I F D R L E T L V L VT-9 I F D R L E N I V L VT-10 I F E R L E T L I L VT-11 L F D Q M E T L V S Influenza A NS1 protein (PDB 2GX9) crystal structure showing Nuclear Export Signal “Sequence Feature” (SF) highlighted in Red Alpha-helix SF highlighted in green Amino acid alignment with colors showing variation within nuclear export signal region Each sequence with 1+ substitutions comprises a unique fingerprint or “Variant Type” (VT) A set of unique sequence substitutions existing within any defined region is a sequence feature variant type (SFVT) Statistical analyses on SFVTs can identify genotype-phenotype relationships Identify regions of protein/gene with known structural or functional properties – Sequence Features (SF) an alpha-helical region, the binding site for another protein, an enzyme active site, an immune epitope Determine the extent of sequence variation for each SF by defining each unique sequence as a Variant Type (VT) High-level, comprehensive grouping of all virus strains by VT membership for each SF independently Genotype-phenotype association statistical analysis (virulence, pathogenesis, host range, immune evasion, drug resistance)

Influenza A Sequence Features as of January 2011 Protein Subtype Functional Structural Immune Epitopes Total Count PB2 - 7 10 564 585 PB1-F2 2 6 PB1 5 733 744 PA 1 29 534 565 NS2 3 78 83 NS1 21 15 458 494 NP 25 472 512 NA N1 26 113 153 N2 9 59 106 180 M2 4 96 116 M1 12 14 286 312 HA H1 37 335 376 H2 20 34 H3 390 481 H5 40 65 H7 Total 97 319 4227 4709

NS1 Sequence Features

VT for SF8 (nuclear export signal)

VT-1 strains

DO VARIATIONS IN NS1 SEQUENCE FEATURES INFLUENCE INFLUENZA VIRUS HOST RANGE?

VT for SF8 (nuclear export signal)

Causes of apparent NS1 VT-associated host range restriction Virus spread = capability + opportunity Phenotypic property of the virus – limited capacity Restricted founder effect – limited opportunity Restricted spatial-temporal distribution Sampling bias – assumption of random sampling Oversampling – avian H5N1 in Asia; 2009 H1N1 Undersampling – large and domestic cats Linkage to causative variant

VT-10 strains

VT for SF8 (nuclear export signal)

VT lineages

VT-10 lineage

VT-4 lineage

VT-4 strains

VT-4 lineage = B allele/group

VT-15 & VT-8 lineages

VT-5 strains

Summary Compiling list of all known influenza protein sequence features (SFs) in IRD Observed dramatic skewing in NS1 SFVT host distributions In some cases, attributable to sampling biases VT-1 and Avian H5N1 due to Asian sampling in mid-2000's VT-2 and human due to 2009 pandemic H1N1 VT-11 and Other (Environment) in Delaware Bay Performing multivariate statistical analysis to control for confounding variables In other cases, attributable to founder effects VT-13 and -14 and Viet Nam 2003 However, in other cases these explanations do not appears to be consistent with the data, suggesting that these may indeed be NS1-mediated host range restrictions Equine VT-10 lineage Avian VT-4 lineage (B allele/group) Human VT-8 lineage Human VT-15 lineage Nuclear export vs linkage disequilibrium?

HLA SFVT Acknowledgements BISC ImmPort Team David Karp (UTSW) Nishanth Marthandan (UTSW) Paula Guidry (UTSW) Frank C. Arnett (UTH) John Reveille (UTH) Chul Ahn (UTSW) Glenys Thompson (Berkeley) Tom Smith (NG) Jeff Wiser (NG) DAIT HLA Working Group David DeLuca (Hannover) Raymond Dunivin (NCBI) Michael Feolo (NCBI) Wolfgang Helmberg (Graz) Steven G. E. Marsh (ANRI) David Parrish (ITN) Bjoern Peters (LIAI) Effie Petersdorf (FHCRC) Matthew J. Waller (ANRI) Sequence Ontology WG Michael Ashburner (Cambridge) Lindsay Cowell (UTSW) Alexander D. Diehl (Buffalo) Karen Eilbeck (Utah) Suzanna Lewis (LBNL) Chris Mungall (LBNL) Darren A. Natale (Georgetown) Barry Smith (Buffalo) With support from NIAID N01AI40076

Influenza SFVT Acknowledgments U.T. Southwestern Richard Scheuermann Burke Squires Jyothi Noronha Mengya Liu Victoria Hunt Shubhada Godbole Brett Pickett Ayman Al-Rawashdeh MSSM Adolfo Garcia-Sastre Eric Bortz Gina Conenello Peter Palese Vecna Chris Larsen Al Ramsey LANL Catherine Macken Mira Dimitrijevic U.C. Davis Nicole Baumgarth Northrop Grumman Ed Klem Mike Atassi Kevin Biersack Jon Dietrich Wenjie Hua Wei Jen Sanjeev Kumar Xiaomei Li Zaigang Liu Jason Lucas Michelle Lu Bruce Quesenberry Barbara Rotchford Hongbo Su Bryan Walters Jianjun Wang Sam Zaremba Liwei Zhou IRD SWG Gillian Air, OMRF Carol Cardona, Univ. Minnesota Adolfo Garcia-Sastre, Mt Sinai Elodie Ghedin, Univ. Pittsburgh Martha Nelson, Fogarty Daniel Perez, Univ. Maryland Gavin Smith, Duke Singapore David Spiro, JCVI Dave Stallknecht, Univ. Georgia David Topham, Rochester Richard Webby, St Jude SFVT experts Toru Takimoto, Rochester Summer Galloway, Emory Robert Lamb, Northwestern Benjamin Hale, Mt. Sinai USDA David Suarez Sage Analytica Robert Taylor Lone Simonsen CEIRS Centers