Sequence Variation Identification and Functional/Structural Inference in the Influenza Research Database (IRD) and Virus Pathogen Resource (ViPR) Yun Zhang J. Craig Venter Institute June 23, 2014
Challenges for Sequence Analysis Sequence Variation Identification Functional/Structural Inference Influenza virus A_NS1_nuclear-export-signal_137(11) Sequence Feature (SF) Curated from literature, public archives, direct submission 2,747 influenza SFs, 543 Dengue SFs, 301 HCV SFs, 296 Vaccinia SFs Clin Infect Dis. (2013) 57 (4): Manual analysis Data amount Subjective Novel analysis tool Statistical Genotype-phenotype correlation Buried in literature
Sequence Variation Analysis Workflow Search for sequences metadata/BLAST Run statistical analysis: Meta-CATS / SNP Verify results in sequence alignment Determine if positions of interest are located in Sequence Features Visualize positions of interest on protein structure Influenza A_ H3_ experimentally- determined- epitope_156(7)
Use Case: Influenza H7N9 Virus 2013 Influenza virus A H7N9 outbreak – H7 viruses have historically circulated in birds and horses – H7N9 human cases: 1 st human case reported in March 2013, 410 human cases as of April 2014, fatality rate 22% – Sequence variations involved in human adaptation?
Sequence Search – H7N9 HA & Similar Sequences SearchMeta-CATSAlignmentSequence FeaturesProtein Structure 2. HA sequences highly similar to a typical H7N9 human strain 1. H7N9 HA complete sequences
Meta-CATS Analysis – Grouping similar older H7 sequences H7N9 human HA sequences SearchMeta-CATSAlignmentSequence FeaturesProtein Structure H7N9 outbreak HA sequences vs. similar older H7 sequences
Meta-CATS Analysis Results SearchMeta-CATSAlignmentSequence FeaturesProtein Structure
Verify Results on Alignment SearchMeta-CATSAlignmentSequence FeaturesProtein Structure Older H7 avian strains H7N9 human strains 235L/I
Meta-CATS Analysis Results SearchMeta-CATSAlignmentSequence FeaturesProtein Structure
Variant Position Mapped to Sequence Features SearchMeta-CATSAlignmentSequence FeaturesProtein Structure 161 2
Variant Positions Visualized on Protein Structure SearchMeta-CATSAlignmentSequence FeaturesProtein Structure Variant position 235 Ligands 4BSC: H7N9 HA in Complex with 6'-SLN
Summary A novel sequence variation identification and functional/structural inference workflow
Acknowledgments NIAID HHSN C J. Craig Venter Institute Richard Scheuermann (PI) Brian Aevermann, M.S. Douglas Greer, Ph.D. Brett Pickett, Ph.D. Rick Stanton, M.S.E.E Lucy Stewart, MBA Yun Zhang, M.Sc. Vecna Chris Larsen, Ph.D. Al Ramsey, Ph.D. Guangyu Sun, Ph.D. LANL Catherine Macken, Ph.D. Mira Dimitrijevic Southern Methodist Univ Monnie McGee, Ph.D. Mengya Liu, Ph.D. Northrop Grumman Scott Stuart, Program Manager Ed Klem, Ph.D., Project Manager Zhiping Gu, Ph.D. Sherry He Wenjie Hua Wei Jen Sanjeev Kumar Xiaomei Li, Ph.D. Jason Lucas Bruce Quesenberry Barbara Rotchford Tom Smith, Ph.D. Hongbo Su, Ph.D. Bryan Walters Sam Zaremba, Ph.D. Hongtao Zhao, Ph.D. Liwei Zhou, Ph.D. NIAID / DMID Alison Yao, Ph.D., Contracting Officer Representative, Microbial Genomics & Advanced Technologies Andrei Gabrielian, Ph.D., Office of Cyber Infrastructure and Computational Biology Diane Post, Ph.D., CEIRS Project Officer