Centers of Excellence for Influenza Research and Surveillance 6 th Annual Meeting Aug 1, 2012 Status of IRD Development
Session Topics Current CEIRS data in IRD Surveillance Serology Immunology & ImmPort IRD enhancements over past year Search improvements Surveillance data from map Support for serology data 3D movies Phylogenetic tree decoration Metadata-driven comparative genomics analysis Sequence feature submission tool Host factor data Publications Plans for future development
Current CEIRS Data in IRD
CEIRS Surveillance Samples 94% avian 5.8% non-human mam. 0.2% human
Surveillance Sample Stats Avian RecordsAvian % Non-Human Mammalian RecordsMammalian % Total197,20714,098 Tested175,74689%12,97392% Flu-positive10,1365.8%5103.9% Linked to sequence7727.6%112.2% *as of May 1, 2012
Serology Samples Species categorySubmission YearSample Count Avian Avian Human Non Human Mammalian Non Human Mammalian Non Human Mammalian TOTAL2675
Influenza Serology Data
CEIRS Immunology Data in IRD
Introduction to ImmPort Immunology Database and Analysis Portal (ImmPort) – Bioinformatics Integration Support Contract (BISC) Purpose – Warehouse for storing immunology experiment data – Integrate data with analysis and visualization tools – Provide access to research community Projects – Population Genetics Analysis Program – HLA Region Genetics in Immune-mediated Diseases – Modeling Immunity for Biodefense – Others
Additional ImmPort Capabilities Integrate data from multiple resources – OMIM, GO, synonyms, protein-protein interactions, etc. Suite of data analysis and visualization tools – Microarray – Flow Cytometry – Other “-omics” platforms
IRD Enhancements Over Past Year
Sequence Search Page Enhancements
Quick Text Search
Surveillance Data from Map
Spinning 3D Protein Structure Movie
Phylogenetic Tree Decoration Decorate by: – Host species Avian: Avian grouped/separated – Country – Year – HA subtype – NA subtype – HA & NA subtype – Geographic region – Flu season – SFVT Manual decoration
Metadata-driven Comparative Analysis Tool
Sequence Feature Variant Type (SFVT)
Sequence Feature Submission Tool
DMID Systems Biology Program
Host Factor Data
IRD/ViPR Publications 2012
Future Development Plans
User Support and Outreach Data – Evaluate feasibility of supporting Antigenic Cartography – Prepare packages of (correctly-formatted) data to export to external tools Outreach – Perform on-site outreach at CEIRS centers – Continue developing tutorials for existing tools & features
Search Query Capabilities – Ability to search for high-path and/or low-path strains (using sequence biomarkers)
Comparative Genomics Develop PCR Primer design tool (exclude orthologs) Increase SF definitions for: virulence, host specificity, replication, etc. Provide a new tool to assign (or convert between) sequence coordinate schemes
Annotation and Host Factor Data Ensure sequence submissions are appropriately prepared (i.e. no primer sequence, etc.) Increase number of host factor datasets Develop method to handle different statistical methods from various “-omics” platforms (e.g. microarray, proteomics, etc.)
Surveillance Identify NIAID-funded human surveillance studies and solicit deposition into IRD Develop additional use-cases to identify additional helpful data types
Immunology Epitopes – Add search options such as: CD4, CD8, host Serology – Solicit feedback from community on use-cases – Identify volunteers for data submission
PA-X Prediction for All Strains Build on analysis performed earlier this year by Jagger et al. Science 2012 Jul 13;337(6091): – Identified new protein on segment 3 using ~1000 sequences Frameshift occurs at codon 190 in PA protein, results in new C-terminus IRD will extend this analysis across all segment 3’s in resource – Add PA-X annotation to existing IRD sequence records – Allow users to search for PA-X protein sequences – Provide data that can assist in downstream comparative genomics analyses
H5 clade annotation tool Automated clade determination for any query HA sequence Match WHO clade definitions
NGS Deep Sequencing Data Primary data in SRA Derived data in IRD – Positions with sequence variation – Proportion of read with a particular sequence variation Metadata to understand the context