Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical Informatics U.T. Southwestern Medical Center Standardizing Metadata Associated.

Slides:



Advertisements
Similar presentations
Aspects of DICOM for Patient Safety
Advertisements

HLA Genetics Consortium Meeting, December 14-15, 2010.
Richard H. Scheuermann, Ph.D. Director of Informatics J. Craig Venter Institute On behalf of the GSC-BRC Metadata Working Group Standardized Metadata for.
GSC-BRC Metadata Standards Richard H. Scheuermann U.T. Southwestern Medical Center.
Toward a definition of “pathological process” Richard H. Scheuermann, Ph.D. Professor of Pathology Director of Biomedical Informatics U.T. Southwestern.
Centers of Excellence for Influenza Research and Surveillance 6 th Annual Meeting Aug 1, 2012 Status of IRD Development.
Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical.
Systems Biology Data Dissemination Working Group 25FEB2015.
Introduction to Bioinformatics Richard H. Scheuermann, Ph.D. Director of Informatics JCVI.
Host cell responses to viral infection can be monitored by a variety of different high throughput experimental methodologies in order to understand the.
Overview of Biomedical Informatics Rakesh Nagarajan.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
BTRIS: The NIH Biomedical Translational Research Information System James J. Cimino Chief, Laboratory for Informatics Development NIH Clinical Center.
How to Organize the World of Ontologies Barry Smith 1.
BTRIS: The NIH Biomedical Translational Research Information System James J. Cimino Chief, Laboratory for Informatics Development NIH Clinical Center.
1 FACS Data Management Workshop The Immunology Database and Analysis Portal (ImmPort) Perspective Bioinformatics Integration Support Contract (BISC) N01AI40076.
Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects and their Implementation in NIAID Bioinformatics Resource Centers Richard.
Databases and tools to study the genomes of hundreds of pathogens, plants, and mammals Richard H. Scheuermann, Ph.D. Director of Informatics J. Craig Venter.
WHO guidelines for investigation and control of Foodborne Diseases outbreak Dr. Christina Rundi Ministry of Health, Malaysia.
Data Requirements for Field Release and Monitoring Jon Knight Imperial College London
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
1 Betsy L. Humphreys, MLS Betsy L. Humphreys, MLS National Library of Medicine National Library of Medicine National Institutes of Health National Institutes.
Limning the CTS Ontology Landscape Barry Smith 1.
Biorepository Software Selection University of Michigan 31-Aug-2012 Frank Manion, Chief Information Officer Paul McGhee, Lead Business Analyst Cancer Center.
Computational Biology and Informatics Laboratory Development of an Application Ontology for Beta Cell Genomics Based On the Ontology for Biomedical Investigations.
Bioinformatics and medicine: Are we meeting the challenge?
Environmental Terminology Research in China HE Keqing, HE Yangfan, WANG Chong State Key Lab. Of Software Engineering
1 Enhancing Organism Based Disease Knowledge Using Biological Taxonomy, and Environmental Ontologies Ken Baclawski Northeastern University Neil Sarkar.
Surveillance, Epidemiology, and Tracing Surveillance Part 1: The Surveillance Plan Adapted from the FAD PReP/NAHEMS Guidelines: Surveillance, Epidemiology,
OBI – Communities and Structure 1. Coordination Committee (CC): Representatives of the communities -> Monthly conferences 2. Developers WG: CC and other.
Michael F. Huerta, Ph.D. Associate Director for Program Development National Library of Medicine, NIH BD2K CDE Webinar – September 8, 2015 Common Data.
1 WHO Communicable Diseases, Surveillance & Response SARS Diagnostics and Laboratory Needs: the WHO Perspective C.E. Roth Dangerous and New Pathogens Global.
Department of Health and Human Services National Institutes of Health National Center for Research Resources Division of Research Infrastructure Extending.
Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,
Background: Clinical and Translational Research Centers promote scientific collaborations. The Puerto Rico Clinical and Translational Research Consortium.
DAN LAWSON BRC 2011 – ANNUAL MEETING UT SOUTHWESTERN MEDICAL CENTER DALLAS, TX SEPTEMBER 2011 Challenges and opportunities of new sequencing technologies.
Metadata in the iPlant Collaborative Cyberinfrastructure Birds of a Feather meeting at PAG XXII, Jan. 14, 2014.
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
Leveraging Ontologies for Human Immunology Research Barry Smith, Alexander Diehl, Anna- Maria Masci Presented at Leveraging Standards and Ontologies to.
Alan Ruttenberg PONS R&D Task force Alan Ruttenberg Science Commons.
Richard H. Scheuermann, Ph.D. November 5, 2012 Support for Systems Biology Data in IRD/ViPR - Proteomics.
BIG Data: Knowledge for Improving Vaccine Virus Selection Richard H. Scheuermann, Ph.D. Director of Informatics JCVI.
Influenza Infectious Disease Ontology (Influenza-IDO) Status August 2010.
ADVANCED DB SYSTEMS BIOMEDICAL ENGINEERING. Index INTRODUCTION  BIOMEDICAL ENGINEERING  B.E. DATASETS APPLICATIONS  DATA MINING ON FDA DATABASE  ONTOLOGY-BASED.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Data Integration and Management A PDB Perspective.
Integration of Host Factor Data into the Virus Pathogen Database and Analysis Resource (ViPR) and the Influenza Research Database (IRD) Brett E. Pickett.
SESSION CHAIR: RICHARD SCHEUERMANN (VIPR & IRD) BRC2011 Session #5 – Data Standards and Metadata.
BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets.
Core 2: Bioinformatics NCBO-Berkeley. Core 2 Specific Aims 1.Apply ontologies  Software toolkit for describing and classifying data 2.Capture, manage,
Common Terminology Services 2 CTS 2 Submission Team Status Update HL7 Vocabulary Working Group May 17, 2011.
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari
Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.
Influenza Ontology Infectious Disease Ontology Workshop 2008 Burke Squires.
Habitat-Lite & EnvO Jin Mao Postdoc, School of Information, University of Arizona Nov. 20, 2015.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
Immunology Ontology Rho Meeting October 10, 2013.
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Division of HIV/AIDS Managing Questionnaire Development for a National HIV Surveillance Survey, Medical Monitoring Project Jennifer L Fagan, Health Scientist/Interview.
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
Randi Vita, M.D. Better living through ontologies at the Immune Epitope Database La Jolla Institute for Allergy & Immunology Division of Vaccine Discovery.
Databases, Ontologies and Text mining Session Introduction Part 2
Infectious Disease: A New Challenge for Biomedical Informatics
Future Directions Unknowns:
OBI – Standard Semantic
Introduction to the MIABIS SOP Working Group
Bird of Feather Session
Presentation transcript:

Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical Informatics U.T. Southwestern Medical Center Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects and their Implementation in NIAID Bioinformatics Resource Centers N01AI N01AI40041

Richard H. Scheuermann, Ph.D. Director of Informatics J. Craig Venter Institute Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects and their Implementation in NIAID Bioinformatics Resource Centers N01AI N01AI40041

Genome Sequencing Centers for Infectious Disease (GSCID) Bioinformatics Resource Centers (BRC)

High Throughput Sequencing Enabling technology – Epidemiology of outbreaks – Pathogen evolution – Host range restriction – Genetic determinants of virulence and pathogenicity Metadata requirements – Temporal-spatial information about isolates – Selective pressures – Host species of specimen source – Disease severity and clinical manifestations

Metadata Submission Spreadsheets

Complex Query Interface

Metadata Inconsistencies Each project was providing different types of metadata No consistent nomenclature being used Impossible to perform reliable comparative genomics analysis Required extensive custom bioinformatics system development

GSC-BRC Metadata Standards Working Group NIAID assembled a group of representatives from their three Genome Sequencing Centers for Infectious Diseases (Broad, JCVI, UMD) and five Bioinformatics Resource Centers (EuPathDB, IRD, PATRIC, VectorBase, ViPR) programs Develop metadata standards for pathogen isolate sequencing projects Bottom up approach Assemble into a semantic framework

GSC-BRC Metadata Working Groups

Metadata Standards Process Divide into pathogen subgroups – viruses, bacteria, eukaryotic pathogens and vectors Collect example metadata sets from sequencing project white papers and other project sources (e.g. CEIRS) Identify data fields that appear to be common across projects within a pathogen subgroup (core) and data fields that appear to be project specific For each data field, provide common set of attributes, including definitions, synonyms, allowed value sets preferably using controlled vocabularies, and expected syntax, etc. Merge subgroup core elements into a common set of core metadata fields and attributes Assemble set of pathogen-specific and project-specific metadata fields to be used in conjunction with core fields Compare, harmonize, map to other relevant initiatives, including OBI, MIGS, MIxS, BioProjects, BioSamples (ongoing) Assemble all metadata fields into a semantic network (ongoing) Harmonize semantic network with the Ontology of Biomedical Investigation (OBI) Draft data submission spreadsheets to be used for all white paper and BRC-associated projects Finalize version 1.0 metadata standard and version 1.0 data submission spreadsheet Beta test version 1.0 standard with new white paper projects, collecting feedback

Data Fields:Core ProjectCore Sample Attributes

organism environmental material equipment person specimen source role specimen capture role specimen collector role temporal-spatial region spatial region temporal interval GPS location date/time specimen X specimen isolation procedure X isolation protocol has_input has_output plays has_specification has_part denotes located_in name denotes spatial region geographic location denotes located_in affiliation has_affiliation ID denotes specimen type instance_of specimen isolation procedure type instance_of Specimen Isolation plays has_input organism part hypothesis is_about IRB/IACUC approval has_authorization environment has_quality organism pathogenic disposition has part has disposition ID denotes CS1 genderagehealth status has quality CS4CS5/6CS7 CS2/3 CS8 CS9/10 CS11/12 CS13 CS14 CS18 CS15/16

Metadata Processes data transformations – image processing assembly sequencing assay specimen source – organism or environmental specimen collector input sample reagents technician equipment typeID qualities temporal-spatial region data transformations – variant detection serotype marker detect. gene detection primary data sequence data genotype/serotype/ gene data specimen microorganism enriched NA sample microorganism genomic NA specimen isolation process isolation protocol sample processing data archiving process sequence data record has_input has_output has_specificationhas_part is_about has_input has_output has_input has_output is_about GenBank ID denotes located_in denotes has_input has_quality instance_of temporal-spatial region located_in Specimen Isolation Material Processing Data Processing Sequencing Assay Investigation temporal-spatial region located_in temporal-spatial region located_in temporal-spatial region located_in temporal-spatial region located_in quality assessment assay Quality Assessment has_input has_output

Outcome of Metadata Standards WG Consistent metadata captured across GSCID Guidance to collaborators regarding metadata expectations for sequencing and analysis services Support more standardized BRC interface development Harmonization with related stakeholders – Genome Standards Consortium MIxS, OBO Foundry OBI and NCBI BioSample Represented in the context of an extensible semantic framework

Conclusions Metadata standards for microorganism sequencing projects Bottom up approach focuses standard on important features Harmonizing with related standards from the Genome Standards Consortium, OBO Foundry and NCBI Being beta-tested by GSCIDs for adoption by all NIAID-sponsored sequencing projects Utility of semantic representation – Identified gaps in data field list (e.g. temporal components) – Includes logical structure for other, project-specific, data fields - extensible – Identified gaps in ontology data standards (use case-driven standard development) – Identified commonalities in data structures (reusable) – Support for semantic queries and inferential analysis in future Ontology-based framework is extensible – Sequencing => “omics”