GSC-BRC Metadata Standards Richard H. Scheuermann U.T. Southwestern Medical Center.

Slides:



Advertisements
Similar presentations
General Microbiology Lecture Twelve Identification of Bacteria
Advertisements

Regulation of Consumer Tests in California AAAS Meeting June 1-2, 2009 Beatrice OKeefe Acting Chief, Laboratory Field Services California Department of.
Using DAML format for representation and integration of complex gene networks: implications in novel drug discovery K. Baclawski Northeastern University.
METS: An Introduction Structuring Digital Content.
Charlie Whittaker – BIG meeting 12/3/14
Richard H. Scheuermann, Ph.D. Director of Informatics J. Craig Venter Institute On behalf of the GSC-BRC Metadata Working Group Standardized Metadata for.
Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical.
Overview of key concepts and features
Systems Biology Data Dissemination Working Group 25FEB2015.
Introduction to Bioinformatics Richard H. Scheuermann, Ph.D. Director of Informatics JCVI.
Experimental pathology refers to the observation of the effects of manipulations on animal models or cell cultures regarding researches on human diseases.
VectorBase Invertebrate Vectors of Human Pathogens.
Overview of Biomedical Informatics Rakesh Nagarajan.
Diagnostic Microbiology and Immunology
Ontology development for the Immune Epitope Database Bjoern Peters La Jolla Institute for Allergy and Immunology.
BTRIS: The NIH Biomedical Translational Research Information System James J. Cimino Chief, Laboratory for Informatics Development NIH Clinical Center.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
How to Organize the World of Ontologies Barry Smith 1.
Instructions for using this template. Remember this is Jeopardy, so where I have written “ Answer ” this is the prompt the students will see, and where.
Towards an Autoimmune Disease Ontology Alexander D. Diehl 6/13/12.
1 FACS Data Management Workshop The Immunology Database and Analysis Portal (ImmPort) Perspective Bioinformatics Integration Support Contract (BISC) N01AI40076.
Laboratory Training for Field Epidemiologists Polymerase Chain Reaction Investigation strategies and methods May 2007.
Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical Informatics U.T. Southwestern Medical Center Standardizing Metadata Associated.
Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects and their Implementation in NIAID Bioinformatics Resource Centers Richard.
Data Requirements for Field Release and Monitoring Jon Knight Imperial College London
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
Richard White Biodiversity Data. Outline Biodiversity: what is it? – Definitions: is biodiversity: A resource? Something which can be measured? How to.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Copyright OpenHelix. No use or reproduction without express written consent1.
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
Bioinformatics and medicine: Are we meeting the challenge?
OBI – Communities and Structure 1. Coordination Committee (CC): Representatives of the communities -> Monthly conferences 2. Developers WG: CC and other.
Ontologies for Web Service Annotations OBI & EDAM Dr. Jessica Kissinger Department Of Genetics University Of Georgia 1.
1 WHO Communicable Diseases, Surveillance & Response SARS Diagnostics and Laboratory Needs: the WHO Perspective C.E. Roth Dangerous and New Pathogens Global.
Approval Criteria for Assays for Testing Blood Donors for West Nile Virus Robin Biswas, M.D. CBER, FDA Blood Products Advisory Committee Meeting March.
Statistical Tool for Identifying Sequence Variations That Correlate with Virus Phenotypic Characteristics in the Virus Pathogen Resource (ViPR) July 22,
DAN LAWSON BRC 2011 – ANNUAL MEETING UT SOUTHWESTERN MEDICAL CENTER DALLAS, TX SEPTEMBER 2011 Challenges and opportunities of new sequencing technologies.
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
Leveraging Ontologies for Human Immunology Research Barry Smith, Alexander Diehl, Anna- Maria Masci Presented at Leveraging Standards and Ontologies to.
BIG Data: Knowledge for Improving Vaccine Virus Selection Richard H. Scheuermann, Ph.D. Director of Informatics JCVI.
Influenza Infectious Disease Ontology (Influenza-IDO) Status August 2010.
Immunological Images and the ImmPort Database and Analysis Portal Anna Maria Masci Department of Immunology Duke University Buffalo, 24 June 2014.
Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.
ADVANCED DB SYSTEMS BIOMEDICAL ENGINEERING. Index INTRODUCTION  BIOMEDICAL ENGINEERING  B.E. DATASETS APPLICATIONS  DATA MINING ON FDA DATABASE  ONTOLOGY-BASED.
Immune System SC.912.L Explain the basic functions of the human immune system, including specific and nonspecific immune response, vaccines, and.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Data Integration and Management A PDB Perspective.
SESSION CHAIR: RICHARD SCHEUERMANN (VIPR & IRD) BRC2011 Session #5 – Data Standards and Metadata.
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
Master headline RDFizing the EBI Gene Expression Atlas James Malone, Electra Tapanari
Mining the Biomedical Research Literature Ken Baclawski.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Introduction to Biomedical Ontology for Imaging Informatics Barry Smith, PhD, FACMI University at Buffalo May 11, 2015.
Influenza Ontology Infectious Disease Ontology Workshop 2008 Burke Squires.
Analysis of Use Cases (and to some extent, standards) - Keith G Jeffery, Rebecca Koskela.
Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.
Immunology Ontology Rho Meeting October 10, 2013.
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
High throughput biology data management and data intensive computing drivers George Michaels.
Pathology & Laboratory By Alejandra Munoz, CPC, NCICS.
Ratio Measurements Overview of modeling approaches in existing ontologies. Note: class hierarchies are shortened in the following Heiner Oberkampf.
CAREERS IN PATHOLOGY. PATHOLOGY Pathology is described as “the study of disease” or in other words the scientific study of the way things go wrong In.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Immune Epitope Database assays. Standard immune epitope definition Classical (textbook) definition: An epitope, also known as antigenic determinant, is.
Unit 1.1 Review MI.
Databases, Ontologies and Text mining Session Introduction Part 2
exRNA Metadata Standards
HIV Drug Resistance Training
Data challenges in the pharmaceutical industry
Presentation transcript:

GSC-BRC Metadata Standards Richard H. Scheuermann U.T. Southwestern Medical Center

Metadata Inconsistencies Each project was providing different types of metadata No consistent nomenclature being used Impossible to perform reliable comparative genomics analysis

Dengue Clinical Metadata

Virus Isolate Information

Complex Query Interface

Additional Clinical Characteristics

GSC-BRC Metadata Standards Working Group NIAID assembled a group of representatives from their three Genome Sequencing Centers for Infectious Diseases (Broad, JCVI, UMD) and five Bioinformatics Resource Centers (EuPathDB, IRD, PATRIC, VectorBase, ViPR) programs Develop metadata standards for pathogen isolate sequencing projects

Metadata Standards Process Divide into pathogen subgroups – viruses, bacteria, eukaryotic pathogens and vectors Collect example metadata sets from sequencing project white papers and other project sources (e.g. CEIRS) Identify data fields that appear to be common across projects within a pathogen subgroup (core) and data fields that appear to be project specific For each data field, provide definitions, synonyms, allowed value sets preferably using controlled vocabularies, expected syntax, examples, data categories and data providers Merge subgroup core elements into a common set of core metadata fields and attributes Assemble metadata fields into a semantic network Harmonize semantic network with the Ontology of Biomedical Investigation (OBI) Compare, harmonize, map to other relevant initiatives, including MIGS, MIMS, BioProjects, BioSamples Develop data submission spreadsheets to be used for all white paper and BRC- associated projects

GSC-BRC Metadata Working Groups

Example Metadata

Virus Core Metadata Sheet

Metadata Merge

data transformations – image processing assembly sequencing assay specimen source – organism or environmental specimen collector input sample reagents technician equipment typeIDqualities temporal-spatial region data transformations – variant detection serotype marker detect. gene detection primary data sequence data genotype/serotype/ gene data specimen microorganism enriched NA sample microorganism genomic NA specimen isolation process isolation protocol sample processing data archiving process sequence data record has_input has_output has_specificationhas_part is_about has_input has_output has_input has_output is_about GenBank ID denotes located_in denotes - independent continuant - dependent continuant - occurrent - temporal-spatial region ital- relations has_input has_quality instance_of temporal-spatial region located_in Network Overview

data transformations – image processing assembly sequencing assay specimen source – organism or environmental specimen collector input sample reagents technician equipment typeIDqualities temporal-spatial region data transformations – variant detection serotype marker detect. gene detection primary data sequence data genotype/serotype/ gene data specimen microorganism enriched NA sample microorganism genomic NA specimen isolation process isolation protocol sample processing data archiving process sequence data record has_input has_output has_specificationhas_part is_about has_input has_output has_input has_output is_about GenBank ID denotes located_in denotes has_input has_quality instance_of temporal-spatial region located_in Specimen Isolation Material Processing Data Processing Sequencing Assay Investigation

Metadata Categories Investigation Host/Source Characterization Specimen Isolation Pathogen Detection Pathogen Isolation Pathogen Characterization Specimen Processing Sample Shipment Sequencing Sample Preparation Sequencing Assay Data Transformation

organism environmental material specimen source role species/ strain organism ID age, gender, symptom specimen isolation procedure X has_input plays common name denotes has_qualityinstance_of v10 v12 v11 v13 Host/Source Characterization temporal-spatial region spatial region temporal interval GPS location date/time has_part denotes spatial region geographic location denotes located_in vX– row X in virus sheet - independent continuant - dependent continuant - occurrent - temporal-spatial region ital- relations b14 b15 b16 b17 b19 b20

organism environmental material equipment person specimen source role specimen capture role specimen collector role temporal-spatial region spatial region temporal interval GPS location date/time specimen X specimen isolation procedure X isolation protocol has_input has_output plays has_specification has_part denotes located_in name denotes spatial region geographic location denotes located_in affiliation has_affiliation ID v2 v5-6 v3-4 v7 v8 v15 v16 denotes specimen type instance_of specimen isolation procedure type instance_of Specimen Isolation plays has_input Comments ???? v9 organism part hypothesis v17 is_about IRB/IACUC approval has_authorization v19v18 b18 b22 environment has_quality b23 b24 b28 b29 b25 b26 b27 b30

temporal-spatial region spatial region temporal interval GPS location date/time specimen X microorganism X has_part located_in spatial region geographic location species/ strain instance_of ID v15 v16 v27 Pathogen Detection pathogen detection process X has_input has_specification data about pathogen presence specimen type amount denotes instance_of has_quality located_in pathogen detection method instance_of denotes pathogen detection protocol has_output v28 is_about b21

specimen X microorganism X has_part species/ strain instance_of ID v15 v16 Pathogen Isolation specimen type amount denotes instance_of has_quality v34 temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location pathogen isolation process X located_in pathogen isolation method denotes pathogen isolation protocol has_input instance_of has_specification pathogen isolate X ID pathogen type amount denotes instance_of has_quality has_output v26

specimen X microorganism X has_part species/ strain instance_of ID v15 v16 v27 Pathogen Characterization specimen type amount denotes instance_of has_quality v34 temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location pathogen isolation process X located_in pathogen isolation method denotes pathogen isolation protocol has_input instance_of has_specification pathogen isolate X ID pathogen type amount denotes instance_of has_quality has_output b2 b3 b4 biological characteristic assay X antigenic characteristic assay X pathologic characteristic assay X genetic characteristic assay X chromosome/plasmid assay X biovar characteristic serovar characteristic pathovar characteristic genotype characteristic chromosome/plasmid characteristic antibiotic sensitivity assay X antibody sensitivity characteristic has_input is_about genus/species/strain determination assay X genus/species/strain characteristic b5 b6 b7 b8 b11 b13 b10 b9 b12 has_output v27 v29 v30 v31 v32

temporal-spatial region spatial region temporal interval GPS location date/time specimen X microorganism X sample set X sample set assembly process X sample set assembly protocol has_output has_part has_specification has_part located_in spatial region geographic location species/ strain instance_of ID v15 v16 v27 Specimen Processing aliquoting process X aliquoting protocol has_input has_output has_specification specimen X aliquot Y specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality located_in sample set assembly process aliquoting process instance_of denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes specimen A aliquot B specimen M aliquot N specimen T aliquot U has_input v20 v22 v23 b40 repository specimen X ID specimen type information record denotes instance_of has_quality repository deposition process X has_input has_output specimen repository located_in b41 b43 b42

sample set X at GSC sample set X in transit sample shipment process X sample shipment protocol sample receipt process X sample receipt protocol has_input has_output has_specification Sample Shipment sample set X ID sample set type amount denotes instance_of has_quality ID sample set type amount denotes instance_of has_quality ID sample set type amount denotes instance_of has_quality located_in sample shipment process sample receipt process instance_of temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes v21 sample X at GSC ID sample type amount denotes instance_of has_quality has_part v24 v25

temporal-spatial region spatial region temporal interval GPS location date/time NA amplified sample X specimen X microorganism X enriched NA sample X microorganism genomic NA NA enrichment process X NA enrichment protocol NA amplification process X NA amplification protocol has_input has_output has_part has_specification has_part has_specification has_part located_in spatial region geographic location species/ strain instance_of ID v15 v16 v27 Sequencing Sample Preparation aliquoting process X aliquoting protocol has_input has_output has_specification specimen aliquot X specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality located_in NA enrichment process NA amplification process aliquoting process instance_of denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes v35 v36 v37 v38 v39 v33 b31 b32 library construction protocol b33

sequencing assay X sample material X person X equipment X lot # primary data sequencing protocol temporal-spatial region has_input located_in has_specification has_output v40 plays spatial region temporal interval GPS location date/time spatial region geographic location Sequencing Assay has_part located_in denotes run ID sequencing assay type denotes insatnce_of reagent role reagent type instance_of denotes sample ID plays template role sample type instance_of denotes name plays sequencing tech. role species instance_of denotes serial # plays signal detection role equipment type instance_of denotes has_input v14 v41 objectives – coverage, genome type targeted, finishing has_part b34 b38

data transformations – image processing assembly X data transformations – variant detection primary data sequence data genotype data microorganism X microorganism genomic NA algorithm data archiving process sequence data record has_input instance_of has_specification has_input has_output is_about GenBank ID denotes software has_input data transfer protocol has_specification species/ strain has_output has_input temporal-spatial region located_in spatial region temporal interval GPS location date/time spatial region geographic location has_part located_in denotes person X name plays bioinformatics tech. role species instance_of denotes run ID denotes located_in data transformations – serotype marker detection serotype data data transformations – gene detection gene data part_of has_output is_about has_input Data Transformations temporal-spatial region spatial region temporal interval GPS location date/time spatial region geographic location has_part located_in denotes v29 v43 v31 v32 v42 v30 v44 v45 v46 v47 b35 b36 finishing status has_quality b37 b39

assay X sample material X person X equipment X lot # primary data assay protocol temporal-spatial region has_input located_in has_specification has_output plays spatial region temporal interval GPS location date/time spatial region geographic location Generic Assay has_part located_in denotes run ID assay type denotes instance_of reagent role reagent type instance_of denotes sample ID plays target role sample type instance_of denotes name plays technician role species instance_of denotes serial # plays signal detection role equipment type instance_of denotes has_input objectives has_part analyte X has_part quality x has_quality input sample material X is_about

material transformation X sample material X person X equipment X lot # output material X material transformation protocol temporal-spatial region has_input located_in has_specification has_output plays spatial region temporal interval GPS location date/time spatial region geographic location Generic Material Transformation has_part located_in denotes run ID material transformation type denotes instance_of reagent role reagent type instance_of denotes sample ID plays target role sample type instance_of denotes name plays technician role species instance_of denotes serial # plays signal detection role equipment type instance_of denotes has_input objectives has_part quality x has_quality quality x material type has_quality instance_of sample ID denotes

data transformation X input data output data material X algorithm has_specification has_output is_about software has_input located_in person X name data analyst role denotes run ID denotes Generic Data Transformation temporal-spatial region spatial region temporal interval GPS location date/time spatial region geographic location has_part located_in denotes data transformation type instance_of plays

Generic Material (IC) material X ID material type quality x has_quality material Y has_part material Z has_part quality y has_quality denotes instance_of temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes located_in

OBI specimen creation organism (for ‘collecting specimen from an organism’) human being synonym individual organism identifier quality geographic location specimen infectious agent specimen creation protocol has_specified_output realizes unfolds_in denotes has_quality is_about located_in has_specified_input geographic location time measurement datum is_duration_of material entity (for ‘environmental material collection’) has_participant organization is_member_of_organization e21 written name denotes e22 CRID symbol denotes e24 textual entity is_about document measurement datum is_about anatomical entity (‘portion of body substance’ or ’ portion of tissue’) is_a specimen creation objective achieves_planned_objective infectious agent is_about e17 e18 synonym e19 is_about organization has_supplier quality has_quality e26 measurement datum e23 is_quality_measured_as infectious agent e25 e27 e29 e30 e31 e32 e33 located_in growth environment e35 e36 e40e41 e42 e44 treatment material_entity has_participant e43 genetic characteristics information is_about e37 genetic characteristics information is_about e20 e39 e38 located_in e45e46 e47 e50 e14 e16 e15 information content entity denotes has_agent

Status Core metadata merge process nearly complete Comprehensive semantic networks developed Begun the OBI harmonization process Begun the MIGS/MIMS harmonization process Still need to: – Compare, harmonize, map with BioProjects and BioSamples – Decide what to do about metadata fields that appear to be project specific – Develop metadata submission templates – Report process and results