Presentation is loading. Please wait.

Presentation is loading. Please wait.

GSC-BRC Metadata Standards Richard H. Scheuermann U.T. Southwestern Medical Center.

Similar presentations


Presentation on theme: "GSC-BRC Metadata Standards Richard H. Scheuermann U.T. Southwestern Medical Center."— Presentation transcript:

1 GSC-BRC Metadata Standards Richard H. Scheuermann U.T. Southwestern Medical Center

2 Metadata Inconsistencies Each project was providing different types of metadata No consistent nomenclature being used Impossible to perform reliable comparative genomics analysis

3 Dengue Clinical Metadata

4 Virus Isolate Information

5 Complex Query Interface

6 Additional Clinical Characteristics

7 GSC-BRC Metadata Standards Working Group NIAID assembled a group of representatives from their three Genome Sequencing Centers for Infectious Diseases (Broad, JCVI, UMD) and five Bioinformatics Resource Centers (EuPathDB, IRD, PATRIC, VectorBase, ViPR) programs Develop metadata standards for pathogen isolate sequencing projects

8 Metadata Standards Process Divide into pathogen subgroups – viruses, bacteria, eukaryotic pathogens and vectors Collect example metadata sets from sequencing project white papers and other project sources (e.g. CEIRS) Identify data fields that appear to be common across projects within a pathogen subgroup (core) and data fields that appear to be project specific For each data field, provide definitions, synonyms, allowed value sets preferably using controlled vocabularies, expected syntax, examples, data categories and data providers Merge subgroup core elements into a common set of core metadata fields and attributes Assemble metadata fields into a semantic network Harmonize semantic network with the Ontology of Biomedical Investigation (OBI) Compare, harmonize, map to other relevant initiatives, including MIGS, MIMS, BioProjects, BioSamples Develop data submission spreadsheets to be used for all white paper and BRC- associated projects

9 GSC-BRC Metadata Working Groups

10 Example Metadata

11 Virus Core Metadata Sheet

12 Metadata Merge

13 data transformations – image processing assembly sequencing assay specimen source – organism or environmental specimen collector input sample reagents technician equipment typeIDqualities temporal-spatial region data transformations – variant detection serotype marker detect. gene detection primary data sequence data genotype/serotype/ gene data specimen microorganism enriched NA sample microorganism genomic NA specimen isolation process isolation protocol sample processing data archiving process sequence data record has_input has_output has_specificationhas_part is_about has_input has_output has_input has_output is_about GenBank ID denotes located_in denotes - independent continuant - dependent continuant - occurrent - temporal-spatial region ital- relations has_input has_quality instance_of temporal-spatial region located_in Network Overview

14 data transformations – image processing assembly sequencing assay specimen source – organism or environmental specimen collector input sample reagents technician equipment typeIDqualities temporal-spatial region data transformations – variant detection serotype marker detect. gene detection primary data sequence data genotype/serotype/ gene data specimen microorganism enriched NA sample microorganism genomic NA specimen isolation process isolation protocol sample processing data archiving process sequence data record has_input has_output has_specificationhas_part is_about has_input has_output has_input has_output is_about GenBank ID denotes located_in denotes has_input has_quality instance_of temporal-spatial region located_in Specimen Isolation Material Processing Data Processing Sequencing Assay Investigation

15 Metadata Categories Investigation Host/Source Characterization Specimen Isolation Pathogen Detection Pathogen Isolation Pathogen Characterization Specimen Processing Sample Shipment Sequencing Sample Preparation Sequencing Assay Data Transformation

16 organism environmental material specimen source role species/ strain organism ID age, gender, symptom specimen isolation procedure X has_input plays common name denotes has_qualityinstance_of v10 v12 v11 v13 Host/Source Characterization temporal-spatial region spatial region temporal interval GPS location date/time has_part denotes spatial region geographic location denotes located_in vX– row X in virus sheet - independent continuant - dependent continuant - occurrent - temporal-spatial region ital- relations b14 b15 b16 b17 b19 b20

17 organism environmental material equipment person specimen source role specimen capture role specimen collector role temporal-spatial region spatial region temporal interval GPS location date/time specimen X specimen isolation procedure X isolation protocol has_input has_output plays has_specification has_part denotes located_in name denotes spatial region geographic location denotes located_in affiliation has_affiliation ID v2 v5-6 v3-4 v7 v8 v15 v16 denotes specimen type instance_of specimen isolation procedure type instance_of Specimen Isolation plays has_input Comments ???? v9 organism part hypothesis v17 is_about IRB/IACUC approval has_authorization v19v18 b18 b22 environment has_quality b23 b24 b28 b29 b25 b26 b27 b30

18 temporal-spatial region spatial region temporal interval GPS location date/time specimen X microorganism X has_part located_in spatial region geographic location species/ strain instance_of ID v15 v16 v27 Pathogen Detection pathogen detection process X has_input has_specification data about pathogen presence specimen type amount denotes instance_of has_quality located_in pathogen detection method instance_of denotes pathogen detection protocol has_output v28 is_about b21

19 specimen X microorganism X has_part species/ strain instance_of ID v15 v16 Pathogen Isolation specimen type amount denotes instance_of has_quality v34 temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location pathogen isolation process X located_in pathogen isolation method denotes pathogen isolation protocol has_input instance_of has_specification pathogen isolate X ID pathogen type amount denotes instance_of has_quality has_output v26

20 specimen X microorganism X has_part species/ strain instance_of ID v15 v16 v27 Pathogen Characterization specimen type amount denotes instance_of has_quality v34 temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location pathogen isolation process X located_in pathogen isolation method denotes pathogen isolation protocol has_input instance_of has_specification pathogen isolate X ID pathogen type amount denotes instance_of has_quality has_output b2 b3 b4 biological characteristic assay X antigenic characteristic assay X pathologic characteristic assay X genetic characteristic assay X chromosome/plasmid assay X biovar characteristic serovar characteristic pathovar characteristic genotype characteristic chromosome/plasmid characteristic antibiotic sensitivity assay X antibody sensitivity characteristic has_input is_about genus/species/strain determination assay X genus/species/strain characteristic b5 b6 b7 b8 b11 b13 b10 b9 b12 has_output v27 v29 v30 v31 v32

21 temporal-spatial region spatial region temporal interval GPS location date/time specimen X microorganism X sample set X sample set assembly process X sample set assembly protocol has_output has_part has_specification has_part located_in spatial region geographic location species/ strain instance_of ID v15 v16 v27 Specimen Processing aliquoting process X aliquoting protocol has_input has_output has_specification specimen X aliquot Y specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality located_in sample set assembly process aliquoting process instance_of denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes specimen A aliquot B specimen M aliquot N specimen T aliquot U has_input v20 v22 v23 b40 repository specimen X ID specimen type information record denotes instance_of has_quality repository deposition process X has_input has_output specimen repository located_in b41 b43 b42

22 sample set X at GSC sample set X in transit sample shipment process X sample shipment protocol sample receipt process X sample receipt protocol has_input has_output has_specification Sample Shipment sample set X ID sample set type amount denotes instance_of has_quality ID sample set type amount denotes instance_of has_quality ID sample set type amount denotes instance_of has_quality located_in sample shipment process sample receipt process instance_of temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes v21 sample X at GSC ID sample type amount denotes instance_of has_quality has_part v24 v25

23 temporal-spatial region spatial region temporal interval GPS location date/time NA amplified sample X specimen X microorganism X enriched NA sample X microorganism genomic NA NA enrichment process X NA enrichment protocol NA amplification process X NA amplification protocol has_input has_output has_part has_specification has_part has_specification has_part located_in spatial region geographic location species/ strain instance_of ID v15 v16 v27 Sequencing Sample Preparation aliquoting process X aliquoting protocol has_input has_output has_specification specimen aliquot X specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality ID specimen type amount denotes instance_of has_quality located_in NA enrichment process NA amplification process aliquoting process instance_of denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes v35 v36 v37 v38 v39 v33 b31 b32 library construction protocol b33

24 sequencing assay X sample material X person X equipment X lot # primary data sequencing protocol temporal-spatial region has_input located_in has_specification has_output v40 plays spatial region temporal interval GPS location date/time spatial region geographic location Sequencing Assay has_part located_in denotes run ID sequencing assay type denotes insatnce_of reagent role reagent type instance_of denotes sample ID plays template role sample type instance_of denotes name plays sequencing tech. role species instance_of denotes serial # plays signal detection role equipment type instance_of denotes has_input v14 v41 objectives – coverage, genome type targeted, finishing has_part b34 b38

25 data transformations – image processing assembly X data transformations – variant detection primary data sequence data genotype data microorganism X microorganism genomic NA algorithm data archiving process sequence data record has_input instance_of has_specification has_input has_output is_about GenBank ID denotes software has_input data transfer protocol has_specification species/ strain has_output has_input temporal-spatial region located_in spatial region temporal interval GPS location date/time spatial region geographic location has_part located_in denotes person X name plays bioinformatics tech. role species instance_of denotes run ID denotes located_in data transformations – serotype marker detection serotype data data transformations – gene detection gene data part_of has_output is_about has_input Data Transformations temporal-spatial region spatial region temporal interval GPS location date/time spatial region geographic location has_part located_in denotes v29 v43 v31 v32 v42 v30 v44 v45 v46 v47 b35 b36 finishing status has_quality b37 b39

26 assay X sample material X person X equipment X lot # primary data assay protocol temporal-spatial region has_input located_in has_specification has_output plays spatial region temporal interval GPS location date/time spatial region geographic location Generic Assay has_part located_in denotes run ID assay type denotes instance_of reagent role reagent type instance_of denotes sample ID plays target role sample type instance_of denotes name plays technician role species instance_of denotes serial # plays signal detection role equipment type instance_of denotes has_input objectives has_part analyte X has_part quality x has_quality input sample material X is_about

27 material transformation X sample material X person X equipment X lot # output material X material transformation protocol temporal-spatial region has_input located_in has_specification has_output plays spatial region temporal interval GPS location date/time spatial region geographic location Generic Material Transformation has_part located_in denotes run ID material transformation type denotes instance_of reagent role reagent type instance_of denotes sample ID plays target role sample type instance_of denotes name plays technician role species instance_of denotes serial # plays signal detection role equipment type instance_of denotes has_input objectives has_part quality x has_quality quality x material type has_quality instance_of sample ID denotes

28 data transformation X input data output data material X algorithm has_specification has_output is_about software has_input located_in person X name data analyst role denotes run ID denotes Generic Data Transformation temporal-spatial region spatial region temporal interval GPS location date/time spatial region geographic location has_part located_in denotes data transformation type instance_of plays

29 Generic Material (IC) material X ID material type quality x has_quality material Y has_part material Z has_part quality y has_quality denotes instance_of temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes temporal-spatial region spatial region temporal interval GPS location date/time has_part located_in spatial region geographic location denotes located_in

30 OBI specimen creation organism (for ‘collecting specimen from an organism’) human being synonym individual organism identifier quality geographic location specimen infectious agent specimen creation protocol has_specified_output realizes unfolds_in denotes has_quality is_about located_in has_specified_input geographic location time measurement datum is_duration_of material entity (for ‘environmental material collection’) has_participant organization is_member_of_organization e21 written name denotes e22 CRID symbol denotes e24 textual entity is_about document measurement datum is_about anatomical entity (‘portion of body substance’ or ’ portion of tissue’) is_a specimen creation objective achieves_planned_objective infectious agent is_about e17 e18 synonym e19 is_about organization has_supplier quality has_quality e26 measurement datum e23 is_quality_measured_as infectious agent e25 e27 e29 e30 e31 e32 e33 located_in growth environment e35 e36 e40e41 e42 e44 treatment material_entity has_participant e43 genetic characteristics information is_about e37 genetic characteristics information is_about e20 e39 e38 located_in e45e46 e47 e50 e14 e16 e15 information content entity denotes has_agent

31 Status Core metadata merge process nearly complete Comprehensive semantic networks developed Begun the OBI harmonization process Begun the MIGS/MIMS harmonization process Still need to: – Compare, harmonize, map with BioProjects and BioSamples – Decide what to do about metadata fields that appear to be project specific – Develop metadata submission templates – Report process and results


Download ppt "GSC-BRC Metadata Standards Richard H. Scheuermann U.T. Southwestern Medical Center."

Similar presentations


Ads by Google