Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontologies, Databases, Knowledgebases: How should they interoperate?

Similar presentations


Presentation on theme: "Ontologies, Databases, Knowledgebases: How should they interoperate?"— Presentation transcript:

1 Ontologies, Databases, Knowledgebases: How should they interoperate?
Judith Blake, Ph.D. The Jackson Laboratory Database: organized body of related information; a structured collection of records or data Knowledgebase: facts + inferences

2 Thesis The Mouse Genome Informatics (MGI) system
provides a model for interoperablity that incorporates the use of ontologies, depends upon the interconnection among databases, and Supports integration of data from multiple data sources This may provide model for PRO objectives to support connections between PRO and disease representations

3 Mouse Genome Informatics (MGI)
MGI’s primary mission is to facilitate the use of mouse as a model for human biology by providing integrated access to data on the genetics, genomics, and biology of the laboratory mouse. variants & polymorphisms expression strain geneaology Hermansky-Pudlak syndrome Mouse model & human phenotype sequence genome location tumors gene function mouse/human orthologs & maps Information content spans from sequence to phenotype/disease

4 Automated (mostly) Data Integration (Loads)
Clones EG mouse RPCI GO UniProt MGC MP Associations Vocabularies DFCI Anatomy DoTS Interpro NIA OMIM Unigene PIRSF TreeFam Add: Vega? Homologene Gene trap Entrezgene chimp & dog Remove: Riken Change TIGR to DFSR (or whatever it is) Annotation Gene traps MGI GenBank EG chimp EG dog RefSeq Sequences EG rat UniProt EG human DFCIseq HCOP DoTSseq Homologene NIAseq Non-mouse dbSNP NCBI VEGA SNP db UniSTS Gene models and coordinates Ensembl microRNAs

5 Mouse Genome Informatics Controlled vocabularies and ontologies
GO - Gene Ontology (GO) PRO - Protein Ontology (PRO) MP - Mouse Phenotype Ontology MA - Mouse Anatomy (GXD) CL- Cell Type Ontology Mouse gene and strain nomenclature SO - Sequence Ontology RO - Relations Ontology ECO - Evidence Code ontology Integration: Controlled Vocabularies and Ontologies

6 MGI Operating Principles
Data integration is key to comprehensive access to mouse genome, functional, mouse model, and comparative data allows the data to be evaluated in new contexts Supports robust access to comprehensive information Permits efficient access to related resources Standards are key to data integration Nomenclature Standardized gene nomenclature, keywords, etc. Knowledge representation Gene Ontology (GO) Mammalian Phenotype Ontology Integration of Multi-Source Data Depends on consistent entity tagging Requires improvement of data storage structures Necessitates ontology updates for data categories and context

7 Mouse Phenotypes and Disease Models Connects mouse and human phenotypes in studies of human disease processes Mouse Crebbptm1Sis/Crebbp+ mutants showing skeletal formation defects. Human Rubinstein-Taybi Syndrome 1 (OMIM:180849), caused by CREBBP mutation. ° mental retardation ° postnatal growth deficiency ° microcephaly ° broad thumbs & halluces ° dysmorphic facial features (beaked nose, high arched palate, characteristic grimacing) ° increased tumor risk 1

8 Diseases and Phenotypes
Diseases are described by signs and symptoms Signs – things you can measure Symptoms – things the patient notices Signs are phenotypes Diseases are characterized by phenotypes including the order, severity and duration with which they occur. A full model of disease takes into account dimensions of anatomy, time, severity, therapeutic responsiveness, outcomes etc. There is also a probabilistic element to an instance of the disease and a probabilistic association between phenotypic elements in one instance. Diseases are not phenotypes ( although predisposition may be considered as such) but single phenotype diseases may be viewed as phenotypes, eg. osteoarthritis. Paul Schofield, 2013

9 Status of Phenotype & Disease Data
May 2012 May 2013 May 2014 change this yr. Phenotype terms in MP ontology 8,775 9,034 10,190 +1,156 Mutant alleles cataloged : total : in mice number of genes represented targeted alleles number of genes targeted 743, ,299 20,937 46,822 15,488 748,960 33,659 21,442 51,119 16,221 754,256 39,241 21,786 55,640 16,358 +5,296 +5,582 +344 +14, Alleles w/ phenotype (MP) annotation Genotypes with MP annotation Total MP annotations 29,064 43,579 223,125 32,095 47,790 249,460 34,625 51,720 268,577 +2,530 +3,930 +19,117 Mouse genotypes modeling human disease Human Diseases w/1 mouse model(s) 3,687 1,153 4,084 1,239 4,365 1,310 +281 +71 QTLs 4,696 4,715 4,835 +120

10 Objective …make phenotype and disease model data robust and accessible to researchers and computational biologists semantic consistency to enable complete data retrieval integrated access to all phenotypic variation sources (single-gene and genomic mutations, engineered mutations, QTLs, strains) data on human disease correlation access to mouse models from various approaches - Genetic - Phenotypic - Genomic localization - Computational

11 Annotating Disease to Genotype
Different alleles of a gene on the same background may/may not be disease models The same alleles of a gene on different genetic backgrounds may/may not be disease model Disease models are attached to genotype “objects” Disease annotation consists of OMIM term, the data reference /source, and association type OMIM term 129S1/Sv Crouzon Syndrome genotype Fgfr2tm1Schl / Fgfr2+ phenotypic similarity to human disease associated with ortholog association type Eswarakumar VP et al., PNAS USA 2006;103: source 8

12 MGI 4,084 MGI Mouse Models 1,239 OMIM diseases (associated with)

13 Note chicken and zebrafish
Each associated human disease links to a Human Disease and Mouse Model Detail Page Survival of motor neuron 1 Note chicken and zebrafish

14 Mouse Genome Informatics: Integrate Sequence with Biology
Nucleotide Sequences Nomenclature Genome location Strains Polymorphisms Orthology Expression Alleles Mutant phenotypes Function of gene products Literature Genome Features Genome variation Biological knowledge and attributes in MGI Protein Sequences Gene predictions

15 Disease Cell Anatomy Adapted from Schriml and Kibbe: ICBO submission 2013

16 Now with annotation extensions
positive regulation of transcription from pol II promoter in response to oxidative stress[GO: ] protein localization to nucleus[GO: ] cellular response to oxidative stress [GO: ] happens during sty1 pap1 has input has regulation target <anonymous description> <anonymous description> Key point: logically equivalent to an annotation to a term in the <anon desc> box, with the same links out. DB Object Term Ev Ref Extension PomBase sty1 SPAC24B11.06c GO: protein localization to nucleus IMP PMID: .. happens_during(GO: ), has_input(SPAC c) pap1 SPAC c GO: has_regulation_target(…)

17 Annotation Extensions

18 MGI Modular Annotation Example
Xirp1 is involved in the organization of the sarcomere in a cardiac muscle cell (CL: ) of the myocardilum (MA: ) xin actin-binding repeat containing 1 Total number of MGI modular annotation units to proteins: 22,866 This does not include annotations to permanent cell lines

19 Summary of MGI Modular Annotations
part_of 9013 occurs_in 6298 regulates_o_occurs_in 2884 regulates_o_acts_on_population_of 1017 regulates_o_results_in_acquisition_of_features_of 967 results_in_acquisition_of_features_of 723 regulates_o_has_agent 537 regulates_o_has_participant 275 acts_on_population_of 259 results_in_movement_of 232 results_in_development_of 173 regulates_o_results_in_movement_of 156 results_in_specification_of 96 results_in_maturation_of 61 results_in_morphogenesis_of 48 has_agent 34 results_in_commitment_to 32 results_in_division_of 21 regulates_o_results_in_commitment_to 14 results_in_determination_of 13 regulates_o_results_in_specification_of 7 has_output_o_axis_of 5 regulates_o_results_in_development_of 1

20 Interaction Data in MGI …from catalog to context
Relationships among markers project Explicit representation of relationships among genome features Interaction explorer Project initially focused on microRNAs microRNA cluster membership Predicted and validated microRNA targets Curation of interaction data from the literature (Gene Ontology) and from specialized external informatics resources

21 Mouse_CCO is an application ontology built on experimental evidence-based annotations. The data drives the structure allowing a user to ‘discover’ connections. This diagram illustrates the generic template for the ontology. Protein_mouse PRO, UniProtKB Gene_mouse MGI encodes Allele_mouse Genotype_mouse has_variant part_of Gene_human NCBI orthologous_to Phenotype_mouse Disease_human OMIM, DO CCO_human BioPortal associated_with Pathway_mouse MouseCyc described_by Process GO (BP) Component GO (CC) Function GO (MF) participates_in located_in Anatomy_mouse GXD, EMAP expressed_in effects Cell_type CL Mary Dolan

22 Mouse_CCO is populated using 1017 mouse genes annotated to GO ‘cell cycle’ along with all their annotations from MGI and several additional data resources. Here we show how the generic template is populated for Brca1. orthologous_to has_variant Gene_human BRCA1 Mouse gene: Brca1 (breast cancer 1) Allele: Brca1tm1Thl part_of described_by encodes Genotype: Brca1tm1Thl/Brca1tm1Thl Waptm1(cre)Arge/0 129S1/Sv * C57BL/6J id: CCO:B name: BRCA1_HUMAN Protein_mouse VEGA model OTTMUSP participates_in associated_with Process DNA repair associated_with mammary adenocarcinoma Function damaged DNA binding participates_in Component BRCA1-BARD1 complex located_in expressed_in effects OMIM: Breast Cancer TS28: mammary gland Mary Dolan associated_with

23 Keys to Interoperability self-help mantras
Start where you are: from silos to networks Identify shared interests: educated self promotion Develop shared processes/applications Discuss the ideal, implement the practical

24 Acknowledgements Gene Ontology Funding: NIH_NHGRI
Mike Cherry Suzi Lewis Paul Sternberg Paul Thomas Funding: NIH_NHGRI Mouse Genome Informatics Carol Bult Janan Eppig Jim Kadin Joel Richardson Martin Ringwald MGI-GO-PRO team Karen Christie Mary Dolan Harold Drabkin David Hill Li Ni Dmitry Sitnikov


Download ppt "Ontologies, Databases, Knowledgebases: How should they interoperate?"

Similar presentations


Ads by Google