Alignment of Ontologies for Biological Research Judith A. Blake, Ph.D. Bioinformatics and Computational Biology The Jackson Laboratory.

Slides:



Advertisements
Similar presentations
Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine *
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December Mouse Genome Informatics.
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Terry F. Hayamizu Mouse Genome Informatics, The Jackson Laboratory M OUSE A NATOMY O NTOLOGIES AND GXD.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Comprehensive Annotation System for Infectious Disease Data Alexander Diehl University at Buffalo/The Jackson Laboratory IDO Workshop /9/2010.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Mouse Genome Informatics November 2008 Paul Szauter MGI User Support.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Ontologies and vocabularies supporting data integration: emphasis on mouse phenotypes and disease model Control C3H/HeJ Homozygous Fasl gld /Fasl gld The.
Judith Blake Biomedical Ontologies and their role in functional genomics Judith A. Blake, Ph.D. The Jackson Laboratory Functional Genomics – February 2012.
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Managing Big Scientific Data Capturing, Integrating and Presenting Mouse Data at MGI Cynthia Smith Canberra April Mouse Genome.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
Bioinformatics and medicine: Are we meeting the challenge?
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Data Analysis Summary. Elephant in the room General Comments General understanding that informatics is integral in medical sequencing and other –omics.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Copyright OpenHelix. No use or reproduction without express written consent1.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
DONNA MAGLOTT, PH.D. PRO AND MEDICAL GENETICS RESOURCES AT NCBI.
Part I: Identifying sequences with … Speaker : S. Gaj Date
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Organizing information in the post-genomic era The rise of bioinformatics.
Alan Ruttenberg PONS R&D Task force Alan Ruttenberg Science Commons.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Integrating the Cell Cycle Ontology with the Mouse Genome Database David R. Smith Mary Dolan Dr. Judith Blake.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
DAVID R. SMITH DR. MARY DOLAN DR. JUDITH BLAKE Integrating the Cell Cycle Ontology with the Mouse Genome Database.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
MGI and Phenotyping Projects Mouse Genome Informatics.
Describing Bioinformatic Metadata at EBI James Malone
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
生物資料庫搜尋 ( 第八組 ) 連威森 王鼎 黃智楹 張鈞淵
Towards a unified MOD resource: An Overview
Ontologies, Databases, Knowledgebases: How should they interoperate?
The Transcriptional Landscape of the Mammalian Genome
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
The Human-Mouse: Disease Connection in MGI (BETA)
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
UniProt: Universal Protein Resource
PIR: Protein Information Resource
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
QTL Annotation in MGI Susan M Bello, Ira Lu, Cynthia L Smith, Janan T Eppig, and the Mouse Genome Informatics Group.
Browsing the GO at MGI Harold Drabkin, Ph.D. Senior Scientific Curator
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Alignment of Ontologies for Biological Research Judith A. Blake, Ph.D. Bioinformatics and Computational Biology The Jackson Laboratory

Dagstuhl What is my perspective? Biological data is voluminous and complex Data integration is hard work Bio-ontologies provide semantic structure and standards that aid in data analysis and hypothesis generation. There are many challenges to the effective use of bio- ontologies (in addition to challenges to the development of ontologies)

Dagstuhl What is my approach? Goal is to facilitate ‘translational research’ through effective integration of experimental data from mouse models of human conditions with human clinical data from disease studies Bio-ontologies provide a mechanism to support comprehensive data integration and analysis

Dagstuhl Interesting…. - Refine Relations Ontology (RO) - Identify critical datasets - Focus on bottlenecks - Create views

Dagstuhl Phenotype mutant allele definitions QTL strain characteristics phenotype vocabularies disease models (human) comparative phenotypes Genes & Gene Products nomenclature gene characterization transcripts, proteins, gene products functional annotation orthologs & paralogs Sequences & Maps sequence representation C57BL/6J genomic sequence SNPs and strain variants adding biological context to computational gene models Gene Expression mouse anatomy time, tissue, level of expression range of assays & results emphasis on embryonic stages Tumor Biology tumor classifications & descriptions strain incidence histopathology images tumor genetics Overview of Mouse Genome Informatics

Dagstuhl Data acquisition is constant Load Program Summary of Data Loaded Mouse EntrezGeneEntrezGene IDs for mouse markers. Plus marker-to-sequence associations from EntrezGene not already in MGD Human/Rat EntrezGeneNomenclature, map position and other data regarding human and rat genes. OMIM associations for human. GenBank SeqMouse sequence records from GenBank RefSeq SeqMouse sequence records from RefSeq UniProt/TrEMBL SeqMouse sequence records from UniProt and TrEMBL TIGR/DoTS/NIA Seq Mouse consensus sequence records from TIGR/DoTS/NIA clusters TIGR/DoTS/NIA Association Associations between TIGR/DoTS/NIA cluster sequences and markers. Ensembl Gene ModelEnsembl gene model sequences, coordinates, & associations between these & markers NCBI Gene ModelNCBI gene model sequences, coordinates, & associations between these & markers UniProt AssociationUniProt/TrEMBL IDs and additional GenBank IDs for mouse markers. Plus GO and InterPro annotations UniGene AssociationUniGene cluster IDs for mouse markers. EST cDNA CloneMouse IMAGE, NIA, MGC, Riken, cDNAs and EST sequence associations MGC AssociationMGC IDs and associations between MGC full length sequences and MGC cDNAs RPCI CloneRPCI 23/24 BAC clones and sequence associations GO VocabularyUpdated Gene Ontology (GO) vocabularies from the central GO site. OMIM VocabularyUpdated OMIM disease terms MP VocabularyUpdated MP vocabulary (from OBO-Edit) AnatomyUpdated adult mouse anatomy ontology (from OBO-Edit) Mapping panelJAX, EUCIB, Copeland-Jenkins and many others PIRSF Mouse PIR superfamily terms and associations to markers SNPsMouse SNPs from dbSNP and associations between SNPs & markers.

Dagstuhl Snapshot of MGI data content MGI data statisticsMarch, 2007 Number of genes with sequence data28,292 Number of genes (incl. unmapped mutants)35,733 Number of markers (including genes)69,639 Number of markers mapped65,345 Number of genes with protein sequence information24,293 Number of genes with GO annotations17,664 Number of mouse/human orthologies16,127 Number of mouse/rat orthologies15,802 Number of genes with one or more phenotypic alleles6,979 Number of cataloged phenotypic alleles17,494 Number of references113,508 Number of integrated mouse nucleotide sequences (+ ESTs)8,3574,701

Dagstuhl Build 36: Ensembl and NCBI Unification (Exon Overlap Detection) Unique to Ensembl Unique to NCBI Equivalent 1:11:nn:1n:m

Dagstuhl Who is the authority? Data typeWorking relationship Gene Symbol/NameMGI makes primary assignment; coordination with HGNC, RGNC Allele Symbol/NameMGI makes primary assignment Strain DesignationsMGI makes primary assignment Gene -to- nucleotide sequence associationCo-curation with NCBI Gene -to- protein sequence associationCo-curation with UniProt Gene Ontology (GO) annotationsMGI provides primary curation Gene homology data between mouse and other speciesMGI curates orthology relationships Mammalian Phenotype OntologyMGI develops vocabulary Genotype -to- phenotype dataMGI provides primary curation Mouse model -to- human disease (OMIM)MGI provides primary curation Mouse data for which MGI serves as the authoritative source.

Dagstuhl Having the data, we want to ask complex questions

Dagstuhl Multiple Controlled Vocabularies in MGI Gene Nomenclature Gene/Marker Type Allele Type Developmental and Adult Anatomies Assay Type  Expression  Mapping Molecular Mutation Inheritance Mode Gene Ontology Mammalian Phenotype Ontology Tissue Types Cell Types Cell Lines Units  Cytogenetic  Molecular ES Cell Line Strain Nomenclature

Dagstuhl Vocabularies in MGI: GO Example DAGs Definition Synonyms GO:54321 Terms … Transcription factor DNA binding Protein binding Ligand binding or carrier Vocabulary Annotations … J:65378TAS J:62648IDA J:60000IEA Ahr Edr2 Genes Synonyms NameMGI:105043

Dagstuhl Mammalian Phenotype Ontology Compositional terms ‘working’ ontology Projected xref to ‘core’ ontologies  Anatomy  GO Built with attention to ontological principles but with primary goal of supporting annotation of diverse experimental results from many research groups and perspectives

Dagstuhl

We are exploring ontological representations that relate human clinical data with mouse phenotypes Create compositional view for annotation of mouse models and human clinical data Provide xref / RO back to core ontologies Support both annotation and ontology alignment efforts Develop tools to support complex queries

Dagstuhl We modeled gangliosidoses as a test case. Two types of gangliosidoses are Sandoff and Tay-Sachs diseases.

Dagstuhl Curators use controlled terms from structured vocabularies (ontologies) to curate complex biological systems described in the literature The knowledge is in the details

Dagstuhl The knowledge is in the details

Dagstuhl Including the relationship to human disease

Dagstuhl More mouse models – Tay Sachs

Dagstuhl Dopamine CHEBI:18243 Chemical Ontology Cell Type Ontology Dopaminergic Neuron CL: Biological Process Synaptic transmission GO: Brain MA: Anatomical Dictionary Different core ontologies need to be combined to describe complex biological systems

Dagstuhl Dilemma: No formal links currently exist between the separate ontologies Solution? 1. Generate cross-products (compositional terms) as necessary for annotations of characteristics of disease cases and disease models; 2. Annotate specific instances of human cases and mouse models; 3. Visualize and mine co-annotated data

Dagstuhl

Abnormal neuron morphology

Dagstuhl

Next Steps Perspective (views) Lung Cancer  Provide Disease Ontology  Build compositional view Mouse Data  Curate comprehensive annotations for genes implicated in lung phenotypes Human Data  Curate clinical data for ontology annotation Data Analysis  Use ontological structures to facilitate data exploration and hypothesis generation

Dagstuhl Next conference? “enabling technologies for ontological access to clinical and animal model data” A hands-on problem solving workshop – a problem use case

Dagstuhl Gene Ontology MGI projects are supported by NIH [NHGRI, NICH, and NCI]. Bar Harbor, Maine, USA Mouse Genome Informatics GO Consortium is supported by NIH-NHGRI and by the European Union RTD Programme