The National Center for Biomedical Ontology Stanford – Berkeley Mayo – Victoria – Buffalo UCSF – Oregon – Cambridge
Ontologies are essential to make sense of biomedical data
A biological ontology is: A machine interpretable representation of some aspect of biological reality eye what kinds of things exist? what are the relationships between these things? ommatidium sense organeye disc is_a part_of develops from
The Foundational Model of Anatomy
Knowledge workers seem trapped in a pre-industrial age Most ontologies are Of relatively small scale Built by small groups working arduously in isolation Success rests heavily on the particular talents of individual artisans, rather than on SOPs and best practices There are few technologies available to make this process “faster, better, cheaper”
A Portion of the OBO Library
Open Biomedical Ontologies (OBO) Open Biomedical Data (OBD) BioPortal Capture and index experimental results Revise biomedical understanding Relate experimental data to results from other sources National Center for Biomedical Ontology
Stanford: Tools for ontology alignment, indexing, and management (Cores 1, 4–7: Mark Musen) Lawrence–Berkeley Labs: Tools to use ontologies for data annotation (Cores 2, 5–7: Suzanna Lewis) Mayo Clinic: Tools for access to large controlled terminologies (Core 1: Chris Chute) Victoria: Tools for ontology and data visualization (Cores 1 and 2: Margaret-Anne Story) University at Buffalo: Dissemination of best practices for ontology engineering (Core 6: Barry Smith)
cBio Driving Biological Projects Trial Bank: UCSF, Ida Sim Flybase: Cambridge, Michael Ashburner ZFIN: Oregon, Monte Westerfield
The National Center for Biomedical Ontology Core 3: Driving Biological Projects Monte Westerfield
Animal models Mutant Gene Mutant or missing Protein Mutant Phenotype Animal disease models
HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models
HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models
HumansAnimal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease) Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models
SHH -/+ SHH -/- shh -/+ shh -/-
Phenotype (clinical sign) = entity + attribute
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied PATO: hypoteloric hypoplastic hypertrophied ZFIN: eye midface kidney +
Phenotype (clinical sign) = entity + attribute Anatomy ontology Cell & tissue ontology Developmental ontology Gene ontology biological process molecular function cellular component + PATO (phenotype and trait ontology)
Phenotype (clinical sign) = entity + attribute P 1 = eye + hypoteloric P 2 = midface + hypoplastic P 3 = kidney + hypertrophied Syndrome = P 1 + P 2 + P 3 (disease) = holoprosencephaly
Human holo- prosencephaly Zebrafish shh Zebrafish oep
Human holo- prosencephaly Zebrafish shh Zebrafish oep
ZFIN mutant genes
ZFIN mutant genes OMIM genes
OMIM genes ZFIN mutant genes FlyBase mutant genes
OMIM gene ZFIN gene FlyBase gene FlyBase mut pub ZFIN mut pub mouseratSNO MED OMIM disease LAMB1lamb1LanB FECHfechFerro- chelatase 25229Protoporphyria, Erythropoietic GLI2gli2aci SLC4A1slc4a1CG Renal Tubular Acidosis, RTADR MYO7Amyo7ack Deafness; DFNB2; DFNA11 ALAS2alas2Alas1714Anemia, Sideroblastic, X- Linked KCNH2kcnh2sei MYH6myh6Mhc Cardiomyopathy, Familial Hypertrophic; CMH TP53tp53p Breast Cancer ATP2A1atp2a1Ca-P60A326111Brody Myopathy EYA1eya1eya251546Branchiootorenal Dysplasia SOX10sox10Sox100B11744Waardenburg-Shah Syndrome
Open Biomedical Ontologies (OBO) Open Biomedical Data (OBD) BioPortal Capture and index experimental results Revise biomedical understanding Relate experimental data to results from other sources National Center for Biomedical Ontology
The National Center for Biomedical Ontology Core 2: Bioinformatics Suzanna Lewis
cBio Bioinformatics Goals 1.Apply ontologies Software toolkit for annotation 2.Manage data Databases and interfaces to store and view annotations 3.Investigate and compare Linking human diseases to genetic models 4.Maintain Ongoing reconciliation of ontologies with annotations
cBio Bioinformatics Goals 1.Apply ontologies Software toolkit for annotation 2.Manage data Databases and interfaces to store and view annotations 3.Investigate and compare Linking human diseases to genetic models 4.Maintain Ongoing reconciliation of ontologies with annotations
Elicitation of Requirements for Annotation Tools Applications pull from pioneer users in Core 3 ZFIN FlyBase Trial Bank Study these groups currently annotate data Determine how our Core 2 tools can integrate with existing data flows and databases Evaluate the commonalities and differences among approaches
Development of Data-Annotation Tool Develop plug-in architecture Default user interface for generic data-annotation tasks Custom-tailored interfaces for particular biomedical domains Enable interoperability with existing ontology- management platforms Integrate ontology-annotation tool with BioPortal Access ontologies for data annotation from OBO Store data annotations in OBD
Phenotype as an observation context environment genetic The class of thing observed publication figures evidence assay sequence ID ontology
Phenotype from published evidence
Ontologies enable users to describe assays
Phenotype as an observation context environment genetic The class of thing observed publication figures evidence assay sequence ID ontology
Ontologies enable users to describe environments
Phenotype as an observation context environment genetic The class of thing observed publication figures evidence assay sequence ID ontology
Ontologies enable users to describe genotypes
Phenotypes as collections Coincidence Same organism, same time Relative Reduced, enhanced Same focus of observation All left hands Differing levels of scale Molecular, cellular, organismal Recognizable patterns Set of observations that describe a disease
Open Biomedical Ontologies (OBO) Open Biomedical Data (OBD) BioPortal Capture and index experimental results Revise biomedical understanding Relate experimental data to results from other sources National Center for Biomedical Ontology
The National Center for Biomedical Ontology Core 1: Computer Science Mark Musen
E-science needs technologies To help build and extend ontologies To locate ontologies and to relate them to one another To visualize relationships and to aid understanding To facilitate evaluation and annotation of ontologies
We need to relate ontologies to one another We keep reinventing the wheel We don’t even know what’s out there! We need to make comparisons between ontologies automatically We need to keep track of ontology history and to compare versions
We need to compute both similarities and differences Similarities Merging ontologies Mapping ontologies Differences Versioning
Ontology engineering requires management of complexity How can we keep track of hundreds of relationships? understand the implications of changes to a large ontology? know where ontologies are underspecified? And where they are over constrained?
E-science needs technologies To help build and extend ontologies To locate ontologies and to relate them to one another To visualize relationships and to aid understanding To facilitate evaluation and annotation of ontologies
Core 1 Components
Core 1 Contributors Stanford: Tools for ontology management, alignment, versioning, metadata management, automated critiquing, and peer review Mayo: LexGrid technology for access to large controlled terminologies, ontology indexing, Soundex, search Victoria: Technology for ontology visualization
Open Biomedical Ontologies (OBO) Open Biomedical Data (OBD) BioPortal Capture and index experimental results Revise biomedical understanding Relate experimental data to results from other sources National Center for Biomedical Ontology
Core 4: Infrastructure Builds on existing IT infrastructure at Stanford and at our collaborating institutions Adds Online resources and technical support for the user community Collaboration tools to link all participating sites
Core 5: Education and Training Builds on existing, strong informatics training programs at Stanford, Berkeley, UCSF, Mayo/Minnesota, and Buffalo New postdoctoral positions at Stanford, Berkeley, and Buffalo New visiting scholars program
Core 6: Dissemination Active relationships with relevant professional societies and agencies (e.g., HL7, IEEE, WHO, NIH) Internet-based resources for discussing, critiquing, and annotating ontologies in OBO Cooperation with other NCBCs to offer a library of open-source software tools Training workshops to aid biomedical scientists in ontology development
Upcoming cBio Dissemination Workshops Image Ontology Workshop Stanford CA, March 24–25, 2006 Training in Biomedical Ontology Schloss Dagstuhl, May 21–24, 2006 Training in Biomedical Ontology Baltimore, November 6–8, 2006 (in association with FOIS and AMIA conferences)
Core 7: Administration Project management shared between Stanford and Berkeley Executive committee (PI, co-PI, Center director, and Center associate director) provides day-to-day management and oversight Council (All site PIs, including PIs of DBPs) provides guidance and coordination of work plans Each Core has a designated “lead” selected from the Council
cBiO Organization Chart
Ontologies are essential to make sense of biomedical data