What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1
You’re interested in which genes control heart muscle development 17,536 results 2
attacked time control Puparial adhesion Molting cycle hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Immune response Toll regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Microarray data shows changed expression of thousands of genes. How will you spot the patterns? 3
Ontologies provide a way to capture and represent all this knowledge in a computable form 4
Uses of ‘ontology’ in PubMed abstracts 5
6 By far the most successful: The Gene Ontology
7
Definitions 8
Gene products involved in cardiac muscle development in humans 9
Term Search Results 10
Hierarchical view representing relations between represented types 11
How GO can be used to help analyse microarray data Treat samples Collect mRNA Label Hybridize Scan Normalize Select differentially regulated genes Understand the biological phenomena involved 12
Traditional analysis operates via literature search for each successive gene Gene 1 Apoptosis Cell-cell signaling Protein phosphorylation Mitosis … Gene 2 Growth control Mitosis Oncogenesis Protein phosphorylation … Gene 3 Growth control Mitosis Oncogenesis Protein phosphorylation … Gene 4 Nervous system Pregnancy Oncogenesis Mitosis … Gene 100 Positive control. of cell proliferation Mitosis Oncogenesis Glucose transport … 13
But by using GO annotations, this work has already been done GO: : apoptosis 14
GO allows grouping by process Apoptosis Gene 1 Gene 53 Mitosis Gene 2 Gene 5 Gene45 Gene 7 Gene 35 … Positive control. of cell proliferation Gene 7 Gene 3 Gene 12 … Growth Gene 5 Gene 2 Gene 6 … Glucose transport Gene 7 Gene 3 Gene 6 … Allows us to ask meaningful questions of microarray data e.g. which genes are involved in the same process, with same/different expression patterns? 15
How does the Gene Ontology work? 16
1. It provides a controlled vocabulary contributing to the cumulativity of scientific results achieved by distinct research communities (if we all use kilograms, meters, seconds …, our results are callibrated) 17
18 2. It provides a tool for algorithmic reasoning
Hierarchical view representing relations between represented types 19
The massive quantities of annotations to gene products in terms of the GO allows a new kind of research 20
Uses of GO in studies of pathways associated with heart failure development correlated with cardiac remodeling (PMID ) sex-specific pathways in early cardiac response to pressure overload in mice (PMID ) molecular signature of cardiomyocyte clusters derived from human embryonic stem cells (PMID ) contrast between cardiac left ventricle and diaphragm muscle in expression of genes involved in carbohydrate and lipid metabolism. (PMID ) immune system involvement in abdominal aortic aneurisms in humans (PMID ) … 21
But GO covers only three sorts of biological entities –cellular components –molecular functions –biological processes and does not provide representations of disease-related phenomena 22
23 How extend the GO to help integrate complex representations of reality help human beings find things in complex representations of reality help computers reason with complex representations of reality in other areas of biomedicine?
24 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry
25 CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Organism-Level Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) Cellular Process (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) initial OBO Foundry coverage GRANULARITY RELATION TO TIME
26 CRITERIA opennness common formal language. collaborative development evidence-based maintenance identifiers versioning textual and formal definitions CRITERIA
COMMON ARCHITECTURE: The ontology uses common formal relations ORTHOGONALITY: One ontology for each domain 27 CRITERIA
Michael Ashburner, Suzanna Lewis, Chris Mungall (GO Consortium) Alan Ruttenberg (Science Commons, OWL Working Group, HCLS/Semantic Web) Richard Scheuermann (ImmPort, CTSA) Barry Smith 28 LEADERSHIP
OBO Foundry provides tested guidelines enabling new groups to develop the ontologies they need in ways which counteract forking and dispersion of effort an incremental bottoms-up approach to evidence-based terminology practices in medicine that is rooted in basic biology automatic web-based linkage between medical terminologies and biological knowledge resources 29