Presentation is loading. Please wait.

Presentation is loading. Please wait.

What is an ontology and Why should you care? Barry Smith 1.

Similar presentations


Presentation on theme: "What is an ontology and Why should you care? Barry Smith 1."— Presentation transcript:

1 What is an ontology and Why should you care? Barry Smith http://ontology.buffalo.edu/smith 1

2 What I do Gene Ontology (NIHGR) (Scientific Advisor) National Center for Biomedical Ontology (NIHGR) Protein Ontology (NIGMS) Infectious Disease Ontology (NIAID) Biometrics Ontology (US Army) Ontology for Integration of Cross-Border Emergency Data (European Union) 2

3 Uses of ‘ontology’ in PubMed abstracts 3

4 4 By far the most successful: GO (Gene Ontology)

5 You’re interested in which genes control heart muscle development 17,536 results 5

6 attacked time control Puparial adhesion Molting cycle hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Immune response Toll regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Microarray data shows changed expression of thousands of genes. How will you spot the patterns? 6

7 7 You’re interested in which of your hospital’s patient data is relevant to understanding how genes control heart muscle development

8 8 Lab / pathology data EHR data Clinical trial data Family history data Medical imaging Microarray data Model organism data Flow cytometry Mass spec Genotype / SNP data How will you spot the patterns? How will you find the data you need?

9 How does the Gene Ontology work? 9 with thanks to Jane Lomax, Gene Ontology Consortium

10 1. GO provides a controlled system of representations for use in annotating data multi-species, multi-disciplinary, open source contributing to the cumulativity of scientific results achieved by distinct research communities compare use of kilograms, meters, seconds … in formulating experimental results 10

11 11

12 Definitions 12

13 13 Gene products involved in cardiac muscle development in humans

14 14 http://wiki.geneontology.org/index.php/Priority_Cardiovascular_genes

15 Questions for annotation where is a particular gene product involved in what type of cell or cell part? in what part of the normal body? in what anatomical abnormality? when is a particular gene product involved in the course of normal development? in the process leading to abnormality with what functions is the gene product associated in other biological processes? 15

16 16 2. GO provides a tool for algorithmic reasoning

17 17 Hierarchical view representing relations between represented types

18 3. GO allows a new kind of clinical research, based on analysis of the massive quantities of annotations linking GO terms to gene products 18

19 Uses of GO in studies of pathways associated with heart failure development correlated with cardiac remodeling (PMID 18780759) molecular signature of cardiomyocyte clusters derived from human embryonic stem cells (PMID 18436862) contrast between cardiac left ventricle and diaphragm muscle in expression of genes involved in carbohydrate and lipid metabolism. (PMID 18207466 ) immune system involvement in abdominal aortic aneurisms in humans (PMID 17634102) 19

20 GO is amazingly successful – but it covers only generic biological entities of three sorts: –cellular components –molecular functions –biological processes and it does not provide representations of disease-related phenomena 20

21 Extending the GO methodology to other domains of biology and of clinical and translational medicine 21

22 22 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) The Open Biomedical Ontologies (OBO) Foundry

23 Foundational Model of Anatomy 23

24 An A is_a B All instances of A are instances of B What are types? what are instances? (Buckets, thresholds) 24

25 Definitions Cell =Def. an anatomical structure which consists of cytoplasm surrounded by a plasma membrane Anatomical structure =Def. a material anatomical entity which is generated by coordinated expression of the organism’s own genes An A =Def. a B which Cs 25

26 Pleural Cavity Pleural Cavity Interlobar recess Interlobar recess Mesothelium of Pleura Mesothelium of Pleura Pleura(Wall of Sac) Pleura(Wall of Sac) Visceral Pleura Visceral Pleura Pleural Sac Parietal Pleura Parietal Pleura Anatomical Space Organ Cavity Organ Cavity Serous Sac Cavity Serous Sac Cavity Anatomical Structure Anatomical Structure Organ Serous Sac Mediastinal Pleura Mediastinal Pleura Tissue Organ Part Organ Subdivision Organ Subdivision Organ Component Organ Component Organ Cavity Subdivision Organ Cavity Subdivision Serous Sac Cavity Subdivision Serous Sac Cavity Subdivision part_of is_a

27 27 Heterotaxy =Def. the abnormal arrangement of organs or viscera across the left-right axis differing from ‘‘complete situs solitus’’ and ‘‘complete situs inversus’’ Left isomerism =Def. a subset of heterotaxy where some paired structures on opposite sides of the left-right axis of the body are symmetrical mirror images of each other, and have the morphology of the normal left-sided structures. Jacobs, et al., 2007

28 OBO Foundry recognized by NIH as framework to address mandates for re-usability of data collected through Federally funded research see NIH PAR-07-425: Data Ontologies for Biomedical Research (R01) 28

29 Analysis of outcomes for congenital cardiac disease: can we do better? Jeffrey P. Jacobs, et al. 2007 Improving methodologies for verification of data Clarifying the relationship between administrative databases [such as ICD] and clinical databases Establishing links between databases Moving beyond geographical barriers Moving beyond sub-specialty barriers

30 OBO Foundry provides tested guidelines enabling new groups to develop the ontologies they need in ways which counteract forking and dispersion of effort an incremental bottoms-up approach to evidence-based terminology practices in medicine that is rooted in basic biology automatic web-based linkage between medical terminologies and biological knowledge resources (massive integration of databases across species and biological system) 30

31 A good solution to the silo problem must be: modular incremental bottom-up based on consistent, intuitive structure evidence-based and thus revisable incorporate a strategy for motivating potential developers and users 31

32 An ontology is not a database New databases for each new kind of data New databases for each new project Ontologies like the GO are a solution to the silo problems databases cause 32

33 An ontology is not a terminology Existing term lists built to serve specific data-processing in ad hoc ways Ontologies designed from the start to ensure integratability and reusability of data by incorporating a common logical structure 33

34 34 Example Addison’s disease Representation in several medical vocabularies –SNOMED International –Medical Subject Headings (MeSH) –Read Codes (CTV3) –International Classification of Diseases Combined representation in the UMLS Metathesaurus with thanks to Olivier Bodenreider and Anita Burgun

35 Diseases of the endocrine system Diseases of the Adrenal Glands Addison’s Disease Diseases/Diagnoses SNOMED International with thanks to Olivier Bodenreider and Anita Burgun

36 Endocrine Diseases Adrenal Gland Diseases Addison’s Disease Diseases MeSH Adrenal Gland Hypofunction with thanks to Olivier Bodenreider and Anita Burgun

37 Endocrine disorder Disorder of adrenal gland Hypoadrenalism Adrenal Hypofunction Corticoadrenal insufficiency Addison’s Disease Read Codes with thanks to Olivier Bodenreider and Anita Burgun

38 Primary adrenocortical insufficiency Other disorders of adrenal gland Disorders of other endocrine gland ICD-10

39 Endocrine Diseases Adrenal Gland Diseases Adrenal Cortex Diseases Adrenal Cortex Dysfunction Hypoadrenalism Adrenal Gland Hypofunction Adrenal cortical hypofunction Addison’s Disease Addison’s disease due to autoimmunity Adrenal DysfunctionAdrenal Glands Adrenal Cortex Secondary hypocortisolism Endocrine System Endocrine Glands Abdominal organ Other disorders of adrenal gland Disorders of other endocrine gland Diseases C0494313 C0014133 C0001625 C0014136C0446633C0012674 C0014130 C0549609 C0001621 C0348453 C0001614 C0235454 C0549149 C0001613 C0001623 C0405580 C0271738C0001403 C0271737 UMLS Metathesaurus

40 40 Unified Medical Language System Metathesaurus local usage respected, cross-framework consistency not important no concern to establish consistency with basic science massively useful for information retrieval and information integration different grades of formal rigor, different degrees of completeness, different update policies, capricious policies for empirical testing

41 Ontology and terminology Currently, where data deriving from different sources are annotated using different term lists, cross-validation of these data, for example as concerns representativeness and reliability, must be carried out manually and in ad hoc ways. OBO Foundry ontologies provide a framework for such cross-validation that is reusable and in significant degree automatic. 41

42 Ontology and terminology Where different term lists are used, there is no way to mount combined queries against the data Ontology provides a resource for integration, whereby each term list needs to be mapped into the ontology only once Thus the ontology is not an alternative to existing data systems but a supplement thereto 42

43 An ontology is not a list of Common Data Elements (CDEs) CDEs will provide a way to capture and represent some of this knowledge in a form that is usable by clinicians and researchers CDEs allow you to translate in some way data from one format into another, but not necessarily bidirectionally 43

44 Can existing CHD terminologies serve as ontologies? 44 An ontology is a representation of the types of entities in a given domain of reality and of the relations between types What happens if we apply evidence-based rules for ontology construction?

45 Rule Every node in the ontology must represent some type of entity in reality 45

46 46 Rule: Each term in an ontology represents a type of biological entity instantiated in biological reality CardioAccess Tree View

47 47 Rule: Each term in an ontology represents a type of biological entity instantiated in biological reality 1. Syntactic Consequences CardioAccess Tree View

48 48 Rule: Each term in an ontology represents a type of biological entity instantiated in biological reality 2. No ‘Other’, No ‘Miscellaneous’, No ‘NOS’ CardioAccess Tree View

49 49 Rule: Each term in an ontology represents a type of biological entity instantiated in biological reality 3. Hierarchical organization of types and subtypes CardioAccess Tree View

50 50 Rule: Each term in an ontology represents a type of biological entity instantiated in biological reality 3. Hierarchical organization of types and subtypes

51 51 Rule: Each term in an ontology represents a type of biological entity instantiated in biological reality 5. Non-redundancy

52 52 Rule: Each term in an ontology represents a type of biological entity instantiated in biological reality 5. Non-redundancy

53 53 Rule: Each term in an ontology represents a type of biological entity instantiated in biological reality 6. An instance of a process type is never an instance of a thing type

54 54 Rule: Each term in an ontology represents a type of biological entity instantiated in biological reality 7. Consistent principles for classification not applied

55 Strategy for building a CDH ontology within the OBO Foundry A good solution to the silo problem must be: modular incremental bottom-up evidence-based revisable incorporate a strategy for motivating potential developers and users work well with other ontologies for neighboring domains 55

56 OBO Foundry principle of modularity one ontology for each domain once you’ve annotated existing data, then no need for mappings (which are in any case too expensive, too fragile, too difficult to keep up-to-date as mapped ontologies change) everyone knows where to look to find out how to annotate each kind of data 56

57 Modularity fosters division of labor allows distributed development but only if there is a well-tested, principles-based structure in place to ensure that the separate modules work well together 57

58 Modularity is indispensable The trees built combine anatomy (normal and abnormal), disease (the physiologic derangements), procedures performed, problems occurring with the repairs done, re-procedures, and other medical conditions (including chromosomal and genetic abnormalities) that accompany the defects. 58

59 Extending the OBO Foundry to other domains of biology and of clinical and translational medicine 59

60 60 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Normal Anatomical Entity Normal Organ Function Disease Developmenal Process Embryology Morphology Abnormal Anatomical Entity Abnormal Organ Function Surgical Processes CELL AND CELLULAR COMPONENT Cellular Component Cellular Function (GO) MOLECULE Genes and Gene Products Genetic Predispositions Molecular FunctionMolecular Process Congenital Heart Disease Ontology Modules


Download ppt "What is an ontology and Why should you care? Barry Smith 1."

Similar presentations


Ads by Google