Presentation is loading. Please wait.

Presentation is loading. Please wait.

Who am I? Director, US National Center for Ontological Research – leader on ontology projects for US Defense Dept. Key Scientist, US National Center for.

Similar presentations


Presentation on theme: "Who am I? Director, US National Center for Ontological Research – leader on ontology projects for US Defense Dept. Key Scientist, US National Center for."— Presentation transcript:

1 Who am I? Director, US National Center for Ontological Research – leader on ontology projects for US Defense Dept. Key Scientist, US National Center for Biomedical Ontology Consultant to German Federal Health Ministry on cross- border transmission of emergency health information Consultant to EU epSOS (European patients Smart Open Services) project Member of ARGOS consortium on EU-US health information standardization 1

2 Co-Principal Investigator ◦ Protein Ontology ◦ Infectious Disease Ontology Scientific Advisor ◦ Gene Ontology (world’s most successful ontology) ◦ Cleveland Clinic Semantic Database in Cardiothoracic Surgery ◦ Ontology of Human Expertise, Resource Repository Project of NIH National Center for Research Resources, collaboration with Know-Soft 2 Barry Smith

3 Work funded by European Union, US, Austrian and Swiss National Science Foundations, Volkswagen Foundation, Humbolt Foundation Winner of $3 million Wolfgang Paul Prize from German Government National Center for Biomedical Ontology collaboration of Stanford University, the Mayo Clinic and Smith’s group at University at Buffalo 3/2 0

4 Large-scale health IT projects and their problems 4/2 4

5 Relational databases and their problems 5/2 4

6 A brief history of the Semantic Web the html demonstrated the power of the Web to allow sharing of information can we use semantic technology to create a Web 2.0 which would allow algorithmic reasoning with online information based on XLM, RDF and above all OWL (Web Ontology Language)? can we use RDF and OWL to break down silos, and create useful integration of on-line data and information 6/2 4

7 RDF triple stores and their problems 7/2 4

8 people tried, but the more they were successful, they more they failed OWL breaks down data silos via controlled vocabularies for the formulation of data dictionaries Unfortunately the very success of this approach led to the creation of multiple new silos, because multiple ontologies are being created in ad hoc ways 8/2 4

9 two factors Tim Berners Lee mentality (modelled on the success of html): ◦ let a million ‘lite ontologies bloom’, and somehow intelligence will be created ◦ ‘links’ can mean anythiing shrink-wrapped software mentality – you will not get paid for reusing old and good ontologies “Linked Open Data” 9/2 4

10 Ontology success stories, and some reasons for failure A fragment of the Linked Open Data in the biomedical domain 10

11 What you get with ‘mappings’ All in Human Phenotype Ontology (= all phenotypes: excess hair loss, splayed feet...) mapped to all organisms in NCBI organism classification allose in ChEBI chemistry ontology Acute Lymphoblastic Leukemia (A.L.L.) in National Cancer Institute Thesaurus 11

12 What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet) all organisms allose (a form of sugar) Acute Lymphoblastic Leukemia (A.L.L.) 12

13 Mappings are hard They are fragile, and expensive to maintain The goal should be to minimize the need for mappings Invest resources in ontology modules which work well together 13

14 Why should you care? you need to create systems for data mining and text processing which will yield useful digitally coded output if the codes you use are constantly in need of ad hoc repair huge resources will be wasted 14/ 24

15 How to do it right? how create an incremental, evolutionary process, where what is good survives, and what is bad fails create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested 15/ 24

16 Uses of ‘ontology’ in PubMed abstracts 16

17 By far the most successful: GO (Gene Ontology) 17

18 GO provides a controlled system of terms for use in annotating (describing, tagging) data multi-species, multi-disciplinary, open source contributing to the cumulativity of scientific results obtained by distinct research communities compare use of kilograms, meters, seconds … in formulating experimental results 18

19 Hierarchical view representing relations between represented types 19

20 US $100 mill. invested in literature and data curation using GO over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO experimental results reported in 52,000 scientific journal articles manually annoted by expert biologists using GO 20

21 GO is amazingly successful in overcoming balkanization problem but it covers only generic biological entities of three sorts: ◦ cellular components ◦ molecular functions ◦ biological processes and it does not provide representations of diseases, symptoms, … 21

22 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Original OBO Foundry ontologies (Gene Ontology in yellow) 22

23  Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology  and agree in advance to collaborate with developers of ontologies in adjacent domains. http://obofoundry.org The OBO Foundry: a step-by- step, evidence-based approach to expand the GO 23

24 OBO Foundry Principles  Common governance (coordinating editors)  Common training  Common architecture to overcome Tim Berners Lee-ism: simple shared top level ontology shared Relation Ontology: www.obofoundry.org/ro 24

25 Open Biomedical Ontologies Foundry Seeks to create high quality, validated terminology modules across all of the life sciences which will be non-redundant close to language use of experts evidence-based incorporate a strategy for motivating potential developers and users revisable as science advances 25

26 Pistoia Alliance Open standards for data and technology interfaces in the life science research industry  consortium of major pharmaceutical companies working to address the data silo problems created by multiplicity of proprietary terminologies  declare terminology ‘pre-competitive’  require shared use of OBO Foundry ontologies in presentation of information http://pistoiaalliance.org/ 26

27 OBO Foundry (example ontologies) GO Gene Ontology CL Cell Ontology SO Sequence Ontology ChEBI Chemical Ontology PATO Phenotype (Quality) Ontology FMA Foundational Model of Anatomy Ontology ChEBI Chemical Entities of Biological Interest PRO Protein Ontology Plant Ontology Environment Ontology Ontology for Biomedical Investigations RNA Ontology 27

28 Example Ontologies Human Phenotype Ontology (HPO) for genetic diseases codifying OMIM (Online Mendelian Inheritance in Man) database Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies. American Journal of Human Genetics, Vol. 85 28/ 24

29 Infectious Disease Ontology (IDO) general template with extensions ◦ HIV Ontology ◦ Influenza Ontology (InfluenzO) ◦ Malaria Ontology (IDO-MAL) ◦ Staph. aureus Ontology... 29/ 24

30 How OBO Foundry can help The problem: ◦ General: data silos ◦ Particular: continuity of care 30

31 with thanks to http://dbmotion.com 31 the problem of continuity of care: patients move around

32 32 f f f f f synchronic and diachronic problems of semantic interoperability (across space and across time) f

33 The Data Model That Nearly Killed Me by Joe Bugajski http://tiny.cc/S1HWo http://tiny.cc/S1HWo “If data cannot be made reliably available across silos in a single EHR, then this data cannot be made reliably available to a huge, heterogeneous collection of networked systems.” 33

34 EPIC, etc. will provide a way to capture and represent some of what is needed in a form that is usable by computers (somewhat) by you yourself but not by other clinics, hospitals and researchers... 34

35 35 f f f f f how can we link EHR 1 to EHR 2 in a reliable, trustworthy, useful way, which both systems can understand ? f EHR 1 EHR 2

36 36 f f f f f the ideal solution: WHO International Classification of Diseases f ICD EHR 1 EHR 2

37 ICD PRO: De facto US billing standard Multilanguage CON: De facto US billing standard (corrupts data) No definitions of terms, and so difficult to judge accuracy of hierarchy and of coding Inconsistent hierarchies Hard to reason with results Hence few secondary uses 37

38 38 f f f f f the ideal solution: a single universal clinical vocabulary f SNOMED-CT EHR 1 EHR 2

39 SNOMED CT: Systematized Nomenclature of Medicine-Clinical Terms PRO: International standard (sort of) Centerpiece of UK national program Huge resources Free for member countries Multi-language (including Spanish) 39

40 SNOMED CT CON Huge (but redundant... and gappy) Still in need of work ◦ Lacks a coherent representation of the medical domain ◦ No consistent interpretation of relations ◦ Many erroneous relation assertions ◦ Many idiosyncratic relations ◦ Mixes ontology with epistemology ◦ It contains numerous compound terms (e.g., test for X) without the constituent terms (here: X), even where the latter are of obvious salience ( 40

41 http://snob.eggbird.eu/screenshot/spanish.html list of free SNOMED browsers: http://www.connectingforhealth.nhs.ukhttp://www.connectingforhealth.nhs.uk /systemsandservices/data/snomed/browser 41

42 42 http://terminology.vetmed.vt.edu/SCT/menu.cfm

43 Contains many examples of false synonymy Contains terms that are arbitrary logical combinations of their parent terms, but represent nothing in reality Includes non-standard logical constructs Subclasses that lack differentiating criteria to distinguish them from their direct superclass 43 SNOMED CT (with thanks to Bill Hogan)

44 Coding with SNOMED-CT is unreliable and inconsistent Multi-stage committee process for adding terms that follows intuitive rules and not formal principles Does there exist a strategy for evolutionary improvement? 44 SNOMED CT

45 45 f f f fan f above all: SNOMED CT cannot solve the problem of continuity of care because it has too much redundancy f EHR 1 EHR 2 SNOMED-CT

46 SNOMED redundancy (examples) 46 SNOMED: Abscess (disorder) SNOMED: Abscess (morphologic abnormality) SNOMED: Solitary leiomyoma (clinical finding) SNOMED: Leiomyoma, no ICD-O subtype (body structure)

47 SNCT 40613008: Open fracture of nasal bones (disorder) is_a (subtype of) Fractured nasal bones (disorder) Open fracture of facial bones (disorder) Open fracture of skull (disorder) Open wound of nose (disorder) 47

48 SNOMED CT has: Open fracture of nasal bones (disorder) is_a Fractured nasal bones (disorder) But nasal bones are not a fracture (nasal bone is a subtype of bone, not a subtype of fracture) 48

49 How to remove (some of) the redundancy from SNOMED-CT By using an ontological approach, to ensure consistency of classifications emanating from different (UK, US,...) sources: the ontology provides the benchmark see already: Ceusters W, Smith B et al. Ontology-based error detection in SNOMED-CT. Proc. Medinfo 2004. 49

50 50 f f f f f link EHR 1 to EHR 2 through a messaging standard (cf. air traffic control English) f HL7 Messaging Standard EHR 1 EHR 2

51 http://hl7-watch.blogspot.com/ HL7 critical blog HL7 will in any case provide only the messaging forma – it will still need content from SNOMED CT or elsewhere 51/ 24

52 52 f f f f f link EHR 1 to EHR 2 through a snapshot of the patient’s condition which both systems can understand f snapshot of patient’s condition EHR 1 EHR 2

53 53 f f f f f but how to formulate this snapshot? US: Clinical Care Document (CCD) merger of Continuity of Care Record (CCR) (XML-format message types) with HL7 Common Document Architecture (CDA) f snapshot of patient’s condition EHR 1 EHR 2

54 54 f f f f f CCD is able to solve the problem at best on a case by case basis; XML still provides only an algorithmically inaccessible blob; HL7 problems remain f snapshot of patient’s condition EHR 1 EHR 2

55 55 f f f f f CCD hard to use, hard to build the needed mappings, and no clear strategy to ensure general validity f snapshot of patient’s condition EHR 1 EHR 2

56 56 f f f f f in any case CDA/CDD will require content provided through (something like) SNOMED CT codes f snapshot of patient’s condition EHR 1 EHR 2

57 Ontologies in the OBO Foundry designed to be usable by clinicians and researchers by EHR developers and users and by computers to provide semantic interoperability between data silos to provide support for data and text mining to provide stable targets for incremental evolutionary improvement 57

58 The OBO Foundry is a collective experiment involving many biological and clinical communities attempting to create terminology resources which will support the goal of modularity one ontology for each domain No need for ‘mappings’ 58

59 An example of OBO Foundry ontology content Question: What is a disease? SNOMED: Disease is_a Clinical Finding is_a SNOMED CT Concept 59

60 SNOMED Glossary 2010 Concept: An ambiguous term. Depending on the context, it may refer to: A clinical id ea to which a unique ConceptID has been assigned. The ConceptID itself The real-world referent(s) of the ConceptID 60/ 24

61 Definitions of ‘disease’ A state of ill-health A state or process of a person’s body or mind that tends to cause ill health in the bearer Disease is a state of a person which issues in abnormal behavior Failing to do what one ordinarily does because of obstruction or opposition 61

62 OGMS Ontology for General Medical Science http://code.google.com/p/ogms 62

63 Basic Formal Ontology continuant occurrent independent continuant dependent continuant organism 63 http://www.ifomis.org/bfo

64 Basic Formal Ontology (BFO) PharmaOntology (W3C HCLS SIG) MediCognos / Microsoft Healthvault Major Histocompatibility Complex (MHC) Ontology (NIAID) Neuroscience Information Framework Standard (NIFSTD) and Constituent Ontologies Interdisciplinary Prostate Ontology (IPO) Nanoparticle Ontology (NPO): Ontology for Cancer Nanotechnology Research Neural Electromagnetic Ontologies (NEMO) ChemAxiom – Ontology for Chemistry http://www.ifomis.org/bfo 64

65 Users of BFO Ontology for Risks Against Patient Safety (RAPS/REMINE) Interdisciplinary Prostate Ontology (IPO) Nanoparticle Ontology (NPO): Ontology for Cancer Nanotechnology Research Neural Electromagnetic Ontologies (NEMO) ChemAxiom – Ontology for Chemistry Ontology for Risks Against Patient Safety (RAPS/REMINE) (EU FP7) IDO Infectious Disease Ontology (NIAID) National Cancer Institute Biomedical Grid Terminology (BiomedGT) US Army Biometrics Ontology US Army Command and Control Ontology 65

66 Basic Formal Ontology continuant occurrent independent continuant dependent continuant organism 66

67 Continuants continue to exist through time, preserving their identity while undergoing different sorts of changes independent continuants – objects, things,... dependent continuants – qualities, attributes, shapes, potentialities... 67

68 Occurrents processes, events, happenings ◦ your life ◦ this process of accelerated cell division 68

69 Qualities temperature blood pressure mass... are continuants they exist through time while undergoing changes 69

70 Qualities temperature / blood pressure / mass... are dimensions of variation within the structure of the entity a quality is something which can change while its bearer remains one and the same 70

71 A Chart representing how John’s temperature changes 71

72 A Chart representing how John’s temperature changes 72

73 John’s temperature, the temperature he has throughout his entire life, cycles through different determinate temperatures from one time to the next John’s temperature is a physiology variable which, in thus changing, exerts an influence on other physiology variables through time 73

74 BFO: The Very Top continuant independent continuant dependent continuant quality occurrent temperature 74

75 Clear division of types and instances independent continuant dependent continuant quality temperature types instances organism John John’s temperature 75

76 Dependence temperature types instances organism John John’s temperature. 76

77 temperature types instances John’s temperature 77 37ºC37.1ºC37.5ºC37.2ºC37.3ºC37.4ºC instantiates at t 1 instantiates at t 2 instantiates at t 3 instantiates at t 4 instantiates at t 5 instantiates at t 6

78 human types instances John 78 embryofetusadultneonateinfantchild instantiates at t 1 instantiates at t 2 instantiates at t 3 instantiates at t 4 instantiates at t 5 instantiates at t 6

79 Temperature subtypes Development-stage subtypes are threshold divisions (hence we do not have sharp boundaries, and we have a certain degree of choice, e.g. in how many subtypes to distinguish, though not in their ordering) 79

80 independent continuant dependent continuant quality temperature types instances organism John John’s temperature 80

81 independent continuant dependent continuant quality temperature organism John John’s temperature occurrent process course of temperature changes John’s temperature history 81

82 quality temperature organism John John’s temperature process life of an organism John’s life 82

83 BFO: The Very Top continuantoccurrent independent continuant dependent continuant qualitydisposition 83

84 Disposition - of a glass vase, to shatter if dropped - of a human, to eat - of a banana, to ripen - of John, to lose hair 84

85 Disposition if it ceases to exist, then its bearer is physically changed its realization occurs when its bearer is in some special physical circumstances its realization is what it is in virtue of the bearer’s physical make-up 85

86 Function - of liver: to store glycogen - of birth canal: to enable transport - of eye: to see - of mitochondrion: to produce ATP functions are dispositions which are designed or selected for 86

87 independent continuant dependent continuant function to see eye John’s eye function of John’s eye: to see occurrent process process of seeing John seeing 87

88 88 Physical Disorder

89 an independent continuant (part of the extended organism) A causally linked combination of physical components that is clinically abnormal. 89

90 Clinically abnormal ◦ (1) not part of the life plan for an organism of the relevant type (unlike aging or pregnancy), ◦ (2) causally linked to an elevated risk either of pain or other feelings of illness, or of death or dysfunction, and ◦ (3) such that the elevated risk exceeds a certain threshold level.* *Compare: baldness 90

91 91

92 Pathological Process =def. A bodily process that is a manifestation of a disorder and is clinically abnormal. Disease =def. – A disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism. 92

93 Cirrhosis - environmental exposure Etiological process - phenobarbitol-induced hepatic cell death ◦ produces Disorder - necrotic liver ◦ bears Disposition (disease) - cirrhosis ◦ realized_in Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death ◦ produces Abnormal bodily features ◦ recognized_as Symptoms - fatigue, anorexia Signs - jaundice, enlarged spleen 93

94 Influenza - infectious Etiological process - infection of airway epithelial cells with influenza virus ◦ produces Disorder - viable cells with influenza virus ◦ bears Disposition (disease) - flu ◦ realized_in Pathological process - acute inflammation ◦ produces Abnormal bodily features ◦ recognized_as Symptoms - weakness, dizziness Signs - fever 94

95 Dispositions and Predispositions All diseases are dispositions; not all dispositions are diseases. Predisposition to Disease =def. – A disposition in an organism that constitutes an increased risk of the organism’s subsequently developing some disease. 95

96 Huntington’s Disease – genetic (sure-fire) Etiological process - inheritance of >39 CAG repeats in the HTT gene ◦ produces Disorder - chromosome 4 with abnormal mHTT ◦ bears Disposition (disease) - Huntington’s disease ◦ realized_in Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum ◦ produces Abnormal bodily features ◦ recognized_as Symptoms - anxiety, depression Signs - difficulties in speaking and swallowing 96

97 HNPCC - genetic pre-disposition Etiological process - inheritance of a mutant mismatch repair gene ◦ produces Disorder - chromosome 3 with abnormal hMLH1 ◦ bears Disposition (disease) - Lynch syndrome ◦ realized_in Pathological process - abnormal repair of DNA mismatches ◦ produces Disorder - mutations in proto-oncogenes and tumor suppressor genes with microsatellite repeats (e.g. TGF-beta R2) ◦ bears Disposition (disease) - non-polyposis colon cancer ◦ realized in Symptoms (including pain) 97

98 98

99 99

100 http://code.google.com/p/ogms Disease =def. – A disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism. Disease course =def. – The aggregate of processes in which a disease disposition is realized. 100

101 coronary heart disease John’s coronary heart disease 101 asymptomatic (‘silent’) infarction early lesions and small fibrous plaques stable angina surface disruption of plaque unstable angina instantiates at t 1 instantiates at t 2 instantiates at t 3 instantiates at t 4 instantiates at t 5 time

102 independent continuant dependent continuant disposition disease disorder John’s disordered heart John’s coronary heart disease occurrent process course of disease course of John’s disease 102

103 The problem of continuity of care 103 Patients move around

104 The opportunity of continuity of care 104 Patients move around

105 EHR – a new approach Epic, Allscripts, Eclipsys... SNOMED CT ICD OpenEHR / CEN 13606 perhaps it doesn’t matter which one you choose – they key is to exploit the fact that patients move around 105

106 A good solution to the silo problem must be: modular incremental bottom-up evidence-based revisable incorporate a strategy for motivating potential developers and users 106


Download ppt "Who am I? Director, US National Center for Ontological Research – leader on ontology projects for US Defense Dept. Key Scientist, US National Center for."

Similar presentations


Ads by Google