Who am I Gianluca Correndo PhD student (end of PhD) Work in the group of medical informatics (Paolo Terenziani) PhD thesis on contextualization techniques for clinical guidelines Mostly devoted to flexible integration of ontologies in information systems
Ontologies in Computer Science An ontology is a shared understanding of some domain of interest. [Uschold, gruninger96] Defines –A common vocabulary of terms –Some specification (more or less formal) of the meaning of the terms –A shared understanding for people and machines
Why develop an ontology? To make domain assumptions explicit –Easier to change domain assumptions –Easier to understand and update legacy data To separate domain knowledge from operational knowledge –Re-use domain and operational knowledge separately A community reference for applications (standards) To share a consistent understanding of what information means
Communication Syntax is not enough for machine communication, e.g. B2B Bestellinformation: Daimler 500 SLK Order information: Car Daimler 500 SLK $
Ontologies in Medical Domain 1.Clinical structured data capture and presentation - letting physicians enter, store, and review in a more structured way than free text notes. 2.Information integration, indexing and retrieval - linking clinical records, decision support, quality assurance, and other information (Data Mart). 3.Messaging between software systems - linking laboratory and Hospital Information Systems providing a fixed semantic to terms used in message context 4.Reporting - providing the official returns in whichever coding system is required …… mostly terminological use of term “ontology”
Heterogeneity of Data Sources in Medical Domain If “ECG” is altered.. If ECG is altered for patient then … Electronic Medical Record Different data schema Different data dictionary ECG E EKG electrocardiogram
Virtual Integration Architecture
Distributed Information Systems Global Ontology Data source Data source Data source Local Ontology Data source Local Ontology Data source Local Ontology Data source Local Ontology Data source Local Ontology Data source Local Ontology Data source Global Ontology Local Ontology Data source Local Ontology Data source Local Ontology Data source Standard Ontology #1 Standard Ontology #2 Harmonize heterogeneous domain conceptualizations
Multi Standard Architecture for Guideline Managers Local Ontology Medical Record Local Ontology Medical Record UMLSLOINC Local Ontology Medical Record Guideline Manager GL described in UML terms GL described in LOINC terms
Automated Ontology Extraction Local Ontology Medical Record
Multi Standard in Bioinformatics Local Ontology Gene database 2 Gene Ontology Guideline Manager Query described in gene ontology Local Ontology Gene database 3 Local Ontology Gene database 4 Local Ontology Gene database 5 Local Ontology Gene database 6 Local Ontology Gene database 1
EcoCyc
Gene Ontology “a dynamic controlled vocabulary that can be applied to all eukaryotes” Built by the community for the community. Three organising principles: Molecular function, Biological process, Cellular component Isa and Part of taxonomy – but not good! ~10,000 concepts Lightweight ontology, Poor semantic rigour. Ok when small and used for annotation. Obstacle when large, evolving and used for mining. GO, OBO
Tools for Ontology Engineering Editing: protégé, oiled … Access API: Jena, legacy Languages: XMI, OWL, RDF(S)