1 MedAT: Medical Resources Annotation Tool Monika Žáková *, Olga Štěpánková *, Taťána Maříková * Department of Cybernetics, CTU Prague Institute of Biology and Medical Genetics, Prague
KEG seminar / 20 Outline 1. Motivation 2. System Description 3. Creating Annotation 4. Additional Functionalities 5. Knowledge Representation 6. Ontologies Task Ontologies Domain Ontologies 7. Results and Conclusion
KEG seminar / 20 Motivation Patients’ records represent a valuable source of information Records stored in semi-structured text files For sharing and data mining format such as ontology or relational database needed Currently known methods for text mining not applicable, since Records heterogeneous – type of examination, personality of doctor Abbreviations used (some non-standard) Gazetteers not available in Czech
KEG seminar / 20 Motivation II Grant “Relational ML for analysis of biomedical data” of the Czech national research program Information Society in cooperation with the Institute of Biology and Medical Genetics, 2nd Medical Faculty of Charles University Relational data mining using subgroup discovery methodology Need to transform data from text files into a form suitable for relational data mining i.e. relational database and rules
KEG seminar / 20 System overview Ontologies Medical record Forms generator Knowledge baseRelational database
KEG seminar / 20 System description Creating semantic annotations of medical records Based on Dynamic Narrative Authoring Tool Modular architecture Export to knowledge base in OCML, OWL Export to relational database Visualization – genealogical tree
KEG seminar / 20 MedAT GUI
KEG seminar / 20 Creating Annotations Dynamically generated forms A form one major class in ontology master table in the database E.g. Patient, Examination Adding abbreviations and aliases to the ontology Filling of forms Automatically by parsing Drag and drop from records in text format Manually in case OCR not effective
KEG seminar / 20 Creating Annotations II
KEG seminar / 20 Additional functionalities Exploration of data stored in the relational database Pre-defined SQL queries – knowledge of SQL not required Writing queries directly in SQL Visualization Genealogy tree
KEG seminar / 20 Knowledge representation Core formalism – Apollet Apollet Frame-based formalism based on OCML Formalism used by Apollo ontology editor => possibility to use I/O modules of Apollo Export to lisp available Inference engine available Disadvantage: rules very often just lisp functions
KEG seminar / 20 Relational database Tables of the relational database generated automatically from the ontology Semantic description of the database given by an ontology Export done in a batch for a particular version of ontology and knowledge base Export intended for a data mining experiment Currently PostgreSQL database used
KEG seminar / 20 Ontologies MedAT relies on ontologies on 2 levels: Task ontologies Describe structure of different medical records Domain ontologies Formalize knowledge about a specific domain e.g. diseases, family relations, time points
KEG seminar / 20 Task Ontologies Developed on basis of procedures and structure of medical records in cooperation with medical doctors Hierarchy induced by part-whole relationship OCML – slots with facets OWL – hasPart, partOf (W3C Working Draft) Serve as basis for generating of forms and tables in relational database
KEG seminar / 20 Task Ontologies - Example Classes - elements of medical records e.g. object of examination, therapy Slots – description of composition of medical records e.g. class examination has slots date, doctor, has_therapy
KEG seminar / 20 Domain ontologies Use of third party ontologies e.g. GALEN, Gene Ontology Ontology of family relations Need for rules e.g. hasHalfBrother(x,y) OWL – no standardized rule language (ORL) OCML – lisp functions
KEG seminar / 20 Time ontology Developed originally for historical narratives Based on Allen’s algebra Uncertain time points and intervals Uncertain temporal position Uncertain granularity Extended to cover time events specific for medical domain E.g. before surgery, during infancy Available in OCML
KEG seminar / 20 Results Easily transfer information from medical reports to dynamically generated forms Data from forms saved to a knowledge base and relational database Iterative extending of ontologies, adding aliases and abbreviations Tool currently being tested at the Institute of Biology and Medical Genetics for patients with neurofibromatosis type 1
KEG seminar / 20 Future work Text mining methods for semi-automatic annotation Tool for semantic search and retrieval of a relevant subset of data and visualization of retrieved data Use of annotated data along with information about genotype for data mining using subgroup discovery methodology
KEG seminar / 20 Questions Thank you for your attention Questions???