HISA ltd. Biography proforma MEDINFO 2007 413 Lygon Street, Brunswick East 3057 Australia Presenter Name: Stefan Schulz Country:1. Germany, 2. Brazil Qualification(s):

Slides:



Advertisements
Similar presentations
© 2000 XTRA Translation Services Is MT technology available today ready to replace human translators?
Advertisements

WP 10 Multilingual Access Philipp Daumke, Stefan Schulz.
Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.
Chapter 5: Introduction to Information Retrieval
Career Path Model Generic Model.
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus HISA ltd. Biography proforma MEDINFO Lygon Street, Brunswick East.
Information and Business Work
Environmental Terminology System and Services (ETSS) June 2007.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Multilingual Access to Biomedical Documents Stefan Schulz, Philipp Daumke Institute of Medical Biometry and Medical Informatics University Medical Center.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
EuroVoc, Eurlex, EU Bookshop Danica Maleková, Publications Office STS Bratislava, 22 October 2010.
Diploma of Project Management Course Outline NSW Course Number Qualification Code BSB51407.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Practical approaches to standardizing vocabularies: the Cultural Heritage experience. Phil Carlisle English Heritage National Monuments Record and European.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri.
WHO-CEHA Inter-Water Thesaurus and other WHO Sources for Health and Environment Terminology Mazen Malkawi Technical Information Officer WHO/EMRO/CEHA.
Betsy L. Humphreys Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS NLM, NIH, HHS National Library.
FishBase Summary Page about Salmo salar in the standard Language of FishBase (English) ENBI-WP-11: Multilingual Access to European Biodiversity Sites through.
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
Multilingual Information Exchange APAN, Bangkok 27 January 2005
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
Of 33 lecture 10: ontology – evolution. of 33 ece 720, winter ‘122 ontology evolution introduction - ontologies enable knowledge to be made explicit and.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
1 Query Operations Relevance Feedback & Query Expansion.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
HYGIA: Design and Application of New Techniques of Artificial Intelligence for the Acquisition and Use of Represented Medical Knowledge as Care Pathways.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
A School of Information Science, Federal University of Minas Gerais, Brazil b Medical University of Graz, Austria, c University Medical Center Freiburg,
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Stefan Schulz, Kornél Markó, Philipp Daumke, Udo Hahn, Susanne Hanser, Percy Nohama, Roosewelt Leite de Andrade, Edson Pacheco, Martin Romacker Semantic.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Company LOGO Digital Infrastructure of RPI Personal Library Qi Pan Digital Infrastructure of RPI Personal Library Qi Pan.
Chapter 9 Database Systems © 2007 Pearson Addison-Wesley. All rights reserved.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Layered MorphoSaurus Lexicon Extension. Problem Confuse and arbitrary synonym classes of non-medical concepts High ambiguity of general (non- terminological)
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Approach to building ontologies A high-level view Chris Wroe.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Detection of underspecifications in SNOMED CT concept definitions using language processing 1 Federal Technical University of Paraná (UTFPR), Curitiba,
Annual Review, Brussels March XX, 2006 SemanticMining No Annual Review NoE No Semantic Interoperability and Data Mining in Biomedicine WP20.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
1 Ontology Evolution within Ontology Editors Presentation at EKAW, Sigüenza, October 2002 L. Stojanovic, B. Motik FZI Research Center for Information Technologies.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
Large-Scale Evaluation of a Medical Cross- Language Information Retrieval System Kornél Markó 1,2, Philipp Daumke 1,2, Stefan Schulz 2, Rüdiger Klar 2,
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Workpackage 2- Building new Evidence Daniel Karlsson, Linköping University Stefan Schulz,
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Stefan Schulz Medical Informatics Research Group
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Multilingual Biomedical Dictionary
ece 627 intelligent web: ontology and beyond
Morphoogle - A Multilingual Interface to a Web Search Engine
Presentation transcript:

HISA ltd. Biography proforma MEDINFO Lygon Street, Brunswick East 3057 Australia Presenter Name: Stefan Schulz Country:1. Germany, 2. Brazil Qualification(s): MD (Doctor in Theoretical Medicine) Vocational Training in Medical Informatics Postdoctoral Habililtation degree in Medical Informatics Position: Associate Professor Department/ Organisation : 1.Medical Informatics, Freiburg University Medical Center, Freiburg, Germany 2.Master Program of Health Technology, Catholic University of Paraná, Curitiba, Brazil Major Achievement(s): Research in the fields of Medical Terminology, Biomedical Ontologies, Medical Language Processing, Text Mining, and Document Retrieval.

Jeferson L. Bitencourt a,b, Pindaro S. Cancian a,b, Edson Pacheco a,b, Percy Nohama a,b, Stefan Schulz b,c a Paraná University of Technology (UTFPR), Curitiba, Brazil Medical Thesaurus Anomaly Detection by User Action Monitoring b Pontificial Catholic University of Paraná, Master Program of Health Technology, Curitiba, Brazil c University Medical Center Freiburg, Medical Informatics, Freiburg, Germany

Introduction Methods Results Discussion Conclusion

Controlled Vocabulary for document indexing and retrieval Assigns semantic descriptors (concepts) to (quasi-)synonymous terms Contains additional semantic relations (e.g. hyperonym / hyponym) Examples: MeSH, UMLS, WordNet Multilingual thesaurus: contains translations (cross-language synonymy links) Thesaurus Introduction Methods Results Discussion Conclusion

International team of lexicon curators React to new terms and senses Decide which terms are synonymous / translations Decide which senses of a term have to be accounted for in the domain Requires quality assurance measures Introduction Methods Results Discussion Conclusion Multilingual Thesaurus Management

Case study: Morphosaurus Medical subword thesaurus Organizes subwords (meaningful word fragments) in multilingual equivalence classes: #derma = { derm, cutis, skin, haut, kutis, pele, cutis, piel, … } #inflamm = { inflamm, -itic, -itis, phlog, entzuend, -itis, -itisch, inflam, flog, inflam, flog,... } Maintained at two locations: Freiburg (Germany), Curitiba (Brazil) Lexicon curators: frequently changing team of medical students Introduction Methods Results Discussion Conclusion

Segmentation: Myo | kard | itis Herz | muskel | entzünd |ung Inflamm |ation of the heart muscle muscle myo muskel muscul inflamm -itis inflam entzünd Eq Class subword herz heart card corazon card INFLAMM MUSCLE HEART Morphosaurus Structure Thesaurus: ~ equivalence classes Lexicon entries: English:~ German:~ Portuguese:~ Spanish:~ French:~ Swedish:~ Introduction Methods Results Discussion Conclusion Indexation: #muscle #heart #inflamm #heart #muscle #inflamm #inflamm #heart #muscle

Morphosemantic Normalization Introduction Methods Results Discussion Conclusion

bossjef chef card chief cabeckopf caput cabez head HEAD CAPUT CHIEF Specialization: Has_sense MYALG MUSCLE mialg myalg muscle muscul mio myo pain dor alg Composition: Has_word_part PAIN Introduction Methods Results Discussion Conclusion Morphosaurus: 2 Semantic Relations

Morphosaurus Building Pragmatics Introduction Methods Results Discussion Conclusion

Properly delimit subword entries so that they are correctly extracted from complex words: nephrotomy -> nephr | oto | my nephrotomy -> nephro | tomy Morphosaurus Building Pragmatics Introduction Methods Results Discussion Conclusion nephrkidney OR nephro kidney nephr

Morphosaurus Building Pragmatics hyper multi many highgrade hyper highgrade poly multi many Highgrade _count OR card poly Properly delimit subword entries so that they are correctly extracted from complex words: nephrotomy -> nephr | oto | my nephrotomy -> nephro | tomy Create consensus about the scope of synonymy classes, especially with regard to highly ambiguous words Introduction Methods Results Discussion Conclusion nephrkidney OR nephro kidney nephr

Morphosaurus Quality Assurance Content quality: Identify content errors in the thesaurus content (see Andrade et al., MEDINFO 2007) Process quality: Detect and prevent user action anomalies User action anomalies: actions that consume effort without any positive impact : uncoordinated edit / update / delete “do undo” transactions done by different lexicographers Introduction Methods Results Discussion Conclusion

Identification of Editing Anomalies Analysis of data logs patterns: 86 thesaurus backups covering 9 months Assessing relevance of anomaly patterns by comparing the thesaurus descriptors affected with those debated in a Morphosaurus editor online forum Introduction Methods Results Discussion Conclusion

Identification of Editing Anomalies Analysis of data logs patterns: 86 thesaurus backups covering 9 months Assessing relevance of anomaly patterns by comparing the thesaurus descriptors affected with those debated in a Morphosaurus editor online forum Introduction Methods Results Discussion Conclusion

Anomalies: Typology t Introduction Methods Results Discussion Conclusion 1. Relationship anomaly

Anomalies: Typology t Introduction Methods Results Discussion Conclusion 2. Type Anomaly

Anomalies: Typology t abcde abcdabcde Introduction Methods Results Discussion Conclusion 3. Delimitation Anomaly

Anomalies: Typology t Introduction Methods Results Discussion Conclusion 4. Permanence anomaly

Identification of Editing Anomalies Analysis of data logs patterns: 86 thesaurus backups covering 9 months Assessing relevance of anomaly patterns by comparing the thesaurus descriptors affected with those debated in a Morphosaurus editor online forum Introduction Methods Results Discussion Conclusion

Example of Morphosaurus forum entry Introduction Methods Results Discussion Conclusion EqClass spotted by corpus based content quality analysis, cf. Andrade et al., MEDINFO 2007

Introduction Methods Results Discussion Conclusion

Results Anomaly TypeOccurrences Discussed in Forum Relationship anomaly7628 Type anomaly18 Delimitation anomaly00 Permanence anomaly54 Introduction Methods Results Discussion Conclusion

Relationship anomalies: multiple changes Number of do-undo actions Occurrences Discussed in Forum Introduction Methods Results Discussion Conclusion

Problems found by Log Analysis Problem TypeOccurrencesFound by Log Analysis An expected relation relating ambiguous or expansible semantic indentifiers (has_sense type or has_word_part type) Entries assigned to one semantic indentifier did not cover all languages. 806 The same sense is represented by two unrelated semantic indentifier. 708 Lexicon entries assigned to one semantic indentifier diverge in meaning. 111 Language specific entry do not translate to other languages. 110 Orthographic errors. 161 Similar senses are represented by two unrelated semantic indentifiers, one of them of the type “excluded from indexing”. 310 Errors caused by incorrect subword delimitation 160 Errors caused by incorrect functioning of the segmentation engine. 40 Introduction Methods Results Discussion Conclusion

Discussion of Results Assignment of semantic relations: main cause of do-undo anomalies (up to seven do-undos) Nearly half of editing anomalies concern semantic indentifiers also identified as problematic by corpus analysis Problems discussed in forum exceeds those identifiable by log analysis Surprising: no anomaly of string delimitation found Introduction Methods Results Discussion Conclusion

Anomaly detection Introduction Methods Results Discussion Conclusion Detects waste of resources by “do - undo” actions in thesaurus management Helps create consensus in borderline decisions Useful to discover common anomalies To be complemented by other techniques Higher process effectiveness by integration of quality assessment routines in the thesaurus management tools: User alert at runtime

Anomaly detection at runtime Introduction Methods Results Discussion Conclusion Anomaly Found You are undoing a change performed by user koppe on May, 14. Please contact this user and create consensus or discuss the problem at the MorphoSaurus forum!