Download presentation
Presentation is loading. Please wait.
Published byCordelia Dean Modified over 9 years ago
1
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus HISA ltd. Biography proforma MEDINFO 2007 413 Lygon Street, Brunswick East 3057 Australia Presenter Name: Stefan Schulz Country:1. Germany, 2. Brazil Qualification(s): MD (Doctor in Theoretical Medicine) Vocational Training in Medical Informatics Postdoctoral Habililtation degree in Medical Informatics Position: Associate Professor Department/ Organisation : 1.Medical Informatics, Freiburg University Medical Center, Freiburg, Germany 2.Master Program of Health Technology, Catholic University of Paraná, Curitiba, Brazil Major Achievement(s): Research in the fields of Medical Terminology, Biomedical Ontologies, Medical Language Processing, Text Mining, and Document Retrieval.
2
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Roosewelt L. Andrade a,b, Edson Pacheco a,b, Pindaro S. Cancian a,b, Percy Nohama a,b, Stefan Schulz b,c a Paraná University of Technology (UTFPR), Curitiba, Brazil Corpus-based Error Detection in a Multilingual Medical Thesaurus b Pontificial Catholic University of Paraná, Master Program of Health Technology, Curitiba, Brazil c University Medical Center Freiburg, Medical Informatics, Freiburg, Germany
3
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Introduction Methods Results Discussion Conclusion
4
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Controlled Vocabulary for document indexing and retrieval Assigns semantic descriptors (concepts) to (quasi-)synonymous terms Contains additional semantic relations (e.g. hyperonym / hyponym) Examples: MeSH, UMLS, WordNet Multilingual thesaurus: contains translations (cross-language synonymy links) Thesaurus Introduction Methods Results Discussion Conclusion
5
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus International team of curators React to new terms and senses Decide which terms are synonymous / translations Decide which senses of a term have to be accounted for in the domain Requires quality assurance measures Introduction Methods Results Discussion Conclusion Multilingual Thesaurus Management
6
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Case study: Morphosaurus Medical subword thesaurus Organizes subwords (meaningful word fragments) in multilingual equivalence classes: –#derma = { derm, cutis, skin, haut, kutis, pele, cutis, piel, … } –#inflamm = { inflamm, -itic, -itis, phlog, entzuend, -itis, -itisch, inflam, flog, inflam, flog,... } Maintained at two locations: Freiburg (Germany), Curitiba (Brazil) Lexicon curators: frequently changing team of medical students Introduction Methods Results Discussion Conclusion
7
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Segmentation: Myo | kard | itis Herz | muskel | entzünd |ung Inflamm |ation of the heart muscle muscle myo muskel muscul inflamm -itis inflam entzünd Eq Class subword herz heart card corazon card INFLAMM MUSCLE HEART Morphosaurus Structure Thesaurus: ~21.000 equivalence classes (MIDs) Lexicon entries: –English:~23.000 –German:~24.000 –Portuguese:~15.000 –Spanish:~11.000 –French:~ 8.000 –Swedish:~10.000 Introduction Methods Results Discussion Conclusion Indexation: #muscle #heart #inflamm #heart #muscle #inflamm #inflamm #heart #muscle
8
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Morphosemantic Normalization Introduction Methods Results Discussion Conclusion
9
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Additional Challenges for Morphosaurus Properly delimit subword entries so that they are correctly extracted from complex words Create consensus about the scope of synonymy classes, especially with regard to highly ambiguous lexicon entries Introduction Methods Results Discussion Conclusion
10
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Introduction Methods Results Discussion Conclusion
11
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Morphosaurus Quality Assurance Content quality: Identify content errors in the thesaurus content –Implement a quality process to fix these errors –Show positive impact Process quality: Detect and prevent user action anomalies –actions that consume effort without any positive impact : uncoordinated edit / update / delete “do undo” transactions done by different people) –see Paper Bitencourt et al., Session 089 Introduction Methods Results Discussion Conclusion
12
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Apply morphosemantic indexing to the Merck manual in English, Spanish, Portuguese, German Hypothesis: nearly identical frequency distribution of Morphosaurus identifiers (MIDs) for each language Problematic MIDs can be spotted by comparing MID frequencies for each language pair Testbed: Parallel Medical Corpora Introduction Methods Results Discussion Conclusion
13
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Testbed: Parallel Medical Corpora Introduction Methods Results Discussion Conclusion
14
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus degree of imbalance Scoring of MID Imbalance frequencies of MIDs in language 1 and 2 Score used for MID ranking per language pair Introduction Methods Results Discussion Conclusion mean relative frequency
15
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Introduction Methods Results Discussion Conclusion
16
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus For each l 1,l 2 : MIDs ranked by Score Introduction Methods Results Discussion Conclusion
17
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Experimental Modification of Workflow Introduction Methods Results Discussion Conclusion Discussion of 100 most highly ranked MID imbalances for each language pair Classification of Problems Documentation Correction in consensus
18
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Reason for MID high score Portuguese / English German / English Spanish / English Ambiguous lexemes0.230.380.14 Missing or dispensable MID0.490.180.53 Same Sense in Different MIDs 0.060.120.19 One MID with Different Senses 0.040.050.06 No problem0.110.100.04 Unclassified problem0.070.170.04 Problems identified Introduction Methods Results Discussion Conclusion
19
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Modified Workflow Introduction Methods Results Discussion Conclusion Discussion of highly ranked MID imbalances
20
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Did the targeted error spotting and resolution have an impact on the “in vivo” performance of the thesaurus ? Benchmark: OHSUMED document collection (user queries with relevant MEDLINE abstract assigned), queries translated to German, Spanish, Portuguese Target: monitoring of Timelines of Eleven Point Average measure (precision values at recall 0, 0.1, 0.2,…0.9, 1.0) Summative Evaluation Introduction Methods Results Discussion Conclusion
21
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus English Portuguese German Spanish Weeks Eleven- Point- Average Precision OHSUMED benchmark Summative Evaluation Results Introduction Methods Results Discussion Conclusion
22
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Introduction Methods Results Discussion Conclusion
23
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Introduction Methods Results Discussion Conclusion Discussion of Results Most problems detected corresponded to real errors (approx. 90%) Consensus could be found in most cases “In vivo” evaluation showed a clear increase in IR performance for Spanish, at the time the less consolidated language in Morphosaurus
24
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Introduction Methods Results Discussion Conclusion
25
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Error detection by MID imbalance score proved useful New workflow productive for error elimination Recommendations –Record MID score over time avg for each language pair avg for each language for each MID –Generate alerts for every MIDs which exhibits an increase in imbalance above a tolerance interval –Continue monitoring AvgP11 values –Include into Morphosaurus editing environment Conclusions / Recommendations Introduction Methods Results Discussion Conclusion
26
Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Jeferson L. Bitencourt a,b, Pindaro S. Cancian a,b, Edson Pacheco a,b, Percy Nohama a,b, Stefan Schulz b,c a Paraná University of Technology (UTFPR), Curitiba, Brazil Medical Thesaurus Anomaly Detection by User Action Monitoring Session 089 b Pontificial Catholic University of Paraná, Master Program of Health Technology, Curitiba, Brazil c University Medical Center Freiburg, Medical Informatics, Freiburg, Germany
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.