Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus HISA ltd. Biography proforma MEDINFO 2007 413 Lygon Street, Brunswick East.

Slides:



Advertisements
Similar presentations
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Advertisements

WP 10 Multilingual Access Philipp Daumke, Stefan Schulz.
Automatic Mapping of Clinical Documentation to SNOMED CT Holger Stenzhorn Saarland University Hospital, Homburg, Germany Edson Pacheco Percy Nohama Stefan.
Chapter 5: Introduction to Information Retrieval
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
HISA ltd. Biography proforma MEDINFO Lygon Street, Brunswick East 3057 Australia Presenter Name: Stefan Schulz Country:1. Germany, 2. Brazil Qualification(s):
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Multilingual Access to Biomedical Documents Stefan Schulz, Philipp Daumke Institute of Medical Biometry and Medical Informatics University Medical Center.
Advance Information Retrieval Topics Hassan Bashiri.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Text- and Content-based Approaches to Image Retrieval for the ImageCLEF 2009 Medical Retrieval Track Matthew Simpson, Md Mahmudur Rahman, Dina Demner-Fushman,
1 Intra- and interdisciplinary cross- concordances for information retrieval Philipp Mayr GESIS – Leibniz Institute for the Social Sciences, Bonn, Germany.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
The PATENTSCOPE search system: CLIR February 2013 Sandrine Ammann Marketing & Communications Officer.
Semantic Enrichment of Ontology Mappings: A Linguistic-based Approach Patrick Arnold, Erhard Rahm University of Leipzig, Germany 17th East-European Conference.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
A School of Information Science, Federal University of Minas Gerais, Brazil b Medical University of Graz, Austria, c University Medical Center Freiburg,
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Stefan Schulz, Kornél Markó, Philipp Daumke, Udo Hahn, Susanne Hanser, Percy Nohama, Roosewelt Leite de Andrade, Edson Pacheco, Martin Romacker Semantic.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Chapter 23: Probabilistic Language Models April 13, 2004.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Layered MorphoSaurus Lexicon Extension. Problem Confuse and arbitrary synonym classes of non-medical concepts High ambiguity of general (non- terminological)
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Proximity-based Ranking of Biomedical Texts Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan.
Automatic Assignment of Biomedical Categories: Toward a Generic Approach Patrick Ruch University Hospitals of Geneva, Medical Informatics Service, Geneva.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Detection of underspecifications in SNOMED CT concept definitions using language processing 1 Federal Technical University of Paraná (UTFPR), Curitiba,
Annual Review, Brussels March XX, 2006 SemanticMining No Annual Review NoE No Semantic Interoperability and Data Mining in Biomedicine WP20.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.
Large-Scale Evaluation of a Medical Cross- Language Information Retrieval System Kornél Markó 1,2, Philipp Daumke 1,2, Stefan Schulz 2, Rüdiger Klar 2,
Assessing SNOMED CT for Large Scale eHealth Deployments in the EU Workpackage 2- Building new Evidence Daniel Karlsson, Linköping University Stefan Schulz,
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Stefan Schulz Medical Informatics Research Group
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Multilingual Biomedical Dictionary
Morphoogle - A Multilingual Interface to a Web Search Engine
Citation-based Extraction of Core Contents from Biomedical Articles
Presentation transcript:

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus HISA ltd. Biography proforma MEDINFO Lygon Street, Brunswick East 3057 Australia Presenter Name: Stefan Schulz Country:1. Germany, 2. Brazil Qualification(s): MD (Doctor in Theoretical Medicine) Vocational Training in Medical Informatics Postdoctoral Habililtation degree in Medical Informatics Position: Associate Professor Department/ Organisation : 1.Medical Informatics, Freiburg University Medical Center, Freiburg, Germany 2.Master Program of Health Technology, Catholic University of Paraná, Curitiba, Brazil Major Achievement(s): Research in the fields of Medical Terminology, Biomedical Ontologies, Medical Language Processing, Text Mining, and Document Retrieval.

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Roosewelt L. Andrade a,b, Edson Pacheco a,b, Pindaro S. Cancian a,b, Percy Nohama a,b, Stefan Schulz b,c a Paraná University of Technology (UTFPR), Curitiba, Brazil Corpus-based Error Detection in a Multilingual Medical Thesaurus b Pontificial Catholic University of Paraná, Master Program of Health Technology, Curitiba, Brazil c University Medical Center Freiburg, Medical Informatics, Freiburg, Germany

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Controlled Vocabulary for document indexing and retrieval Assigns semantic descriptors (concepts) to (quasi-)synonymous terms Contains additional semantic relations (e.g. hyperonym / hyponym) Examples: MeSH, UMLS, WordNet Multilingual thesaurus: contains translations (cross-language synonymy links) Thesaurus Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus International team of curators React to new terms and senses Decide which terms are synonymous / translations Decide which senses of a term have to be accounted for in the domain Requires quality assurance measures Introduction Methods Results Discussion Conclusion Multilingual Thesaurus Management

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Case study: Morphosaurus Medical subword thesaurus Organizes subwords (meaningful word fragments) in multilingual equivalence classes: –#derma = { derm, cutis, skin, haut, kutis, pele, cutis, piel, … } –#inflamm = { inflamm, -itic, -itis, phlog, entzuend, -itis, -itisch, inflam, flog, inflam, flog,... } Maintained at two locations: Freiburg (Germany), Curitiba (Brazil) Lexicon curators: frequently changing team of medical students Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Segmentation: Myo | kard | itis Herz | muskel | entzünd |ung Inflamm |ation of the heart muscle muscle myo muskel muscul inflamm -itis inflam entzünd Eq Class subword herz heart card corazon card INFLAMM MUSCLE HEART Morphosaurus Structure Thesaurus: ~ equivalence classes (MIDs) Lexicon entries: –English:~ –German:~ –Portuguese:~ –Spanish:~ –French:~ –Swedish:~ Introduction Methods Results Discussion Conclusion Indexation: #muscle #heart #inflamm #heart #muscle #inflamm #inflamm #heart #muscle

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Morphosemantic Normalization Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Additional Challenges for Morphosaurus Properly delimit subword entries so that they are correctly extracted from complex words Create consensus about the scope of synonymy classes, especially with regard to highly ambiguous lexicon entries Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Morphosaurus Quality Assurance Content quality: Identify content errors in the thesaurus content –Implement a quality process to fix these errors –Show positive impact Process quality: Detect and prevent user action anomalies –actions that consume effort without any positive impact : uncoordinated edit / update / delete “do undo” transactions done by different people) –see Paper Bitencourt et al., Session 089 Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Apply morphosemantic indexing to the Merck manual in English, Spanish, Portuguese, German Hypothesis: nearly identical frequency distribution of Morphosaurus identifiers (MIDs) for each language Problematic MIDs can be spotted by comparing MID frequencies for each language pair Testbed: Parallel Medical Corpora Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Testbed: Parallel Medical Corpora Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus degree of imbalance Scoring of MID Imbalance frequencies of MIDs in language 1 and 2 Score used for MID ranking per language pair Introduction Methods Results Discussion Conclusion mean relative frequency

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus For each l 1,l 2 : MIDs ranked by Score Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Experimental Modification of Workflow Introduction Methods Results Discussion Conclusion Discussion of 100 most highly ranked MID imbalances for each language pair Classification of Problems Documentation Correction in consensus

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Reason for MID high score Portuguese / English German / English Spanish / English Ambiguous lexemes Missing or dispensable MID Same Sense in Different MIDs One MID with Different Senses No problem Unclassified problem Problems identified Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Modified Workflow Introduction Methods Results Discussion Conclusion Discussion of highly ranked MID imbalances

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Did the targeted error spotting and resolution have an impact on the “in vivo” performance of the thesaurus ? Benchmark: OHSUMED document collection (user queries with relevant MEDLINE abstract assigned), queries translated to German, Spanish, Portuguese Target: monitoring of Timelines of Eleven Point Average measure (precision values at recall 0, 0.1, 0.2,…0.9, 1.0) Summative Evaluation Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus English Portuguese German Spanish Weeks Eleven- Point- Average Precision OHSUMED benchmark Summative Evaluation Results Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Introduction Methods Results Discussion Conclusion Discussion of Results Most problems detected corresponded to real errors (approx. 90%) Consensus could be found in most cases “In vivo” evaluation showed a clear increase in IR performance for Spanish, at the time the less consolidated language in Morphosaurus

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Error detection by MID imbalance score proved useful New workflow productive for error elimination Recommendations –Record MID score over time avg for each language pair avg for each language for each MID –Generate alerts for every MIDs which exhibits an increase in imbalance above a tolerance interval –Continue monitoring AvgP11 values –Include into Morphosaurus editing environment Conclusions / Recommendations Introduction Methods Results Discussion Conclusion

Andrade et al. Corpus-based Error Detection in a Multilingual Medical Thesaurus Jeferson L. Bitencourt a,b, Pindaro S. Cancian a,b, Edson Pacheco a,b, Percy Nohama a,b, Stefan Schulz b,c a Paraná University of Technology (UTFPR), Curitiba, Brazil Medical Thesaurus Anomaly Detection by User Action Monitoring Session 089 b Pontificial Catholic University of Paraná, Master Program of Health Technology, Curitiba, Brazil c University Medical Center Freiburg, Medical Informatics, Freiburg, Germany