April 19 th,2002 MuchMore Project Review Multilingual Concept Hierarchies for Medical Information Organization and Retrieval MUCHMORE.

Slides:



Advertisements
Similar presentations
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Chapter 5: Introduction to Information Retrieval
Mining External Resources for Biomedical IE Why, How, What Malvina Nissim
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Web-based Information Architectures Jian Zhang. Today’s Topics Term Weighting Scheme Vector Space Model & GVSM Evaluation of IR Rocchio Feedback Web Spider.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
1 UCB Digital Library Project An Experiment in Using Lexical Disambiguation to Enhance Information Access Robert Wilensky, Isaac Cheng, Timotius Tjahjadi,
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
1 The Domain-Specific Track at CLEF 2008 Vivien Petras & Stefan Baerisch GESIS Social Science Information Centre, Bonn, Germany Aarhus, Denmark, September.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval Doctorate Course Web Information Retrieval Speaker Gaia Trecarichi.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
Toman, Steinberger, Ježek Searching and Summarizing in a Multilingual Environment Michal Toman, Josef Steinberger, Karel Ježek University of West Bohemia.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
© Paul Buitelaar, February 2002 Corpus Annotation Day at DI Multi-Layer Annotation for Cross- Lingual Information Retrieval in the Medical Domain Paul.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
1 Query Operations Relevance Feedback & Query Expansion.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
Chapter 6: Information Retrieval and Web Search
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Chapter 23: Probabilistic Language Models April 13, 2004.
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
Medical Information Retrieval: eEvidence System By Zhao Jin Mar
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Automatic Assignment of Biomedical Categories: Toward a Generic Approach Patrick Ruch University Hospitals of Geneva, Medical Informatics Service, Geneva.
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
July 2002, DI Colloquium Semantic Annotation for Semantic Indexing Paul Buitelaar, Martin VolkMuchMore DFKI Language Technology Saarbrücken, Germany Eurospider.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Multimedia Information Retrieval
Statistical NLP: Lecture 9
Cross Language Information Retrieval (CLIR)
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

April 19 th,2002 MuchMore Project Review Multilingual Concept Hierarchies for Medical Information Organization and Retrieval MUCHMORE

April 19 th,2002 MuchMore Project Review Project Overview Application  Addressing a Real-Life Medical Scenario for Cross-Lingual Information Retrieval Research & Development  Developing Novel, Hybrid (Corpus-/Concept- Based) Methods for Handling this Scenario Evaluation  Evaluating the Technical Performance of (Combinations of) Existing and Novel Methods

April 19 th,2002 MuchMore Project Review User Perspective (ZInfo) MuchMore  Provide Relevant Medical Information  … for a Specific Patient Problem  … Automatically, from the Web  … Independent of Language Vision: BAIK Model

April 19 th,2002 MuchMore Project Review  Automatic Query Generation (and Expansion), Identifying the Exact Problem of the Patient  Retrieval and Relevance Ranking of Evidence Based Medical Literature, Language Independent  Summarization and Filtering of Results According to a User Profile User Requirements User Perspective (ZInfo)

April 19 th,2002 MuchMore Project Review User Evaluation Use for Medical Cases  Part of Postgraduate Course in Medical Informatics Evaluate Usefulness  Query Generation  Relevance for Decisions in Diagnostics and Treatment Problematic Issues  Different medical profiles, schools, experience, speciality  Relevant for one user may mean less or nothing to another  Evidence based medicine criteria exist only for a small fraction of medicine User Perspective (ZInfo)

April 19 th,2002 MuchMore Project Review MuchMore Prototype  Overview of Prototype Functionality  Relation between Functionality and User Requirements  Issues Addressed by Research and Development within MuchMore

April 19 th,2002 MuchMore Project Review R&D in MuchMore Corpus Annotation (DFKI, ZInfo)  PoS, Morphology, Phrases, Grammatical Functions  Term and Relation Tagging Term Extraction (XRCE, EIT, CMU, CSLI)  Bilingual Lexicon Extraction, Extension of Semantic Resources Relation Extraction (DFKI, CSLI)  Grammatical Function Tagging  Extracting Semantic Relation Indicators  Extracting Novel Semantic Relations Sense Disambiguation (CSLI, DFKI)  Tuning and Extension of Semantic Resources  Combining Sense Disambiguation Methods Semantic Annotation Based CLIR Semantic Indexing/Retrieval (EIT,DFKI)

April 19 th,2002 MuchMore Project Review Corpus Based CLIR  Bilingual Lexicon Extraction (XRCE, EIT, CMU, CSLI)  Pseudo Relevance Feedback: PRF (CMU)  Generalized Vector Space Model: GVSM (CMU) Summarization (CMU)  Query, Genre Specific Text Classification Based CLIR (CMU)  Hierarchical/Flat kNN with MeSH R&D in MuchMore Additional Approaches in CLIR

April 19 th,2002 MuchMore Project Review Corpus Annotation PoS  Lexicon Update, Remaining Error Rate ~ 1.5% (EN) Histologically, we found a subepidermal blister formation and a predominantly neutrophilic infiltrate. pos=VB > pos_correct=NN Term and Relation Tagging  Evaluation of 8 DE/EN Parallel Abstracts, Relevant for a Query Morphology German Nouns MMorphRecallIncorrectError-Rate test-dvlp % % test-final % % Incorrect, e.g.: Chorionzottenbiopsie > Chor + Ion + Zotte + Biopsie Annotation Evaluation Corpus ~ 9000 English and German Medical Abstracts from 41 Journals, Springer LINK WebSite, ~ 1 M Tokens for each Language

April 19 th,2002 MuchMore Project Review Term Extraction Aim  Bilingual Lexicon Extraction From Comparable Corpora at Word Level; From Parallel Corpora at Word, and Term (Multi-Word) Level  Bilingual Extension of Semantic Resource (MeSH) verbesserter transabdomineller Technikenimproved transabdominal techniques Prognose des Frühcarcinomsprognosis of early gastric cancer Verletzungen des Gehirnsintracranial injuries Lebensqualitaetquality of live XRCE (Aims and Resources) Resources  Optimal Combination of Existing Resources (Corpus, General Dictionary, Thesaurus: MeSH)  Corpus Specific German Decompounding (Improves Recall by 25% at Equal Precision)

April 19 th,2002 MuchMore Project Review Optimal Combination of Resources  Retaining only 10 best Translations for each Candidate 1.word-to-word, comparable corpora:F1 = aword-to-word, parallel corpora:F1 = bterm-to-term, parallel corpora:F1 = 0.85 Evaluating Separately with Individual Resources (F1) Corpus: 0.62; MeSH: 0.51; General Dictionary: MeSH Extension: 1453 new multi-word terms added (synonyms or new term entries) extracted from the Springer corpus Term Extraction XRCE (Results of Best Method)

April 19 th,2002 MuchMore Project Review Method  Extract Most Frequent Terms (Single Word) by Comparison of Term Frequencies in a General Corpus (German: SDA, English: LA Times) vs. Medical Corpus Term Extraction EIT (Similarity Thesauri) Results  Single Word Terms (Springer Abstracts) German-English:104,904 / English-German: 49,454  Multiword Terms (Phrase Lexicon Generated from ICD10) German Phrases: 354 / English Phrases: 665 Bilingual Phrasal Entries Generated: German - English: 225 / English - German: 246

April 19 th,2002 MuchMore Project Review Method  For each word in one language, accumulate counts of the number of times the translations of the sentences containing that word include each word of the other language. These co-occurrence counts may be restricted using word-alignment techniques.  Apply a variable threshold to filter out uncommon co-occurrences which are unlikely to be translations. The result is a lexicon listing candidate translations and their relative frequencies. Results  ~ Bilingual Term Pairs (PubMed Parallel Abstracts) (Estimated Error Rate: < 10%) Term Extraction CMU (EBT Bilingual Lexicon)

April 19 th,2002 MuchMore Project Review Represent English and German Words as Vectors that are Produced by Recording the Number of Co-Occurrences of the Word in Question with each of a Set of Content-Bearing Words. Use (Cosine) Similarity Measure on these Rows to Find “Nearest Neighbours”. 1, 000 (English) content-bearing words ligament English words Kreuzband Kniegelenk German words ligament knee joint English German Term Extraction CSLI (Infomap System) Term (EN)SIMTerm (DE)SIM bone1.00knochen0.82 cancellous0.70knochens0.71 osteoinductive0.67knochenneubildung0.67 demineralized0.65spongiosa0.64 trabeculae0.64knochenresorption0.60 formation0.60allogenen0.60 periosteum0.56knöcherne0.59 ………

April 19 th,2002 MuchMore Project Review Tuning (CSLI, DFKI)  Aligning Clusters with Senses C |GER|P|L |PF|S |Frauen|3| C |ENG|P|L |PF|S |Human adult females|0| WSD: Terms, Senses Extension (DFKI)  Morphological Analysis (Decomposition) Entzündungsgewebe (infection tissue) HYPONYM Gewebe,Körpergewebe (body tissue) Gewebe, Stoff,Textilstoff (textile)  Semantic Similarity (Co-Occurrence Patterns) Karzinom (carcinoma), Metastase (metastasis) SYNONYM Geschwulst, Tumor,.... Semantic Resource Extension and Tuning

April 19 th,2002 MuchMore Project Review WSD: Algorithm Bilingual Sense Selection (CSLI)  1 Sense in L1 vs. >1 Sense in L2 Englishblood vessel (C ) vs. vessel (polysaccharide) (C ) GermanBlutgefaesse = blood vessel (C ) Combination of Methods (Task, Domain, General) Collocations and Senses (CSLI)  For an ambiguous single word term that is part of several unambiguous multiword terms, choose the sense of the most frequent multiword term. single word termabortion 1) a natural process C (T047) 2) a medical procedure C (T061) multiword termrecurrent abortion C (T047) => sense 1 induced abortion C (T061) => sense 2

April 19 th,2002 MuchMore Project Review WSD: Algorithm Domain Specific Senses (DFKI)  Concept Relevance in Domain Corpus Mineral : Mineralstoff, Eisen, Ferrum, Fluor, Kalzium, Magnesium E-5: Allanit, Alumogel,..., Axionit, Beryll,... Wurtzit, Zirkon Combination of Methods (Task, Domain, General) Instance-Based Learning (DFKI)  Unsupervised Context Models (n-grams) Training (Learn Class Models)He drank He drank He drank He drank Application (Apply Class Models) He drank He drank

April 19 th,2002 MuchMore Project Review  Ambiguous: MeSH EN: 847 (2.5), DE: 780 (2.1); EWN EN: 6300 (2.8) DE: 4059 (1.5)  Evaluation (Nouns): GermaNet (40), English MeSH (59), German MeSH (28) WSD: Evaluation Lexical Sample Evaluation Corpora (Medical) Band (tape, strap. ligament) Fall (drop, case, instance) Gefäss (jar, vessel) Operation (operation, surgery) Prüfung (survey, tryout, checkup) Verletzung (injury, trauma) Wahl (ballot, choice, option) Lage (site, status, position, layer) Gewicht (weight, importance) ……

April 19 th,2002 MuchMore Project Review  Robust, Shallow Grammatical Function Tagger  EM Model (Trained on Frankfurter Rundschau: 35M Tokens, Adaptation on Medical Corpora Under Development) 1.5M ‘Types’: Verb, Voice, Function, Nom-Head-Argument abarbeiten ACT SUBJ Politiker  Use of PoS Information, Use of Chunk Information Planned  Tags for SUBJ, OBJ, IOBJ, ACT/PAS  German Available, English under Development Untersucht wurden 30 Patienten, die sich einer elektiven aortokoronaren Bypassoperation unterziehen mussten. Relation Extraction Grammatical Function Tagging (DFKI)

April 19 th,2002 MuchMore Project Review Cluster 1 T047/T060 (Diagnoses) T060/T101 (Affects) T060/T Cluster 3 T047/T121 (Treats, Causes) T061/T121 (Uses) T121/T184 (Treats)... Cluster 2 T101/T169 T101/T184 T101/T differentiate conclude discriminate diagnose illustrate suffer demonstrate progress develop die reduce treat follow diagnose cure T047: Disease T048: Mental Dysfunction T060: Diagnostic Procedure T101: Patient T121: Pharm. Substance T169: Funct. Concept (Syndrom) T184: Sign or Symptom Relation Extraction Semantic Relation Indicators (DFKI, CSLI) Novel Semantic Relations (DFKI, CSLI)

April 19 th,2002 MuchMore Project Review Maximal Marginal Relevance (MMR)  Find passages most relevant to query  Maximize information novelty (minimize passage redundancy)  Assemble extracted passages for summary Argmax k d i in C [λS(Q, d i ) - (1-λ)max d j in R (S(d i, d j ))] Q = query, d = document, S = similarity function λ = tradeoff factor between relevance & novelty k = number of passages to include in summary Summarization (CMU) Extractive Summarization Applications  Re-ranking retrieved documents from IR Engine  Ranking passages from a document for inclusion in summaries  Ranking passages from topically-related document cluster for cluster summary

April 19 th,2002 MuchMore Project Review  MMR applies to English and German –Genre-based specialization (e.g. include conclusions for scientific articles) –Linguistic specialization possible  Summarization should apply when retrieving FULL articles  query-driven summaries instead of generic abstracts MuchMore Application TaskQuery-Relevant (focused)Query-Free (generic) INDICATIVE, for Filtering (Do I read further?) To filter search engine resultsShort abstracts CONTENTFUL, for reading in lieu of full doc. To solve problems for busy professionals Executive summaries  INDICATIVE and QUERY-RELEVANT Summarization (CMU)

April 19 th,2002 MuchMore Project Review  Test Collection: Springer Abstracts (German and English)  Query Set: 25 of 126 Selected by ZInfo  Relevance Assessments Assumption : Documents Retrieved by all Runs for one Query (Intersection) are Relevant Pool Size : 500 Documents Based on 18 Runs Done by CMU, CSLI and EIT German (ZInfo): 959 Relevant Documents English (CMU): 500 Relevant Documents (1 judge) 964 Relevant Documents (3 judges) Technical Evaluation Test Data

April 19 th,2002 MuchMore Project Review  Corpus BasedSimilarity Thesaurus (EIT) Example-based Translation (CMU) Pseudo Relevance Feedback (CMU) Generalized Vector Space Model (CMU)  Hybrid Classification (CMU) H ierarchical: kNN, Rocchio Flat: kNN, Rocchio-style Classifier Semantic Annotation + Extraction (DFKI, XRCE) UMLS / XRCE Terms & Semantic Relations EuroWordNet Terms Semantic Annotation + Similarity Thesaurus Technical Evaluation Methods Evaluated

April 19 th,2002 MuchMore Project Review Overall Performance  11point-Average Precision (Interpolated) Performance in the High-Precision Area Assumption: User Wants to Get Most Relevant Documents Topranked within the Result List  Average Interpolated Precision at Recall of 0.1  Exact Precision after 10 Retrieved Documents Applied to Experiments Evaluating Semantic Annotations Technical Evaluation TREC-Style Performance Measurements

April 19 th,2002 MuchMore Project Review Data Sets  EIT: The Springer Parallel Corpus, i.e Documents for English, and 9640 documents for German  CMU: Half of the Corpus, i.e. a Test Set with 4820 Documents in each. SystemEng-EngGer-GerGer-EngEng-Ger Monolingual EIT: lnu.ltn N/A Crosslingual EIT: SimThes & lnu.ltnN/A Monolingual PRF N/A Crosslingual PRFN/A EBT: chi-squaredN/A Crosslingual GVSM(first evaluation to be completed in July, 2002) Technical Evaluation Results: Corpus Based Methods

April 19 th,2002 MuchMore Project Review Categorization (Preliminary Results) Reuters-21578: 10,000+ documents, 90 categories Reuters Corpus Volume 1, TREC-10 version (RCV1): 783,484 documents, 84 categories Reuters Koller & Sahami subsets (ICML’98): 138 to 939 documents, 6-11 categories in a set OHSUMED: 233,445 documents, 14,321 categories SystemData SetMacro-avg F1Micro-avg F1 kNNReuters RocchioReuters kNNRCV1.TREC-10(F0.5 =.44)(F0.5 =.55) RocchioRCV1.TREC-10(F0.5 =.39)(F0.5 =.49) kNNR-KS Subsets (3).85,.81,.97.89,.80,.94 HkNNR-KS Subsets (3).85,.80,.98.86,.82,.99 RocchioR-KS Subsets (3).80,.75,.96.82,.83,.96 HRocchioR-KS Subsets (3).83,.81,.98.78,.84,.99 kNNOHSUMED Technical Evaluation Results: Hybrid Methods

April 19 th,2002 MuchMore Project Review Semantic Annotation + Extraction Data SetFull Springer Corpus Weighting SchemeCoordination Level Matching (CLM): 1. Pass: Documents Preferred Containing Matching Terms or Semantic Relations 2. Pass: All Features Using lnu.ltn Rel. AssessmentsGerman System 11pt AvPrecPrec at Recall of 0.1Prec at 10 Docs Retr SemA-v3SemA-v4Sem-Av3SemA-v4SemA-v3SemAv4 EN2DE: Morph & EWN EN2DE: Morph & UMLS EN2DE: Morph& UMLS & XRCE DE2EN: Morph & EWN DE2EN: Morph & UMLS Technical Evaluation Results: Hybrid Methods

April 19 th,2002 MuchMore Project Review Semantic Annotation + Similarity Thesaurus Data SetFull Springer Corpus Weighting SchemeCoordination Level Matching (CLM) Rel. AssessmentsGerman System 11pt AvPrec Prec at Recall of 0.1 Prec at 10 Docs Retr EN2DE: transl. Morphology & EWN EN2DE: transl. Morphology & UMLS EN2DE: transl. Morphology & UMLS & XRCE DE2EN: transl. Morphology & EWN DE2EN: transl. Morphology & UMLS Technical Evaluation Results: Hybrid Methods

April 19 th,2002 MuchMore Project Review Assumption: CLIR achieves up to 75 % of Monolingual Baseline (11pt Average Precision)  Corpus-based Methods (Compared to Monolingual PRF) German – EnglishPRF: 81 %, EBT: 77 %, EIT: 66% English – GermanPRF: 113 %, EBT: 106 %, EIT: 60%  Hybrid Methods (Compared to Monolingual EIT) German – English: 73 % (UMLS Terms & SemRels) English – German: 50 % (UMLS Terms & SemRels) English – German: 80 % (UMLS Terms & SemRels & XRCE Terms) German – English: 74 % (SimThes & UMLS Terms & SemRels) English – German: 80 % (SimThes & UMLS Terms & SemRels) English – German: 92 % (SimThes & UMLS Terms & SemRels & XRCE Terms) Technical Evaluation Summary of the Results

April 19 th,2002 MuchMore Project Review Corpus Collection  Comparable Medical Document Corpora are Very Difficult to Obtain, Anonymization Must be Validated by Hospital CIO  Work with „Shuffled“ Parallel Corpus  Radiology Reports (~ ) Available in German, to be Obtained for English Management Deviations from the Work Plan Corpus Annotation  More Efforts on Improving PoS Tagging and Morphological Analysis (English and German Medical Specialist Lexicon) Relation Extraction  More Efforts on Grammatical Function Tagging as Preprocessing for Semantic Relation Tagging and Extraction

April 19 th,2002 MuchMore Project Review R&D Topics  Ontology Development Combining Axes in AGK-Thesaurus (ZInfo) with Cluster Methods (CSLI, DFKI)  Semantic Web Semantic Annotation of Medical Documents with Metadata (UMLS in Protégé) Management Future Prospects and Activities Related Projects and Workshops  Project Proposal IKAR/OS on KM & Visualization in Life Sciences  OntoWeb SIG on LT in Ontology Development and Use  MuchMore Workshop with Invited Experts in Medical Information Access, CLIR and Semantic Annotation (September 2002)  ZInfo/MuchMore Workshop on Electronic Patient Records (Spring 2003)