Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Slides:



Advertisements
Similar presentations
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Advertisements

A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.
The Challenges of Multilingual Search Paul Clough The Information School University of Sheffield ISKO UK conference 8-9 July 2013.
Experiments on Using Semantic Distances Between Words in Image Caption Retrieval Presenter: Cosmin Adrian Bejan Alan F. Smeaton and Ian Quigley School.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Kalyani Patel K.S.School of Business Management,Gujarat University.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Jiuling Zhang  Why perform query expansion?  WordNet based Word Sense Disambiguation WordNet Word Sense Disambiguation  Conceptual Query.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval Doctorate Course Web Information Retrieval Speaker Gaia Trecarichi.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
CLEF Budapest Joint SemEval/CLEF tasks: Contribution of WSD to CLIR UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini Irion Technologies:
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
The CLEF 2003 cross language image retrieval task Paul Clough and Mark Sanderson University of Sheffield
Information Retrieval and Web Search Cross Language Information Retrieval Instructor: Rada Mihalcea Class web page:
1 Query Operations Relevance Feedback & Query Expansion.
Cross-Language Evaluation Forum (CLEF) IST Expected Kick-off Date: August 2001 Carol Peters IEI-CNR, Pisa, Italy Carol Peters: blabla Carol.
MIRACLE Multilingual Information RetrievAl for the CLEF campaign DAEDALUS – Data, Decisions and Language, S.A. Universidad Carlos III de.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
IIIT Hyderabad’s CLIR experiments for FIRE-2008 Sethuramalingam S & Vasudeva Varma IIIT Hyderabad, India 1.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.
Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.
SENSEVAL: Evaluating WSD Systems
Statistical NLP: Lecture 9
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department of Information Studies 1 Department of Computer Science 2 University of Sheffield, UK

GWC th January 2004 Outline Introduction Word sense disambiguation Experimental setup CLIR evaluation WSD evaluation Discussion and conclusion

GWC th January 2004 Introduction CLIR – search for documents written in one language (target) with queries written in another (source) Approaches – translate query, documents or both Translation methods – e.g. MT, MRDs, parallel corpora, controlled vocabulary Problems – e.g. lexical coverage, ambiguity, small context, proper names, compound words WSD – to identify the correct sense of a word during translation Experiments – with EuroWordNet and “standard” IR test collection resources

GWC th January 2004 Example translation Number: CL1 Caso Waldenheim caso#1 --> [case#9:grammatical case#1:]( ) "nouns or pronouns or adjectives (often marked by inflection) related in some way to other words in a sentence" caso#2 --> [case#12:instance#2:]( ) "an occurrence of something; "it was a case of bad judgment"" caso#3 --> [case#16:event#2:]( ) "a special set of circumstances; "in that event, the first possibility is excluded"“ Case (event) Waldenheim Source query Target query EuroWordNet Disambiguation needed?

GWC th January 2004 Word sense disambiguation Each Spanish noun can be associated with multiple synsets, in addition each of these can be mapped to multiple synsets in the ILI (English WN) Attempt to automatically identify the EuroWordNet synset appropriate to the query using WSD Adapt Resnik’s algorithm for disambiguating groups of nouns: –Treats EuroWordNet as a hierarchy and identifies most likely synsets based on distance in WordNet and corpus information –Query is treated as a “bag of words”

GWC th January 2004 Experimental setup TREC 6 collection (242,918 documents and 25 queries) Spanish used for CL retrieval and English as monolingual baseline Query translation process: term identification  term translation (EWN)  retrieval EWN transformed into a kind of MRD for translation Focused on translation of nouns and adjectives Synset selection – manually, first, all or WSD algorithm Synset member selection – head (first) or all Experimented with short (title) and longer queries (title + description)

GWC th January 2004 Example translation Number: CL1 Caso Waldenheim caso#1 --> [case#9:grammatical case#1:]( ) "nouns or pronouns or adjectives (often marked by inflection) related in some way to other words in a sentence" caso#2 --> [case#12:instance#2:]( ) "an occurrence of something; "it was a case of bad judgment"" caso#3 --> [case#16:event#2:]( ) "a special set of circumstances; "in that event, the first possibility is excluded"“ case Waldenheim Source query EuroWordNet Disambiguation needed? 1 st sense, head case grammatical case Waldenheim 1 st sense, all words case Waldenheim case grammatical case Instance event Waldenheim all senses, head all senses, all words

GWC th January 2004 CLIR evaluation (title & description) Measured MAP and relevant retrieved using trec_eval Baseline: map = , relevant retrieved = 979 Synset selection Synset members Relevant retrieved MAP GOLDAll st All st st All st WSDAll st % monolingual Highest (72% monolingual)

GWC th January 2004 CLIR evaluation (title only) Baseline: map = , relevant retrieved = 977 Synset selection Synset members Relevant retrieved MAP GOLDAll st All st st All st WSDAll st % monolingual Highest (76% monolingual)

GWC th January 2004 WSD evaluation Manual annotation identifies single correct sense for each noun; WSD algorithm can return multiple senses Calculated two evaluation metrics: –Relaxed: score 1 if correct sense is identified; corresponds to proportion of words where correct senses is included –Strict: score 1/m if correct sense included in m returned; gives indication of amount of incorrect senses returned “Choose first synset” used as naïve baseline x xx m Correct sense

GWC th January 2004 WSD evaluation LanguageMethodStrictRelaxed EnglishWSD first0.47 SpanishWSD first0.48 WSD results are disappointing compared to state-of-the-art Limited context of queries seems to make disambiguation difficult BUT does not seem to effect CLIR results!

GWC th January 2004 Discussion and conclusions Disagreement of usefulness of WSD for monolingual retrieval  WSD algorithms have to be accurate to be useful for retrieval  the IR algorithm performs a kind of disambiguation anyway Our results suggest some WSD better than none for CLIR using EWN as the translation resource even with poor WSD performance WSD algorithm well-suited to CLIR where it selects senses only when there is sufficient context Experiments highlight limitation in EWN for CLIR: many types of useful semantic information missing and lexical coverage

GWC th January 2004 Future work Experiment with different languages supported by EWN to see if results generalise Experiment with different datasets (e.g. CLEF) and further bilingual pairs, e.g. English  Spanish. Use advanced query construction techniques, e.g. the “synonym” operator to combine synset members Combine various WSD algorithms to improve their individual effectiveness Improve the translation process based on EWN, e.g. identify phrases