SENSEVAL: Evaluating WSD Systems

Slides:



Advertisements
Similar presentations
Evaluating the Waspbench A Lexicography Tool Incorporating Word Sense Disambiguation Rob Koeling, Adam Kilgarriff, David Tugwell, Roger Evans ITRI, University.
Advertisements

Word Sense Disambiguation for Machine Translation Han-Bin Chen
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Francisco Viveros-Jiménez Alexander Gelbukh Grigori Sidorov.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
SENSEVAL2 Scott Cotton and Martha Palmer ISLE Meeting Dec 11, 2000 University of Pennsylvania.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Word Sense Disambiguation UIUC - 06/10/2004 Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez TALP, LSI,
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
Word Sense Disambiguation Reading: Chap 16-17, Jurafsky & Martin Instructor: Rada Mihalcea.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
WordNet: Connecting words and concepts Christiane Fellbaum Cognitive Science Laboratory Princeton University.
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
11 Chapter 19 Lexical Semantics. 2 Lexical Ambiguity Most words in natural languages have multiple possible meanings. –“pen” (noun) The dog is in the.
Learning Multilingual Subjective Language via Cross-Lingual Projections Mihalcea, Banea, and Wiebe ACL 2007 NLG Lab Seminar 4/11/2008.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter.
CLEF Kerkyra Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Arantxa Otegi UNIPD: Giorgio Di Nunzio UH: Thomas Mandl.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
CLEF Budapest1 Measuring the contribution of Word Sense Disambiguation for QA Proposers: UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini.
Coarse-grained Word Sense Disambiguation
Sentiment analysis algorithms and applications: A survey
PRESENTED BY: PEAR A BHUIYAN
Using lexical chains for keyword extraction
LACONEC A Large-scale Multilingual Semantics-based Dictionary
CS 388: Natural Language Processing: Word Sense Disambiguation
Lecture 21 Computational Lexical Semantics
Statistical NLP: Lecture 9
Special Topics in Text Mining
WordNet WordNet, WSD.
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
A method for WSD on Unrestricted Text
C SC 620 Advanced Topics in Natural Language Processing
臺灣大學資訊工程學系 高紹航 臺灣大學外國語文學系 高照明
Using Uneven Margins SVM and Perceptron for IE
CS224N Section 3: Corpora, etc.
Unsupervised Word Sense Disambiguation Using Lesk algorithm
CS224N Section 3: Project,Corpora
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

SENSEVAL: Evaluating WSD Systems Jason Blind & Lisa Norman College of Computer and Information Science Northeastern University Boston, MA 02115 January 25, 2006

What is SENSEVAL? Mission Underlying Goal History When To organize and run evaluations that test the strengths and weaknesses of WSD systems. Underlying Goal To further human understanding of lexical semantics and polysemy. History Began as a workshop in April 1997 Organized by ACL-SIGLEX When 1998, 2001, 2004, 2007?

What is WSD? Machine Translation Information Retrieval English drug translates into French as either drogue or médicament. Information Retrieval If user queries for documents about drugs, do they want documents about illegal narcotics or medicine? How do people disambiguate word senses? Grammatical Context “AIDS drug” (modified by proper name) Lexical Context If drug is followed by {addict, trafficker, etc…} the proper translation is most likely drogue. Domain-based Context If the text/document/conversation is about {disease, medicare, etc…} then médicament is most likely the correct translation.

Evaluating WSD Systems Definition of the task Selecting the data to be used for the evaluation Production of correct answers for the evaluation data Distribution of the data to the participants Participants use their program to tag the data Administrators score the participants’ tagging Participants and administrators meet to compare notes and learn lessons

Evaluation Tasks All-Words Lexical Sample Multilingual Lexical Sample Translation Automatic Sub-categorization Acquisition (?) WSD of WordNet Glosses Semantic Roles (FrameNet) Logic Forms (FOPC)

SENSEVAL-1 Comprised of WSD tasks for English, French and Italian. Timetable A plan for selecting evaluation materials was agreed. Human annotators generated the ‘gold standard’ set of correct answers. The gold standard materials, without answers, were released to participants, who then had a short time to run their programs over them and return their sets of answers to the organizers. The organizers scored the returned answer sets, and scores were announced and discussed at the workshop. 17 Systems were evaluated.

SENSEVAL-1 : Tasks Lexical Sample Task breakdown first carefully select a sample of words from the lexicon (based upon BNC frequency and WordNet polysemy levels); systems must then tag several corpus instances of the sample words in short extracts of text. Advantages over All-words sample More efficient human tagging The all-words task requires access to a full dictionary Many systems needed either sense tagged training data or some manual input for each dictionary entry, so all-words would be infeasible Task breakdown 15 noun tasks 13 verb tasks 8 adjective tasks 5 indeterminate tasks

SENSEVAL-1 : Dictionary & Corpus Hector A joint Oxford University Press/Digital project in which a database with linked dictionary and 17M word corpus was developed. Chosen, when SENSEVAL wasn’t sure if it would have extra funding to pay humans to sense-tag text, because it was already sense-tagged. One disadvantage is that the OUP delivered corpus instances were associated with very little context (1-2 sentences usually).

SENSEVAL-1 : Data Dry-run Distribution Training-data Distribution Systems must tag almost all of the content words in a sample of running text. Training-data Distribution first carefully select a sample of words from the lexicon; systems must then tag several instances of the sample words in short extracts of text. 20,000+ instances of 38 words. Evaluation Distribution A set of corpus instances for each task. Each instance had been tagged by at least 3 humans (these were obviously not part of the distribution :^) There were a total of 8,448 corpus instances in total. Most tasks had between 80 and 400 instances.

SENSEVAL-1 : Baselines Lesk’s algorithm Dictionary-based Corpus-based Unsupervised systems Corpus-based Supervised systems

SENSEVAL-1 : Results (English) State of the art, where training data is available, is 75%-80% When training data is available, systems that use it perform substantially better than those that do not. A well implemented simple LESK algorithm is hard to beat.

SENSEVAL-2 : Tasks All-Words Lexical Sample Translation Systems must tag almost all of the content words in a sample of running text. Lexical Sample first carefully select a sample of words from the lexicon; systems must then tag several instances of the sample words in short extracts of text. 73 words = 29 nouns + 15 adjectives + 19 verbs Translation (Japanese only) in SENSEVAL-2 task in which word sense is defined according to translation distinction. (By contrast, SENSEVAL-1 evaluated systems on only lexical sample tasks in English, French, and Italian.)

SENSEVAL-2 : Dictionary & Corpus Sense-Dictionary WordNet (1.7) Corpus Penn Treebank II Wall Street Journal articles British National Corpus (BNC)

SENSEVAL-2 : Data Dry-run Distribution Training-data Distribution Systems must tag almost all of the content words in a sample of running text. Training-data Distribution 12,000+ instances of 73 word Evaluation Distribution A set of corpus instances for each task. Each instance had been tagged by at least 3 humans (these were obviously not part of the distribution :^) There were a total of ? corpus instances in total.

SENSEVAL-2 : Results 34 Teams : 93 Systems Czech AW 1 - .94 Basque LS Language Task # of Submissions # of Teams IAA Baseline Best System Czech AW 1 - .94 Basque LS 3 2 .75 .65 .76 Estonian .72 .85 .67 Italian .39 Korean .71 .74 Spanish 12 5 .64 .48 Swedish 8 .95 .70 Japanese 7 .86 .78 TL 9 .81 .37 .79 English 21 .57 .69 26 15 .51/.16 .64/.40

SENSEVAL-3 : Tasks All-Words Lexical Sample English, Italian Lexical Sample Basque, Catalan, Chinese, English, Italian, Romanian, Spanish, Swedish Multilingual Lexical Sample Automatic Sub-categorization Acquisition (?) WSD of WordNet Glosses Semantic Roles (FrameNet) Logic Forms (FOPC)

SENSEVAL-3 : Dictionary & Corpus Sense-Dictionary WordNet (2.0), eXtendedWordNet, EuroWordNet, ItalWordNet(1.7) MiniDir-Cat FrameNet Corpus British National Corpus (BNC) Penn Trebank, Los Angeles Times, Open Mind Common Sense SI-TAL (Integrated System for the Automatic Treatment of Language), MiniCors-Cat, etc...

SENSEVAL-3 : Data Dry-run Distribution Training-data Distribution Systems must tag almost all of the content words in a sample of running text. Training-data Distribution 12,000+ instances of 57 word Evaluation Distribution A set of corpus instances for each task. Each instance had been tagged by at least 3 humans (these were obviously not part of the distribution :^)

Approaches to WSD Kernel methods EM-based Clustering Lesk-based methods Most Common Sense Heuristic Domain Relevance Estimation Latent Semantic Analysis (LSA) Kernel methods EM-based Clustering Ensemble Classification Maximum Entropy Naïve Bayes SVM Boosting KPCA

References A.Kilgarriff. “An Exercise in Evaluating Word Sense Disambiguation Programs”, 1998 A.Kilgarriff. “Gold Standard Datasets for Evaluating Word Sense Disambiguation Programs”, 1998 R.Mihalcea, T.Chklovski, and A.Kilgarriff. “The SENSEVAL-3 English Lexical Sample Task.”, 2004 J.Rosenzweig and A.Kilgarriff. “English SENSEVAL: Report and Results”, 1998 P.Edmonds. “The evaluation of word sense disambiguation systems”. ELRA Newsletter, Vol. 7, No.3, 2002 M.Carpuat, W.Su, and D.Wu. “Augmenting Ensemble Classification for Word Sense Disambiguation with Kernel PCA Model”. SENSEVAL-3, 2004