Coarse-grained Word Sense Disambiguation

Slides:

Advertisements

Similar presentations

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.

Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.

Expectation Maximization

Grouping WordNet Senses 1.How can you tell the senses apart? 2.In which cases should senses be merged? 3.In which cases should they be kept separate?

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.

Automatic Metaphor Interpretation as a Paraphrasing Task Ekaterina Shutova Computer Lab, University of Cambridge NAACL 2010.

Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Overview Full Bayesian Learning MAP learning

A Novel Approach to Event Duration Prediction

Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, Fu- Dong Chiou Computer and Information Science University.

Faculty Of Applied Science Simon Fraser University Cmpt 825 presentation Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary Jiri.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.

1 Empirical Learning Methods in Natural Language Processing Ido Dagan Bar Ilan University, Israel.

Ensemble Learning: An Introduction

Introduction LING 572 Fei Xia Week 1: 1/3/06. Outline Course overview Problems and methods Mathematical foundation –Probability theory –Information theory.

Automatic Verb Sense Grouping --- Term Project Proposal for CIS630 Jinying Chen 10/28/2002.

Distributional clustering of English words Authors: Fernando Pereira, Naftali Tishby, Lillian Lee Presenter: Marian Olteanu.

Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.

A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK

WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Word Sense Disambiguation (WSD)

Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.

Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.

SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Word Sense Disambiguation Kyung-Hee Sung Foundations of Statistical NLP Chapter 7.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.

Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.

ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.

NUMBER OR NUANCE: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

ARDA Visit 1 Penn Lexical Semantics at Penn: Proposition Bank and VerbNet Martha Palmer, Dan Gildea, Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Karin.

1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.

Data Mining and Decision Support

Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.

BLUE (Boeing Language Understanding Engine) - A Quick Tutorial on How it Works Working Note Peter Clark Phil Harrison (Boeing Phantom Works)

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

English Proposition Bank: Status Report

Machine Learning for Computer Security

NELL Knowledge Base of Verbs

David Mareček and Zdeněk Žabokrtský

INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

Sample Selection for Statistical Parsing

CSC 594 Topics in AI – Natural Language Processing

Maximum Likelihood Estimation

Machine Learning Basics

Statistical NLP: Lecture 9

N-Gram Model Formulas Word sequences Chain rule of probability

A method for WSD on Unrestricted Text

Automatic Detection of Causal Relations for Question Answering

Text Categorization Berlin Chen 2003 Reference:

CS224N Section 3: Corpora, etc.

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

Coarse-grained Word Sense Disambiguation Jinying Chen, Martha Palmer March 25th, 2003

Outline Motivation Supervised Verb Sense Grouping Unsupervised Verb Frameset Tagging Future Work

Motivation Fine-grained WSD are difficult for both human and machine and well-defined sense groups can alleviate this problem (Martha, Hoa, Christiane, 2002) Potential application in Machine Translation When building up a WSD corpus, the sense hierarchy can help annotators in sense tagging speed and accuracy (hopefully ? )

Outline Motivation Supervised Verb Sense Grouping What’s VSG? Using Semantic Features for VSG Building Decision Tree for VSG Experiment Results Unsupervised Verb Frameset Tagging Future Work

What’s VSG? verb sense group verb sense Frameset2 Frameset1 WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20 verb sense

Using Semantic Features for VSG PropBank Each verb is defined by several framesets All verb instances belonging to the same frameset share a common set of roles Roles can be argn (n=0,1,…) and argM-f Frameset is consistent with Verb Sense Group Frameset tags and roles are semantic features for VSG

Building Decision Tree for VSG Use c5.0 of DT 3 Feature Sets: SF (Simple Feature set) works best: VOICE: PAS, ACT FRAMESET: 01,02, … ARGn (n=0,1,2 …) : 0(not occur), 1(occur) CoreFrame: 01-ARG0-ARG1, 02-ARG0-ARG2,… ARGM: 0(has not ARGM), 1(has ARGM) ARGM-f(f=DIS, ADV, …): i (occur i times)

Experiment Results Table 2 Error rate of Decision Tree on five verbs

Discussion Simple feature set and simple DT algorithms works well Potential sparse data problem Complicate DT algorithms (e.g. with boosting) tend to overfit the data Complex features are not utilized by the model Solution: use large corpus, e.g. parsed BNC corpus without frameset annotation

Outline Task Description Methodology Unsupervised Verb Frameset Tagging EM Clustering for Frameset Tagging Features Preliminary Experiment Results Future Work

EM Clustering for Frameset Tagging we treat a set of features extracted from the parsed sentences as observed variables and assume they are independent given a hidden variable, c: (1) f1 cluster c f2 …… fm

In the expectation step, we compute the probability of c conditioned on the set of observed features: (2) In the maximization step, we re-compute and by maximizing the log-likelihood of all of the observed data. Repeat the Expectation and Maximization steps for a fixed number of rounds or until the change of the probability parameters and is under a threshold.

To do clustering, we compute for each verb instance with the same formula as in (2) and assign this instance to the cluster that has the maximal . To evaluate we count the majority of the instances in a single cluster which have the same gold-standard Frameset. Other instances not in the majority of a cluster are treated as misclassified.

Features WordNet classes for Subject: Person, Animate, State, Event, … WordNet classes for Object Passivization: 0, 1 Transitivity: 0, 1 PP as adjuncts: location, direction, beneficiary … Double objects: 0, 1 Clausal complements: 0, 1

Preliminary Experiment Results Table 3 Accuracy of EM clustering on five verbs

Outline Task Description Methodology Unsupervised Verb Frameset Tagging Future Work

Future Work To improve current model by Refine Subcategorization Extraction Use More Features Example: a. He has to live with this programming work. (live 02 endure) b. He lived with his relatives. (live 01 inhabit) To cluster nouns automatically instead of using WordNet to group nouns

Thanks!

Table 4 lower bound on Decision Tree error rate

Table 5 Error rate of DT with different feature sets

Table 6 Accuracy of EM clustering on five verbs

What’s VSG? Aggregate the senses of a verb into several groups according to their similarities Example: Learn GROUP 1: WN1, WN3 (acquire a skill) GROUP 2: WN2, WN6 (find out) SINGLETON: WN4 (be a student) SINGLETON: WN5 (teach) WordNet Meaning (simplified): 1. acquire or gain knowledge or skills -- ("She learned dancing”) 2. hear, find out -- ("I learned that she has two grown-up children“) 3. memorize, con -- (commit to memory; learn by heart) 4. study, read, take -- (be a student of a certain subject; "She is learning for the bar exam") 5. teach, learn, instruct -- ("I learned them French") 6. determine, find out -- ("I want to learn whether she speaks French“)

Groups Senses Portuguese German G1 WN1,WN2 desenvolver entwickeln G2 bilden G3 WN8 ausbilden WN13 G4 WN5 desenvolver-se WN10 Bilden sich Table 7 Portuguese and German translations of develop