Corpus-Based Approaches to Word Sense Disambiguation

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.
Outline What is a collocation?
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 28, 2011.
Lexical Semantics & Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 16, 2011.
Word Sense Disambiguation Ling571 Deep Processing Techniques for NLP February 23, 2011.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
Faculty Of Applied Science Simon Fraser University Cmpt 825 presentation Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary Jiri.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
1/17 Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation Hiram Calvo and Alexander Gelbukh Presented.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Word sense induction using continuous vector space models
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ELN – Natural Language Processing Giuseppe Attardi
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Oh-Woog Kwon KLE Lab. CSE POSTECH.
Short Text Understanding Through Lexical-Semantic Analysis
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Text Classification, Active/Interactive learning.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
Lexical Semantics & Word Sense Disambiguation CMSC Natural Language Processing May 15, 2003.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Beyond Nouns Exploiting Preposition and Comparative adjectives for learning visual classifiers.
Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Hierarchical Clustering for POS Tagging of the Indonesian Language Derry Tanti Wijaya and Stéphane Bressan.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Lecture 21 Computational Lexical Semantics
Statistical NLP: Lecture 9
Introduction Task: extracting relational facts from text
A method for WSD on Unrestricted Text
Word embeddings (continued)
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Corpus-Based Approaches to Word Sense Disambiguation Gina-Anne Levow April 17, 1996

Word Sense Disambiguation Many plants and animals live in the rainforest. Plant1 Plant2 Plant3 The manufacturing plant produced widgets.

Ambiguity: The Problem Mappings not 1-to-1 Sound Symbol “Sense” Shi stone to be job time ré-cord record NOUN re-córd VERB plant plant to plant a seed living plant manufacturing plant it it ?

Dictation Speech Synthesis Information Retrieval Text Understanding English French “sentence” peine (legal) phrase (grammatical) Machine Translation “make” (a decision) prendre “take” (a car)

Roadmap Introduction: Questioning Assumptions Introduction to 3 Corpus-Based Approaches Example Critique of the Approaches Context Surface Statistics New Words & Similarity Conclusion

The Problems Corpus-Based Disambiguation Accept Simple Answers to Key Questions Context: Windows of Word Co-occurrence Everything is Inferable from Surface Statistics No Definition of Sense Independent of Approach Fundamental Limits Build Disambiguators, No More

Method Sampler Schutze’s “Word Space” Resnik - Cluster Labelling Context Vector Representations Resnik - Cluster Labelling WordNet Semantic Hierarchy Corpus-Based “Informativeness” Yarowsky - Trained Decision Lists 1 Sense per Discourse 1 Sense per Collocation

Example: “Plant” Disambiguation There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. Biological Example The Paulus company was founded in 1938. Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We’re engineering, manufacturing and commissioning world- wide ready-to-run plants packed with our comprehensive know-how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime and many others. We use reagent injection in molten metal for the… Industrial Example Label the First Use of “Plant”

Sense Selection in “Word Space” Build a Context Vector 1,001 character window - Whole Article Compare Vector Distances to Sense Clusters Only 3 Content Words in Common Distant Context Vectors Clusters - Build Automatically, Label Manually Result: 2 Different, Correct Senses 92% on Pair-wise tasks

Sense Labeling Under WordNet Use Local Content Words as Clusters Biology: Plants, Animals, Rainforests, species… Industry: Company, Products, Range, Systems… Find Common Ancestors in WordNet Biology: Plants & Animals isa Living Thing Industry: Product & Plant isa Artifact isa Entity Use Most Informative Result: Correct Selection

Sense Choice With Collocational Decision Lists Use Initial Decision List Rules Ordered by Check nearby Word Groups (Collocations) Biology: “Animal” in + 2-10 words Industry: “Manufacturing” in + 2-10 words Result: Correct Selection 95% on Pair-wise tasks

The Question of Context Shared Intuition: Context Area of Disagreement: What is context? Wide vs Narrow Window Word Co-occurrences Sense

Taxonomy of Contextual Information Topical Content Word Associations Syntactic Constraints Selectional Preferences World Knowledge & Inference

A Trivial Definition of Context All Words within X words of Target Many words: Schutze - 1000 characters, several sentences Unordered “Bag of Words” Information Captured: Topic & Word Association Limits on Applicability Nouns vs. Verbs & Adjectives Schutze: Nouns - 92%, “Train” -Verb, 69%

Limits of Wide Context Comparison of Wide-Context Techniques (LTV ‘93) Neural Net, Context Vector, Bayesian Classifier, Simulated Annealing Results: 2 Senses - 90+%; 3+ senses ~ 70% People: Sentences ~100%; Bag of Words: ~70% Inadequate Context Need Narrow Context Local Constraints Override Retain Order, Adjacency

Learning from Large Corpora Hand-coded Approaches Limited Corpus + Automatic Training Face Recognition, Speech Recognition, Part-of-Speech Tagging (Sung & Poggio 1995, Rabiner & Juang 1993, Brill et al. 1991) Cautionary Note: “Big” problem Successes Task Features Influences Training Data Results ASR 625 3 Phones 5000 sentences 95% Part-of-Speech 60 Tags 2 POS 1.5 million words 97% WSD 74,000 senses 100+ words ??? ???

Surface Regularities = Useful Disambiguators Not Necessarily! “Scratching her nose” vs “Kicking the bucket” (deMarcken 1995) Right for the Wrong Reason Burglar Rob… Thieves Stray Crate Chase Lookout Learning the Corpus, not the Sense The “Ste.” Cluster: Dry Oyster Whisky Hot Float Ice Learning Nothing Useful, Wrong Question Keeping: Bring Hoping Wiping Could Should Some Them Rest

Interactions Below the Surface Constraints Not All Created Equal “The Astronomer Married the Star” Selectional Restrictions Override Topic No Surface Regularities “The emigration/immigration bill guaranteed passports to all Soviet citizens No Substitute for Understanding

What is Similar Ad-hoc Definitions of Sense Cluster in “word space”, WordNet Sense, “Seed Sense”: Circular Schutze: Vector Distance in Word Space Resnik: Informativeness of WordNet Subsumer + Cluster Relation in Cluster not WordNet is-a hierarchy Yarowsky: No Similarity, Only Difference Decision Lists - 1/Pair Find Discriminants

New Words: Spotting & Adding Dependent on Similarity Resnik: Closed Representation Assume Cluster Coherent Senses Defined only in WordNet Yarowsky: Build New Decision List!! Must be a seed

Future Directions Can’t Just “Put Them Together” Evaluation: Assessing the Problem Broad Tests vs 10 Pairs Task-Based - Is WSD the problem to solve? What parts of WSD are most important? All Senses/# Senses Equally Hard? Interaction of Ambiguities?

Future Directions Relation-based Similarity Similar Words & Similar Relations Mutual Information & Argument Structure Topic, Task Constraints

Note on Baselines Many Evaluations Weak 2-way Ambiguous Words Small Sets, Isolated Baseline: Human 2-way forced choice > 90% Many-way: as low as 60-70% Baseline: Corpus 28% One Sense - For Free Multi-Sense Guessing: 28%, Frequency, Co-occurrence: 58% Overall: 70% Note: Ambiguous: 70% Overall 90%

Schutze’s Vector Space: Detail Build a co-occurrence matrix Restrict Vocabulary to 4 letter sequences Exclude Very Frequent - Articles, Afflixes Entries in 5000-5000 Matrix Word Context 4grams within 1001 Characters Sum & Normalize Vectors for each 4gram Distances between Vectors by dot product 97 Real Values

Schutze’s Vector Space: continued Word Sense Disambiguation Context Vectors of All Instances of Word Automatically Cluster Context Vectors Hand-label Clusters with Sense Tag Tag New Instance with Nearest Cluster

Resnik’s WordNet Labeling: Detail Assume Source of Clusters Assume KB: Word Senses in WordNet IS-A hierarchy Assume a Text Corpus Calculate Informativeness For Each KB Node: Sum occurrences of it and all children Informativeness Disambiguate wrt Cluster & WordNet Find MIS for each pair, I For each subsumed sense, Vote += I Select Sense with Highest Vote

Yarowsky’s Decision Lists: Detail One Sense Per Discourse - Majority One Sense Per Collocation Near Same Words Same Sense

Yarowsky’s Decision Lists: Detail Training Decision Lists 1. Pick Seed Instances & Tag 2. Find Collocations: Word Left, Word Right, Word +K (A) Calculate Informativeness on Tagged Set, Order: (B) Tag New Instances with Rules (C)* Apply 1 Sense/Discourse (D) If Still Unlabeled, Go To 2 3. Apply 1 Sense/Discouse Disambiguation: First Rule Matched