11 Chapter 20 Computational Lexical Semantics. Supervised Word-Sense Disambiguation (WSD) Methods that learn a classifier from manually sense-tagged text.

Slides:

Advertisements

Similar presentations

Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 10: Natural Language Processing and IR. Syntax and structural disambiguation.

Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word Sense Disambiguation semantic tagging of text, for Confusion Set Disambiguation.

11 Chapter 20 Computational Lexical Semantics. Supervised Word-Sense Disambiguation (WSD) Methods that learn a classifier from manually sense-tagged text.

Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -

Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

1 Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation Saif Mohammad Ted Pedersen University of Toronto University of Minnesota.

Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.

CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.

1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

11 CS 388: Natural Language Processing: Word Sense Disambiguation Raymond J. Mooney University of Texas at Austin.

Albert Gatt Corpora and Statistical Methods Lecture 9.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Word Sense Disambiguation. Word Sense Disambiguation (WSD) Given A word in context A fixed inventory of potential word senses Decide which sense of the.

Mining and Summarizing Customer Reviews

Part 4: Supervised Methods of Word Sense Disambiguation.

Word sense disambiguation (2) Instructor: Paul Tarau, based on Rada Mihalcea’s original slides Note: Some of the material in this slide set was adapted.

Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

9/8/20151 Natural Language Processing Lecture Notes 1.

1 Statistical NLP: Lecture 10 Lexical Acquisition.

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.

Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Word Sense Disambiguation Many words have multiple meanings –E.g, river bank, financial bank Problem: Assign proper sense to each ambiguous word in text.

Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.

Word Sense Disambiguation UIUC - 06/10/2004 Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez TALP, LSI,

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.

A Language Independent Method for Question Classification COLING 2004.

An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Word Sense Disambiguation Kyung-Hee Sung Foundations of Statistical NLP Chapter 7.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Lecture 21 Computational Lexical Semantics Topics Features in NLTK III Computational Lexical Semantics Semantic Web USCReadings: NLTK book Chapter 10 Text.

Disambiguation Read J & M Chapter 17.1 – The Problem Washington Loses Appeal on Steel Duties Sue caught the bass with the new rod. Sue played the.

CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.

Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.

Supertagging CMSC Natural Language Processing January 31, 2006.

Automatic recognition of discourse relations Lecture 3.

Part-of-speech tagging

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

NATURAL LANGUAGE PROCESSING

Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.

Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”

CS 388: Natural Language Processing: Word Sense Disambiguation

Lecture 21 Computational Lexical Semantics

Using UMLS CUIs for WSD in the Biomedical Domain

Statistical NLP: Lecture 9

Computational Lexical Semantics

Automatic Detection of Causal Relations for Question Answering

CS246: Information Retrieval

Chapter 10: Compilers and Language Translation

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

11 Chapter 20 Computational Lexical Semantics

Supervised Word-Sense Disambiguation (WSD) Methods that learn a classifier from manually sense-tagged text using machine learning techniques. –Classifier: machine learning model for classifying instances into one of a fixed set of classes Treats WSD as a classification problem, where a target word is assigned the most likely sense (from a given sense inventory), based on the context in which the word appears. 2

3 Supervised Learning for WSD Assume the POS of the target word is already determined. Encode context using a set of features to be used for disambiguation. Given labeled training data, encode it using these features, and train a machine learning algorithm. The result is a classifier. Use the trained classifier to disambiguate future instances of the target word (test data), given their contextual features (the same features)

Sense Tagged Text Bonnie and Clyde are two really famous criminals, I think they were bank/1 robbers My bank/1 charges too much for an overdraft. I went to the bank/1 to deposit my check and get a new ATM card. The University of Minnesota has an East and a West Bank/2 campus right on the Mississippi River. My grandfather planted his pole in the bank/2 and got a great big catfish! The bank/2 is pretty muddy, I can’t walk there.

5 Feature Engineering The success of machine learning requires instances to be represented using an effective set of features that are correlated with the categories of interest. Feature engineering can be a laborious process that requires substantial human expertise and knowledge of the domain. In NLP it is common to extract many (even thousands of) potential features and use a learning algorithm that works well with many relevant and irrelevant features.

6 Contextual Features Surrounding bag of words. POS of neighboring words Local collocations Syntactic relations Experimental evaluations indicate that all of these features are useful; and the best results comes from integrating all of these cues in the disambiguation process.

7 Surrounding Bag of Words Unordered individual words near the ambiguous word (their exact positions are ignored) To create the features: –Let BOW be an empty hash table –For each sentence in the training data: For each word W within +-N words of the target word: –If W not in BOW: then BOW[W] = 0 –BOW[W] += 1 –Let Fs be a list of the K most frequent words in BOW, excluding “stop words” “Stop words”: pronouns, numbers, conjunctions, and other “function” words. Standard lists of stop words are available –Define K features for each sentence, one for each of the K words: Feature i is the number of Fs[i] appearing within +- N of the target word

Surrounding Bag of Words Features: Example Example, disambiguating bass.n 12 most frequent content words from a collection of bass.n sentences from the WSJ (J&M p. 641): –[fishing,big,sound,player,fly,rod,pound,double,runs,pla ying,guitar,band] “An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps.” Features for that sentence: [0,0,0,1,0,0,0,0,0,0,1,0] –In an arff file, these would be the values in 12 of the feature (attribute) columns 8

9 Surrounding Bag of Words Idea? They are general topical cues of the context (“global” features)

10 POS of Neighboring Words Use part-of-speech of immediately neighboring words. Provides evidence of local syntactic context. P -i is the POS of the word i positions to the left of the target word. P i is the POS of the word i positions to the right of the target word. Typical to include features for: P -3, P -2, P -1, P 1, P 2, P 3

11 POS of Neighboring Words “An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps.” Features for the sentence: –[JJ,NN,CC,NN,VB,IN] –6 more feature/attribute columns in the arff file

12 Local Collocations Specific lexical context immediately adjacent to the word. For example, to determine if “interest” as a noun refers to “readiness to give attention” or “money paid for the use of money”, the following collocations are useful: –“in the interest of” –“an interest in” –“interest rate” –“accrued interest” C i,j is a feature of the sequence of words from i to j relative to the target word. –C -2,1 for “in the interest of” is “in the of” Typical to include: –Single word context: C -1,-1, C 1,1, C -2,-2, C 2,2 –Two word context: C -2,-1, C -1,1,C 1,2 –Three word context: C -3,-1, C -2,1, C -1,2, C 1,3

13 Local Collocations Typical to include: –Single word context: C -1,-1, C 1,1, C -2,-2, C 2,2 –Two word context: C -2,-1, C -1,1,C 1,2 –Three word context: C -3,-1, C -2,1, C -1,2, C 1,3 “An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps.” Features for this sentence: [and,player,guitar,stand,guitar and,and player,player stand,electric guitar and,guitar and player,and player stand,player stand off] (11 more columns in arff) What’s the difference with the bag-of-words features? These features reflect position, and are N-grams (fixed sequences). They more richly capture the local context of the target word. Bag-of-words features, in contrast, are more general clues of the topic.

14 Syntactic Relations (Ambiguous Verbs) For an ambiguous verb, it is very useful to know its direct object. 1.“played the game” 2.“played the guitar” 3.“played the risky and long-lasting card game” 4.“played the beautiful and expensive guitar” 5.“played the big brass tuba at the football game” 6.“played the game listening to the drums and the tubas” May also be useful to know its subject: 1.“The game was played while the band played.” 2.“The game that included a drum and a tuba was played on Friday.”

15 Syntactic Relations (Ambiguous Nouns) For an ambiguous noun, it is useful to know what verb it is an object of: –“played the piano and the horn” –“poached the rhinoceros’ horn” May also be useful to know what verb it is the subject of: –“the bank near the river loaned him $100” –“the bank is eroding and the bank has given the city the money to repair it”

16 Syntactic Relations (Ambiguous Adjectives) For an ambiguous adjective, it useful to know the noun it is modifying. 1.“a brilliant young man” 2.“a brilliant yellow light” 3.“a wooden writing desk” 4.“a wooden acting performance”

17 Using Syntax in WSD (per-word classifiers) Produce a parse tree for a sentence using a syntactic parser. For ambiguous verbs, use the head word of its direct object and of its subject as features. For ambiguous nouns, use verbs for which it is the object and the subject as features. For ambiguous adjectives, use the head word (noun) of its NP as a feature. John ProperN NP S VP V played NP DETN the piano

18 Syntactic Relations (Ambiguous Verbs) Feature: head of direct object (special value null if none) 1.“played the game” game 2.“played the guitar” guitar 3.“played the risky and long-lasting card game” game 4.“played the beautiful and expensive guitar” guitar 5.“played the big brass tuba at the football game” tuba 6.“played the game listening to the drums and the tubas” game Feature: head of subject (special value null if none) 1.“The game was played game while the band played band.” (two instances of “played” in one sentence) 2.“The game that included a drum and a tuba was played on Friday.” game

19 Syntactic Relations (Ambiguous Nouns) Feature: Head verb that the target is the object of –“played the piano and the horn” played –“poached the rhinoceros’ horn” poached Feature: Head verb that the target is the subject of –“the bank near the river loaned him $100” loaned –“the bank is eroding eroding and the bank has given the city the money to repair it” given

20 Syntactic Relations (Ambiguous Adjectives) Feature: Noun the adjective modifies 1.“a brilliant young man” man 2.“a brilliant yellow light” light 3.“a wooden writing desk” desk 4.“a wooden acting performance” performance

Summary: Supervised Methodology Create a sample of training data where a given target word is manually annotated with a sense from a predetermined set of possibilities. –One tagged word per instance Select a set of features with which to represent context. –co-occurrences, collocations, POS tags, verb-obj relations, etc... Convert sense-tagged training instances to feature vectors. Apply a machine learning algorithm to induce a classifier. –Form – structure or relation among features –Parameters – strength of feature interactions Convert a held out sample of test data into feature vectors. –“correct” sense tags are known but not used Apply classifier to test instances to assign a sense tag.

Supervised Learning Algorithms Once data is converted to feature vector form, any supervised learning algorithm can be used. Many have been applied to WSD with good results: –Support Vector Machines –Nearest Neighbor Classifiers –Decision Trees –Decision Lists –Naïve Bayesian Classifiers –Perceptrons –Neural Networks –Graphical Models –Log Linear Models

Summary: Supervised WSD with Individual Classifiers Many supervised Machine Learning algorithms have been applied to Word Sense Disambiguation, most work reasonably well. –(Witten and Frank, 2000) is a great intro. to supervised learning. Features tend to differentiate among methods more than the learning algorithms. Good sets of features tend to include: –Co-occurrences or keywords –Collocations –Bigrams and Trigrams –Part of speech –Syntactic features

Convergence of Results Accuracy of different systems applied to the same data tends to converge on a particular value, no one system shockingly better than another. –Senseval-1, a number of systems in range of % accuracy for English Lexical Sample task (a small number of words, so it is feasible to develop one classifier per word) –Senseval-2, a number of systems in range of % accuracy for English Lexical Sample task. –Senseval-3, a number of systems in range of % accuracy for English Lexical Sample task…

25 Evaluation of WSD “In vitro”: –Corpus developed in which one or more ambiguous words are labeled with explicit sense tags according to some sense inventory. –Corpus used for training and testing WSD and evaluated using accuracy (percentage of labeled words correctly disambiguated). Use most common sense selection as a baseline. “In vivo”: –Incorporate WSD system into some larger application system, such as machine translation, information retrieval, or question answering. –Evaluate relative contribution of different WSD methods by measuring performance impact on the overall system on final task (accuracy of MT, IR, or QA results).