Unsupervised Word Sense Disambiguation Rivaling Supervised Methods 1998. 12. 10. Oh-Woog Kwon KLE Lab. CSE POSTECH.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Decision Trees Decision tree representation ID3 learning algorithm
Decision Tree Approach in Data Mining
JavaConLib GSLT: Java Development for HLT Leif Grönqvist – 11. June :30.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.
The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Chapter 1: Introduction to Pattern Recognition
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
Faculty Of Applied Science Simon Fraser University Cmpt 825 presentation Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary Jiri.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Decision List LING 572 Fei Xia 1/18/06. Outline Basic concepts and properties Case study.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Semi-Supervised Natural Language Learning Reading Group I set up a site at: ervised/
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Radial Basis Function Networks
Evaluating Performance for Data Mining Techniques
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
Image Classification 영상분류
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
HyperLex: lexical cartography for information retrieval Jean Veronis Presented by: Siddhanth Jain( ) Samiulla Shaikh( )
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
Part 5. Minimally Supervised Methods for Word Sense Disambiguation.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Word Translation Disambiguation Using Bilingial Bootsrapping Paper written by Hang Li and Cong Li, Microsoft Research Asia Presented by Sarah Hunter.
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.
Decision List LING 572 Fei Xia 1/12/06. Outline Basic concepts and properties Case study.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas
Cost-Sensitive Learning
Introduction to Data Mining, 2nd Edition by
Cost-Sensitive Learning
Statistical NLP: Lecture 9
CSCI N317 Computation for Scientific Applications Unit Weka
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Oh-Woog Kwon KLE Lab. CSE POSTECH

Introduction w An Unsupervised Algorithm for WSD  Avoids the need for costly hand-tagged training data  Using two powerful properties of human language 1. One sense per collocation (dictionary definition of collocation): 2. One sense per discourse: 동물의 눈은 물체를 보는 기관이다. bank …. Text:101 Same sense Only one sense (eye), not two sense (eye or snow)

One Sense Per Discourse w A Test for One Sense Per Discourse  Table of pp. 189 (using 37,232 hand-tagged examples)  Accuracy: discourse 에서 같은 단어는 같은 의미로 사용되나 ? (99.8%)  Applicability: 한 discourse 에서 두 번 이상 나타나는가 ? (50.1%) w Advantage of One Sense Per Discourse  Conjunction with separate models of local context for each word … bank … bank … bank …. Text:101 Local context of bank = + +

One Sense Per Collocation w The Type of Collocation (predictive degree)  Immediately adjacent collocations > collocations with distance  At equivalent distance, predicate-argument relationship > arbitrary associations  Collocations with content words > collocations with function words  adjacent content words can disambiguate word sense. w A Supervised Algorithm based on Above Property  Decision List Algorithm [Yarowsky, ACL94]  Accent Restoration in Spanish and French  Be used as a component of the proposed unsupervised algorithm

Decision List Algorithm Step 1: Identify the Ambiguities in the Target Word ex) 눈 : eye, snow Step 2: Collect Training Context, for Each Sense ex) eye : … 사람의 눈은 좋은 …, 곤충의 눈은 머리에 …, … … snow: … 하늘에서 눈이 내리고 …, … 어제 눈이 내려 …, … … Step 3: Measure Collocational Distribution ex) -1 w [ 사람 눈 ] : eye (1,000), snow (0)  k w [ 하늘 within  k words] : eye (2), snow (10,000) Step 4: Sort by Log-Likelihood into Decision Lists Step 5: Optional Pruning and Interpolation Step 6: Train Decision Lists for General Classes of Ambiguity Step 7: Classification using Decision Lists Using only the single most reliable collocation matched in the target context

Unsupervised Learning Algorithm - 1 w Illustrated by the disambiguation of 7,538 instances of plant w STEP 1:  Collect contexts in untagged training set (right column of pp. 190) w STEP 2: a) Choose a small number of seed collocations of each sense b) Tagging all training examples containing the seed collocates with seed’s sense label => two seed sets (left column of pp. 191, Figure 1)  Options for Training Seeds Use words in dictionary definitions Use a single defining collocate for each class (using thesaurus(WordNet)) Label salient corpus collocates (not fully automatic): –use of words that co-occur with the target word –a human judge decide which one

Unsupervised Learning Algorithm - 2 w STEP 3: (pp. 192, Figure 2) a) Train the supervised classification algorithm on two seed sets b) Classify the entire sample set using the resulting classifier of (a) Add examples with probability above a threshold to the seed sets c) Using one-sense-per-discourse constraint (option) Detect the dominate sense for each discourse (using threshold). Augmentation: If the dominate sense exists, add previously untagged contexts to the seed set of the dominate sense Filtering: Otherwise, return all instances in the discourse (where there is substantial disagreement for the dominate sense) to the residual set. d) Repeat Step 3. Can escape from initial misclassification Two techniques to avoid a local minimum –incrementally increasing the width of the context window periodically –randomly perturbing the class-inclusion threshold, similar to simulated annealing.

Unsupervised Learning Algorithm - 3 w STEP 4: Stop, when converging on a stable residual set. w STEP 5: Classify new data using final decision lists  For error correction, optionally use one-sense-per-discourse constraint.

Evaluation w The test data  extracted from a 460 million word corpus  the type of data: news article, scientific abstracts, spoken transcripts, and novels used in the previous researches. w Comparison System (see Table in pp. 194)  (5) : using supervised algorithm  (6) : using only two words as seeds  (7) : using the salient words of a dictionary definition as seeds  (8) : using quick hand tagging of a list of algorithmically-identified salient collocates  (9) : (7) + using one-sense-per-discourse only in classification procedure  (10) : (9) + using one-sense-per-discourse in the learning

Conclusion w Unsupervised Word Sense Disambiguation Rivaling Supervised Methods