A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.

Slides:



Advertisements
Similar presentations
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Engeniy Gabrilovich and Shaul Markovitch American Association for Artificial Intelligence 2006 Prepared by Qi Li.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Search Engines and Information Retrieval
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Chapter 5: Information Retrieval and Web Search
Aiding WSD by exploiting hypo/hypernymy relations in a restricted framework MEANING project Experiment 6.H(d) Luis Villarejo and Lluís M à rquez.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer. Customizing the Content: The placeholders in this.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Search Engines and Information Retrieval Chapter 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Querying Structured Text in an XML Database By Xuemei Luo.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
Chapter 6: Information Retrieval and Web Search
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Presenter: Shanshan Lu 03/04/2010
Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Unsupervised Word Sense Disambiguation REU, Summer, 2009.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
1 Centroid Based multi-document summarization: Efficient sentence extraction method Presenter: Chen Yi-Ting.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
A method for WSD on Unrestricted Text
Presentation transcript:

A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston Chris Bowes and David Brown University of Houston-Downtown Human Language Technologies -North American Chapter of the ACL(HLT-NAACL) 2009

2 Introduction(1/7) In many natural languages, a word can represent multiple meanings/senses, and such a word is called a homograph. Word sense disambiguation(WSD) is the process of determining which sense of a homograph is used in a given context. WSD is a long-standing problem in Computational Linguistics, and has significant impact in many real- world applications including machine translation, information extraction, and information retrieval.

3 Introduction(2/7) Generally, WSD methods use the context of a word for its sense disambiguation, and the context information can come from either annotated/unannotated text or other knowledge resources, such as Word-Net, SemCor, Open Mind Word Expert, eXtended WordNet, Wikipedia. WSD can be viewed as a classification task: word senses are the classes, and an automatic classification method is used to assign each occurrence of a word to one or more classes based on the evidence from the context and from external knowledge sources.

4 Introduction(3/7) We can broadly distinguish two main approaches to WSD: –supervised WSD: these approaches use machine-learning techniques to learn a classifier from labeled training sets, that is, sets of examples encoded in terms of a number of features together with their appropriate sense label (or class) –unsupervised WSD: these methods are based on unlabeled corpora, and do not exploit any manually sense-tagged corpus to provide a sense choice for a word in context

5 Introduction(4/7) Disambiguation of a limited number of words is not hard, and necessary context information can be carefully collected and hand-crafted to achieve high disambiguation accuracy. However, such approaches suffer a significant performance drop in practice when domain or vocabulary is not limited. Such a “cliff-style” performance collapse is called brittleness, which is due to insufficient knowledge and shared by many techniques in Artificial Intelligence.

6 Introduction(5/7) The main challenge of a WSD system is how to overcome the knowledge acquisition bottleneck and efficiently collect the huge amount of context knowledge. More precisely, a practical WSD need figure out how to create and maintain a comprehensive, dynamic, and up- todate context knowledge base in a highly automatic manner. The context knowledge required in WSD has the following properties: –The context knowledge need cover a large number of words and their usage. –Natural language is not a static phenomenon. Such dynamics requires a timely maintenance and updating of context knowledge base

7 Introduction(6/7) Inspired by human WSD process, we choose an electronic dictionary and unannotated text samples of word instances as context knowledge sources for our WSD system. Both sources can be automatically accessed, provide an excellent coverage of word meanings and usage, and are actively updated to reflect the current state of languages. In this paper we present a fully unsupervised word sense disambiguation method that requires only a dictionary and unannotated text as input.

8 Introduction(7/7) Such an automatic approach overcomes the problem of brittleness suffered in many existing methods and makes broad-coverage word sense disambiguation feasible in practice. We evaluated our approach using SemEval 2007, and our system significantly outperformed the best unsupervised system participating in SemEval 2007 and achieved the performance approaching top-performing supervised systems.

9 Knowledge Base Acquisition(1/4)

10 Knowledge Base Acquisition(2/4) The goal of this step is to collect as many as possible valid sample sentences containing the instances of to-be- disambiguated words. Two possible text sources: –Electronic text collection: e.g., Gutenberg project, such collections often include thousands of books, which are often written by professionals and can provide many valid and accurate usage of a large number of words. –Web documents. Each single word that need to be disambiguated is submitted to a Web search engine as a query. Sentences that contain the instances of a specific word are extracted and saved into a local repository.

11 Knowledge Base Acquisition(3/4) Sentences organized according to each word are sent to a dependency parser, Minipar. After parsing sentences are converted to parsing trees and saved in files. A dependency relation includes one head word/node and one dependent word/node. Nodes from different dependency relations are merged into one as long as they represent the same word.

12 Knowledge Base Acquisition(4/4) Many companies hire computer programmers. Computer programmers write software. head word dependent word dependency relation the number of occurrences of dependency relation

13 Stop for clarification A is dependent on B, we will notate as follows: Besides, A is a head word. B is a dependent word. A B Head word Dependent word

14 MiniPar Input: A polite person never interrupt others while they are discussing important matters. Output:

15 WSD Algorithm(1/8) Our WSD approach is based on the following insight: Assume that the text to be disambiguated is semantically valid, if we replace a word with its glosses one by one, the correct sense should be the one that will maximize the semantic coherence within this word’s context.glosses The gloss most semantically coherent with the original sentence will be chosen as the correct sense. If a word is semantically coherent with its context, then at least one sense of this word is semantically coherent with its context.

16 WSD Algorithm(2/8) Based on the following hypotheses (assume word 1 is the to-be-disambiguated word) to measure the semantic coherence: –In a sentence if word 1 is dependent on word 2, and we denote the gloss of the correct sense of word1 as g1 i, then g1 i contains the most semantically coherent words that are dependent on word 2 ; –In a sentence if a set of words DEP 1 are dependent on word 1, and we denote the gloss of the correct sense of word 1 as g1 i, then g1 i contains the most semantically coherent words that DEP 1 are dependent on. A large software company hires many computer programmers. DEP1 word1 g1 i word2 hirecompanylarge institutio n unit

17 WSD Algorithm(3/8)

18 WSD Algorithm(4/8) Input: Glosses from WordNet; S: the sentence to be disambiguated; G: the knowledge base generated in Section 2; 1. Input a sentence S, W = {w| w’s part of speech is noun, verb, adjective, or adverb, w S}; 2. Parse S with a dependency parser, generate parsing tree T S ; 3. For each w W { 4. Input all w’s glosses from WordNet; 5. For each gloss w i { 6. Parse w i, get a parsing tree T wi ; 7. score = TreeMatching(T S, T wi ); } 8. If the highest score is larger than a preset threshold, choose the sense with the highest score as the correct sense; 9. Otherwise, choose the first sense. 10. }

19 WSD Algorithm(5/8) TreeMatching(T S, T wi ) 11. For each node n Si T S { 12. Assign weight w Si =, l Si is the length between n Si and w i in T S ; 13. } 14. For each node n wi T wi { 15. Load its dependent words Dw i from G; 16. Assign weight w wi =, l wi is thelevel number of n wi in T wi ; 17. For each n Sj { 18. If n Sj D wi 19. calculate connection strength s ji between n Sj and n wi ; 20. score = score + w Si × w wi × s ji ; 21. } 22. } 23. Return score;

20 WSD Algorithm(6/8)

21 WSD Algorithm(7/8)

22 WSD Algorithm(8/8) Original sentence: A large company hires many computer programmers. Disambiguated word: company Two glosses from Wordnet (company, noun): –an institution created to conduct business (sense 1) –small military unit (sense 2) Based on WSD algorithm: –sense1: 1.0×1.0× ×0.25× ×0.25×0.9=1.125 –sense2: 1.0×1.0×0.8=0.8 The correct sense should be sense 1.

23 Matching Strategies S1: dependency matching: TreeMatching. S2: backward matching: strategy1 will not work if a to- be-disambiguated word has no dependent words at all. Therefore, to match the head words that the to-be- disambiguated word is dependent on. –e.g. Companies hire computer programmers” –helpful for adjectives and adverbs disambiguation S3: synonym matching: to consider synonyms as a match besides the exact matching words.

24 Experiment(1/3) We have evaluated our method using SemEval-2007 Task 07 (Coarse-grained English All-words Task) test set. The task organizers provide a coarse-grained sense inventory created with SSI algorithm (Navigli and Velardi, 2005), training data, and test data. Since our method does not need any training or special tuning, neither coarse-grained sense inventory nor training data was used. Roberto Navigli and Paola Velardi Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 27(7):

25 Experiment(2/3) Test data: We followed the WSD process described in Section 2 and 3 using the WordNet 2.1 sense repository that is adopted by SemEval-2007 Task 07. Among the 2269 to-be- disambiguated words in the five test documents, 1112 words are unique and submitted to Google API as queries. No total disambiguated

26 Experiment(3/3) The retrieved Web pages were cleaned, and relevant sentences were extracted. The Web page retrieval step took 3 days, and the cleaning step took 2 days. Parsing was very time- consuming and took 11 days. The merging step took 3 days. Disambiguation of 2269 words in the 5 test articles took 4 hours.

27 Impact of different matching strategies To test the effectiveness of different matching strategies discussed in Section 3, we performed some additional experiments. Table 2 shows the disambiguation results by each individual document with the following 5 matching strategies: –Dependency matching only. –Dependency and backward matching. –Dependency and synonym backward matching. –Dependency and synonym dependency matching. –Dependency, backward, synonym backward, and synonym dependency matching.

28 Experiment Results As expected combination of more matching strategies results in higher disambiguation quality. Backward matching is especially useful to disambiguate adjectives and adverbs. Since synonyms are semantically equivalent, it is reasonable that synonym matching can also improve disambiguation performance.

29 Impact of knowledge base size To test the impact of knowledge base size to disambiguation quality we randomly selected sentences (about two thirds of all sentences) from our text collection and built a smaller knowledge base.

30 Conclusion Broad coverage and disambiguation quality are critical for WSD techniques to be adopted in practice. This paper proposed a fully unsupervised WSD method and evaluated it with SemEval-2007 data set. The future works includes: –Continue to build the knowledge base, enlarge the coverage and improve the system performance. –Call for a smarter and more selective matching strategy.