A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations Peter D. Turney Institute for Information Technology National Research Council of.

Slides:

Advertisements

Similar presentations

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Advertisements

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.

ECG Signal processing (2)

Clustering Categorical Data The Case of Quran Verses

Imbalanced data David Kauchak CS 451 – Fall 2013.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

K nearest neighbor and Rocchio algorithm

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Predicting the Semantic Orientation of Adjectives

CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.

Ensemble Learning (2), Tree and Forest

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Efficient Model Selection for Support Vector Machines

Active Learning for Class Imbalance Problem

by B. Zadrozny and C. Elkan

Lecture 6: The Ultimate Authorship Problem: Verification for Short Docs Moshe Koppel and Yaron Winter.

Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.

Universit at Dortmund, LS VIII

Date: 2013/8/27 Author: Shinya Tanaka, Adam Jatowt, Makoto P. Kato, Katsumi Tanaka Source: WSDM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Estimating.

A Language Independent Method for Question Classification COLING 2004.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Expressing Implicit Semantic Relations without Supervision ACL 2006.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Presenter: Shanshan Lu 03/04/2010

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.

Automatic Syllabus Classification JCDL – Vancouver – 22 June 2007 Edward A. Fox (presenting co-author), Xiaoyan Yu, Manas Tungare, Weiguo Fan, Manuel Perez-Quinones,

Overview Concept Learning Representation Inductive Learning Hypothesis

Chapter 23: Probabilistic Language Models April 13, 2004.

Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.

Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.

Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,

1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,

A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.

Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.

UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Learning Analogies and Semantic Relations Nov William Cohen.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.

From Frequency to Meaning: Vector Space Models of Semantics

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Presentation transcript:

A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations Peter D. Turney Institute for Information Technology National Research Council of Canada Coling 2008

Outline Introduction Classifying Analogous Word Pairs Experiments – SAT Analogies – TOEFL Synonyms – Synonyms and Antonyms – Similar, Associated, and Both Discussion Limitations and Future Work Conclusion 2008/9/102

Introduction (1/5) A pair of words (petrify:stone) is analogous to another pair (vaportize:gas) when the semantic relations between the words in the first pair are highly similar to the relations in the second pair. Two words (levied and imposed) are synonymous in a context (levied a tax) when they can be interchanged (imposed a tax); they are antonymous when they have opposite meanings (black and white); they are associated when they tend to co-occur (doctor and hospital). 2008/9/103

Introduction (2/5) On the surface, it appears that these are four distinct semantic classes, requiring distinct NLP algorithms, but we propose a uniform approach to all four. We subsume synonyms, antonyms, and associations under analogies. We say that X and Y are: – synonyms when the pair X:Y is analogous to the pair levied:imposed – antonyms when they are analogous to the pair black:white – associated when they are analogous to the pair doctor:hospital 2008/9/104

Introduction (3/5) There is past work on recognizing analogies, synonyms, antonyms, and associations, but each of these four tasks has been examined separately, in isolation from the others. As far as we know, the algorithm proposed here is the first attempt to deal with all four tasks using a uniform approach. 2008/9/105

Introduction (4/5) We believe that it is important to seek NLP algorithms that can handle a broad range of semantic phenomena, because developing a specialized algorithm for each phenomena is a very inefficient research strategy. It might seem that a lexicon, such as WordNet, contains all the information we need to handle these four tasks. 2008/9/106

Introduction (5/5) However, we prefer to take a corpus-based approach to semantics. – Veale (2004) used WordNet to answer374 multiple-choice SAT analogy questions, achieving an accuracy of 43%, but the best corpus-based approach attains an accuracy of 56% (Turney, 2006). – Another reason to prefer a corpus-based approach to a lexicon- based approach is that the former requires less human labour, and thus it is easier to extend to other languages. 2008/9/107

Classifying Analogous Word Pairs (1/11) An analogy, A:B::C:D, asserts that A is to B as C is to D. – for example, traffic:street::water:riverbed asserts that traffic is to street as water is to riverbed. – that is, the semantic relations between traffic and street are highly similar to the semantic relations between water and riverbed. We may view the task of recognizing word analogies as a problem of classifying word pairs. 2008/9/108

Classifying Analogous Word Pairs (2/11) 2008/9/109

Classifying Analogous Word Pairs (3/11) We approach this as a standard classification problem for supervised machine learning. The algorithm takes as input a training set of word pairs with class labels and a testing set of word pairs without labels. Each word pair is represented as a vector in a feature space and a supervised learning algorithm is used to classify the feature vectors. 2008/9/1010

Classifying Analogous Word Pairs (4/11) The elements in the feature vectors are based on the frequencies of automatically defined patterns in a large corpus. The output of the algorithm is an assignment of labels to the word pairs in the testing set. For some of the experiments, we select a unique label for each word pair; for other experiments, we assign probabilities to each possible label for each word pair. 2008/9/1011

Classifying Analogous Word Pairs (5/11) For a given word pair, such as mason:stone, the first step is to generate morphological variations, such as masons:stones. The second step is to search in a large corpus for all phrases of the following form: [0 to 1 words] X [0 to 3 words] Y [0 to 1 words] In this template, X:Y consists of morphological variations of the given word pair, in either order. – Ex: mason:stone, stone:mason, masons:stones, and so on. 2008/9/1012

Classifying Analogous Word Pairs (6/11) In the following experiments, we search in a corpus of 5 × words (about 280 GB of plain text), consisting of web pages gathered by a web crawler. The next step is to generate patterns from all of the phrases that were found for all of the input word pairs (from both the training and testing sets). To generate patterns from a phrase, we replace the given word pairs with variables, X and Y, and we replace the remaining words with a wild card symbol (an asterisk) or leave them as they are. 2008/9/1013

Classifying Analogous Word Pairs (7/11) For example, the phrase "the mason cut the stone with" yields the patterns "the X cut * Y with", "* X * Y *", and so on. – If a phrase contains n words, then it yields 2 (n-2) patterns. Each pattern corresponds to a feature in the feature vectors that we will generate. Since a typical input set of word pairs yields millions of patterns, we need to use feature selection, to reduce the number of patterns to a manageable quantity. 2008/9/1014

Classifying Analogous Word Pairs (8/11) For each pattern, we count the number of input word pairs that generated the pattern. – Ex: "* X cut * Y" is generated by both mason:stone and carpenter:wood. We then sort the patterns in descending order of the number of word pairs that generated them. If there are N input word pairs, then we select the top kN patterns and drop the remainder. – In the following experiments, k is set to 20. – The algorithm is not sensitive to the precise value of k. 2008/9/1015

Classifying Analogous Word Pairs (9/11) The next step is to generate feature vectors, one vector for each input word pair. Each of the N feature vectors has kN elements, one element for each selected pattern. The value of an element in a vector is given by the logarithm of the frequency in the corpus of the corresponding pattern for the given word pair. – If f phrases match the pattern, then the value of this element in the feature vector is log(f+1). 2008/9/1016

Classifying Analogous Word Pairs (10/11) Each feature vector is normalized to unit length. – The normalization ensures that features in vectors for high- frequency word pairs (traffic:street) are comparable to features in vectors for low-frequency word pairs (water:riverbed). Now that we have a feature vector for each input word pair, we can apply a standard supervised learning algorithm. In the following experiment, we use a sequential minimal optimization (SMO) support vector machine (SVM) with a radial basis function (RBF) kernel. 2008/9/1017

Classifying Analogous Word Pairs (11/11) The algorithm generates probability estimates for each class by fitting logistic regression models to the output of the SVM. We chose the SMO RBF algorithm because it is fast, robust, and it easily handles large numbers of features. For convenience, we will refer to the above algorithm as PairClass. – In the following experiments, PairClass is applied to each of the four problems with no adjustments or tuning to the specific problems. 2008/9/1018

SAT Analogies (1/5) In this section, we apply PairClass to the task of recognizing analogies. To evaluate the performance, we use a set of 374 multiple-choice questions from the SAT college entrance exam. The target pair is called the stem, and the task is to select the choice pair that is most analogous to the stem pair. 2008/9/1019

SAT Analogies (2/5) 2008/9/1020

SAT Analogies (3/5) We may view Table 2 as a binary classification problem, in which mason:stone and carpenter:wood are positive examples and the remaining word pairs are negative examples. – That is, the training set consists of one positive example (the stem pair) and the testing set consists of five unlabeled examples (the five choice pairs). To make this task more tractable, we randomly choose a stem pair from one of the 373 other SAT analogy questions, and we assume that this new stem pair is a negative example. 2008/9/1021

SAT Analogies (4/5) 2008/9/1022

SAT Analogies (5/5) To answer the SAT question, we use PairClass to estimate the probability that each testing example is positive, and we guess the testing examples with the highest probability. To increase the stability, we repeat the learning process 10 times, using a different randomly chosen negative training example each time. For each testing word pair, the 10 probability estimates are averaged together. 2008/9/1023

TOEFL Synonyms (1/4) Now we apply PairClass to the task of recognizing synonyms, using a set of 80 multiple-choice synonym questions from the TOEFL (test of English as a foreign language). The task is to select the choice word that is most similar to the stem word. We may view Table 4 as a binary classification problem, in which the pair levied:imposed is a positive example of the class synonyms and the other possible pairs are negative examples. 2008/9/1024

TOEFL Synonyms (2/4) 2008/9/1025

TOEFL Synonyms (3/4) 2008/9/1026

TOEFL Synonyms (4/4) The 80 TOEFL questions yield 320 (80 × 4) word pairs, 80 labeled positive and 240 labeled negative. We apply PairClass to the word pair using ten-fold cross- validation. – In each random fold, 90% of the pairs are used for training and 10% are used for testing. – For each fold, the model that is learned from the training set is used to assign probabilities to the pairs in the training set. – Our guess for each TOEFL question is the choice with the highest probability of being positive, when paired with the corresponding item. 2008/9/1027

Synonyms and Antonyms (1/2) The task of classifying word pairs as either synonyms or antonyms readily fit into the framework of supervised classification of word pairs. Table 6 shows some examples from a set of 136 ESL (English as a second language) practice questions that we collected from various ESL websites. Ten-fold cross-validation is used. 2008/9/1028

Synonyms and Antonyms (2/2) 2008/9/1029

Similar, Associated, and Both (1/3) A common criticism of corpus-based measures of word similarity (as opposed to lexicon-based measures) is that they are merely detecting associations (co-occurrence), rather than actual semantic similarity. To address this criticism, Lund et al. (1995) evaluated their algorithm for measuring word similarity with word pairs that were labeled similar, associated, or both. Table 7 shows some examples from this collection of 144 word pairs (48 pairs in each of the three classes). 2008/9/1030

Similar, Associated, and Both (2/3) 2008/9/1031

Similar, Associated, and Both (3/3) Lund et al. (1995) did not measure the accuracy of their algorithm on this three-class classification problem. Instead, following standard practice in cognitive psychology, they showed that their algorithm’s similarity scores for the 144 word pairs were correlated with the response times of human subjects in priming tests. – In a typical priming test, a human subject reads a priming test word (cradle) and is then asked to complete a partial word (complete bab as baby). Ten-fold cross-validation is used. 2008/9/1032

Discussion (1/2) 2008/9/1033

Discussion (2/2) As far as we know, this is the first time a standard supervised learning algorithm has been applied to any of these four problems. The advantage of being able to cast these problem in the framework of standard supervised learning problems is that we can now exploit the huge literature on supervised learning. 2008/9/1034

Limitations and Future Work (1/2) The main limitation of PairClass is the need for a large corpus. – Phrases that contain a pair of words tend to be more rare than phrases that contain either of the members of the pair, thus a large corpus is needed to ensure that sufficient numbers of phrases are found for each input word pair. – The size of the corpus has a cost in terms of disk space and processing time. – There may be ways to improve the algorithm, so that a smaller corpus is sufficient. 2008/9/1035

Limitations and Future Work (2/2) Another area for future work is to apply PairClass to more tasks. – WordNet includes more than a dozen semantic relations (e.g., synonyms, hyponyms, hypernyms, meronyms, holonyms, and antonyms). – PairClass should be applicable to all of these relations. Our potential applications include any task that involves semantic relations, such as word sense disambiguation, information retrieval, information extraction, and metaphors interpretation. 2008/9/1036

Conclusion (1/2) In this paper, we have described a uniform approach to analogies, synonyms, antonyms, and associations, in which all of these phenomena are subsumed by analogies. We view the problem of recognizing analogies as the classification of semantic relations between words. We believe that most of our lexical knowledge is relational, not attributional. – That is, meaning is largely about relations among words, rather than properties of individual words, considered in isolation. 2008/9/1037

Conclusion (2/2) The idea of subsuming a broad range of semantic phenomena under analogies has been suggested by many researchers. In NLP, analogical algorithms have been applied to machine translation, morphology, and semantic relations. Analogy provides a framework that has the potential to unify the field of semantics; this paper is a small step towards that goal. 2008/9/1038