Distributional clustering of English words Authors: Fernando Pereira, Naftali Tishby, Lillian Lee Presenter: Marian Olteanu.

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Supervised Learning Recap
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
Generative Topic Models for Community Analysis
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Measures of Distributional Similarity Presenter: Cosmin Adrian Bejan Lillian Lee Department of Computer Science Cornell University.
Clustering Evaluation April 29, Today Cluster Evaluation – Internal We don’t know anything about the desired labels – External We have some information.
Smoothing Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Automatic Verb Sense Grouping --- Term Project Proposal for CIS630 Jinying Chen 10/28/2002.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Modeling Consensus: Classifier Combination for WSD Authors: Radu Florian and David Yarowsky Presenter: Marian Olteanu.
Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu.
Distributional Clustering of English Words Fernando Pereira- AT&T Bell Laboratories, 600 Naftali Tishby- Dept. of Computer Science, Hebrew University Lillian.
Towards the automatic identification of adjectival scales: clustering adjectives according to meaning Authors: Vasileios Hatzivassiloglou and Kathleen.
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Part I: Classification and Bayesian Learning
Disambiguation of References to Individuals Levon Lloyd (State University of New York) Varun Bhagwan, Daniel Gruhl (IBM Research Center) Varun Bhagwan,
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Latent Dirichlet Allocation
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
John Lafferty Andrew McCallum Fernando Pereira
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Part of Speech Tagging in Context month day, year Alex Cheng Ling 575 Winter 08 Michele Banko, Robert Moore.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
An Adaptive Learning with an Application to Chinese Homophone Disambiguation from Yue-shi Lee International Journal of Computer Processing of Oriental.
Today Cluster Evaluation Internal External
Coarse-grained Word Sense Disambiguation
Statistical Models for Automatic Speech Recognition
Methods and Metrics for Cold-Start Recommendations
CSC 594 Topics in AI – Natural Language Processing
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
TOP DM 10 Algorithms C4.5 C 4.5 Research Issue:
Statistical NLP: Lecture 9
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
Statistical Models for Automatic Speech Recognition
N-Gram Model Formulas Word sequences Chain rule of probability
A method for WSD on Unrestricted Text
Supervised vs. unsupervised Learning
Papers 15/08.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Text Categorization Berlin Chen 2003 Reference:
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Distributional clustering of English words Authors: Fernando Pereira, Naftali Tishby, Lillian Lee Presenter: Marian Olteanu

Introduction Method for automatic clustering of words  Distribution in particular syntactic contexts  Deterministic annealing Find lowest distortion sets of clusters Increasing annealing parameters  Clusters subdivide – hierarchical “soft” clustering  Clusters Class models Word co-occurrence

Introduction Simple tabulation of frequencies  Data sparseness Hindle proposed smoothing based on clustering  Estimating likelihood of unseen events from the frequencies of “similar” events that have been seen Example: estimating the likelihood of a particular direct object for a verb from the likelihood of that direct object for similar verbs

Introduction Hindle’s proposal  Words are similar if there is strong statistical evidence that they tend to participate in the same events This paper  Factor word association tendencies into associations of words to certain hidden classes and association between classes themselves  Derive classes directly from data

Introduction Classes  Probabilistic concepts or clusters c p(c|w) for each word w  Different than classical “hard” Boolean classes  Thus, this method is more robust Is not strongly affected by errors in frequency counts Problem in this paper  2 word classes: V and N Relation between a transitive main verb and the head noun of the direct object

Problem Raw knowledge:  f vn – frequency of occurrence of a particular pair (v,n) in the training corpus Unsmoothed probability - conditional density:  p n (v) =  This is p(v|n) Problem  How to use p n to classify the n  N

Methodology Measure of similarity between distributions  Kullback-Leibler distance This problem  Unsupervised learning – leardn underlying distribution of data  Objects have no internal structure, the only info. – statistics about joint appearance (kind of supervised learning)

Distributional Clustering Goal – find clusters such that p n (v) is approximated by: Solve by EM

Hierarchical clustering Deterministic annealing  Sequence of phase transitions Increasing the parameter β  Local influence of each noun on the definition of centroids

Results

Evaluation Relative entropy  Where t n is the relative frequency distribution of verbs taking n as direct object in the test set

Evaluation Check if the model can disambiguate between two verbs, v and v’