Statistical NLP: Lecture 10

Slides:



Advertisements
Similar presentations
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Advertisements

January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Introduction to Machine Learning Approach Lecture 5.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Albert Gatt Corpora and Statistical Methods Lecture 5.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
Natural Language Processing : Lexical Acquisition Lecture 8 Pusan National University Minho Kim
Lexical Acquisition Extending our information about words, particularly quantitative information.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Rules, Movement, Ambiguity
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Overview of Statistical NLP IR Group Meeting March 7, 2006.
11-1 Chapter 11 Lexical Acquisition Lecture Overview Methodological Issues: Evaluation Measures Verb Subcategorization –the syntactic means by which.
Chapter 8 Lexical Acquisition February 19, 2007 Additional Notes to Manning’s slides.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Vasile Rus
WP4 Models and Contents Quality Assessment
Plan for Today’s Lecture(s)
CSC 594 Topics in AI – Natural Language Processing
E303 Part II The Context of Language Research
Chapter 7. Classification and Prediction
Statistical NLP: Lecture 7
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Statistical NLP: Lecture 3
Lecture 15: Text Classification & Naive Bayes
Statistical NLP: Lecture 13
Inferential statistics,
Relevance Feedback Hongning Wang
Statistical NLP: Lecture 9
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
An ICALL writing support system tunable to varying levels
CSCI 5832 Natural Language Processing
Basic Information Retrieval
Discrete Event Simulation - 4
Statistical NLP: Lecture 4
CS4705 Natural Language Processing
Chunk Parsing CS1573: AI Application Development, Spring 2003
Chapter 11 user support.
CS246: Information Retrieval
Language Model Approach to IR
INF 141: Information Retrieval
David Kauchak CS159 – Spring 2019
Statistical NLP : Lecture 9 Word Sense Disambiguation
CS249: Neural Language Model
Presentation transcript:

Statistical NLP: Lecture 10 Lexical Acquisition January 12, 2000

Goal of Lexical Acquisition Goal: To develop algorithms and statistical techniques for filling the holes in existing machine-readable dictionaries by looking at the occurrence patterns of words in large text corpora. Acquiring collocations and word sense disambiguation are examples of lexical acquisition, but there are many other types. Examples of lexical acquisition problems: selectional preferences, subcategorization frames, semantic categorization. January 12, 2000

Why is Lexical Acquisition Necessary? Language evolves. i.e., new words and new uses of old words are constantly invented. Traditional Dictionaries were written for the needs of human users. Lexicons are dictionaries formatted for computers. In addition to the format, lexicons can be useful if they contain quantitative information. Lexical acquisition can provide such information. Traditional Dictionaries draw a sharp boundary between lexical and non-lexical information. It may be useful to erase this distinction. January 12, 2000

Lecture Overview Methodological Issues: Evaluation Measures Verb Subcategorization Attachment Ambiguity Selectional Preferences Semantic Similarity January 12, 2000

Evaluation Measures Precision and Recall F Measure Precision and Recall versus Accuracy and Error Fallout Receiver Operating Characteristic (ROC) Curve January 12, 2000

Verb Subcategorization I Verbs express their semantic categories using different syntactic means. A particular set of syntactic categories that a verb can appear with is called a subcategorization frame. Most dictionaries doe not contain information on subcategorization frame. (Brent, 93)’s subcategorization frame learner tries to decide based on corpus evidence whether verb v takes frame f. It works in 2 steps. January 12, 2000

Verb Subcategorization II Brent’s Lerner system: Cues: Define a regular pattern of words and syntactic categories which indicates the presence of the frame with high certainty. For a particular cue cj we define a probability of error j that indicates how likely we are to make a mistake if we assign frame f to verb v based on cue cj. Hypothesis Testing: Define the null hypothesis, H0, as: “the frame is not appropriate for the verb”. Reject this hypothesis if the cue cj.indicate with high probability that our H0 is wrong. January 12, 2000

Verb Subcategorization III Brent’s system does well at precision, but not well at recall. (Manning, 93)’s system addresses this problem by using a tagger and running the cue detection on the output of the tagger. Manning’s method can learn a large number of subcategorization frames, even those that have only low-reliability cues. Manning’s results are still low and one way to improve them is to use prior knowledge. January 12, 2000

Attachment Ambiguity I When we try to determine the syntactic structure of a sentence, there are often phrases that can be attached to two or more different nodes in the tree. Which one is correct? A simple model for this problem consists of computing the following likelihood ratio: (v, n, p) = log (P(p|v)/P(p|n)) where P(p|v) is the probability of seeing a PP with p after the verb v and P(p|n) is the probability of seeing a PP with p after the noun n. Weakness of this model: it ignores the fact that other things being equal, there is a preference for attaching phrases “low” in the parse tree. January 12, 2000

Attachment Ambiguity II The preference bias for low attachment in the parse tree is formalized by (Hindle and Rooth, 1993) The model asks the following questions: Vap: Is there a PP headed by p and following the verb v which attaches to v (Vap=1) or not (Vap=0)? Nap: Is there a PP headed by p and following the noun n which attaches to n (Nap=1) or not (Nap=0)? We compute P(Attach(p)=n|v,n)=P(Nap=1|n) and P(Attach(p)=v|v,n)=P(Vap=1|v) P(Nap=0|n). January 12, 2000

Attachment Ambiguity III P(Attach(p)=v) and P(Attach(p)=n) can be assessed via a likelihood ratio  where (v, n, p) = log (P(Vap=1|v) P(Nap=0|n))/ P(Nap=1|n) We estimate the necessary probabilities using maximum likelihood estimates: P(Vap=1|v)=C(v,p)/C(v) P(Nap=1|n)=C(n,p)/C(n) January 12, 2000

General Remarks on PP Attachment There are some limitations to the method by Hindle and Rooth: Sometimes information other than v, n and p is useful. There are other types of PP attachment than the basic case of a PP immediately after an NP object. There are other types of attachments altogether: N N N or V N P. The Hindle and Rooth formalism is more difficult to apply in these cases because of data sparsity. In certain cases, there is attachment indeterminacy. January 12, 2000

Selectional Preferences I Most verbs prefer arguments of a particular type (e.g., the things that bark are dogs). Such regularities are called selectional preferences or selectional restrictions. Selectional preferences are useful for a couple of reasons: If a word is missing from our machine-readable dictionary, aspects of its meaning can be inferred from selectional restrictions. Selectional preferences can be used to rank different parses of a sentence. January 12, 2000

Selectional Preferences II Resnik (1993, 1996)’s idea for Selectional Preferences uses the notions of selectional preference strength and selectional association. We look at the <Verb, Direct Object> Problem. Selectional Preference strength, S(v) measures how strongly the verb constrains its direct object. S(v) is defined as the KL divergence between the prior distribution of direct objects (for verbs in general) and the distribution of direct objects of the verb we are trying to characterize. We make 2 assumptions in this model: 1) only the head noun of the object is considered; 2) rather than dealing with individual nouns, we look at classes of nouns. January 12, 2000

Selectional Preferences III The Selectional Association between a verb and a class is defined as the proportion that this classe’s contribution to S(v) contributes to the overall preference strength S(v). There is also a rule for assigning association strengths to nouns as opposed to noun classes. If a noun is in a single class, then its association strength is that of its class. If it belongs to several classes, then its association strength is that of the class it belongs to that has the highest association strength. Finally, there is a rule for estimating the probability that a direct object in noun class c occurs given a verb v. January 12, 2000

Semantic Similarity I Text Understanding or Information Retrieval could benefit much from a system able to acquire meaning. Meaning acquisition is not possible at this point, so people focus on assessing semantic similarity between a new word and other already known words. Semantic similarity is not as intuitive and clear a notion as we may first think: synonymy? Same semantic domain? Contextual interchangeability? Vector Space versus Probabilistic Measures January 12, 2000

Semantic Similarity II: Vector Space Measures Words can be expressed in different spaces: document space, word space and modifier space. Similarity measures for binary vectors: matching coefficient, Dice coefficient, Jaccard (or Tanimoto) coefficient, Overlap coefficient and cosine. Similarity measures for the real-valued vector space: cosine, Euclidean Distance, normalized correlation coefficient January 12, 2000

Semantic Similarity II: Probabilistic Measures The problem with vector space based measures is that, aside from the cosine, they operate on binary data. The cosine, on the other hand, assumes a Euclidean space which is not well-motivated when dealing with word counts. A better way of viewing word counts is by representing them as probability distributions. Then we can compare two probability distributions using the following measures: KL Divergence, Information Radius (Irad) and L1 Norm. January 12, 2000