Albert Gatt Corpora and Statistical Methods Lecture 5.

Slides:



Advertisements
Similar presentations
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Advertisements

FAO assessment of global undernourishment. Current practice and possible improvements Carlo Cafiero, ESS Rome, September CFS Round Table on.
Chapter 10: Estimating with Confidence
Learning When (and When Not) to Omit Objects in English: The Role of Verb Semantic Selectivity Tamara Nicol Medina IRCS, University of Pennsylvania Collaborators:
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
CHAPTER 13: Binomial Distributions
Critical Thinking: Chapter 10
Linguistic Theory Lecture 8 Meaning and Grammar. A brief history In classical and traditional grammar not much distinction was made between grammar and.
Sampling Distributions
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
Stat 301 – Day 28 Review. Last Time - Handout (a) Make sure you discuss shape, center, and spread, and cite graphical and numerical evidence, in context.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Lecture 2: Thu, Jan 16 Hypothesis Testing – Introduction (Ch 11)
Intro to Statistics for the Behavioral Sciences PSYC 1900
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Discrete Random Variables and Probability Distributions
Variance Fall 2003, Math 115B. Basic Idea Tables of values and graphs of the p.m.f.’s of the finite random variables, X and Y, are given in the sheet.
CHAPTER 23 Inference for Means.
Chapter 15 Nonparametric Statistics
Measures of Central Tendency
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
Albert Gatt LIN 3098 Corpus Linguistics. In this lecture Some more on corpora and grammar Construction Grammar as a theoretical framework Collostructional.
Linear Regression Inference
@ 2012 Wadsworth, Cengage Learning Chapter 11 The Ecology of the Experiment: The Scientist and Research Participant in Relation to Their
Chapter 9.3 (323) A Test of the Mean of a Normal Distribution: Population Variance Unknown Given a random sample of n observations from a normal population.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Computational Lexical Semantics Lecture 8: Selectional Restrictions Linguistic Institute 2005 University of Chicago.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Modeling and Simulation CS 313
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
CHAPTER 14 MULTIPLE REGRESSION
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
Copyright © Cengage Learning. All rights reserved. CHAPTER 3 THE LOGIC OF QUANTIFIED STATEMENTS THE LOGIC OF QUANTIFIED STATEMENTS.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Albert Gatt LIN3021 Formal Semantics Lecture 4. In this lecture Compositionality in Natural Langauge revisited: The role of types The typed lambda calculus.
Measures of Conserved Synteny Work was funded by the National Science Foundation’s Interdisciplinary Grants in the Mathematical Sciences All work is joint.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Lexical Acquisition of Verb Direct- Object Selectional Preferences Based on the WordNet Hierarchy Emily Shen and Sushant Prakash.
Chapter 8 Lexical Acquisition February 19, 2007 Additional Notes to Manning’s slides.
1 Collecting and Interpreting Quantitative Data Deborah K. van Alphen and Robert W. Lingard California State University, Northridge.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Sampling and Sampling Distributions. Sampling Distribution Basics Sample statistics (the mean and standard deviation are examples) vary from sample to.
Copyright © Cengage Learning. All rights reserved. 3 Discrete Random Variables and Probability Distributions.
Statistics 200 Objectives:
Lecture8 Test forcomparison of proportion
One-Sample Inference for Proportions
Chapter 11 Chi-Square Tests.
Inference for Proportions
Corpora and Statistical Methods
Inferential statistics,
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Chapter 9 Hypothesis Testing.
CONCEPTS OF ESTIMATION
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Collecting and Interpreting Quantitative Data – Introduction (Part 1)
Chapter 11 Chi-Square Tests.
Analyzing the Association Between Categorical Variables
Analyzing Reliability and Validity in Outcomes Assessment
Chapter 11 Chi-Square Tests.
Statistical NLP: Lecture 10
Presentation transcript:

Albert Gatt Corpora and Statistical Methods Lecture 5

Application 3: Verb selectional restrictions

Observation Some verbs place high restrictions on the semantic category of the NPs they take as arguments. Assumption: we’re focusing attention on Direct Objects only e.g. eat selects for FOOD DOs: eat cake eat some fresh vegetables grow selects for LEGUME DOs: grow potatoes

Not all verbs are equally constraining Some verbs seem to place fewer restrictions than others: see doesn’t seem too restrictive: see John see the potato see the fresh vegetables …

Problem definition For a given verb and a potential set of arguments (nouns), we want to learn to what extent the verb selects for those arguments rather than individual nouns, we’re better off using noun classes (FOOD etc), since these allow us to generalise more can obtain these using a standard resource, e.g. WordNet

A short detour: Kullback-Leibler divergence

Kullback-Leibler divergence We are often in a position where we estimate a probability distribution from (incomplete) data This problem is inherent in sampling. We end up with a distribution P, which is intended as a model of distribution Q. How good is P as a model? Kullback-Leibler divergence tells us how well our model matches the actual distribution.

Motivating example Suppose I’m interested in the semantic type or class to which a noun belongs, e.g.: cake, meat, cauliflower are types of FOOD (among other things) potato, carrot are types of LEGUME (among other things) How do I infer this? It helps if I know that certain predicates, like grow select for some types of DO, not others *grow meat, *grow cake grow potatoes, grow carrots

Motivating example cont/d Ingredients C: the class of interest (e.g. LEGUME) v: the verb of interest (e.g. grow) P(C) = probability of class C prior probability of finding some element of C as DO of any verb P(C|v) = probability of C given that we know that a noun is a DO of grow this is my posterior probability More precise way of asking the question: Does the probability distribution of C change given the info about v?

Ingredients for KL Divergence some prior distribution P some posterior distribution Q Intuition: KL-Divergence measures how much information we gain about P, given that we know Q if it’s 0, then we gain no info Given two probability distributions P and Q, with probability mass functions p(x) and q(x), KL-Divergence is denoted D(p||q)

Calculating KL-Divergence divergence between prior and posterior probability distributions

More on the interpretation of KL-Divergence If probability distribution P is interpreted as “the truth” and distribution Q is my approximation, then: D(p||q) tells me how much extra info I need to add to Q to get to the actual truth

Back to our problem: Applying KL-divergence to selectional restrictions

Resnik’s model (Resnik 1996) 2 main ingredients: 1. Selectional Preference Strength (S): how strongly a verb constrains its direct object (a global estimate) 2. Selectional Association (A): how much a verb v is associated with a given noun class (a specific estimate for a given class)

Notation v = a verb of interest S(v) = the selectional preference strength of v c = a noun class C = the set of all the noun classes A(v,c) = the selectional association between v and class c

Selectional Preference Strength S(v) is the KL-Divergence between: the overall prior distribution of all noun classes the posterior distribution of noun classes in the direct object position of v how much info we gain from knowing the probability that members of a class occur as DO of v works as a global estimate of how much v constrains its arguments semantically the more it constrains them, the more info we stand to gain from knowing that an argument occurs as DO of v

S(grow): prior vs. posterior Source: Resnik 1996, p. 135

Calculating S(v) This quantifies the extent to which our prior and posterior probability estimates diverge.  how much info do we gain about C by knowing it’s the object of v?

Some more examples classP(c)P(c|eat)P(c|see)P(c|find) people furniture food action SPS: S(v) How much info do we gain if we know what a noun is DO of?  quite a lot if it’s an argument of eat  not much if it’s an argument of find  none if it’s an argument of see Source: Manning and Schutze 1999, p. 290

Selectional association This is estimated based on selectional preference strength tells us how much a verb is associated with a specific class, given the extent to which it constrains its arguments given a class c, A(v,c) tells us how much of S(v) is contributed by c

Calculating A(v,c) this is part of our summation for S(v) dividing by S(v) gives the proportion of S(v) which is caused by class c

From A(v,c) to A(v,n) We know how to estimate the association strength of a class with v Problem: some nouns can occur in more than one class Let classes(n) be the classes in which noun n belongs:

Example Susan interrupted the chair. chair is in class FURNITURE chair is in class PEOPLE A(interrupt,PEOPLE) > A(interrupt,FURNITURE) A(interrupt,chair) = A(interrupt,PEOPLE) Note that this is a kind of word-sense disambiguation!

Some results from Resnik 1996 Verb (v)Noun (n)Class (c)A(V,n) answerrequestSpeech act4.49 answertragedycommunication3.88 hearstorycommunication1.89 hearissuecommunication1.89 There are some fairly atypical examples:  these are due to the disambiguation method  e.g. tragedy can be in COMM class, and so is assigned A(answer,COMM) as it’s a(v,n)

Overall evaluation Resnik’s results were shown to correlate very well with results from a psycholinguistic study The method is promising: seems to mirror human intuitions may have some psychological validity Possibly an alternative, data-driven account of the semantic bootstrapping hypothesis of Pinker 1989?