Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy.

Slides:



Advertisements
Similar presentations
Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.
Advertisements

Presenter Name(s) Issue date National Student.
Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.
Putting Statistics to Work
CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 15 Probability Rules!
Testing Hypotheses About Proportions
Psych 5500/6500 t Test for Two Independent Groups: Power Fall, 2008.
Stefan Frank Department of Cognitive, Perceptual and Brain Sciences
A Tutorial on Learning with Bayesian Networks
Announcements You survived midterm 2! No Class / No Office hours Friday.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Albert Gatt Corpora and Statistical Methods Lecture 11.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
Sentence Processing III Language Use and Understanding Class 12.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
Sentence Processing 1: Encapsulation 4/7/04 BCS 261.
Readers maintain and act on uncertainty about past linguistic input: Evidence from eye movements Roger Levy Klinton Bicknell Tim Slattery Keith Rayner.
NASSP Masters 5003F - Computational Astronomy Lecture 5: source detection. Test the null hypothesis (NH). –The NH says: let’s suppose there is no.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.
Visual Recognition Tutorial
Day 4: Reranking/Attention shift; surprisal-based sentence processing Roger Levy University of Edinburgh & University of California – San Diego.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Day 2: Pruning continued; begin competition models
Lecture Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
On hallucinated garden paths Roger Levy UC San Diego Department of Linguistics 2010 LSA Annual Meeting, January 8.
Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Day 5: Entropy reduction models; connectionist models; course wrapup Roger Levy University of Edinburgh & University of California – San Diego.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
The mental representation of sentences Tree structures or state vectors? Stefan Frank
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Online Detection of Change in Data Streams Shai Ben-David School of Computer Science U. Waterloo.
1 Probability and Statistics  What is probability?  What is statistics?
SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
1 Natural Language Processing Lecture 11 Efficient Parsing Reading: James Allen NLU (Chapter 6)
Probabilistic Robotics Bayes Filter Implementations.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Linguistic Essentials
Lexicalized and Probabilistic Parsing Read J & M Chapter 12.
A Fully Rational Model of Local-coherence Effects Modeling Uncertainty about the Linguistic Input in Sentence Comprehension Roger Levy UC San Diego CUNY.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.
Dec 11, Human Parsing Do people use probabilities for parsing?! Sentence processing Study of Human Parsing.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Entropy Reduction Model Resource: The Information Conveyed by Words in Sentences, John Hale.
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Rational process models Tom Griffiths Department of Psychology Cognitive Science Program University of California, Berkeley.
Statistical NLP Winter 2009
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
The free-energy principle: a rough guide to the brain? K Friston
Probabilistic Parsing
David Kauchak CS159 – Spring 2019
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Statistical NLP Winter 2009 Lecture 12: Computational Psycholinguistics Roger Levy

NLP techniques, human parsing Our “parsing” here is about Treebank parsing Now for a bit about human parsing! Techniques from NLP are still the foundation We’ll focus on rational models of human sentence processing [rational = using all available information to make inferences] incremental inference: understanding of and response to a partial utterance

Incrementality and Rationality Online sentence comprehension is hard But lots of information sources can be usefully brought to bear to help with the task Therefore, it would be rational for people to use all the information available, whenever possible This is what incrementality is We have lots of evidence that people do this often “Put the apple on the towel in the box.” (Tanenhaus et al., 1995)

Anatomy of ye olde garden path sentence The horse raced past the barn fell. It’s weird People fail to understand it most of the time People are more likely to misunderstand it than to understand it properly “What’s a barn fell?” The horse that raced past the barn fell The horse raced past the barn and fell Today I’m going to talk about three outstanding puzzles involving garden-path sentences

Garden paths: What we do understand We have decent models of how this sentence is not understood Incremental probabilistic parsing with beam search (Jurafsky, 1996) Surprisal (Hale, 2001; Levy, 2008): the disambiguating word fell is extremely low probability  alarm signal signals “this doesn’t make sense” to the parser These models are based on rational use of evidential information (data-driven probabilistic inference) Also compatible with gradations in garden-path difficulty (Garnsey et al., 1997; McRae et al., 1998)

Hale, 2001; Levy, 2008; Smith & Levy, 2008: surprisal Let the difficulty of a word be its surprisal given its context: Captures the expectation intuition: the more we expect an event, the easier it is to process Many probabilistic formalisms, including probabilistic context-free grammars, can give us word surprisals

a man arrived yesterday 0.3 S  S CC S 0.15 VP  VBD ADVP 0.7 S  NP VP 0.4 ADVP  RB 0.35 NP  DT NN Total probability: 0.7*0.35*0.15*0.3*0.03*0.02*0.4*0.07= 1.85  Algorithms by Lafferty and Jelinek (1992), Stolcke (1995) give us P(w i |context) from a PCFG

Surprisal and garden paths: theory Revisiting the horse raced past the barn fell After the horse raced past the barn, assume 2 parses: Jurafsky 1996 estimated the probability ratio of these parses as 82:1 The surprisal differential of fell in reduced versus unreduced conditions should thus be log 2 83 = 6.4 bits *(assuming independence between RC reduction and main verb)

Surprisal and garden paths: practice An unlexicalized PCFG (from Brown corpus) gets right monotonicity of surprisals at disambiguating word “fell” this is the key comparison; the difference is small, but in the right direction Aside: These are way too high, but that’s because the grammar’s crude

Garden Paths: What we don’t understand so well How do people arrive at the misinterpretations they come up with? What factors induce them to be more or less likely to come up with such a misinterpretation

Outstanding puzzle: length effects Try to read this: Tom heard the gossip about the neighbors wasn’t true. Compare it with this: Tom heard the gossip wasn’t true. Likewise: While the man hunted the deer that was brown and graceful ran into the woods. While the man hunted the deer ran into the woods. The longer the ambiguous region, the harder it is to recover (Frazier & Rayner, 1987; Tabor & Hutchins, 2004) Also problematic for rational models: effects of irrelevant information

Memory constraints in human parsing Sentence meaning is structured The number of logically possible analyses for a sentence is at best exponential in sentence length So we must be entertaining some limited subset of analyses at all times* *“Dynamic programming”, you say? Ask later.

Dynamic programming Exact probabilistic inference with context-free grammars can be done efficiently in O(n 3 ) But… This inference requires strict probabilistic locality Human parsing is linear—that is, O(n)—anyway Here, we’ll explore an approach from the machine- learning literature: the particle filter

The particle filter: general picture Sequential Monte Carlo for incremental observations Let x i be hidden data, z i be unobserved states For parsing: x i are words, z i are structural analyses Suppose that after n-1 observations we have the distribution overinterpretations P(z n-1 |x 1…n-1 ) After obtaining the next word x n, represent the next distribution P(z n |x 1…n ) inductively: Representing P(z i |x 1…i ) by samples makes it a Monte Carlo method

Particle filter with probabilistic grammars S  NP VP1.0V  broke0.3 NP  N0.8V  tired0.3 NP  N RRC0.2Part  raced0.1 RRC  Part Adv1.0Part  broken0.5 VP  V Adv1.0Part  tired0.4 N  horses1.0Adv  quickly1.0 V  raced 0.4 S horses raced quickly VP N VAdv * NP * * * * * * * * * horsesracedquickly RRC N VAdv * * * * * * * * * * * tired * * VP V S * * * 1.0 NP

Returning to the puzzle A-STom heard the gossip wasn’t true. A-LTom heard the gossip about the neighbors wasn’t true. U-STom heard that the gossip wasn’t true. U-LTom heard that the gossip about the neighbors wasn’t true. Previous empirical finding: ambiguity induces difficulty… …but so does the length of the ambiguous region Our linking hypothesis: The proportion of parse failures at the disambiguating region should be monotonically related to the difficulty of the sentence Frazier & Rayner,1982; Tabor & Hutchins, 2004

Model Results Ambiguity matters… But the length of the ambiguous region also matters!

Human results (offline rating study)

Rational comprehension’s other successes Global disambiguation preferences (Jurafsky, 1996) The women discussed the dogs on the beach Basic garden-path sentences (Hale, 2001) The horse raced past the barn fell Garden-path gradience (Narayanan & Jurafsky, 2002) The crook arrested by the detective was guilty Predictability in unambiguous contexts (Levy, 2008) The children went outside to… Grounding in optimality/rational analysis (Norris, 2006; Smith & Levy, 2008) ? ? (that was)(that was) (not difficult)(not difficult) play chat

Behavioral correlates (Tabor et al., 2004) Also, Konieczny (2006, 2007) found compatible results in stops-making- sense and visual-world paradigms These results are problematic for theories requiring global contextual consistency (Frazier, 1987; Gibson, 1991, 1998; Jurafsky, 1996; Hale, 2001, 2006) harder than thrown tossed