Day 2: Pruning continued; begin competition models

Slides:



Advertisements
Similar presentations
Sentence Processing III Language Use and Understanding Class 12.
Advertisements

The Interaction of Lexical and Syntactic Ambiguity by Maryellen C. MacDonald presented by Joshua Johanson.
Sentence Processing 1: Encapsulation 4/7/04 BCS 261.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
Statistical NLP: Lecture 3
Probabilistic Earley Parsing Charlie Kehoe, Spring 2004 Based on the 1995 paper by Andreas Stolcke: An Efficient Probabilistic Context-Free Parsing Algorithm.
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
1 Pertemuan 23 Syntatic Processing Matakuliah: T0264/Intelijensia Semu Tahun: 2005 Versi: 1/0.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
Computational Psycholinguistics Lecture 1: Introduction, basic probability theory, incremental parsing Florian Jaeger & Roger Levy LSA 2011 Summer Institute.
Day 4: Reranking/Attention shift; surprisal-based sentence processing Roger Levy University of Edinburgh & University of California – San Diego.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 4.
Psy1302 Psychology of Language Lecture 12 Sentence Comprehension II.
The Neural Basis of Thought and Language Week 15 The End is near...
1/13 Parsing III Probabilistic Parsing and Conclusions.
grateful acknowledgments to
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
1/17 Probabilistic Parsing … and some other approaches.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
PSY 369: Psycholinguistics Language Comprehension: Sentence comprehension.
Probabilistic Methods in Computational Psycholinguistics Roger Levy University of Edinburgh & University of California – San Diego.
Intro to Psycholinguistics What its experiments are teaching us about language processing and production.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
1 Natural Language Processing Lecture 11 Efficient Parsing Reading: James Allen NLU (Chapter 6)
1 Statistical Distribution Fitting Dr. Jason Merrick.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Parsing and Syntax. Syntactic Formalisms: Historic Perspective “Syntax” comes from Greek word “syntaxis”, meaning “setting out together or arrangement”
Albert Gatt Corpora and Statistical Methods Lecture 11.
Linguistic Essentials
Lexicalized and Probabilistic Parsing Read J & M Chapter 12.
Rules, Movement, Ambiguity
E BERHARD- K ARLS- U NIVERSITÄT T ÜBINGEN SFB 441 Coordinate Structures: On the Relationship between Parsing Preferences and Corpus Frequencies Ilona Steiner.
Results of Eyetracking & Self-Paced Moving Window Studies DO-Bias Verbs: The referees warned the spectators would probably get too rowdy. The referees.
Natural Language - General
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.
Dec 11, Human Parsing Do people use probabilities for parsing?! Sentence processing Study of Human Parsing.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
Parafoveal Preview in Reading Burgess (1991) - Self-paced moving window reading time study - Varied window size from single to several words - Found an.
48 Item Sets (Only the results for the relative clause versions are reported here.) The professor (who was) confronted by the student was not ready for.
Entropy Reduction Model Resource: The Information Conveyed by Words in Sentences, John Hale.
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
48 Item Sets (Only the results for the relative clause versions are reported here.) The professor (who was) confronted by the student was not ready for.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
1 Thinking in Organizations Chapter 9, 10, 11 and 12 Section 3:
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
Probabilistic CKY Parser
Natural Language Processing
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
David Kauchak CS159 – Spring 2019
Presentation transcript:

Day 2: Pruning continued; begin competition models Roger Levy University of Edinburgh & University of California – San Diego

Today Concept from probability theory: marginalization Complete Jurafsky 1996: modeling online data Begin competition models

Marginalization In many cases, a joint p.d. will be more “basic” than the raw distribution of any member variable Imagine two dice with a weak spring attached No independence → joint more basic The resulting distribution over Y is known as the marginal distribution Calculating P(Y) is called marginalizing over X Coin1 = H Coin1 = T Coin2 = H 1/3 1/8 Coin2 = T 5/12

Today Concept from probability theory: marginalization Complete Jurafsky 1996: modeling online data Begin competition models

Modeling online parsing Does this sentence make sense? The complex houses married and single students and their families. How about this one? The warehouse fires a dozen employees each year. And this one? The warehouse fires destroyed all the buildings. fires can be either a noun or a verb. So can houses: [NP The complex] [VP houses married and single students…]. These are garden path sentences Originally taken as some of the strongest evidence for serial processing by the human parser Frazier and Rayner 1987

Limited parallel parsing Full-serial: keep only one incremental interpretation Full-parallel: keep all incremental interpretations Limited parallel: keep some but not all interpretations In a limited parallel model, garden-path effects can arise from the discarding of a needed interpretation [S [NP The complex] [VP houses…] …] discarded [S [NP The complex houses …] …] kept

Modeling online parsing: garden paths Pruning strategy for limited ranked-parallel processing Each incremental analysis is ranked Analyses falling below a threshold are discarded In this framework, a model must characterize The incremental analyses The threshold for pruning Jurafsky 1996: partial context-free parses as analyses Probability ratio as pruning threshold Ratio defined as P(I) : P(Ibest) (Gibson 1991: complexity ratio for pruning threshold)

Garden path models 1: N/V ambiguity Each analysis is a partial PCFG tree Tree prefix probability used for ranking of analysis Partial rule probs marginalize over rule completions these nodes are actually still undergoing expansion add in example of marginalization, and show its equivalence to a grammar transform *implications for granularity of structural analysis

N/V ambiguity (2) Partial CF tree analysis of the complex houses… Analysis of houses as noun has much lower probability than analysis as verb (> 250:1) Hypothesis: the low-ranking alternative is discarded

N/V ambiguity (3) Note that top-down vs. bottom-up questions are immediately implicated, in theory Jurafsky includes the cost of generating the initial NP under the S of course, it’s a small cost as P(S -> NP …) = 0.92 If parsing were bottom-up, that cost would not have been explicitly calculated yet

Garden path models II The most famous garden-paths: reduced relative clauses (RRCs) versus main clauses (MCs) From the valence + simple-constituency perspective, MC and RRC analyses differ in two places: The horse raced past the barn fell. (that was) p=0.14 p≈1 we’ll come back to RRCs twice: once in competition models, once in surprisal best intransitive: p=0.92 transitive valence: p=0.08

Garden path models II (2) 82 : 1 probability ratio means that lower-probability analysis is discarded In contrast, some RRCs do not induce garden paths: Here, found is preferentially transitive (0.62) As a result, the probability ratio is much closer (≈ 4 : 1) Conclusion within pruning theory: beam threshold is between 4 : 1 and 82 : 1 (granularity issue: when exactly does probability cost of valence get paid??? c.f. the complex houses) The bird found in the room died. *note also that Jurafsky does not treat found as having POS ambiguity

Notes on the probabilistic model Jurafsky 1996 is a product-of-experts (PoE) model Expert 1: the constituency model Expert 2: the valence model PoEs are flexible and easy to define, but… The Jurafsky 1996 model is actually deficient (loses probability mass), due to relative frequency estimation

Notes on the probabilistic model (2) Jurafsky 1996 predated most work on lexicalized parsers (Collins 1999, Charniak 1997) In a generative lexicalized parser, valence and constituency are often combined through decomposition & Markov assumptions, e.g., The use of decomposition makes it easy to learn non-deficient models sometimes approximated as

Jurafsky 1996 & pruning: main points Syntactic comprehension is probabilistic Offline preferences explained by syntactic + valence probabilities Online garden-path results explained by same model, when beam search/pruning is assumed

General issues What is the granularity of incremental analysis? In [NP the complex houses], complex could be an adjective (=the houses are complex) complex could also be a noun (=the houses of the complex) Should these be distinguished, or combined? When does valence probability cost get paid? What is the criterion for abandoning an analysis? Should the number of maintained analyses affect processing difficulty as well?

Today Concept from probability theory: marginalization Complete Jurafsky 1996: modeling online data Begin competition models

General idea Disambiguation: when different syntactic alternatives are available for a given partial input, each alternative receives support from multiple probabilistic information sources Competition: the different alternatives compete with each other until one wins, and the duration of competition determines processing difficulty

Origins of competition models Parallel competition models of syntactic processing have their roots in lexical access research Initial question: process of word recognition are all meanings of a word simultaneously accessed? or are only some (or one) meanings accessed? Parallel vs. serial question, for lexical access

Origins of competition models (2) Testing access models: priming studies show that subordinate (= less frequent) meanings are accessed as well as dominant (=more frequent) meanings Also, lexical decision studies show that more frequent meanings are accessed more quickly

Origins of competition models (3) Lexical ambiguity in reading: does the amount of time spent on a word reflect its degree of ambiguity? Readers spend more time reading equibiased ambiguous words than non-equibiased ambiguous words (eye-tracking studies) Different meanings compete with each other Of course the pitcher was often forgotten… ? ? Rayner and Duffy (1986); Duffy, Morris, and Rayner (1988)

Competition in syntactic processing Can this idea of competition be applied to online syntactic comprehension? If so, then multiple interpretations of a partial input should compete with one another and slow down reading does this mean increase difficulty of comprehension? [compare with other types of difficulty, e.g., memory overload]

Constraint types Configurational bias: MV vs. RR Thematic fit (initial NP to verb’s roles) i.e., Plaus(verb,noun), ranging from Bias of verb: simple past vs. past participle i.e., P(past | verb)* Support of by i.e., P(MV | <verb,by>) [not conditioned on specific verb] That these factors can affect processing in the MV/RR ambiguity is motivated by a variety of previous studies (MacDonald et al. 1993, Burgess et al. 1993, Trueswell et al. 1994 (c.f. Ferreira & Clifton 1986), Trueswell 1996) by-support defined as P(MV | <verb,by>) but not sensitive to the specific verb *technically not calculated this way, but this would be the rational reconstruction