600.465 - Intro to NLP - J. Eisner1 Probabilistic CKY.

Slides:



Advertisements
Similar presentations
Intro to NLP - J. Eisner1 Modeling Grammaticality [mostly a blackboard lecture]
Advertisements

1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05.
Intro to NLP - J. Eisner1 The Expectation Maximization (EM) Algorithm … continued!
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Statistical NLP Winter 2009 Lecture 17: Unsupervised Learning IV Tree-structured learning Roger Levy [thanks to Jason Eisner and Mike Frank for some slides]
Intro to NLP - J. Eisner1 Part-of-Speech Tagging A Canonical Finite-State Task.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Intro to NLP - J. Eisner1 Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:  no restrictions on the form of the grammar:  A.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
PROBABILITY David Kauchak CS159 – Spring Admin  Posted some links in Monday’s lecture for regular expressions  Logging in remotely  ssh to vpn.cs.pomona.edu.
Intro to NLP - J. Eisner1 Structured Prediction with Perceptrons and CRFs Time flies like an arrow N PP NP VPD N VP S Time flies like an arrow.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.
Albert Gatt Corpora and Statistical Methods Lecture 11.
Intro to NLP - J. Eisner1 Structured Prediction with Perceptrons and CRFs Time flies like an arrow N PP NP VPD N VP S Time flies like an arrow.
Probabilistic CKY Roger Levy [thanks to Jason Eisner]
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
Chart Parsing and Augmenting Grammars CSE-391: Artificial Intelligence University of Pennsylvania Matt Huenerfauth March 2005.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
Natural Language - General
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Intro to NLP - J. Eisner1 Parsing Tricks.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Speech and Language Processing SLP Chapter 13 Parsing.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Intro to NLP - J. Eisner1 Smoothing Intro to NLP - J. Eisner2 Parameter Estimation p(x 1 = h, x 2 = o, x 3 = r, x 4 = s, x 5 = e,
CSC 594 Topics in AI – Natural Language Processing
Part-of-Speech Tagging
The Expectation Maximization (EM) Algorithm
Statistical NLP Winter 2009
Modeling Grammaticality
Probabilistic CKY Parser
Inside-Outside & Forward-Backward Algorithms are just Backprop
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
Recitation Akshay Srivatsan
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Natural Language - General
Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:
Parsing Tricks Intro to NLP - J. Eisner.
The Expectation Maximization (EM) Algorithm
David Kauchak CS159 – Spring 2019
Modeling Grammaticality
Weighted Parsing, Probabilistic Parsing
David Kauchak CS159 – Spring 2019
Presentation transcript:

Intro to NLP - J. Eisner1 Probabilistic CKY

Intro to NLP - J. Eisner2 Our bane: Ambiguity  John saw Mary  Typhoid Mary  Phillips screwdriver Mary note how rare rules interact  I see a bird  is this 4 nouns – parsed like “city park scavenger bird”? rare parts of speech, plus systematic ambiguity in noun sequences  Time flies like an arrow  Fruit flies like a banana  Time reactions like this one  Time reactions like a chemist  or is it just an NP?

Intro to NLP - J. Eisner3 Our bane: Ambiguity  John saw Mary  Typhoid Mary  Phillips screwdriver Mary note how rare rules interact  I see a bird  is this 4 nouns – parsed like “city park scavenger bird”? rare parts of speech, plus systematic ambiguity in noun sequences  Time | flies like an arrow NP VP  Fruit flies | like a banana NP VP  Time | reactions like this one V[stem] NP  Time reactions | like a chemist S PP  or is it just an NP?

Intro to NLP - J. Eisner4 How to solve this combinatorial explosion of ambiguity? 1.First try parsing without any weird rules, throwing them in only if needed. 2.Better: every rule has a weight. A tree’s weight is total weight of all its rules. Pick the overall lightest parse of sentence. 3.Can we pick the weights automatically? We’ll get to this later …

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 PP12 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 NP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 NP18 S21 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP S Follow backpointers …

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP S NP VP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP S NP VP PP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP S NP VP PP PNP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP S NP VP PP PNP DetN

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Which entries do we need?

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Which entries do we need?

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Not worth keeping …

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP … since it just breeds worse options

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Keep only best-in-class! inferior stock

time 1 flies 2 like 3 an 4 arrow 5 NP3 Vst3 NP10 S8 NP24 S22 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Keep only best-in-class! (and backpointers so you can recover parse)

Intro to NLP - J. Eisner39 Probabilistic Trees  Instead of lightest weight tree, take highest probability tree  Given any tree, your assignment 1 generator would have some probability of producing it!  Just like using n-grams to choose among strings …  What is the probability of this tree? S NP time VP flies PP P like NP Det an N arrow

Intro to NLP - J. Eisner40 Probabilistic Trees  Instead of lightest weight tree, take highest probability tree  Given any tree, your assignment 1 generator would have some probability of producing it!  Just like using n-grams to choose among strings …  What is the probability of this tree?  You rolled a lot of independent dice … S NP time VP flies PP P like NP Det an N arrow p( | S )

Intro to NLP - J. Eisner41 Chain rule: One word at a time p(time flies like an arrow) =p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an)

Intro to NLP - J. Eisner42 Chain rule + backoff (to get trigram model) p(time flies like an arrow) =p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an)

Intro to NLP - J. Eisner43 Chain rule – written differently p(time flies like an arrow) =p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an ) Proof: p(x,y | x) = p(x | x) * p(y | x, x) = 1 * p(y | x)

Intro to NLP - J. Eisner44 Chain rule + backoff p(time flies like an arrow) =p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an ) Proof: p(x,y | x) = p(x | x) * p(y | x, x) = 1 * p(y | x)

Intro to NLP - J. Eisner45 Chain rule: One node at a time S NP time VP flies PP P like NP Det an N arrow p( | S ) = p( S NP VP | S ) * p( S NP time VP | S NP VP ) * p( S NP time VP PP | S NP time VP ) * p( S NP time VP flies PP | S NP time VP ) * … VP PP

Intro to NLP - J. Eisner46 Chain rule + backoff S NP time VP flies PP P like NP Det an N arrow p( | S ) = p( S NP VP | S ) * p( S NP time VP | S NP VP ) * p( S NP time VP PP | S NP time VP ) * p( S NP time VP flies PP | S NP time VP ) * … VP PP model you used in homework 1! (called “PCFG”)

Intro to NLP - J. Eisner47 Simplified notation S NP time VP flies PP P like NP Det an N arrow p( | S ) = p( S  NP VP | S ) * p( NP  flies | NP ) * p( VP  VP NP | VP ) * p( VP  flies | VP ) * … model you used in homework 1! (called “PCFG”)

48 Already have a CKY alg for weights … S NP time VP flies PP P like NP Det an N arrow w( | S ) = w( S  NP VP ) + w( NP  flies | NP ) + w( VP  VP NP ) + w( VP  flies ) + … Just let w( X  Y Z ) = -log p( X  Y Z | X ) Then lightest tree has highest prob

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP multiply to get

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP multiply to get Need only best-in-class to get best parse

Intro to NLP - J. Eisner51 Why probabilities not weights?  We just saw probabilities are really just a special case of weights …  … but we can estimate them from training data by counting and smoothing! Yay!  Warning: What kind of training corpus do we need?

Intro to NLP - J. Eisner52 A slightly different task  Been asking: What is probability of generating a given tree with your homework 1 generator?  To pick tree with highest prob: useful in parsing.  But could also ask: What is probability of generating a given string with the generator? (i.e., with the –t option turned off)  To pick string with highest prob: useful in speech recognition, as substitute for an n-gram model.  (“Put the file in the folder” vs. “Put the file and the folder”)  To get prob of generating string, must add up probabilities of all trees for the string …

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Could just add up the parse probabilities oops, back to finding exponentially many parses

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S S NP24 S22 S27 NP24 S27 S 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 -2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Any more efficient way?

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S NP24 S22 S27 NP24 S27 S 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 -2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Add as we go … (the “inside algorithm”)

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S NP S 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 -2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Add as we go … (the “inside algorithm”)