Probabilistic CKY Roger Levy [thanks to Jason Eisner]

Slides:



Advertisements
Similar presentations
Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.
Advertisements

Natural Language Processing - Parsing 1 - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment / Binding Bottom vs. Top Down Parsing.
Parsing with Context Free Grammars Reading: Chap 13, Jurafsky & Martin
Probabilistic Earley Parsing Charlie Kehoe, Spring 2004 Based on the 1995 paper by Andreas Stolcke: An Efficient Probabilistic Context-Free Parsing Algorithm.
September PROBABILISTIC CFGs & PROBABILISTIC PARSING Universita’ di Venezia 3 Ottobre 2003.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.
 Christel Kemke /08 COMP 4060 Natural Language Processing PARSING.
Intro to NLP - J. Eisner1 Probabilistic CKY.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Hidden Markov Models David Meir Blei November 1, 1999.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Intro to NLP - J. Eisner1 The Expectation Maximization (EM) Algorithm … continued!
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Statistical NLP Winter 2009 Lecture 17: Unsupervised Learning IV Tree-structured learning Roger Levy [thanks to Jason Eisner and Mike Frank for some slides]
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Intro to NLP - J. Eisner1 Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:  no restrictions on the form of the grammar:  A.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
Intro to NLP - J. Eisner1 Structured Prediction with Perceptrons and CRFs Time flies like an arrow N PP NP VPD N VP S Time flies like an arrow.
Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Chart Parsing and Augmenting Grammars CSE-391: Artificial Intelligence University of Pennsylvania Matt Huenerfauth March 2005.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
Natural Language - General
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
Intro to NLP - J. Eisner1 Parsing Tricks.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Speech and Language Processing SLP Chapter 13 Parsing.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Natural Language Processing Vasile Rus
1 Statistical methods in NLP Diana Trandabat
The Expectation Maximization (EM) Algorithm
Statistical NLP Winter 2009
Inside-Outside & Forward-Backward Algorithms are just Backprop
Probabilistic and Lexicalized Parsing
Recitation Akshay Srivatsan
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Natural Language - General
Parsing and More Parsing
The Expectation Maximization (EM) Algorithm
David Kauchak CS159 – Spring 2019
Weighted Parsing, Probabilistic Parsing
David Kauchak CS159 – Spring 2019
Presentation transcript:

Probabilistic CKY Roger Levy [thanks to Jason Eisner]

Managing Ambiguity John saw Mary Typhoid Mary Phillips screwdriver Mary note how rare rules interact I see a bird is this 4 nouns – parsed like “city park scavenger bird”? rare parts of speech, plus systematic ambiguity in noun sequences Time flies like an arrow Fruit flies like a banana Time reactions like this one Time reactions like a chemist or is it just an NP?

Our bane: Ambiguity John saw Mary Typhoid Mary Phillips screwdriver Mary note how rare rules interact I see a bird is this 4 nouns – parsed like “city park scavenger bird”? rare parts of speech, plus systematic ambiguity in noun sequences Time | flies like an arrow NP VP Fruit flies | like a banana NP VP Time | reactions like this one V[stem] NP Time reactions | like a chemist S PP or is it just an NP?

How to solve this combinatorial explosion of ambiguity? 1.First try parsing without any weird rules, throwing them in only if needed. 2.Better: every rule has a weight. A tree’s weight is total weight of all its rules. Pick the overall lightest parse of sentence. 3.Can we pick the weights automatically? We’ll get to this later …

The plan for the rest of parsing There’s probability (inference) and then there’s statistics (model estimation) These two problems are logically separate, yet interrelated in practice We’ll start with the problem of efficient inference (probabilistic bottom-up parsing) Then we’ll move to the estimation of good (generative) models that permit efficient bottom-up parsing Finally, we’ll mention state-of-the-art work that doesn’t permit exact inference (“reranking” approaches)

The critical recursion We’ll do bottom-up parsing (weighted CKY) When combining constituents into a larger constituent: The weight of the new constituent is the sum of the weights of the combined subconstituents… …plus the weight of the rule used to combine the subconstituents

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 3 Det1 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 PP12 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 NP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 NP18 S21 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP S Follow backpointers …

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP S NP VP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP S NP VP PP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP S NP VP PP PNP

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP S NP VP PP PNP DetN

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Which entries do we need?

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Which entries do we need?

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Not worth keeping …

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP … since it just breeds worse options

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Keep only best-in-class! “inferior stock”

time 1 flies 2 like 3 an 4 arrow 5 NP3 Vst3 NP10 S8 NP24 S22 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Keep only best-in-class! (and backpointers so you can recover parse)

Probabilistic Trees Instead of lightest weight tree, take highest probability tree Given any tree, your assignment 1 generator would have some probability of producing it! Just like using n-grams to choose among strings … What is the probability of this tree? S NP time VP flies PP P like NP Det an N arrow

Probabilistic Trees Instead of lightest weight tree, take highest probability tree Given any tree, your assignment 1 generator would have some probability of producing it! Just like using n-grams to choose among strings … What is the probability of this tree? You rolled a lot of independent dice … S NP time VP flies PP P like NP Det an N arrow p( | S )

Chain rule: One word at a time p(time flies like an arrow) =p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an)

Chain rule + backoff (to get trigram model) p(time flies like an arrow) =p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an)

Chain rule – written differently p(time flies like an arrow) =p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an ) Proof: p(x,y | x) = p(x | x) * p(y | x, x) = 1 * p(y | x)

Chain rule + backoff p(time flies like an arrow) =p(time) * p(time flies | time) * p(time flies like | time flies) * p(time flies like an | time flies like) * p(time flies like an arrow | time flies like an ) Proof: p(x,y | x) = p(x | x) * p(y | x, x) = 1 * p(y | x)

Chain rule: One node at a time S NP time VP flies PP P like NP Det an N arrow p( | S ) = p( S NP VP | S ) * p( S NP time VP | S NP VP ) * p( S NP time VP PP | S NP time VP ) * p( S NP time VP flies PP | S NP time VP ) * … VP PP

Chain rule + backoff S NP time VP flies PP P like NP Det an N arrow p( | S ) = p( S NP VP | S ) * p( S NP time VP | S NP VP ) * p( S NP time VP PP | S NP time VP ) * p( S NP time VP flies PP | S NP time VP ) * … VP PP

Simplified notation S NP time VP flies PP P like NP Det an N arrow p( | S ) = p( S  NP VP | S ) * p( NP  flies | NP ) * p( VP  VP NP | VP ) * p( VP  flies | VP ) * …

Already have a CKY alg for weights … S NP time VP flies PP P like NP Det an N arrow w( | S ) = w( S  NP VP ) + w( NP  flies | NP ) + w( VP  VP NP ) + w( VP  flies ) + … Just let w( X  Y Z ) = -log p( X  Y Z | X ) Then lightest tree has highest prob

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP multiply to get

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP multiply to get Need only best-in-class to get best parse

Why probabilities not weights? We just saw probabilities are really just a special case of weights … … but we can estimate them from training data by counting and smoothing! Warning: What kind of training data do we need for this type of estimation?

A slightly different task One task: What is probability of generating a given tree with a PCFG generator? To pick tree with highest prob: useful in parsing. But could also ask: What is probability of generating a given string with the generator? To pick string with highest prob: useful in speech recognition, as substitute for an n-gram model. (“Put the file in the folder” vs. “Put the file and the folder”) To get prob of generating string, must add up probabilities of all trees for the string …

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S8 S13 NP24 S22 S27 NP24 S27 S22 S27 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP12 VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Could just add up the parse probabilities oops, back to finding exponentially many parses

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S S NP24 S22 S27 NP24 S27 S 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 -2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Any more efficient way?

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S NP24 S22 S27 NP24 S27 S 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 -2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Add as we go … (the “inside algorithm”)

time 1 flies 2 like 3 an 4 arrow 5 0 NP3 Vst3 NP10 S NP S 1 NP4 VP4 NP18 S21 VP18 2 P2V5P2V5 PP VP16 3 Det1NP10 4 N8N8 1 S  NP VP 6 S  Vst NP 2 -2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP Add as we go … (the “inside algorithm”)

PCFGs and HMMs There is a natural connection between: the inside algorithm for filling out a probabilistic CKY chart and the forward algorithm for filling out an HMM trellis Do you see the relationship? However, bottom-up for PCFGs and left-to-right for HMMs are not the only ways to go for DPs You can go outside-in for PCFGs We’ll see left-to-right (the Earley algorithm) for PCFGs And you can go right-to-left for HMMs

Charts and lattices You can equivalently represent a parse chart as a lattice constructed over some initial arcs This will also set the stage for Earley parsing later saltfliesscratch NP N NP S S VP V N NP S VP NP N NP V VP salt N NP V VP fliesscratch N NP V VPN NP NP S NP VP S S S  NP VP VP  V NP VP  V NP  N NP  N N

(Speech) Lattices There was nothing magical about words spanning exactly one position. When working with speech, we generally don’t know how many words there are, or where they break. We can represent the possibilities as a lattice and parse these just as easily. I awe of van eyes saw a ‘ve an Ivan