Parsing Tricks 600.465 - Intro to NLP - J. Eisner.

Slides:



Advertisements
Similar presentations
Feature Structures and Parsing Unification Grammars Algorithms for NLP 18 November 2014.
Advertisements

Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Parsing with Context Free Grammars Reading: Chap 13, Jurafsky & Martin
1 Earley Algorithm Chapter 13.4 October 2009 Lecture #9.
Context-Free Parsing. 2/37 Basic issues Top-down vs. bottom-up Handling ambiguity –Lexical ambiguity –Structural ambiguity Breadth first vs. depth first.
Intro to NLP - J. Eisner1 Probabilistic CKY.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
CS 4705 Basic Parsing with Context-Free Grammars.
Intro to NLP - J. Eisner1 The Expectation Maximization (EM) Algorithm … continued!
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
Intro to NLP - J. Eisner1 Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:  no restrictions on the form of the grammar:  A.
1 Natural Language Processing Lecture 11 Efficient Parsing Reading: James Allen NLU (Chapter 6)
Intro to NLP - J. Eisner1 Tree-Adjoining Grammar (TAG) One of several formalisms that are actually more powerful than CFG Note: CFG with features.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Context-Free Parsing Read J & M Chapter 10.. Basic Parsing Facts Regular LanguagesContext-Free Languages Required Automaton FSMPDA Algorithm to get rid.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Probabilistic CKY Roger Levy [thanks to Jason Eisner]
CSA2050 Introduction to Computational Linguistics Parsing I.
Chart Parsing and Augmenting Grammars CSE-391: Artificial Intelligence University of Pennsylvania Matt Huenerfauth March 2005.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
Intro to NLP - J. Eisner1 Parsing Tricks.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
The Expectation Maximization (EM) Algorithm
Statistical NLP Winter 2009
CSC 594 Topics in AI – Natural Language Processing
Programming Languages Translator
Chapter 2 :: Programming Language Syntax
Parsing IV Bottom-up Parsing
Basic Parsing with Context Free Grammars Chapter 13
Inside-Outside & Forward-Backward Algorithms are just Backprop
CS 3304 Comparative Languages
Probabilistic and Lexicalized Parsing
Top-Down Parsing.
Top-Down Parsing CS 671 January 29, 2008.
Recitation Akshay Srivatsan
Natural Language - General
Parsing #2 Leonidas Fegaras.
Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:
Parsing and More Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
Parsing #2 Leonidas Fegaras.
The Expectation Maximization (EM) Algorithm
Chunk Parsing CS1573: AI Application Development, Spring 2003
Chapter 2 :: Programming Language Syntax
Chapter Fifteen: Stack Machine Applications
Chapter 2 :: Programming Language Syntax
CSA2050 Introduction to Computational Linguistics
David Kauchak CS159 – Spring 2019
Parsing I: CFGs & the Earley Parser
Probabilistic Parsing
Tree-Adjoining Grammar (TAG)
State-Space Searches.
State-Space Searches.
David Kauchak CS159 – Spring 2019
State-Space Searches.
NLP.
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Parsing Tricks 600.465 - Intro to NLP - J. Eisner

Left-Corner Parsing Technique for 1 word of lookahead in algorithms like Earley’s (can also do multi-word lookahead but it’s harder) 600.465 - Intro to NLP - J. Eisner

Basic Earley’s Algorithm 0 Papa 1 0 ROOT . S 0 NP Papa . 0 S . NP VP 0 S NP . VP 0 NP . Det N 0 NP NP . PP 0 NP . NP PP 0 NP . Papa 0 Det . the 0 Det . a attach

predict 0 Papa 1 0 ROOT . S 0 NP Papa . 0 S . NP VP 0 S NP . VP 0 NP . Det N 0 NP NP . PP 0 NP . NP PP 1 VP . V NP 0 NP . Papa 1 VP . VP PP 0 Det . the 0 Det . a predict

predict 0 Papa 1 0 ROOT . S 0 NP Papa . 0 S . NP VP 0 S NP . VP 0 NP . Det N 0 NP NP . PP 0 NP . NP PP 1 VP . V NP 0 NP . Papa 1 VP . VP PP 0 Det . the 1 PP . P NP 0 Det . a predict

.V makes us add all the verbs in the vocabulary! 0 Papa 1 0 ROOT . S 0 NP Papa . 0 S . NP VP 0 S NP . VP 0 NP . Det N 0 NP NP . PP 0 NP . NP PP 1 VP . V NP 0 NP . Papa 1 VP . VP PP 0 Det . the 1 PP . P NP 0 Det . a 1 V . ate 1 V . drank 1 V . snorted predict .V makes us add all the verbs in the vocabulary! Slow – we’d like a shortcut.

Every .VP adds all VP  … rules again. 0 Papa 1 0 ROOT . S 0 NP Papa . 0 S . NP VP 0 S NP . VP 0 NP . Det N 0 NP NP . PP 0 NP . NP PP 1 VP . V NP 0 NP . Papa 1 VP . VP PP 0 Det . the 1 PP . P NP 0 Det . a 1 V . ate 1 V . drank 1 V . snorted predict Every .VP adds all VP  … rules again. Before adding a rule, check it’s not a duplicate. Slow if there are > 700 VP  … rules, so what will you do in Homework 4?

.P makes us add all the prepositions … 0 Papa 1 0 ROOT . S 0 NP Papa . 0 S . NP VP 0 S NP . VP 0 NP . Det N 0 NP NP . PP 0 NP . NP PP 1 VP . V NP 0 NP . Papa 1 VP . VP PP 0 Det . the 1 PP . P NP 0 Det . a 1 V . ate 1 V . drank 1 V . snorted 1 P . with predict .P makes us add all the prepositions …

1-word lookahead would help 0 Papa 1 0 ROOT . S 0 NP Papa . 0 S . NP VP 0 S NP . VP 0 NP . Det N 0 NP NP . PP 0 NP . NP PP 1 VP . V NP 0 NP . Papa 1 VP . VP PP 0 Det . the 1 PP . P NP 0 Det . a 1 V . ate 1 V . drank 1 V . snorted 1 P . with ate No point in adding words other than ate

1-word lookahead would help 0 Papa 1 0 ROOT . S 0 NP Papa . 0 S . NP VP 0 S NP . VP 0 NP . Det N 0 NP NP . PP 0 NP . NP PP 1 VP . V NP 0 NP . Papa 1 VP . VP PP 0 Det . the 1 PP . P NP 0 Det . a 1 V . ate 1 V . drank 1 V . snorted 1 P . with ate In fact, no point in adding any constituent that can’t start with ate Don’t bother adding PP, P, etc. No point in adding words other than ate

With Left-Corner Filter 0 Papa 1 0 ROOT . S 0 NP Papa . 0 S . NP VP 0 S NP . VP 0 NP . Det N 0 NP NP . PP 0 NP . NP PP 0 NP . Papa 0 Det . the 0 Det . a ate attach PP can’t start with ate Birth control – now we won’t predict 1 PP . P NP 1 P . with either! Need to know that ate can’t start PP Take closure of all categories that it does start …

predict 0 Papa 1 0 ROOT . S 0 NP Papa . 0 S . NP VP 0 S NP . VP 0 NP . Det N 0 NP NP . PP 0 NP . NP PP 1 VP . V NP 0 NP . Papa 1 VP . VP PP 0 Det . the 0 Det . a ate predict

predict 0 Papa 1 0 ROOT . S 0 NP Papa . 0 S . NP VP 0 S NP . VP 0 NP . Det N 0 NP NP . PP 0 NP . NP PP 1 VP . V NP 0 NP . Papa 1 VP . VP PP 0 Det . the 1 V . ate 0 Det . a 1 V . drank 1 V . snorted ate predict

predict 0 Papa 1 0 ROOT . S 0 NP Papa . 0 S . NP VP 0 S NP . VP 0 NP . Det N 0 NP NP . PP 0 NP . NP PP 1 VP . V NP 0 NP . Papa 1 VP . VP PP 0 Det . the 1 V . ate 0 Det . a 1 V . drank 1 V . snorted ate predict

Merging Right-Hand Sides Grammar might have rules X  A G H P X  B G H P Could end up with both of these in chart: (2, X  A . G H P) in column 5 (2, X  B . G H P) in column 5 But these are now interchangeable: if one produces X then so will the other To avoid this redundancy, can always use dotted rules of this form: X  ... G H P 600.465 - Intro to NLP - J. Eisner

Merging Right-Hand Sides Similarly, grammar might have rules X  A G H P X  A G H Q Could end up with both of these in chart: (2, X  A . G H P) in column 5 (2, X  A . G H Q) in column 5 Not interchangeable, but we’ll be processing them in parallel for a while … Solution: write grammar as X  A G H (P|Q) 600.465 - Intro to NLP - J. Eisner

Merging Right-Hand Sides Combining the two previous cases: X  A G H P X  A G H Q X  B G H P X  B G H Q becomes X  (A | B) G H (P | Q) And often nice to write stuff like NP  (Det | ) Adj* N 600.465 - Intro to NLP - J. Eisner

Merging Right-Hand Sides X  (A | B) G H (P | Q) NP  (Det | ) Adj* N These are regular expressions! Build their minimal DFAs: A B P Q G H X  Automaton states replace dotted rules (X  A G . H P) Det Adj N NP  600.465 - Intro to NLP - J. Eisner

Merging Right-Hand Sides Indeed, all NP  rules can be unioned into a single DFA! NP  ADJP ADJP JJ JJ NN NNS NP  ADJP DT NN NP  ADJP JJ NN NP  ADJP JJ NN NNS NP  ADJP JJ NNS NP  ADJP NN NP  ADJP NN NN NP  ADJP NN NNS NP  ADJP NNS NP  ADJP NPR NP  ADJP NPRS NP  DT NP  DT ADJP NP  DT ADJP , JJ NN NP  DT ADJP ADJP NN NP  DT ADJP JJ JJ NN NP  DT ADJP JJ NN NP  DT ADJP JJ NN NN etc. 600.465 - Intro to NLP - J. Eisner

Merging Right-Hand Sides Indeed, all NP  rules can be unioned into a single DFA! NP  ADJP ADJP JJ JJ NN NNS | ADJP DT NN | ADJP JJ NN | ADJP JJ NN NNS | ADJP JJ NNS | ADJP NN | ADJP NN NN | ADJP NN NNS | ADJP NNS | ADJP NPR | ADJP NPRS | DT | DT ADJP | DT ADJP , JJ NN | DT ADJP ADJP NN | DT ADJP JJ JJ NN | DT ADJP JJ NN | DT ADJP JJ NN NN etc. DFA ADJP DT NP  NP regular expression ADJP  ADJ P ADJP 600.465 - Intro to NLP - J. Eisner

Earley’s Algorithm on DFAs What does Earley’s algorithm now look like? VP  … PP NP … Column 4 … (2, ) predict 600.465 - Intro to NLP - J. Eisner

Earley’s Algorithm on DFAs What does Earley’s algorithm now look like? VP  … PP NP … Det Adj N NP  PP PP  … Column 4 … (2, ) (4, ) predict 600.465 - Intro to NLP - J. Eisner

Earley’s Algorithm on DFAs What does Earley’s algorithm now look like? VP  … PP NP … Det Adj N NP  PP PP  … Column 4 Column 5 … Column 7 (2, ) (4, ) (4, ) predict or attach? 600.465 - Intro to NLP - J. Eisner

Earley’s Algorithm on DFAs What does Earley’s algorithm now look like? VP  … PP NP … Det Adj N NP  PP PP  … Column 4 Column 5 … Column 7 (2, ) (4, ) (7, ) (4, ) predict or attach? Both! 600.465 - Intro to NLP - J. Eisner

Pruning and Prioritization Heuristically throw away constituents that probably won’t make it into best complete parse. Use probabilities to decide which ones. So probs are useful for speed as well as accuracy! Both safe and unsafe methods exist Iterative deepening: Throw x away if p(x) < 10-200 (and lower this threshold if we don’t get a parse) Heuristic pruning: Throw x away if p(x) < 0.01 * p(y) for some y that spans the same set of words (for example) Prioritization: If p(x) is low, don’t throw x away; just postpone using it until you need it (hopefully you won’t). 600.465 - Intro to NLP - J. Eisner 25

Prioritization continued: Agenda-Based Parsing Prioritization: If p(x) is low, don’t throw x away; just postpone using it until you need it. In other words, explore best options first. Should get some good parses early on; then stop! time 1 flies 2 like 3 an 4 arrow 5 NP 3 Vst 3 NP 10 S 8 NP 24 S 22 1 NP 4 VP 4 NP 18 S 21 VP 18 2 P 2 V 5 PP 12 VP 16 3 Det 1 4 N 8 600.465 - Intro to NLP - J. Eisner

Prioritization continued: Agenda-Based Parsing until we pop a parse 0S5 or fail with empty agenda pop top element iYj from agenda into chart for each right neighbor jZk for each rule X  Y Z in grammar put iXk onto the agenda for each left neighbor hZi for each rule X  Z Y put hXj onto the agenda chart of good constituents prioritized agenda of pending constituents (ordered by p(x), say) time 1 flies 2 like 3 an 4 arrow 5 NP 3 Vst 3 S 8 1 NP 4 VP 4 2 P 2 V 5 3 Det 1 4 N 8 3NP5 10 0NP2 10 … 600.465 - Intro to NLP - J. Eisner 27

Prioritization continued: Agenda-Based Parsing until we pop a parse 0S5 or fail with empty agenda pop top element iYj from agenda into chart for each right neighbor jZk for each rule X  Y Z in grammar put iXk onto the agenda for each left neighbor hZi for each rule X  Z Y put hXj onto the agenda chart of good constituents prioritized agenda of pending constituents (ordered by p(x), say) time 1 flies 2 like 3 an 4 arrow 5 NP 3 Vst 3 S 8 1 NP 4 VP 4 2 P 2 V 5 3 Det 1 NP 10 4 N 8 0NP2 10 … 2VP5 16 600.465 - Intro to NLP - J. Eisner 28

Prioritization continued: Agenda-Based Parsing always finds best parse! analogous to Dijkstra’s shortest-path algorithm until we pop a parse 0S5 or fail with empty agenda pop top element iYj from agenda into chart for each right neighbor jZk for each rule X  Y Z in grammar put iXk onto the agenda for each left neighbor hZi for each rule X  Z Y put hXj onto the agenda chart of good constituents prioritized agenda of pending constituents (ordered by p(x), say) time 1 flies 2 like 3 an 4 arrow 5 NP 3 Vst 3 S 8 1 NP 4 VP 4 2 P 2 V 5 3 Det 1 NP 10 4 N 8 will eventually both pop and combine 0NP2 10 2PP5 12 … 2VP5 16 600.465 - Intro to NLP - J. Eisner 29

Outside Estimates for better Pruning and Prioritization Iterative deepening: Throw x away if p(x)*q(x) < 10-200 (lower this threshold if we don’t get a parse) Heuristic pruning: Throw x away if p(x)*q(x) < 0.01*p(y)*q(y) for some y that spans the same set of words Prioritized agenda: Priority of x on agenda is p(x)*q(x); stop at first parse In general, the “inside prob” p(x) will be higher for smaller constituents Not many rule probabilities inside them The “outside prob” q(x) is intended to correct for this Estimates the prob of all the rest of the rules needed to build x into full parse So p(x)*q(x) estimates prob of the best parse that contains x If we take q(x) to be the best estimate we can get Methods may no longer be safe (but may be fast!) Prioritized agenda is then called a “best-first algorithm” But if we take q(x)=1, that’s just the methods from previous slides And iterative deepening and prioritization were safe there If we take q(x) to be an “optimistic estimate” (always ≥ true prob) Still safe! Prioritized agenda is then an example of an “A* algorithm” 600.465 - Intro to NLP - J. Eisner 30

Outside Estimates for better Pruning and Prioritization Iterative deepening: Throw x away if p(x)*q(x) < 10-200 (lower this threshold if we don’t get a parse) Heuristic pruning: Throw x away if p(x)*q(x) < 0.01*p(y)*q(y) for some y that spans the same set of words Prioritized agenda: Priority of x on agenda is p(x)*q(x); stop at first parse In general, the “inside prob” p(x) will be higher for smaller constituents Not many rule probabilities inside them The “outside prob” q(x) is intended to correct for this Estimates the prob of all the rest of the rules needed to build x into full parse So p(x)*q(x) estimates prob of the best parse that contains x If we take q(x) to be the best estimate we can get Methods may no longer be safe (but may be fast!) Prioritized agenda is then called a “best-first algorithm” But if we take q(x)=1, that’s just the methods from previous slides And iterative deepening and prioritization were safe there If we take q(x) to be an “optimistic estimate” (always ≥ true prob) Still safe! Prioritized agenda is then an example of an “A* algorithm” Terminology warning: Here “inside” and “outside” mean probability of the best partial parse inside or outside x But traditionally, they mean total prob of all such partial parses (as in the “inside algorithm” that we saw in the previous lecture) 600.465 - Intro to NLP - J. Eisner 31

Preprocessing First “tag” the input with parts of speech: Guess the correct preterminal for each word, using faster methods we’ll learn later Now only allow one part of speech per word This eliminates a lot of crazy constituents! But if you tagged wrong you could be hosed Raise the stakes: What if tag says not just “verb” but “transitive verb”? Or “verb with a direct object and 2 PPs attached”? (“supertagging”) Safer to allow a few possible tags per word, not just one … 600.465 - Intro to NLP - J. Eisner

Center-Embedding STATEMENT  if EXPR then STATEMENT endif if x then if y if a then b endif else b STATEMENT  if EXPR then STATEMENT endif STATEMENT  if EXPR then STATEMENT else STATEMENT endif But not: STATEMENT  if EXPR then STATEMENT 600.465 - Intro to NLP - J. Eisner

Center-Embedding This is the rat that ate the malt. This is the malt that the rat ate. This is the cat that bit the rat that ate the malt. This is the malt that the rat that the cat bit ate. This is the dog that chased the cat that bit the rat that ate the malt. This is the malt that [the rat that [the cat that [the dog chased] bit] ate]. 600.465 - Intro to NLP - J. Eisner

More Center-Embedding [What did you disguise [those handshakes that you greeted [the people we bought [the bench [Billy was read to] on] with] for]? [Which mantelpiece did you put [the idol I sacrificed [the fellow we sold [the bridge you threw [the bench [Billy was read to] on] off] to] on]? Take that, English teachers! 600.465 - Intro to NLP - J. Eisner

Center Recursion vs. Tail Recursion [What did you disguise [those handshakes that you greeted [the people we bought [the bench [Billy was read to] on] with] for]? [For what did you disguise [those handshakes with which you greeted [the people with which we bought [the bench on which [Billy was read to]? “pied piping” – NP moves leftward, preposition follows along 600.465 - Intro to NLP - J. Eisner

Disallow Center-Embedding? Center-embedding seems to be in the grammar, but people have trouble processing more than 1 level of it. You can limit # levels of center-embedding via features: e.g., S[S_DEPTH=n+1]  A S[S_DEPTH=n] B If a CFG limits # levels of embedding, then it can be compiled into a finite-state machine – we don’t need a stack at all! Finite-state recognizers run in linear time. However, it’s tricky to turn them into parsers for the original CFG from which the recognizer was compiled. And compiling a small grammar into a much larger FSA may be a net loss – structure sharing in the parse chart is expanded out to duplicate structure all over the FSA. 600.465 - Intro to NLP - J. Eisner

Parsing Algs for non-CFG If you’re going to make up a new kind of grammar, you should also describe how to parse it. Such algorithms exist, e.g., for TAG (where the grammar specifies not just rules but larger tree fragments, which can be combined by “substitution” and “adjunction” operations) for CCG (where the grammar only specifies preterminal rules, and there are generic operations to combine slashed nonterminals like X/Y or (X/Z)/(Y\W)) 600.465 - Intro to NLP - J. Eisner