Statistical NLP Spring 2010 Lecture 12: Parsing I Dan Klein – UC Berkeley.

Slides:



Advertisements
Similar presentations
PARSING WITH CONTEXT-FREE GRAMMARS
Advertisements

Statistical NLP: Lecture 3
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 3.
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Context-Free Parsing. 2/37 Basic issues Top-down vs. bottom-up Handling ambiguity –Lexical ambiguity –Structural ambiguity Breadth first vs. depth first.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
1 I256 Applied Natural Language Processing Fall 2009 Sentence Structure Barbara Rosario.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Statistical NLP Spring 2010 Lecture 13: Parsing II Dan Klein – UC Berkeley.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Parsing and Syntax. Syntactic Formalisms: Historic Perspective “Syntax” comes from Greek word “syntaxis”, meaning “setting out together or arrangement”
Parsing Based on presentations from Chris Manning’s course on Statistical Parsing (Stanford)
Linguistic Essentials
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 2.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Rules, Movement, Ambiguity
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Handling Unlike Coordinated Phrases in TAG by Mixing Syntactic Category and Grammatical Function Carlos A. Prolo Faculdade de Informática – PUCRS CELSUL,
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Speech and Language Processing SLP Chapter 13 Parsing.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Vasile Rus
Statistical Natural Language Parsing Parsing: The rise of data and statistics.
CSC 594 Topics in AI – Natural Language Processing
Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition
Statistical NLP Winter 2009
Statistical NLP: Lecture 3
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Statistical NLP Spring 2011
Statistical NLP Spring 2011
Lecture 7: Introduction to Parsing (Syntax Analysis)
Constraining Chart Parsing with Partial Tree Bracketing
Chunk Parsing CS1573: AI Application Development, Spring 2003
Linguistic Essentials
David Kauchak CS159 – Spring 2019
David Kauchak CS159 – Spring 2019
Presentation transcript:

Statistical NLP Spring 2010 Lecture 12: Parsing I Dan Klein – UC Berkeley

Competitive Parsing G9 G1G8 G4 G6 G5 G2

Competitive Parsing  Winners  Best grammaticality rate: Group 1  Best perplexity on class output: Group 9  Best perplexity on examples: Group 6  Examples  that land has no sovereign.  each husk spoke ; which five ants again have carried ?  tropical yellow hard eight halves must know precisely ?  the they suggest neither precisely nor already across story to suggest  it is Patsy what migrated !  another chalice cover that castle.

Competitive Parsing  More examples  True: Saxons know Arthur !  Poetic: snows are neither south nor again.  Weird: who does neither neither every castle nor the quest nor any quest so every coconut being halves rides another castle a coconut no coconut ?  Long: each home grow... either any defeater has been covering – every coconut suggests, and England was a defeater... nor areas should are for this land : but snows drink at that home below this chalice : either no king will spoke every winter of no fruit... or that master does each king : England above a quest been carrying !

Parse Trees The move followed a round of similar increases by other lenders, reflecting a continuing decline in that market

Phrase Structure Parsing  Phrase structure parsing organizes syntax into constituents or brackets  In general, this involves nested trees  Linguists can, and do, argue about details  Lots of ambiguity  Not the only kind of syntax… new art critics write reviews with computers PP NP N’ NP VP S

Constituency Tests  How do we know what nodes go in the tree?  Classic constituency tests:  Substitution by proform  Question answers  Semantic gounds  Coherence  Reference  Idioms  Dislocation  Conjunction  Cross-linguistic arguments, too

Conflicting Tests  Constituency isn’t always clear  Units of transfer:  think about ~ penser à  talk about ~ hablar de  Phonological reduction:  I will go  I’ll go  I want to go  I wanna go  a le centre  au centre  Coordination  He went to and came from the store. La vélocité des ondes sismiques

Classical NLP: Parsing  Write symbolic or logical rules:  Use deduction systems to prove parses from words  Minimal grammar on “Fed raises” sentence: 36 parses  Simple 10-rule grammar: 592 parses  Real-size grammar: many millions of parses  This scaled very badly, didn’t yield broad-coverage tools Grammar (CFG)Lexicon ROOT  S S  NP VP NP  DT NN NP  NN NNS NN  interest NNS  raises VBP  interest VBZ  raises … NP  NP PP VP  VBP NP VP  VBP NP PP PP  IN NP

Ambiguities: PP Attachment

Attachments  I cleaned the dishes from dinner  I cleaned the dishes with detergent  I cleaned the dishes in my pajamas  I cleaned the dishes in the sink

Syntactic Ambiguities I  Prepositional phrases: They cooked the beans in the pot on the stove with handles.  Particle vs. preposition: The puppy tore up the staircase.  Complement structures The tourists objected to the guide that they couldn’t hear. She knows you like the back of her hand.  Gerund vs. participial adjective Visiting relatives can be boring. Changing schedules frequently confused passengers.

Syntactic Ambiguities II  Modifier scope within NPs impractical design requirements plastic cup holder  Multiple gap constructions The chicken is ready to eat. The contractors are rich enough to sue.  Coordination scope: Small rats and mice can squeeze into holes or cracks in the wall.

Probabilistic Context-Free Grammars  A context-free grammar is a tuple  N : the set of non-terminals  Phrasal categories: S, NP, VP, ADJP, etc.  Parts-of-speech (pre-terminals): NN, JJ, DT, VB  T : the set of terminals (the words)  S : the start symbol  Often written as ROOT or TOP  Not usually the sentence non-terminal S  R : the set of rules  Of the form X  Y 1 Y 2 … Y k, with X, Y i  N  Examples: S  NP VP, VP  VP CC VP  Also called rewrites, productions, or local trees  A PCFG adds:  A top-down production probability per rule P(Y 1 Y 2 … Y k | X)

Treebank Sentences

Treebank Grammars  Need a PCFG for broad coverage parsing.  Can take a grammar right off the trees (doesn’t work well):  Better results by enriching the grammar (e.g., lexicalization).  Can also get reasonable parsers without lexicalization. ROOT  S1 S  NP VP.1 NP  PRP1 VP  VBD ADJP1 …..

PLURAL NOUN NOUNDET ADJ NOUN NP CONJ NP PP Treebank Grammar Scale  Treebank grammars can be enormous  As FSAs, the raw grammar has ~10K states, excluding the lexicon  Better parsers usually make the grammars larger, not smaller NP

Chomsky Normal Form  Chomsky normal form:  All rules of the form X  Y Z or X  w  In principle, this is no limitation on the space of (P)CFGs  N-ary rules introduce new non-terminals  Unaries / empties are “promoted”  In practice it’s kind of a pain:  Reconstructing n-aries is easy  Reconstructing unaries is trickier  The straightforward transformations don’t preserve tree scores  Makes parsing algorithms simpler! VP [VP  VBD NP  ] VBD NP PP [VP  VBD NP PP  ] VBD NP PP PP VP

A Recursive Parser  Will this parser work?  Why or why not?  Memory requirements? bestScore(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max score(X->YZ) * bestScore(Y,i,k) * bestScore(Z,k,j)

A Memoized Parser  One small change: bestScore(X,i,j,s) if (scores[X][i][j] == null) if (j = i+1) score = tagScore(X,s[i]) else score = max score(X->YZ) * bestScore(Y,i,k) * bestScore(Z,k,j) scores[X][i][j] = score return scores[X][i][j]

 Can also organize things bottom-up A Bottom-Up Parser (CKY) bestScore(s) for (i : [0,n-1]) for (X : tags[s[i]]) score[X][i][i+1] = tagScore(X,s[i]) for (diff : [2,n]) for (i : [0,n-diff]) j = i + diff for (X->YZ : rule) for (k : [i+1, j-1]) score[X][i][j] = max score[X][i][j], score(X->YZ) * score[Y][i][k] * score[Z][k][j] YZ X i k j

Unary Rules  Unary rules? bestScore(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max max score(X->YZ) * bestScore(Y,i,k) * bestScore(Z,k,j) max score(X->Y) * bestScore(Y,i,j)

CNF + Unary Closure  We need unaries to be non-cyclic  Can address by pre-calculating the unary closure  Rather than having zero or more unaries, always have exactly one  Alternate unary and binary layers  Reconstruct unary chains afterwards NP DTNN VP VBD NP DTNN VP VBDNP VP S SBAR VP SBAR

Alternating Layers bestScoreU(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max max score(X->Y) * bestScoreB(Y,i,j) bestScoreB(X,i,j,s) return max max score(X->YZ) * bestScoreU(Y,i,k) * bestScoreU(Z,k,j)

Memory  How much memory does this require?  Have to store the score cache  Cache size: |symbols|*n 2 doubles  For the plain treebank grammar:  X ~ 20K, n = 40, double ~ 8 bytes = ~ 256MB  Big, but workable.  Pruning: Beams  score[X][i][j] can get too large (when?)  Can keep beams (truncated maps score[i][j]) which only store the best few scores for the span [i,j]  Pruning: Coarse-to-Fine  Use a smaller grammar to rule out most X[i,j]  Much more on this later…

Time: Theory  How much time will it take to parse?  For each diff (<= n)  For each i (<= n)  For each rule X  Y Z  For each split point k Do constant work  Total time: |rules|*n 3  Something like 5 sec for an unoptimized parse of a 20-word sentences YZ X i k j

Time: Practice  Parsing with the vanilla treebank grammar:  Why’s it worse in practice?  Longer sentences “unlock” more of the grammar  All kinds of systems issues don’t scale ~ 20K Rules (not an optimized parser!) Observed exponent: 3.6

Efficient CKY  Lots of tricks to make CKY efficient  Most of them are little engineering details:  E.g., first choose k, then enumerate through the Y:[i,k] which are non-zero, then loop through rules by left child.  Optimal layout of the dynamic program depends on grammar, input, even system details.  Another kind is more critical:  Many X:[i,j] can be suppressed on the basis of the input string  We’ll see this next class as figures-of-merit or A* heuristics

Same-Span Reachability ADJP ADVP FRAG INTJ NP PP PRN QP S SBAR UCP VP WHNP TOP LST CONJP WHADJP WHADVP WHPP NX NAC SBARQ SINV RRCSQX PRT

Rule State Reachability  Many states are more likely to match larger spans! Example: NP CC  NPCC 0nn-1 1 Alignment Example: NP CC NP  NPCC 0nn-k-1 n Alignments NP n-k

Non-Local Phenomena  Dislocation / gapping  Why did the postman think that the neighbors were home?  A debate arose which continued until the election.  Binding  Reference  The IRS audits itself  Control  I want to go  I want you to go

Regularity of Rules  Argumentation  Adjunction  Coordination  X’ Theory

Human Processing  Garden pathing:  Ambiguity maintenance

PP Attachment

An Example criticswritereviewswithcomputers NP PP VP NP 67 newart NP VP S JJNNSVBP

Unaries in Grammars TOP S-HLN NP-SUBJVP VB-NONE-  Atone PTB Tree TOP S NPVP VB-NONE-  Atone NoTransform TOP S VP VB Atone NoEmpties TOP S Atone NoUnaries TOP VB Atone HighLow

Dark Ambiguities  Dark ambiguities: most analyses are shockingly bad (meaning, they don’t have an interpretation you can get your mind around)  Unknown words and new usages  Solution: We need mechanisms to focus attention on the best ones, probabilistic techniques do this This analysis corresponds to the correct parse of “This will panic buyers ! ”

The Parsing Problem criticswritereviewswithcomputers NP PP VP NP 67 newart NP VP S