Parsing I Context-free grammars and issues for parsers.

Slides:



Advertisements
Similar presentations
Natural Language Processing - Parsing 1 - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment / Binding Bottom vs. Top Down Parsing.
Advertisements

May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
PARSING WITH CONTEXT-FREE GRAMMARS
Top-down Parsing By Georgi Boychev, Rafal Kala, Ildus Mukhametov.
Parsing with Context Free Grammars Reading: Chap 13, Jurafsky & Martin
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.
1 Earley Algorithm Chapter 13.4 October 2009 Lecture #9.
 Christel Kemke /08 COMP 4060 Natural Language Processing PARSING.
CS Basic Parsing with Context-Free Grammars.
Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Context-Free Parsing. 2/37 Basic issues Top-down vs. bottom-up Handling ambiguity –Lexical ambiguity –Structural ambiguity Breadth first vs. depth first.
Top-Down Parsing.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Amirkabir University of Technology Computer Engineering Faculty AILAB Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing Course,
1/13 Parsing III Probabilistic Parsing and Conclusions.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
CS 4705 Basic Parsing with Context-Free Grammars.
Professor Yihjia Tsai Tamkang University
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Chapter 3: Formal Translation Models
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Linguistics II Syntax. Rules of how words go together to form sentences What types of words go together How the presence of some words predetermines others.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
Intro to NLP - J. Eisner1 Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:  no restrictions on the form of the grammar:  A.
October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.
Top-Down Parsing - recursive descent - predictive parsing
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
1 CKY and Earley Algorithms Chapter 13 October 2012 Lecture #8.
Chapter 10. Parsing with CFGs From: Chapter 10 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by.
LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
Parsing with Context-Free Grammars References: 1.Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2.Speech and Language Processing, chapters 9,
Chart Parsing and Augmenting Grammars CSE-391: Artificial Intelligence University of Pennsylvania Matt Huenerfauth March 2005.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
Natural Language - General
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
November 2004csa3050: Sentence Parsing II1 CSA350: NLP Algorithms Sentence Parsing 2 Top Down Bottom-Up Left Corner BUP Implementation in Prolog.
Quick Speech Synthesis CMSC Natural Language Processing April 29, 2003.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
Computerlinguistik II / Sprachtechnologie Vorlesung im SS 2010 (M-GSW-10) Prof. Dr. Udo Hahn Lehrstuhl für Computerlinguistik Institut für Germanistische.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Top-Down Parsing.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 3.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
November 2004csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Speech and Language Processing SLP Chapter 13 Parsing.
Basic Parsing with Context Free Grammars Chapter 13
CS : Speech, NLP and the Web/Topics in AI
Natural Language - General
Parsing and More Parsing
CSA2050 Introduction to Computational Linguistics
Parsing I: CFGs & the Earley Parser
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Presentation transcript:

Parsing I Context-free grammars and issues for parsers

2/45 Bibliography More or less all books on CL or NLP will have chapters on parsing, and some may be all or mostly about parsing Many are written for computer scientists –They explain linguistic things like POSs and PS grammars –They go into detail about implementation (eg talk of Earley’s algorithm, shift- reduce parsers) D Jurafsky & JH Martin Speech and Language Processing, Upper Saddle River NJ (2000): Prentice Hall. Chs 9 & 10 RM Kaplan ‘Syntax’ = Ch 4 of R Mitkov (ed) The Oxford Handbook of Computational Linguistics, Oxford (2003): OUP J Allen Natural Language Understanding (2 nd ed) (1994): Addison Wesley

3/45 Parsing Bedrock of (almost) all NLP Familiar from linguistics But issues for NLP are practicalities rather than universality –Implementation –Grammar writing –Interplay with lexicons –Suitability of representation (what do the trees show?)

4/45 Basic issues Top-down vs. bottom-up Handling ambiguity –Lexical ambiguity –Structural ambiguity Breadth first vs. depth first Handling recursive rules Handling empty rules

5/45 Some terminology Rules written A  B c oTerminal vs. non-terminal symbols oLeft-hand side (head): always non-terminal oRight-hand side (body): can be mix of terminal and non-terminal, any number of them oUnique start symbol (usually S) o‘  ’ “rewrites as”, but is not directional (an “=” sign would be better)

6/45 S  NP VP NP  det n VP  v VP  v NP 1. Top-down with simple grammar Lexicon det  {an, the} n  {elephant, man} v  shot S  NP VP S NP VP NP  det n the man shot an elephant det n det  {an, the} the n  {elephant, man} man VP  v VP  v NP v v  shot shot No more rules, but input is not completely accounted for… So we must backtrack, and try the other VP rule

7/45 Lexicon det  {an, the} n  {elephant, man} v  shot S  NP VP NP  det n VP  v VP  v NP 1. Top-down with simple grammar S  NP VP S NP VP NP  det n the man shot an elephant det n det  {an, the} the n  {elephant, man} man VP  v VP  v NP v NP v  shot shot NP  det n det n det  {an, the} an n  {elephant, man} elephant No more rules, and input is completely accounted for

8/45 Breadth-first vs depth-first (1) When we came to the VP rule we were faced with a choice of two rules “Depth-first” means following the first choice through to the end “Breadth-first” means keeping all your options open We’ll see this distinction more clearly later, And also see that it is quite significant

9/45 S  NP VP NP  det n VP  v VP  v NP 2. Bottom-up with simple grammar Lexicon det  {an, the} n  {elephant, man} v  shot S  NP VP S NP  det n the man shot an elephant VP  v NPVP We’ve reached the top, but input is not completely accounted for… So we must backtrack, and try the other VP rule det  {an, the} n  {elephant, man} v  shot detnv n NP VP  v NP VP S  NP VP S We’ve reached the top, and input is completely accounted for

10/45 S  NP VP NP  det n VP  v VP  v NP Same again but with lexical ambiguity Lexicon det  {an, the} n  {elephant, man, shot} v  shot shot can be v or n

11/45 S  NP VP NP  det n VP  v VP  v NP 3. Top-down with lexical ambiguity Lexicon det  {an, the} n  {elephant, man, shot} v  shot S  NP VP S NP VP NP  det n the man shot an elephant det n det  {an, the} the n  {elephant, man} man VP  v VP  v NP v NP shot det n an elephant Same as before: at this point, we are looking for a v, and shot fits the bill; the n reading never comes into play

12/45 S  NP VP NP  det n VP  v VP  v NP 4. Bottom-up with lexical ambiguity Lexicon det  {an, the} n  {elephant, man, shot} v  shot S  NP VP S NP  det n the man shot an elephant VP  v NP VP det  {an, the} n  {elephant, man, shot} v  shot det n v n NP n VP  v NP VP S Terminology: graph nodes arcs (edges)

13/45 det n S  NP VP NP  det n VP  v VP  v NP 4. Bottom-up with lexical ambiguity Lexicon det  {an, the} n  {elephant, man, shot} v  shot the man shot an elephant NP det n v NP n VP S S Let’s get rid of all the unused arcs

14/45 det n S  NP VP NP  det n VP  v VP  v NP 4. Bottom-up with lexical ambiguity Lexicon det  {an, the} n  {elephant, man, shot} v  shot the man shot an elephant NP det n v NP VP S Let’s get rid of all the unused arcs

15/45 det n S  NP VP NP  det n VP  v VP  v NP 4. Bottom-up with lexical ambiguity Lexicon det  {an, the} n  {elephant, man, shot} v  shot the man shot an elephant NP det n v NP VP S And let’s clear away all the arcs…

16/45 det n S  NP VP NP  det n VP  v VP  v NP 4. Bottom-up with lexical ambiguity Lexicon det  {an, the} n  {elephant, man, shot} v  shot the man shot an elephant NP det n v NP VP S And let’s clear away all the arcs…

17/45 Breadth-first vs depth-first (2) In chart parsing, the distinction is more clear cut: At any point there may be a choice of things to do: which arcs to develop Breadth-first vs. depth-first can be seen as what order they are done in Queue (FIFO = breadth-first) vs. stack (LIFO= depth-first)

18/45 S  NP VP NP  det n NP  det n PP VP  v VP  v NP VP  v NP PP PP  prep NP Same again but with structural ambiguity Lexicon det  {an, the, his} n  {elephant, man, shot, pyjamas} v  shot prep  in S NP VP det n theman v NP shot det n anelephant prep NP in det n hispyjamas PP in his pyjamas the man shot an elephant We introduce a PP rule in two places

19/45 Lexicon det  {an, the, his} n  {elephant, man, shot, pyjamas} v  shot prep  in S NP VP det n theman v NP shot det n anelephant prep NP in det n hispyjamas PP in his pyjamas the man shot an elephant We introduce a PP rule in two places S  NP VP NP  det n NP  det n PP VP  v VP  v NP VP  v NP PP PP  prep NP Same again but with structural ambiguity

20/45 S  NP VP S NP VP NP  det n NP  det n PP the man shot an elephant in his pyjamas det n det  {an, the, his} n  {elephant, man, shot, pyjamas} 5. Top-down with structural ambiguity At this point, depending on our strategy (breadth-first vs. depth-first) we may consider the NP complete and look for the VP, or we may try the second NP rule. Let’s see what happens in the latter case. PP  prep NP prep  in PP prep NP The next word, shot, isn’t a prep, So this rule simply fails the man S  NP VP NP  det n NP  det n PP VP  v VP  v NP VP  v NP PP PP  prep NP

21/45 S  NP VP S NP VP NP  det n NP  det n PP the man shot an elephant in his pyjamas det n det  {an, the, his} the n  {elephant, man, shot, pyjamas} man VP  v VP  v NP VP  v NP PP v  shot anelephant 5. Top-down with structural ambiguity shot v As before, the first VP rule works, But does not account for all the input. v NP shot NP  det n NP  det n PP det n det  {an, the, his} n  {elephant, man, shot, pyjamas} Similarly, if we try the second VP rule, and the first NP rule … S  NP VP NP  det n NP  det n PP VP  v VP  v NP VP  v NP PP PP  prep NP

22/45 S  NP VP S NP VP NP  det n NP  det n PP the man shot an elephant in his pyjamas det n det  {an, the, his} the n  {elephant, man, shot, pyjamas} man VP  v VP  v NP VP  v NP PP v  shot 5. Top-down with structural ambiguity shot v NP  det n NP  det n PP So what do we try next? This? Or this? Depth-first: it’s a stack, LIFO Breadth-first: it’s a queue, FIFO S  NP VP NP  det n NP  det n PP VP  v VP  v NP VP  v NP PP PP  prep NP an elephant v NP shot det n

23/45 S  NP VP S NP VP NP  det n NP  det n PP the man shot an elephant in his pyjamas det n det  {an, the, his} the n  {elephant, man, shot, pyjamas} man VP  v VP  v NP VP  v NP PP v  shot 5. Top-down with structural ambiguity (depth-first) NP  det n NP  det n PP anelephant v NP det  {an, the, his} n  {elephant, man, shot, pyjamas} shot det n PP PP  prep NP prep  in prep NP in his pyjamas S  NP VP NP  det n NP  det n PP VP  v VP  v NP VP  v NP PP PP  prep NP

24/45 S  NP VP S NP VP NP  det n NP  det n PP the man shot an elephant in his pyjamas det n det  {an, the, his} the n  {elephant, man, shot, pyjamas} man VP  v VP  v NP VP  v NP PP v  shot 5. Top-down with structural ambiguity (breadth-first) NP  det n NP  det n PP v NP PP prep NP in his pyjamas shot det n an det  {an, the, his} elephant n  {elephant, man, shot, pyjamas} PP  prep NP prep  in S  NP VP NP  det n NP  det n PP VP  v VP  v NP VP  v NP PP PP  prep NP

25/45 Recognizing ambiguity Notice how the choice of strategy determines which result we get (first). In both strategies, there are often rules left untried, on the list (whether queue or stack). If we want to know if our input is ambiguous, at some time we do have to follow these through. As you will see later, trying out alternative paths can be quite intensive

26/45 6. Bottom-up with structural ambiguity S  NP VP NP  det n the man shot an elephant in his pyjamas VP  v NP VP VP  v NP S  NP VP NP  det n NP  det n PP VP  v VP  v NP VP  v NP PP PP  prep NP detnv nprepdetn NP NP  det n PP PP  prep NP PP NP VP VP  v NP PP VP S S S S

27/45 6. Bottom-up with structural ambiguity the man shot an elephant in his pyjamas NP S  NP VP NP  det n NP  det n PP VP  v VP  v NP VP  v NP PP PP  prep NP detnv nprepdetn NP PP NP VP S S S

28/45 Recursive rules “Recursive” rules call themselves We already have some recursive rule pairs: NP  det n PP PP  prep NP Rules can be immediately recursive AdjG  adj AdjG (the) big fat ugly (man)

29/45 Recursive rules Left recursive AdjG  AdjG adj AdjG  adj Right recursive AdjG  adj AdjG AdjG  adj AdjG adj AdjG adj big fat rich old AdjG adj AdjG adj big fat rich old

30/45 NP det n the NP  det n NP  det AdjG n AdjG  AdjG adj AdjG  adj 7. Top-down with left recursion NP  det n NP  det AdjG n the big fat rich old man the AdjG  AdjG adj AdjG  adj NP det AdjG n AdjG adj You can’t have left- recursive rules with a top-down parser, even if the non-recursive rule is first

31/45 NP  det n NP  det AdjG n AdjG  adj AdjG AdjG  adj 7. Top-down with right recursion NP  det n NP  det AdjG n the big fat rich old man the AdjG  adj AdjG AdjG  adj NP det AdjG n adj AdjG big adj AdjG fat adj AdjG rich adj AdjG old adj AdjG adjold man old

32/45 NP  det n NP  det AdjG n AdjG  AdvG adj AdjG AdjG  adj AdvG  AdvG adv AdvG  adv 8. Bottom-up with left and right recursion AdjG rule is right recursive, AdvG rule is left recursive the very very fat ugly man detadv adj n AdjG  adj AdjG AdvG  adv AdvG AdjG  AdvG adj AdjG AdjG AdvG  AdvG adv AdvG AdjG  AdvG adj AdjG AdjG NP  det AdjG n NP Quite a few useless paths, but overall no difficulty

33/45 NP  det n NP  det AdjG n AdjG  AdvG adj AdjG AdjG  adj AdvG  AdvG adv AdvG  adv 8. Bottom-up with left and right recursion AdjG rule is right recursive, AdvG rule is left recursive the very very fat ugly man detadv adj n AdjG  adj AdjG AdvG  adv AdvG AdjG  AdvG adj AdjG AdvG  AdvG adv AdvG AdjG  AdvG adj AdjG AdjG NP  det AdjG n NP

34/45 Empty rules For example NP  det AdjG n AdjG  adj AdjG AdjG  ε Equivalent to NP  det AdjG n NP  det n AdjG  adj AdjG  adj AdjG Or NP  det (AdjG) n AdjG  adj (AdjG)

35/45 NP  det AdjG n AdjG  adj AdjG AdjG  ε 7. Top-down with empty rules NP  det AdjG n the man the AdjG  adj AdjG AdjgG  ε NP det AdjG n adj AdjG man NP  det AdjG n the big fat man the AdjG  adj AdjG AdjgG  ε NP det AdjG n adj AdjG big adj AdjG fat man

36/45 8. Bottom-up with empty rules the fat man detadjn AdjG  ε AdjG  adj AdjG NP  det AdjG n NP Lots of useless paths, especially in a long sentence, but otherwise no difficulty NP  det AdjG n AdjG  adj AdjG AdjG  ε AdjG

37/45 Some additions to formalism Recursive rules build unattractive tree structures: you’d rather have flat trees with unrestricted numbers of daughters Kleene star AdjG  adj* NP AdjG det adj adj adj adj n the big fat old ugly man

38/45 Some additions to formalism As grammars grow, the rule combinations multiply and it gets clumsy NP  det n NP  det AdjG n NP  det n PP NP  det AdjG n PP NP  n NP  AdjG n NP  n PP NP  AdjG n PP NP  (det) n (AdjG) n (PP)

39/45 Processing implications Parsing with Kleene star –Neatly combines empty rules and recursive rules

40/45 NP  det adj* n 9. Top-down with Kleene star NP  det adj* n the man the adj* = adj adj* adj* = ε man NP  det adj* n the big fat man the adj* = adj adj* adj* = ε NP det adj* n bigfat man NP det adj* n adj adj*

41/ Bottom-up with Kleene star the fat man detadjn NP  det adj* n NP NP  det adj* n detadjn NP the fat ugly man adj

42/45 Processing implications Parsing with bracketed symbols –Parser has to expand rules either in a single pass beforehand or (better) on the fly (as it comes to them) –So bracketing convention is just a convenience for rule-writers NP  (det) n (AdjG) n (PP) NP  det n (AdjG) n (PP) NP  n (AdjG) n (PP)

43/45 Top down vs. bottom-up Bottom-up builds many useless trees Top-down can propose false trails, sometimes quite long, which are only abandoned when they reach the word level –Especially a problem if breadth-first Bottom-up very inefficient with empty rules Top-down CANNOT handle left-recursion Top-down cannot do partial parsing –Especially useful for speech Wouldn’t it be nice to combine them to get the advantages of both?

44/45 Left-corner parsing The “left corner” of a rule is the first symbol after the rewrite arrow –e.g. in S  NP VP, the left corner is NP. Left corner parsing starts bottom-up, taking the first item off the input and finding a rule for which it is the left corner. This provides a top-down prediction, but we continue working bottom-up until the prediction is fulfilled. When a rule is completed, apply the left-corner principle: is that completed constituent a left- corner?

45/45 S  NP VP NP  det n VP  v VP  v NP 9. Left-corner with simple grammar NP  det n the man shot an elephant VP  v det the man NP n S  NP VP S VP shot v but text not all accounted for, so try VP  v NP NP det an NP  det n n elephant