CS 388: Natural Language Processing: Syntactic Parsing

Slides:



Advertisements
Similar presentations
Basic Parsing with Context-Free Grammars CS 4705 Julia Hirschberg 1 Some slides adapted from Kathy McKeown and Dan Jurafsky.
Advertisements

Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Natural Language Processing - Parsing 1 - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment / Binding Bottom vs. Top Down Parsing.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Parsing with Context Free Grammars Reading: Chap 13, Jurafsky & Martin
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.
Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
1 CPE 480 Natural Language Processing Lecture 5: Parser Asst. Prof. Nuttanart Facundes, Ph.D.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
11 Syntactic Parsing. Produce the correct syntactic parse tree for a sentence.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
Rules, Movement, Ambiguity
CSA2050 Introduction to Computational Linguistics Parsing I.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
Natural Language - General
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Syntactic Parsing: Part I Niranjan Balasubramanian Stony Brook University March 8 th and Mar 22 nd, 2016 Many slides adapted from: Ray Mooney, Michael.
Speech and Language Processing SLP Chapter 13 Parsing.
Parsing with Context Free Grammars. Slide 1 Outline Why should you care? Parsing Top-Down Parsing Bottom-Up Parsing Bottom-Up Space (an example) Top -
1 Statistical methods in NLP Diana Trandabat
Natural Language Processing Vasile Rus
CSC 594 Topics in AI – Natural Language Processing
Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition
Statistical NLP Winter 2009
Modeling Grammaticality
CS60057 Speech &Natural Language Processing
Basic Parsing with Context Free Grammars Chapter 13
CKY Parser 0Book 1 the 2 flight 3 through 4 Houston5 6/19/2018
Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition
Probabilistic CKY Parser
Chapter Eight Syntax.
CPSC 503 Computational Linguistics
CS 388: Natural Language Processing: Statistical Parsing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
CPSC 503 Computational Linguistics
CSCI 5832 Natural Language Processing
CKY Parser 0Book 1 the 2 flight 3 through 4 Houston5 11/16/2018
Probabilistic and Lexicalized Parsing
Chapter Eight Syntax.
Natural Language - General
Parsing and More Parsing
CSCI 5832 Natural Language Processing
Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars
CPSC 503 Computational Linguistics
CSA2050 Introduction to Computational Linguistics
David Kauchak CS159 – Spring 2019
Modeling Grammaticality
Parsing I: CFGs & the Earley Parser
David Kauchak CS159 – Spring 2019
NLP.
Presentation transcript:

CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin 1 1

Syntactic Parsing Produce the correct syntactic parse tree for a sentence.

Context Free Grammars (CFG) N a set of non-terminal symbols (or variables)  a set of terminal symbols (disjoint from N) R a set of productions or rules of the form A→, where A is a non-terminal and  is a string of symbols from ( N)* S, a designated non-terminal called the start symbol

Simple CFG for ATIS English Grammar Lexicon S → NP VP S → Aux NP VP S → VP NP → Pronoun NP → Proper-Noun NP → Det Nominal Nominal → Noun Nominal → Nominal Noun Nominal → Nominal PP VP → Verb VP → Verb NP VP → VP PP PP → Prep NP Det → the | a | that | this Noun → book | flight | meal | money Verb → book | include | prefer Pronoun → I | he | she | me Proper-Noun → Houston | NWA Aux → does Prep → from | to | on | near | through

Sentence Generation Sentences are generated by recursively rewriting the start symbol using the productions until only terminals symbols remain. S Derivation or Parse Tree VP Verb NP book Det Nominal the Nominal PP Noun Prep NP flight through Proper-Noun Houston

Parsing Given a string of terminals and a CFG, determine if the string can be generated by the CFG. Also return a parse tree for the string Also return all possible parse trees for the string Must search space of derivations for one that derives the given string. Top-Down Parsing: Start searching space of derivations for the start symbol. Bottom-up Parsing: Start search space of reverse deivations from the terminal symbols in the string.

Parsing Example S VP Verb NP book that flight book Det Nominal that Noun flight

Top Down Parsing S NP VP Pronoun

Top Down Parsing S NP VP Pronoun book X

Top Down Parsing S NP VP ProperNoun

Top Down Parsing S NP VP ProperNoun book X

Top Down Parsing S NP VP Det Nominal

Top Down Parsing S NP VP Det Nominal book X

Top Down Parsing S Aux NP VP

Top Down Parsing S Aux NP VP book X

Top Down Parsing S VP

Top Down Parsing S VP Verb

Top Down Parsing S VP Verb book

Top Down Parsing S VP Verb X book that

Top Down Parsing S VP Verb NP

Top Down Parsing S VP Verb NP book

Top Down Parsing S VP Verb NP book Pronoun

Top Down Parsing S VP Verb NP book Pronoun X that

Top Down Parsing S VP Verb NP book ProperNoun

Top Down Parsing S VP Verb NP book ProperNoun X that

Top Down Parsing S VP Verb NP book Det Nominal

Top Down Parsing S VP Verb NP book Det Nominal that

Top Down Parsing S VP Verb NP book Det Nominal that Noun

Top Down Parsing S VP Verb NP book Det Nominal that Noun flight

Bottom Up Parsing book that flight

Bottom Up Parsing Noun book that flight

Bottom Up Parsing Nominal Noun book that flight

Bottom Up Parsing Nominal Nominal Noun Noun book that flight

Bottom Up Parsing Nominal Nominal Noun X Noun book that flight

Bottom Up Parsing Nominal Nominal PP Noun book that flight 35 35

Bottom Up Parsing Nominal Nominal PP Noun Det book that flight 36 36

Bottom Up Parsing Nominal Nominal PP NP Noun Det Nominal book that flight 37 37

Bottom Up Parsing Nominal Nominal PP NP Noun Det Nominal book that flight 38 38

Bottom Up Parsing Nominal Nominal PP NP Noun Det Nominal book that flight 39 39

Bottom Up Parsing Nominal S Nominal PP NP VP Noun Det Nominal book that Noun flight 40 40

Bottom Up Parsing X Nominal S Nominal PP NP VP Noun Det Nominal book that Noun flight 41 41

Bottom Up Parsing X Nominal Nominal PP NP Noun Det Nominal book that flight 42 42

Bottom Up Parsing NP Verb Det Nominal book that Noun flight

Bottom Up Parsing VP NP Verb Det Nominal book that Noun flight

Bottom Up Parsing S VP NP Verb Det Nominal book that Noun flight

Bottom Up Parsing X S VP NP Verb Det Nominal book that Noun flight 46

Bottom Up Parsing VP VP PP NP Verb Det Nominal book that Noun flight

Bottom Up Parsing X VP VP PP NP Verb Det Nominal book that Noun flight 48 48

Bottom Up Parsing VP NP NP Verb Det Nominal book that Noun flight

Bottom Up Parsing VP NP Verb Det Nominal book that Noun flight

Bottom Up Parsing S VP NP Verb Det Nominal book that Noun flight

Top Down vs. Bottom Up Top down never explores options that will not lead to a full parse, but can explore many options that never connect to the actual sentence. Bottom up never explores options that do not connect to the actual sentence but can explore options that can never lead to a full parse. Relative amounts of wasted search depend on how much the grammar branches in each direction.

Dynamic Programming Parsing To avoid extensive repeated work, must cache intermediate results, i.e. completed phrases. Caching (memoizing) critical to obtaining a polynomial time parsing (recognition) algorithm for CFGs. Dynamic programming algorithms based on both top-down and bottom-up search can achieve O(n3) recognition time where n is the length of the input string.

Dynamic Programming Parsing Methods CKY (Cocke-Kasami-Younger) algorithm based on bottom-up parsing and requires first normalizing the grammar. Earley parser is based on top-down parsing and does not require normalizing grammar but is more complex. More generally, chart parsers retain completed phrases in a chart and can combine top-down and bottom-up search.

CKY First grammar must be converted to Chomsky normal form (CNF) in which productions must have either exactly 2 non-terminal symbols on the RHS or 1 terminal symbol (lexicon rules). Parse bottom-up storing phrases formed from all substrings in a triangular table (chart).

ATIS English Grammar Conversion Original Grammar Chomsky Normal Form S → NP VP S → Aux NP VP S → VP NP → Pronoun NP → Proper-Noun NP → Det Nominal Nominal → Noun Nominal → Nominal Noun Nominal → Nominal PP VP → Verb VP → Verb NP VP → VP PP PP → Prep NP S → NP VP S → X1 VP X1 → Aux NP S → book | include | prefer S → Verb NP S → VP PP NP → I | he | she | me NP → Houston | NWA NP → Det Nominal Nominal → book | flight | meal | money Nominal → Nominal Noun Nominal → Nominal PP VP → book | include | prefer VP → Verb NP VP → VP PP PP → Prep NP

CKY Parser Cell[i,j] contains all constituents (non-terminals) Book the flight through Houston j= 1 2 3 4 5 i= 1 2 3 4 Cell[i,j] contains all constituents (non-terminals) covering words i +1 through j

CKY Parser Book the flight through Houston S, VP, Verb, Nominal, Noun None NP Det Nominal, Noun

CKY Parser Book the flight through Houston S, VP, Verb, Nominal, Noun None NP Det Nominal, Noun

CKY Parser Book the flight through Houston S S, VP, Verb, Nominal, Noun VP None NP Det Nominal, Noun

CKY Parser Book the flight through Houston S S, VP, Verb, Nominal, Noun VP None NP Det Nominal, Noun

CKY Parser Book the flight through Houston S S, VP, Verb, Nominal, Noun VP None None NP None Det Nominal, Noun None Prep

CKY Parser Book the flight through Houston S S, VP, Verb, Nominal, Noun VP None None NP None Det Nominal, Noun None Prep PP NP ProperNoun

CKY Parser Book the flight through Houston S S, VP, Verb, Nominal, Noun VP None None NP None Det Nominal, Noun Nominal None Prep PP NP ProperNoun

CKY Parser Book the flight through Houston S S, VP, Verb, Nominal, Noun VP None None NP NP None Det Nominal, Noun Nominal None Prep PP NP ProperNoun

CKY Parser Book the flight through Houston S S, VP, Verb, Nominal, Noun VP None None VP NP NP None Det Nominal, Noun Nominal None Prep PP NP ProperNoun

CKY Parser Book the flight through Houston S S, VP, Verb, Nominal, Noun VP None S None VP NP NP None Det Nominal, Noun Nominal None Prep PP NP ProperNoun

CKY Parser Book the flight through Houston S S, VP, Verb, Nominal, Noun VP VP S None None VP NP NP None Det Nominal, Noun Nominal None Prep PP NP ProperNoun

CKY Parser Book the flight through Houston S S, VP, Verb, Nominal, Noun S VP VP None S None VP NP NP None Det Nominal, Noun Nominal None Prep PP NP ProperNoun

CKY Parser Book the flight through Houston Parse Tree #1 S S, VP, Verb, Nominal, Noun S VP VP None S None VP NP NP None Det Nominal, Noun Nominal None Prep PP NP ProperNoun

CKY Parser Book the flight through Houston Parse Tree #2 S S, VP, Verb, Nominal, Noun S VP VP None S None VP NP NP None Det Nominal, Noun Nominal None Prep PP NP ProperNoun

Complexity of CKY (recognition) There are (n(n+1)/2) = O(n2) cells Filling each cell requires looking at every possible split point between the two non-terminals needed to introduce a new phrase. There are O(n) possible split points. Total time complexity is O(n3)

Complexity of CKY (all parses) Previous analysis assumes the number of phrase labels in each cell is fixed by the size of the grammar. If compute all derivations for each non-terminal, the number of cell entries can expand combinatorially. Since the number of parses can be exponential, so is the complexity of finding all parse trees.

Effect of CNF on Parse Trees Parse trees are for CNF grammar not the original grammar. A post-process can repair the parse tree to return a parse tree for the original grammar.

Syntactic Ambiguity Just produces all possible parse trees. Does not address the important issue of ambiguity resolution.

Issues with CFGs Addressing some grammatical constraints requires complex CFGs that do no compactly encode the given regularities. Some aspects of natural language syntax may not be captured at all by CFGs and require context-sensitivity (productions with more than one symbol on the LHS).

Agreement Subjects must agree with their verbs on person and number. I am cold. You are cold. He is cold. * I are cold * You is cold. *He am cold. Requires separate productions for each combination. S → NP1stPersonSing VP1stPersonSing S → NP2ndPersonSing VP2ndPersonSing NP1stPersonSing → … VP1stPersonSing → … NP2ndPersonSing → … VP2ndPersonSing → …

Other Agreement Issues Pronouns have case (e.g. nominative, accusative) that must agree with their syntactic position. I gave him the book. * I gave he the book. He gave me the book. * Him gave me the book. Many languages have gender agreement. Los Angeles * Las Angeles Las Vegas * Los Vegas

Subcategorization Specific verbs take some types of arguments but not others. Transitive verb: “found” requires a direct object John found the ring. * John found. Intransitive verb: “disappeared” cannot take one John disappeared. * John disappeared the ring. “gave” takes both a direct and indirect object John gave Mary the ring. * John gave Mary. * John gave the ring. “want” takes an NP, or non-finite VP or S John wants a car. John wants to buy a car. John wants Mary to take the ring. * John wants. Subcategorization frames specify the range of argument types that a given verb can take.

Conclusions Syntax parse trees specify the syntactic structure of a sentence that helps determine its meaning. John ate the spaghetti with meatballs with chopsticks. How did John eat the spaghetti? What did John eat? CFGs can be used to define the grammar of a natural language. Dynamic programming algorithms allow computing a single parse tree in cubic time or all parse trees in exponential time.

Partial/Shallow Parsing Full parsing (using Context-free Grammar) is too slow, also brittle. Give up full parsing and do simplified task – Finding groups, called “chunks” (instead of phrases/constituents).

Phrase Chunking Find all non-recursive noun phrases (NPs) and verb phrases (VPs) in a sentence. [NP I] [VP ate] [NP the spaghetti] [PP with] [NP meatballs]. [NP He ] [VP reckons ] [NP the current account deficit ] [VP will narrow ] [PP to ] [NP only # 1.8 billion ] [PP in ] [NP September ]

Phrase Chunking as Sequence Labeling Tag individual words with one of 3 tags B (Begin) word starts new target phrase I (Inside) word is part of target phrase but not the first word O (Other) word is not part of target phrase Sample for NP chunking He reckons the current account deficit will narrow to only # 1.8 billion in September. Begin Inside Other

He B_NP reckons O the B_NP current I_NP account I_NP deficit I_NP will O narrow O to O only B_NP # 1.8 I_NP billion I_NP in O September B_NP

Evaluating Chunking Per token accuracy does not evaluate finding correct full chunks. Instead use: Take harmonic mean to produce a single evaluation metric called F measure.

Current Chunking Results Best system for NP chunking: F1=96% Typical results for finding range of chunk types (CONLL 2000 shared task: NP, VP, PP, ADV, SBAR, ADJP) is F1=92−94%