csa3180: Setence Parsing Algorithms 1

Slides:



Advertisements
Similar presentations
CSA2050: DCG I1 CSA2050 Introduction to Computational Linguistics Lecture 8 Definite Clause Grammars.
Advertisements

CSA4050: Advanced Topics in NLP Semantics IV Partial Execution Proper Noun Adjective.
Basic Parsing with Context-Free Grammars CS 4705 Julia Hirschberg 1 Some slides adapted from Kathy McKeown and Dan Jurafsky.
Natural Language Processing - Parsing 1 - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment / Binding Bottom vs. Top Down Parsing.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
PARSING WITH CONTEXT-FREE GRAMMARS
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.
1 Earley Algorithm Chapter 13.4 October 2009 Lecture #9.
 Christel Kemke /08 COMP 4060 Natural Language Processing PARSING.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
CS 4705 Basic Parsing with Context-Free Grammars.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
October 2008csa3180: Setence Parsing Algorithms 1 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Natural Language Processing Lecture 11 Efficient Parsing Reading: James Allen NLU (Chapter 6)
LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
October 2008CSA3180: Sentence Parsing1 CSA3180: NLP Algorithms Sentence Parsing Algorithms 2 Problems with DFTD Parser.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
CSA2050 Introduction to Computational Linguistics Parsing I.
Natural Language - General
November 2004csa3050: Sentence Parsing II1 CSA350: NLP Algorithms Sentence Parsing 2 Top Down Bottom-Up Left Corner BUP Implementation in Prolog.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
November 2009HLT: Sentence Parsing1 HLT Sentence Parsing Algorithms 2 Problems with Depth First Top Down Parsing.
November 2004csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
CSC 594 Topics in AI – Natural Language Processing
Programming Languages Translator
Lexical and Syntax Analysis
Unit-3 Bottom-Up-Parsing.
Parsing IV Bottom-up Parsing
Basic Parsing with Context Free Grammars Chapter 13
Chapter Eight Syntax.
CS : Speech, NLP and the Web/Topics in AI
4 (c) parsing.
Parsing Techniques.
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
CS 388: Natural Language Processing: Syntactic Parsing
4d Bottom Up Parsing.
Chapter Eight Syntax.
Natural Language - General
Parsing and More Parsing
Lecture 7: Introduction to Parsing (Syntax Analysis)
Bottom Up Parsing.
Parsing IV Bottom-up Parsing
Bottom-Up Parsing “Shift-Reduce” Parsing
4d Bottom Up Parsing.
4d Bottom Up Parsing.
4d Bottom Up Parsing.
CSA2050 Introduction to Computational Linguistics
4d Bottom Up Parsing.
Parsing I: CFGs & the Earley Parser
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
David Kauchak CS159 – Spring 2019
4d Bottom Up Parsing.
NLP.
Presentation transcript:

csa3180: Setence Parsing Algorithms 1 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies October 2008 csa3180: Setence Parsing Algorithms 1

csa3180: Setence Parsing Algorithms 1 References This lecture is largely based on material found in Jurafsky & Martin chapter 13 October 2008 csa3180: Setence Parsing Algorithms 1

csa3180: Setence Parsing Algorithms 1 Handling Sentences Sentence boundary detection. Finite state techniques are fine for certain kinds of analysis: named entity recognition NP chunking But FS techniques are of limited use when trying to compute grammatical relationships between parts of sentences. We need these to get at meanings. October 2008 csa3180: Setence Parsing Algorithms 1

Grammatical Relationships: e.g. subject Wikipaedia definition: The subject has the grammatical function in a sentence of relating its constituent (a noun phrase) by means of the verb to any other elements present in the sentence, i.e. objects, complements and adverbials. October 2008 csa3180: Setence Parsing Algorithms 1

Grammatical Relationships: e.g. subject The dictionary helps me find words. Ice cream appeared on the table. The man that is sitting over there told me that he just bought a ticket to Tahiti. Nothing else is good enough. That nothing else is good enough shouldn't come as a surprise. To eat six different kinds of vegetables a day is healthy. October 2008 csa3180: Setence Parsing Algorithms 1

Why not use FS techniques for describing NL sentences Descriptive Adequacy Some NL phenomena cannot be described within FS framework. example: central embedding Notational Efficiency The notation does not facilitate 'factoring out' the similarities. To describe sentences of the form subject-verb-object using a FSA, we must describe possible subjects and objects, even though almost all phrases that can appear as one can equally appear as the other. October 2008 csa3180: Setence Parsing Algorithms 1

csa3180: Setence Parsing Algorithms 1 Central Embedding The following sentences The cat spat 1 1 The cat the boy saw spat 1 2 2 1 The cat the boy the girl liked saw spat 1 2 3 3 2 1 Require at least a grammar of the form S → An Bn October 2008 csa3180: Setence Parsing Algorithms 1

DCG-style Grammar/Lexicon s --> np, vp. s --> aux, np, vp. s --> vp. np --> det nom. nom --> noun. nom --> noun, nom. nom --> nom, pp pp --> prep, np. np --> pn. vp --> v. vp --> v np % LEXICON d --> [that];[this];[a]. n --> [book];[flight]; [meal];[money]. v --> [book];[include]; [prefer]. aux --> [does]. prep --> [from];[to];[on]. pn --> [‘Houston’];[‘TWA’]. October 2008 csa3180: Setence Parsing Algorithms 1

Definite Clause Grammars Prolog Based LHS --> RHS1, RHS2, ..., {code}. s(s(NP,VP)) --> np(NP), vp(VP), {mk-subj(NP)} Rules are translated into executable Prolog program. No clear distinction between rules for grammar and lexicon. October 2008 csa3180: Setence Parsing Algorithms 1

csa3180: Setence Parsing Algorithms 1 Parsing Problem Given grammar G and sentence A discover all valid parse trees for G that exactly cover A S VP NP V Nom Det book N that flight October 2008 csa3180: Setence Parsing Algorithms 1

The elephant is in the trousers VP NP NP NP PP I shot an elephant in my trousers October 2008 csa3180: Setence Parsing Algorithms 1

I was wearing the trousers VP NP NP PP I shot an elephant in my trousers October 2008 csa3180: Setence Parsing Algorithms 1

csa3180: Setence Parsing Algorithms 1 Parsing as Search Search within a space defined by Start State Goal State State to state transformations Two distinct parsing strategies: Top down Bottom up Different parsing strategy, different state space, different problem. N.B. Parsing strategy ≠ search strategy October 2008 csa3180: Setence Parsing Algorithms 1

csa3180: Setence Parsing Algorithms 1 Top Down Each state comprises: a tree an open node an input pointer Together these encode the current state of the parse. Top down parser tries to build from the root node S down to the leaves by replacing nodes with non-terminal labels with RHS of corresponding grammar rules. Nodes with pre-terminal (word class) labels are compared to input words. October 2008 csa3180: Setence Parsing Algorithms 1

csa3180: Setence Parsing Algorithms 1 Top Down Search Space Start node → Goal node ↓ October 2008 csa3180: Setence Parsing Algorithms 1

csa3180: Setence Parsing Algorithms 1 Bottom Up Each state is a forest of trees. Start node is a forest of nodes labelled with pre-terminal categories (word classes derived from lexicon) Transformations look for places where RHS of rules can fit. Any such place is replaced with a node labelled with LHS of rule. October 2008 csa3180: Setence Parsing Algorithms 1

csa3180: Setence Parsing Algorithms 1 Bottom Up Search Space failed BU derivation fl fl fl fl fl fl fl October 2008 csa3180: Setence Parsing Algorithms 1

Top Down vs Bottom Up Search Spaces For: space excludes trees that cannot be derived from S Against: space includes trees that are not consistent with the input Bottom up For: space excludes states containing trees that cannot lead to input text segments. Against: space includes states containing subtrees that can never lead to an S node. October 2008 csa3180: Setence Parsing Algorithms 1

Top Down Parsing - Remarks Top-down parsers do well if there is useful grammar driven control: search can be directed by the grammar. Not too many different rules for the same category Not too much distance between non terminal and terminal categories. Top-down is unsuitable for rewriting parts of speech (preterminals) with words (terminals). In practice that is always done bottom-up as lexical lookup. October 2008 csa3180: Setence Parsing Algorithms 1

Bottom Up Parsing - Remarks It is data-directed: it attempts to parse the words that are there. Does well, e.g. for lexical lookup. Does badly if there are many rules with similar RHS categories. Inefficient when there is great lexical ambiguity (grammar driven control might help here) Empty categories: termination problem unless rewriting of empty constituents is somehow restricted (but then it’s generally incomplete) October 2008 csa3180: Setence Parsing Algorithms 1

Basic Parsing Algorithms Top Down Bottom Up see Jurafsky & Martin Ch. 10 October 2008 csa3180: Setence Parsing Algorithms 1

Top Down Algorithm

Recoding the Grammar/Lexicon rule(s,[np,vp]). rule(np,[d,n]). rule(vp,[v]). rule(vp,[v,np]). % Lexicon word(d,the). word(n,dog). word(n,cat). word(n,dogs). word(n,cats). word(v,chase). word(v,chases). October 2008 csa3180: Setence Parsing Algorithms 1

Top Down Depth First Recognition in Prolog parse(C,[Word|S],S) :- word(C,Word). % word(noun,cat). parse(C,S1,S) :- rule(C,Cs), % rule(s,[np,vp]) parse_list(Cs,S1,S). parse_list([],S,S). parse_list([C|Cs],S1,S) :- parse(C,S1,S2), parse_list(Cs,S2,S). October 2008 csa3180: Setence Parsing Algorithms 1

Derivation top down, left-to-right, depth first October 2006

Bottom Up Shift/Reduce Algorithm Two data structures input string stack Repeat until input is exhausted Shift word to stack Reduce stack using grammar and lexicon until no further reductions are possible Unlike top down, algorithm does not require category to be specified in advance. It simply finds all possible trees. October 2008 csa3180: Setence Parsing Algorithms 1

Shift/Reduce Operation →| Step Action Stack Input 0 (start) the dog barked 1 shift the dog barked 2 reduce d dog barked 3 shift dog d barked 4 reduce n d barked 5 reduce np barked 6 shift barked np 7 reduce v np 8 reduce vp np 9 reduce s October 2008 csa3180: Setence Parsing Algorithms 1

Shift/Reduce Implementation parse(S,Res) :- sr(S,[],Res). sr(S,Stk,Res) :- shift(Stk,S,NewStk,S1), reduce(NewStk,RedStk), sr(S1,RedStk,Res). sr([],Res,Res). shift(X,[H|Y],[H|X],Y). reduce(Stk,RedStk) :- brule(Stk,Stk2), reduce(Stk2,RedStk). reduce(Stk,Stk). %grammar brule([vp,np|X],[s|X]). brule([n,d|X],[np|X]). brule([np,v|X],[vp|X]). brule([v|X],[vp|X]). %interface to lexicon brule([Word|X],[C|X]) :- word(C,Word). ↑ ↑ ↑ ↑ stack sent nstack nsent October 2008 csa3180: Setence Parsing Algorithms 1

Shift/Reduce Operation Words are shifted to the beginning of the stack, which ends up in reverse order. The reduce step is simplified if we also store the rules backward, so that the rule s → np vp is stored as the fact brule([vp,np|X],[s|X]). The term [a,b|X] matches any list whose first and second elements are a and b respectively. The first argument directly matches the stack to which this rule applies The second argument is what the stack becomes after reduction. October 2008 csa3180: Setence Parsing Algorithms 1

csa3180: Setence Parsing Algorithms 1 Shift Reduce Parser Standard implementations do not perform backtracking (e.g. NLTK) Only one result is returned even when sentence is ambiguous. May not fail even when sentence is grammatical Shift/Reduce conflict Reduce/Reduce conflict October 2008 csa3180: Setence Parsing Algorithms 1

csa3180: Setence Parsing Algorithms 1 Handling Conflicts Shift-reduce parsers may employ policies for resolving such conflicts, e.g. For Shift/Reduce Conflicts Prefer shift Prefer reduce For Reduce/Reduce Conflicts Choose reduction which removes most elements from the stack October 2008 csa3180: Setence Parsing Algorithms 1