CSA2050 Introduction to Computational Linguistics

CSA2050 Introduction to Computational Linguistics
Parsing II

Problems with Recursive Descent Parsing
Left Recursion Inefficiency Repeated Work apr 2008 CSA Parsing II

Left Recursion A grammar is left recursive if it contains at least one non-terminal A for which A * A and  *  (n.b. * is the transitive closure of ) Intuitive idea: derivation of that category includes itself along its leftmost branch. NP  NP PP NP  NP and NP NP  DetP Nominal DetP  NP ' s apr 2008 CSA Parsing II

Left Recursion Left recursion can lead to an infinite loop [nltk demo
apr 2008 CSA Parsing II

Dealing with Left Recursion
Use different parsing strategy Reformulate the grammar to eliminate LR A  A |  is rewritten as A  A' A'  A' |  apr 2008 CSA Parsing II

Rewriting the Grammar NP → NP ‘and’ NP NP → D N | D N PP apr 2008
CSA Parsing II

Rewriting the Grammar NP → NP ‘and’ NP β NP → D N | D N PP α apr 2008
CSA Parsing II

Rewriting the Grammar NP → NP ‘and’ NP β NP → D N | D N PP α
New Grammar NP → α NP1 NP1 → β NP1 | ε apr 2008 CSA Parsing II

Rewriting the Grammar NP → NP ‘and’ NP β NP → D N | D N PP α
New Grammar NP → α NP1 NP1 → β NP1 | ε α → D N | D N PP β → ‘and’ NP apr 2008 CSA Parsing II

New Parse Tree NP α NP1 D N the cat ε apr 2008 CSA Parsing II

Rewriting the Grammar Different parse tree Unnatural parse tree?
apr 2008 CSA Parsing II

Inefficiency Top down strategy uses the grammar to predict the input.
Recursive descent cannot confirm a structure until it looks at the input. Consequently it wastes a lot of time building structures that may be inconsistent with the input. apr 2008 CSA Parsing II

Prediction can be inefficient
N -> apple N -> ant N -> alloy . N -> zebra N -> zoo NP D N the zoo apr 2008 CSA Parsing II

Prediction can be inefficient
1. VP -> V NP 2. VP -> V NP PP The first NP constituent is built by the first rule, which fails. The same constituents are rebuilt when the parser backtracks to the second rule VP NP V D N saw the man with the dog apr 2008 CSA Parsing II

Repeated Parsing of Subtrees
a flight 4 from Indianapolis 3 to Houston 2 on TWA 1 A flight from Indianapolis A flight from Indianapolis to Houston A flight from Indianapolis to Houston on TWA apr 2008 CSA Parsing II

Bottom Up Shift/Reduce Algorithm
Two data structures input string stack Repeat until input is exhausted Shift word to stack Reduce stack using grammar and lexicon until no further reductions are possible Unlike top down, algorithm does not require category to be specified in advance. It simply finds all possible trees. apr 2008 CSA Parsing II

Shift/Reduce Operation
→| Step Action Stack Input 0 (start) the dog barked 1 shift the dog barked 2 reduce d dog barked 3 shift dog d barked 4 reduce n d barked 5 reduce np barked 6 shift barked np 7 reduce v np 8 reduce vp np 9 reduce s >>>from nltk.draw.srparser import demo >>>srparser.demo() apr 2008 CSA Parsing II

Shift Reduce Parser Standard implementations do not perform backtracking (e.g. NLTK) Only one result is returned even when sentence is ambiguous. May not fail even when sentence is grammatical Shift/Reduce conflict Reduce/Reduce conflict apr 2008 CSA Parsing II

Handling Conflicts Shift-reduce parsers may employ policies for resolving such conflicts, e.g. For Shift/Reduce Conflicts Prefer shift Prefer reduce For Reduce/Reduce Conflicts Choose reduction which removes most elements from the stack apr 2008 CSA Parsing II

Top Down vs Bottom Up General
For: Never wastes time exploring trees that cannot be derived from S Against: Can generate trees that are not consistent with the input Bottom up For: Never wastes time building trees that cannot lead to input text segments. Against: Can generate subtrees that can never lead to an S node. apr 2008 CSA Parsing II

Top Down Parsing - Remarks
Top-down parsers do well if there is useful grammar driven control: search can be directed by the grammar. Not too many different rules for the same category Not too much distance between non terminal and terminal categories. Top-down is unsuitable for rewriting parts of speech (preterminals) with words (terminals). In practice that is always done bottom-up as lexical lookup. apr 2008 CSA Parsing II

Bottom Up Parsing - Remarks
It is data-directed: it attempts to parse the words that are there. Does well, e.g. for lexical lookup. Does badly if there are many rules with similar RHS categories. Inefficient when there is great lexical ambiguity (grammar driven control might help here) Empty categories: termination problem unless rewriting of empty constituents is somehow restricted (but then it’s generally incomplete) apr 2008 CSA Parsing II

Left Corner Parsing S → NP VP NP → D N NP → D N PP NP → PN
… more rules D → the D → a PN → John “John saw the dog” There are three NP rules If you were parsing top down, which NP rule will be used first? Is this the best? apr 2008 CSA Parsing II

Left Corner Parsing We know that parser has to expand NP in such a way that NP derives “John”. There is only one rule which does this. Basic idea behind Left Corner parser is to use input to determine which rule is most relevant. apr 2008 CSA Parsing II

Bottom Up Filtering We know the current input word must serve as the first word in the derivation of the unexpanded node the parser is currently processing. Therefore the parser should not consider grammar rule for which the current word cannot serve as the "left corner" apr 2008 CSA Parsing II

Left Corner The node marked Verb is a left corner of VP fl fl apr 2008
CSA Parsing II

Left Corner Definition
X is a direct left corner of a nonterminal A, if there is an A-production with X as the left-most symbol on the right-hand side. the left-corner relation is the reflexive transitive closure of the direct-left-corner relation. proper-left-corner relation is the transitive closure of the direct-left-corner relation. Proper left corners of all non-terminal categories can be determined in advance and placed in a table. apr 2008 CSA Parsing II

DCG-style Grammar/Lexicon
s --> np, vp. s --> aux, np, vp. s --> vp. np --> det nom. nom --> noun. nom --> noun, nom. nom --> nom, pp pp --> prep, np. np --> pn. vp --> v. vp --> v np What are the left corners of S? What are the proper left corners of S? apr 2008 CSA Parsing II

DCG-style Grammar/Lexicon
s --> np, vp. s --> aux, np, vp. s --> vp. np --> det nom. nom --> noun. nom --> noun, nom. nom --> nom, pp pp --> prep, np. np --> pn. vp --> v. vp --> v np What are the left corners of S? np, aux, vp, det, pn, noun, v What are the proper left corners of S? s, np, aux, vp, det, pn, noun, v apr 2008 CSA Parsing II

Example of Left Corner Table
Category Proper Left Corners s np nom vp np, aux, vp, det, pn, noun, v pn, det noun v apr 2008 CSA Parsing II

How to use the Left Corner Table
If attempting to parse category A, only consider rules A → Bα for which category(current input)  LeftCorners(B) s → np vp s → aux np vp s → vp apr 2008 CSA Parsing II

Left Corner Parsing Algorithm
Key Idea: accept a word, identify the constituent it marks the beginning of, and parse the rest of the constituent top down. Main Advantages: Like a bottom-up parser, can handle left recursion without looping, since it starts each constituent by accepting a word from the input string. Like a top-down parser, is always expecting a particular category for which only a few of the grammar rules are relevant. It is therefore more efficient than a plain shift-reduce algorithm. apr 2008 CSA Parsing II

Left Corner Algorithm define parse(C) //parse a constituent of type C:
{ W = readnextword() K = category(w) complete(K,C) } define complete(K,C) { if K=C, exit with success else foreach rule (CC -> K α)  Grammar { parseList(α); complete(CC,C)} define parselist(L) {if empty(L) succeed else { parse(head(L)); parselist(tail(L)) } apr 2008 CSA Parsing II

Left Corner Example Input: Vincent slept; parse s (top down) S
s → np vp np → d n np → pn vp → iv d → the n → robber pn → vincent iv → slept apr 2008 CSA Parsing II

Left Corner Example Category of next input word = pn (bottom up) pn
Vincent slept s → np vp np → d n np → pn vp → iv d → the n → robber pn → vincent iv → slept apr 2008 CSA Parsing II

Left Corner Example select rule with direct left corner = pn np → pn
parse remainder of rhs (nothing remains) np pn Vincent slept s → np vp np → d n np → pn vp → iv d → the n → robber pn → vincent iv → slept apr 2008 CSA Parsing II

Left Corner Example select rule with direct left corner = np s → np vp
parse remainder of rhs = [ vp ] np pn Vincent slept s → np vp np → d n np → pn vp → iv d → the n → robber pn → vincent iv → slept apr 2008 CSA Parsing II

Left Corner Example Category of next input word = iv (bottom up) np
pn iv Vincent slept s → np vp np → d n np → pn vp → iv d → the n → robber pn → vincent iv → slept apr 2008 CSA Parsing II

Left Corner Example Select rule with direct left corner = iv vp → iv
parse remainder of rhs. nothing left so np vp pn iv Vincent slept s → np vp np → d n np → pn vp → iv d → the n → robber pn → vincent iv → slept apr 2008 CSA Parsing II

Left Corner Example with vp complete, nothing left on rhs of s rule s
np vp pn iv Vincent slept s → np vp np → d n np → pn vp → iv d → the n → robber pn → vincent iv → slept apr 2008 CSA Parsing II

CSA2050 Introduction to Computational Linguistics

Similar presentations

Presentation on theme: "CSA2050 Introduction to Computational Linguistics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSA2050 Introduction to Computational Linguistics

Similar presentations

Presentation on theme: "CSA2050 Introduction to Computational Linguistics"— Presentation transcript:

Similar presentations

About project

Feedback