Download presentation
1
PARSING WITH CONTEXT-FREE GRAMMARS
cc437
2
PARSING Parsing is the process of recognizing and assigning STRUCTURE
Parsing a string with a CFG: Finding a derivation of the string consistent with the grammar The derivation gives us a PARSE TREE
3
EXAMPLE (CFR LAST WEEK)
4
PARSING AS SEARCH Just as in the case of non-deterministic regular expressions, the main problem with parsing is the existence of CHOICE POINTS There is a need for a SEARCH STRATEGY determining the order in which alternatives are considered
5
TOP-DOWN AND BOTTOM-UP SEARCH STRATEGIES
The search has to be guided by the INPUT and the GRAMMAR TOP-DOWN search: the parse tree has to be rooted in the start symbol S EXPECTATION-DRIVEN parsing BOTTOM-UP search: the parse tree must be an analysis of the input DATA-DRIVEN parsing
6
AN EXAMPLE OF TOP-DOWN SEARCH (IN PARALLEL)
7
AN EXAMPLE OF BOTTOM-UP SEARCH
8
NON-PARALLEL SEARCH If it’s not possible to examine all alternatives in parallel, it’s necessary to make further decisions: Which node in the current search space to expand first (breadth-first or depth-first) Which of the applicable grammar rules to expand first Which leaf node in a parse tree to expand next (e.g., leftmost)
9
TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT
10
TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (II)
11
TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (III)
12
TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (IV)
13
A T-D, D-F, L-R PARSER (Compare with ND-recognize)
14
TOP-DOWN vs BOTTOM-UP TOP-DOWN: BOTTOM-UP:
Only search among grammatical answers BUT: suggests hypotheses that may not be consistent with data Problem: left-recursion BOTTOM-UP: Only forms hypotheses consistent with data BUT: may suggest hypotheses that make no sense globally
15
LEFT-RECURSION A LEFT-RECURSIVE grammar may cause a T-D, D-F, L-R parser to never return Examples of left-recursive rules: NP NP PP S S and S But also: NP Det Nom Det NP’s
16
THE PROBLEM WITH LEFT-RECURSION
17
LEFT-RECURSION: POOR SOLUTIONS
Rewrite the grammar to a weakly equivalent one Problem: may not get correct parse tree Limit the depth during search Problem: limit is arbitrary
18
LEFT-CORNER PARSING A hybrid of top-down and bottom-up parsing
Strategy: don’t consider any expansion unless the current input can serve as the LEFT-CORNER of that expansion
19
FURTHER PROBLEMS IN PARSING
Ambiguity Church and Patel (1982): the number of attachment ambiguities grows like the Catalan numbers C(2) = 2, C(3) = 5, C(4) = 14, C(5) = 132, C(6) = 469, C(7) = 1430, C(8) = 4867 Avoiding reparsing
20
COMMON STRUCTURAL AMBIGUITIES
COORDINATION ambiguity OLD (MEN AND WOMEN) vs (OLD MEN) AND WOMEN ATTACHMENT ambiguity: Gerundive VP attachment ambiguity I saw the Eiffel Tower flying to Paris PP attachment ambiguity I shot an elephant in my pajamas
21
PP ATTACHMENT AMBIGUITY
22
AMBIGUITY: SOLUTIONS Use a PROBABILISTIC GRAMMAR (not covered in this module) Use semantics
23
AVOID RECOMPUTING INVARIANTS
Consider parsing with a top-down parser the NP: A flight from Indianapolis to Houston on TWA With the grammar rules: NP Det Nominal NP NP PP NP ProperNoun
24
INVARIANTS AND TOP-DOWN PARSING
25
THE EARLEY ALGORITHM
26
DYNAMIC PROGRAMMING A standard T-D parser would reanalyze A FLIGHT 4 times, always in the same way A DYNAMIC PROGRAMMING algorithm uses a table (the CHART) to avoid repeating work The Earley algorithm also Does not suffer from the left-recursion problem Solves an exponential problem in O(n3)
27
THE CHART The Earley algorithm uses a table (the CHART) of size N+1, where N is the length of the input Table entries sit in the `gaps’ between words Each entry in the chart is a list of Completed constituents In-progress constituents Predicted constituents All three types of objects are represented in the same way as STATES
28
THE CHART: GRAPHICAL REPRESENTATION
29
STATES A state encodes two types of information: DOTTED RULES
How much of a certain rule has been encountered in the input Which positions are covered A , [X,Y] DOTTED RULES VP V NP NP Det Nominal S VP
30
EXAMPLES
31
SUCCESS The parser has succeeded if entry N+1 of the chart contains the state S , [0,N]
32
THE ALGORITHM The algorithm loops through the input without backtracking, at each step performing three operations: PREDICTOR: add predictions to the chart COMPLETER: Move the dot to the right when looked-for constituent is found SCANNER: read in the next input word
33
THE ALGORITHM: CENTRAL LOOP
34
EARLEY ALGORITHM: THE THREE OPERATORS
35
EXAMPLE, AGAIN
36
EXAMPLE: BOOK THAT FLIGHT
37
EXAMPLE: BOOK THAT FLIGHT (II)
38
EXAMPLE: BOOK THAT FLIGHT (III)
39
EXAMPLE: BOOK THAT FLIGHT (IV)
40
READINGS Jurafsky and Martin, chapter
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.