Download presentation
Presentation is loading. Please wait.
1
Ambiguity Parsing algorithms
The Chinese University of Hong Kong Fall 2011 CSCI 3130: Automata theory and formal languages Ambiguity Parsing algorithms Andrej Bogdanov
2
Ambiguity E E + E | E * E | (E) | N N 1N | 2N | 1 | 2 E * + N 1 2 E + * N 1 2 1+2*2 = 5 = 6 A CFG is ambiguous if some string has more than one parse tree
3
Example Is S SS | x ambiguous? Yes, because x S x S xxx
4
Disambiguation S SS | x S Sx | x x
Sometimes we can rewrite the grammar to remove ambiguity
5
Disambiguation F T Divide expression into terms and factors F
E E + E | E * E | (E) | N N 1N | 2N | 1 | 2 same precedence! F T Divide expression into terms and factors F 2 * (1 + 2 * 2)
6
Disambiguation An expression is a sum of one or more terms
E E + E | E * E | (E) | N N 1N | 2N | 1 | 2 An expression is a sum of one or more terms E T | E + T Each term is a product of one or more factors T F | T * F Each factor is a parenthesized expression or a number F (E) | 1 | 2
7
Parsing example E T E T F T F E F T E T E F T T F F F
+ T F T F * E ( ) F T E + T E + F T * T F F F 2 * ( * 2) + 1
8
Disambiguation Disambiguation is not always possible because
There exist inherently ambiguous languages There is no general procedure for disambiguation In programming languages, ambiguity comes from precedence rules, and we can do like in example In English, ambiguity is sometimes a problem: He ate the cookies on the floor
9
Ambiguity in English He ate the cookies on the floor
10
Parsing S → 0S1 | 1S0S | T input: 0011 T → S | e
How would we program the computer to build a parse tree for us?
11
Parsing ✔ S → 0S1 | 1S0S | T input: 0011 T → S | e
⇒ ... S 0S1 1S0S T ⇒ 00S11 01S0S1 0T1 ⇒ ⇒ 000S111 00T11 ... ... ⇒ ⇒ 00S11 0011 ✔ S 10S10S ... ⇒ First idea: Try all derivations
12
Problems Trying all derivations may take a very long time
If input is not in the language, parsing will never stop
13
When to stop Idea 2: Stop when S → 0S1 | 1S0S | T T → S | e Problems:
|derived string| > |input| Problems: S 0S1 0T1 01 1 3 2 S T S T … Derived strings may shrink because of “e-productions” Derivation may loop because of “unit productions” Task: remove e and unit productions
14
Removal of -productions
A variable N is nullable if it derives the empty string N * Identify all nullable variables N Remove nullable variables carefully If start variable S is nullable: Add a new start variable S’ Add special productions S’ → S |
15
Example grammar nullable variables S ACD A a B C ED |
D BC | b E b B C D If X , mark X as nullable If X YZ…W, all marked nullable, mark X as nullable also. Repeat the following: Identify all nullable variables
16
Eliminating e-productions
D C S AD D B D e S AC S A C E S ACD A a B C ED | D BC | b E b nullable: B, C, D If you see X → N, add X → If you see N → , remove it. For every nullable N: Remove nullable variables carefully
17
Eliminating unit productions
A unit production is a production of the form A → B grammar: unit productions graph: S → 0S1 | 1S0S | T T → S | R | R → 0SR S T R
18
Removal of unit productions
If there is a cycle of unit productions delete it and replace everything with A A → B → ... → C → A S T R S → 0S1 | 1S0S | T T → S | R | R → 0SR S → 0S1 | 1S0S S → R | R → 0SR replace T by S
19
Removal of unit productions
Replace every chain by A → , B → ,... , C → A → B → ... → C → S R S → 0S1 | 1S0S | R | R → 0SR S → 0S1 | 1S0S | 0SR | R → 0SR S → R → 0SR is replaced by S → 0SR, R → 0SR
20
Recap If input is not in the language, parsing will never stop
Problem: Solution: Eliminate productions Eliminate unit productions important to do in this order Try all possible derivations but stop parsing when |derived string| > |input|
21
Example input: 0011 conclusion: 0011 ∉ L S → 0S1 | 0S0S | T T → S | 0
0S0S 0S1 ⇒ ✘ ⇒ 001 00S11 00S0S1 ✘ too long too long ⇒ 000S 00S10S 00S0S0S ⇒ 0000 1000S1 1000S0S ✘ too long too long too long too long
22
Problems Trying all derivations may take a very long time
If input is not in the language, parsing will never stop
23
A faster way to parse: the Cocke-Younger-Kasami algorithm
Preparations A faster way to parse: the Cocke-Younger-Kasami algorithm To use it we must prepare the CFG: Eliminate productions Eliminate unit productions Convert CFG to Chomsky Normal Form
24
Chomsky Normal Form A CFG is in Chomsky Normal Form if every production* has the form A → BC or A → a Convert to Chomsky Normal Form: Noam Chomsky A → BcDE A → BCDE C → c A → BX X → CY Y → DE break up sequences with new variables replace terminals with new variables C → c * Exception: We allow S → e for start variable only
25
Cocke-Younger-Kasami algorithm
SAC S AB | BC A BA | a B CC | b C AB | a – SAC – B B SA B SC SA B AC AC B AC x = baaba b a a b a Idea: We generate each substring of x bottom up
26
Parse tree reconstruction
b AC B SA SC – SAC S AB | BC A BA | a B CC | b C AB | a x = baaba Tracing back the derivations, we obtain the parse tree
27
Cocke-Younger-Kasami algorithm
Grammar without e and unit productions in Chomsky Normal Form table cells Input string x = x1…xk 1k … … For all cells in last row If there is a production A xi Put A in table cell ii For cells st in other rows If there is a production A BC where B is in cell sj and C is in cell (j+1)t Put A in cell st 12 23 11 22 kk x x … xk 1 s j t k Cell ij remembers all possible derivations of substring xi…xj
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.