Download presentation
Presentation is loading. Please wait.
Published byAlisha Brown Modified over 8 years ago
1
Structure and Ambiguity Removing Ambiguity Chomsky Normal Form Pushdown Automata Intro (who is he foolin', thinking that there will be time to get to this?) MA/CSSE 474 Theory of Computation
2
Context free languages: We care about structure. E E +E id E * E 3 id id 5 7 Structure
3
Parse trees capture essential structure: 1 2 3 4 5 6 S SS (S)S ((S))S (())S (())(S) (())() S SS (S)S ((S))S ((S))(S) (())(S) (())() 1 2 3 5 4 6 S S S ( S ) ( S ) ( S ) Derivations and parse trees
4
Parse Trees A parse tree, derived from a grammar G = (V, , R, S), is a rooted, ordered tree in which: ● Every leaf node is labeled with an element of { }, ● The root node is labeled S, ● Every other node is labeled with an element of N, and ● If m is a non-leaf node labeled X and the (ordered) children of m are labeled x 1, x 2, …, x n, then R contains the rule X x 1 x 2, … x n.
5
S NP VP Nominal VNP Adjs N Nominal AdjN the smart cat smells chocolate Structure in English
6
Generative Capacity Because parse trees matter, it makes sense, given a grammar G, to distinguish between: ● G’s weak generative capacity, defined to be the set of strings, L(G), that G generates, and ● G’s strong generative capacity, defined to be the set of parse trees that G generates.
7
Algorithms Care How We Search Algorithms for generation and recognition must be systematic. They typically use either the leftmost derivation or the rightmost derivation. S (S)(S) (S )
8
Derivations of The Smart Cat A left-most derivation is: S NP VP the Nominal VP the Adjs N VP the Adj N VP the smart N VP the smart cat VP the smart cat V NP the smart cat smells NP the smart cat smells Nominal the smart cat smells N the smart cat smells chocolate A right-most derivation is: S NP VP NP V NP NP V Nominal NP V N NP V chocolate NP smells chocolate the Nominal smells chocolate the Adjs N smells chocolate the Adjs cat smells chocolate the Adj cat smells chocolate the smart cat smells chocolate
9
Ambiguity A grammar is ambiguous iff there is at least one string in L(G) for which G produces more than one parse tree *. For many applications of context-free grammars, this is a problem. Example: A programming language. If there can be two different structures for a string in the language, there can be two different meanings. Not good! * Equivalently, more than one leftmost derivation, or more than one rightmost derivation.
10
An Arithmetic Expression Grammar E E + E E E E E (E) E id
11
Inherent Ambiguity Some CF languages have the property that every grammar for them is ambiguous. We call such languages inherently ambiguous. Example: L = {a n b n c m : n, m 0} {a n b m c m : n, m 0}.
12
Inherent Ambiguity L = { a n b n c m : n, m 0} { a n b m c m : n, m 0}. One grammar for L has the rules: S S 1 | S 2 S 1 S 1 c | A/* Generate all strings in { a n b n c m }. A a A b | S 2 a S 2 | B/* Generate all strings in { a n b m c m }. B b B c | Consider any string of the form a n b n c n. It turns out that L is inherently ambiguous.
13
Ambiguity and undecidability Both of the following problems are undecidable*: Given a context-free grammar G, is G ambiguous? Given a context-free language L, is L inherently ambiguous? Informal definition of undecidable for the first problem: There is no algorithm (procedure that is guaranteed to always halt) that, given a grammar G, determines whether G is ambiguous.
14
But We Can Often Reduce Ambiguity We can get rid of: ● some rules like S , ● rules with symmetric right-hand sides, e.g., S SS E E + E ● rule sets that lead to ambiguous attachment of optional postfixes, such as if … else ….
15
A Highly Ambiguous Grammar S S SS S (S)
16
Resolving the Ambiguity with a Different Grammar The biggest problem is the rule. A different grammar for the language of balanced parentheses: S* S* S S SS S (S) S () We'd like to have an algorithm for removing all - productions… … except for the case where - is actually in the language; then we introduce a new start symbol and have one -production whose left side is that symbol.
17
Nullable Nonterminals Examples: S a T a T S a T a T A B A B A nonterminal X is nullable iff either: (1) there is a rule X , or (2) there is a rule X PQR… and P, Q, R, … are all nullable.
18
Nullable Nonterminals A nonterminal X is nullable iff either: (1) there is a rule X , or (2) there is a rule X PQR… and P, Q, R, … are all nullable. So compute N, the set of nullable nonterminals, as follows: 1. Set N to the set of nonterminals that satisfy (1). 2. Repeat until an entire pass is made without adding anything to N Evaluate all other nonterminals with respect to (2). If any nonterminal satisfies (2) and is not in N, insert it.
19
A General Technique for Getting Rid of -Rules Definition: a rule is modifiable iff it is of the form: P Q , for some nullable Q. removeEps(G: cfg) = 1. Let G = G. 2. Find the set N of nullable nonterminals in G. 3. Repeat until G contains no modifiable rules that haven’t been processed: Given the rule P Q , where Q N, add the rule P if it is not already present and if and if P . 4. Delete from G all rules of the form X . 5. Return G. L(G) = L(G) – { }
20
An Example G = {{S, T, A, B, C, a, b, c }, { a, b, c }, R, S), R = {S a T a T ABC A a A | C B B b | C C c | } removeEps(G: cfg) = 1. Let G = G. 2. Find the set N of nullable nonterminals in G. 3. Repeat until G contains no modifiable rules that haven’t been processed: Given the rule P Q , where Q N, add the rule P if it is not already present and if and if P . 4. Delete from G all rules of the form X . 5. Return G.
21
What If L? atmostoneEps(G: cfg) = 1. G = removeEps(G). 2. If S G is nullable then/* i. e., L(G) 2.1 Create in G a new start symbol S*. 2.2 Add to R G the two rules: S* S* S G. 3. Return G .
22
But There Can Still Be Ambiguity S* What about ()()() ? S* S S SS S (S) S ()
23
Eliminating Symmetric Recursive Rules S* S* S S SS S (S) S () Replace S SS with one of: S SS 1 /* force branching to the left S S 1 S /* force branching to the right So we get: S* S SS 1 S* SS S 1 S 1 (S) S 1 ()
24
Eliminating Symmetric Recursive Rules So we get: S* S* S S SS 1 S S 1 S 1 (S) S 1 () S* S SS 1 S 1 ( ) ( ) ( )
25
Arithmetic Expressions E E + E E E E E (E) E id } E E EE E id id id Problem 1: Associativity
26
Arithmetic Expressions E E + E E E E E (E) E id } E E EE E id id + id Problem 2: Precedence
27
Arithmetic Expressions - A Better Way E E + T E T T T * F T F F (E) F id
28
Ambiguous Attachment The dangling else problem: ::= if then ::= if then else Consider: if cond 1 then if cond 2 then st 1 else st 2
29
::= | | ::= | | … ::= if ( ) else ::= if ( ) else if (cond) else The Java Fix
30
Hidden: Going Too Far S NP VP NP the Nominal | Nominal | ProperNoun | NP PP Nominal N | Adjs N N cat | girl | dogs | ball | chocolate | bat ProperNoun Chris | Fluffy Adjs Adj Adjs | Adj Adj young | older | smart VP V | V NP | VP PP V like | likes | thinks | hits PP Prep NP Prep with ● Chris likes the girl with the cat. ● Chris shot the bear with a rifle.
31
● Chris likes the girl with the cat. ● Chris shot the bear with a rifle. Hidden: Going Too Far
32
Hidden: Comparing Regular and Context-Free Languages Regular LanguagesContext-Free Languages ● regular exprs. or ● regular grammars ● context-free grammars ● recognize ● parse
33
Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid objects, with the following two properties: ● For every element c of C, except possibly a finite set of special cases, there exists some element f of F such that f is equivalent to c with respect to some set of tasks. ● F is simpler than the original form in which the elements of C are written. By “simpler” we mean that at least some tasks are easier to perform on elements of F than they would be on elements of C.
34
Normal Form Examples ● Disjunctive normal form for database queries so that they can be entered in a query-by- example grid. ● Jordan normal form for a square matrix, in which the matrix is almost diagonal in the sense that its only non-zero entries lie on the diagonal and the superdiagonal. ● Various normal forms for grammars to support specific parsing techniques.
35
Normal Forms for Grammars Chomsky Normal Form, in which all rules are of one of the following two forms: ● X a, where a , or ● X BC, where B and C are elements of V - . Advantages: ● Parsers can use binary trees. ● Exact length of derivations is known: S AB AAB B aab BB b b
36
Normal Forms for Grammars Greibach Normal Form, in which all rules are of the following form: ● X a , where a and (V - )*. Advantages: ● Every derivation of a string s contains |s| rule applications. ● Greibach normal form grammars can easily be converted to pushdown automata with no - transitions. This is useful because such PDAs are guaranteed to halt.
37
Theorems: Normal Forms Exist Theorem: Given a CFG G, there exists an equivalent Chomsky normal form grammar G C such that: L(G C ) = L(G) – { }. Proof: The proof is by construction. Theorem: Given a CFG G, there exists an equivalent Greibach normal form grammar G G such that: L(G G ) = L(G) – { }. Proof: The proof is also by construction. Details of Chomsky conversion are complex but straightforward; I leave them for you to read in Chapter 11 and/or in the next 16 slides. Details of Greibach conversion are more complex but still straightforward; I leave them for you to read in Appendix D if you wish (not req'd).
38
Hidden: Converting to a Normal Form 1. Apply some transformation to G to get rid of undesirable property 1. Show that the language generated by G is unchanged. 2. Apply another transformation to G to get rid of undesirable property 2. Show that the language generated by G is unchanged and that undesirable property 1 has not been reintroduced. 3. Continue until the grammar is in the desired form.
39
Hidden: Rule Substitution X a Y c Y b Y ZZ We can replace the X rule with the rules: X abc X a ZZ c X a Y c a ZZ c
40
Hidden: Rule Substitution Theorem: Let G contain the rules: X Y and Y 1 | 2 | … | n, Replace X Y by: X 1 , X 2 , …, X n . The new grammar G' will be equivalent to G.
41
Hidden: Rule Substitution Replace X Y by: X 1 , X 2 , …, X n . Proof: ● Every string in L(G) is also in L(G'): If X Y is not used, then use same derivation. If it is used, then one derivation is: S … X Y k … w Use this one instead: S … X k … w ● Every string in L(G') is also in L(G): Every new rule can be simulated by old rules.
42
Hidden: Convert to Chomsky Normal Form 1. Remove all -rules, using the algorithm removeEps. 2. Remove all unit productions (rules of the form A B). 3. Remove all rules whose right hand sides have length greater than 1 and include a terminal: (e.g., A a B or A B a C) 4. Remove all rules whose right hand sides have length greater than 2: (e.g., A BCDE)
43
Remove all productions: (1) If there is a rule P Q and Q is nullable, Then: Add the rule P . (2) Delete all rules Q . Hidden: Recap: Removing - Productions
44
Example: S a A A B | CDC B B a C BD D b D Hidden: Removing -Productions
45
Hidden: Unit Productions A unit production is a rule whose right-hand side consists of a single nonterminal symbol. Example: S X Y X A A B | a B b Y T T Y | c
46
removeUnits(G) = 1. Let G' = G. 2. Until no unit productions remain in G' do: 2.1 Choose some unit production X Y. 2.2 Remove it from G'. 2.3 Consider only rules that still remain. For every rule Y , where V*, do: Add to G' the rule X unless it is a rule that has already been removed once. 3. Return G'. After removing epsilon productions and unit productions, all rules whose right hand sides have length 1 are in Chomsky Normal Form. Hidden: Removing Unit Productions
47
removeUnits(G) = 1. Let G' = G. 2. Until no unit productions remain in G' do: 2.1 Choose some unit production X Y. 2.2 Remove it from G'. 2.3 Consider only rules that still remain. For every rule Y , where V*, do: Add to G' the rule X unless it is a rule that has already been removed once. 3. Return G'. Hidden: Removing Unit Productions Example:S X Y X A A B | a B b Y T T Y | c
48
Hidden: Mixed Rules removeMixed(G) = 1. Let G = G. 2. Create a new nonterminal T a for each terminal a in . 3. Modify each rule whose right-hand side has length greater than 1 and that contains a terminal symbol by substituting T a for each occurrence of the terminal a. 4. Add to G, for each T a, the rule T a a. 5. Return G. Example: A a A a B A B a C A B b C
49
Hidden: Long Rules removeLong(G) = 1. Let G = G. 2. For each rule r of the form: A N 1 N 2 N 3 N 4 …N n, n > 2 create new nonterminals M 2, M 3, … M n-1. 3. Replace r with the rule A N 1 M 2. 4. Add the rules: M 2 N 2 M 3, M 3 N 3 M 4, … M n-1 N n-1 N n. 5. Return G. Example: A BCDEF
50
Hidden: An Example S a AC a A B | a B C | c C c C | removeEps returns: S a AC a | a A a | a C a | aa A B | a B C | c C c C | c
51
Hidden: An Example Next we apply removeUnits: Remove A B. Add A C | c. Remove B C. Add B c C (B c, already there). Remove A C. Add A c C (A c, already there). So removeUnits returns: S a AC a | a A a | a C a | aa A a | c | c C B c | c C C c C | c S a AC a | a A a | a C a | aa A B | a B C | c C c C | c
52
Hidden: An Example S a AC a | a A a | a C a | aa A a | c | c C B c | c C C c C | c Next we apply removeMixed, which returns: S T a ACT a | T a AT a | T a CT a | T a T a A a | c | T c C B c | T c C C T c C | c T a a T c c
53
Hidden: An Example S T a ACT a | T a AT a | T a CT a | T a T a A a | c | T c C B c | T c C C T c C | c T a a T c c Finally, we apply removeLong, which returns: S T a S 1 S T a S 3 S T a S 4 S T a T a S 1 AS 2 S 3 AT a S 4 CT a S 2 CT a A a | c | T c C B c | T c C C T c C | c T a a T c c
54
The Price of Normal Forms E E + E E (E) E id Converting to Chomsky normal form: E E E E P E E L E E E R E id L ( R ) P + Conversion doesn’t change weak generative capacity but it may change strong generative capacity.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.