Exercise 1: Balanced Parentheses Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence)

Slides:



Advertisements
Similar presentations
Parsing II : Top-down Parsing
Advertisements

Grammar vs Recursive Descent Parser
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
Exercise: Balanced Parentheses
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
About Grammars CS 130 Theory of Computation HMU Textbook: Sec 7.1, 6.3, 5.4.
Prof. Busch - LSU1 Simplifications of Context-Free Grammars.
Lecture # 11 Grammar Problems.
CS5371 Theory of Computation
Courtesy Costas Buch - RPI1 Simplifications of Context-Free Grammars.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule Substitute Equivalent grammar.
1 Compilers. 2 Compiler Program v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } Add v,v,0 cmp v,5 jmplt ELSE THEN: add x, 12,v.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
CS 536 Spring Ambiguity Lecture 8. CS 536 Spring Announcement Reading Assignment –“Context-Free Grammars” (Sections 4.1, 4.2) Programming.
Normal forms for Context-Free Grammars
January 15, 2014CS21 Lecture 61 CS21 Decidability and Tractability Lecture 6 January 16, 2015.
Cs466(Prasad)L8Norm1 Normal Forms Chomsky Normal Form Griebach Normal Form.
Fall 2004COMP 3351 Context-Free Languages. Fall 2004COMP 3352 Regular Languages.
Parsing II : Top-down Parsing Lecture 7 CS 4318/5331 Apan Qasem Texas State University Spring 2015 *some slides adopted from Cooper and Torczon.
CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
EECS 6083 Intro to Parsing Context Free Grammars
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 5 Mälardalen University 2005.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Context-Free Grammars Normal Forms Chapter 11. Normal Forms A normal form F for a set C of data objects is a form, i.e., a set of syntactically valid.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Exercise 1 Consider a language with the following tokens and token classes: ident ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP.
PART I: overview material
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CYK Algorithm for Parsing General Context-Free Grammars
A Rule of While Language Syntax // Where things work very nicely for recursive descent! statmt ::= println ( stringConst, ident ) | ident = expr | if (
Exercise 1 A ::= B EOF B ::=  | B B | (B) Tokens: EOF, (, ) Generate constraints and compute nullable and first for this grammar. Check whether first.
Compiler verification for fun and profit a week from now, 8:55am in Salle Polyvalente Xavier LeroyInria Paris–Rocquencourt …Compilers are, however, vulnerable.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 Chapter 6 Simplification of CFGs and Normal Forms.
11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.
CYK Algorithm for Parsing General Context-Free Grammars.
Computing if a token can follow first(B 1... B p ) = {a  | B 1...B p ...  aw } follow(X) = {a  | S ... ...Xa... } There exists a derivation from.
Syntax Analyzer (Parser)
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
Parsing using CYK Algorithm Transform grammar into Chomsky Form: 1.remove unproductive symbols 2.remove unreachable symbols 3.remove epsilons (no non-start.
Exercises on Chomsky Normal Form and CYK parsing
Lecture # 10 Grammar Problems. Problems with grammar Ambiguity Left Recursion Left Factoring Removal of Useless Symbols These can create problems for.
About Grammars Hopcroft, Motawi, Ullman, Chap 7.1, 6.3, 5.4.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax(1). 2 Syntax  The syntax of a programming language is a precise description of all its grammatically correct programs.  Levels of syntax Lexical.
Introduction to Parsing
Parsing & Context-Free Grammars
CS510 Compiler Lecture 4.
Syntax Specification and Analysis
Simplifications of Context-Free Grammars
Simplifications of Context-Free Grammars
Lecture 3: Introduction to Syntax (Cont’)
Lecture 7: Introduction to Parsing (Syntax Analysis)
Introduction to Parsing
Introduction to Parsing
Answer Questions about Exam2 problems
Faculty of Computer Science and Information System
Context-Free Languages
Presentation transcript:

Exercise 1: Balanced Parentheses Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence) and find unambiguous grammar for the same language. B ::=  | ( B ) | B B

Remark The same parse tree can be derived using two different derivations, e.g. B -> (B) -> (BB) -> ((B)B) -> ((B)) -> (()) B -> (B) -> (BB) -> ((B)B) -> (()B) -> (()) this correspond to different orders in which nodes in the tree are expanded Ambiguity refers to the fact that there are actually multiple parse trees, not just multiple derivations.

Towards Solution (Note that we must preserve precisely the set of strings that can be derived) This grammar: B ::=  | A A ::= ( ) | A A | (A) solves the problem with multiple  symbols generating different trees, but it is still ambiguous: string ( ) ( ) ( ) has two different parse trees

Solution Proposed solution: B ::=  | B (B) this is very smart! How to come up with it? Clearly, rule B::= B B generates any sequence of B's. We can also encode it like this: B ::= C* C ::= (B) Now we express sequence using recursive rule that does not create ambiguity: B ::=  | C B C ::= (B) but now, look, we "inline" C back into the rules for so we get exactly the rule B ::=  | B (B) This grammar is not ambiguous and is the solution. We did not prove this fact (we only tried to find ambiguous trees but did not find any).

Exercise 2: Dangling Else The dangling-else problem happens when the conditional statements are parsed using the following grammar. S ::= S ; S S ::= id := E S ::= if E then S S ::= if E then S else S Find an unambiguous grammar that accepts the same conditional statements and matches the else statement with the nearest unmatched if.

Discussion of Dangling Else if (x > 0) then if (y > 0) then z = x + y else x = - x This is a real problem languages like C, Java –resolved by saying else binds to innermost if Can we design grammar that allows all programs as before, but only allows parse trees where else binds to innermost if?

Sources of Ambiguity in this Example Ambiguity arises in this grammar here due to: –dangling else –binary rule for sequence (;) as for parentheses –priority between if-then-else and semicolon (;) if (x > 0) if (y > 0) z = x + y; u = z + 1 // last assignment is not inside if Wrong parse tree -> wrong generated code

How we Solved It We identified a wrong tree and tried to refine the grammar to prevent it, by making a copy of the rules. Also, we changed some rules to disallow sequences inside if-then-else and make sequence rule non-ambiguous. The end result is something like this: S::=  |A S// a way to write S::=A* A ::= id := E A ::= if E then A A ::= if E then A' else A A' ::= id := E A' ::= if E then A' else A' At some point we had a useless rule, so we deleted it. We also looked at what a practical grammar would have to allow sequences inside if-then-else. It would add a case for blocks, like this: A ::= { S } A' ::= { S } We could factor out some common definitions (e.g. define A in terms of A'), but that is not important for this problem.

Transforming Grammars into Chomsky Normal Form Steps: 1.remove unproductive symbols 2.remove unreachable symbols 3.remove epsilons (no non-start nullable symbols) 4.remove single non-terminal productions X::=Y 5.transform productions w/ more than 3 on RHS 6.make terminals occur alone on right-hand side

1) Unproductive non-terminals What is funny about this grammar: stmt ::= identifier := identifier | while (expr) stmt | if (expr) stmt else stmt expr ::= term + term | term – term term ::= factor * factor factor ::= ( expr ) There is no derivation of a sequence of tokens from expr Why? In every step will have at least one expr, term, or factor If it cannot derive sequence of tokens we call it unproductive How to compute them?

stmt ::= identifier := identifier | while (expr) stmt | if (expr) stmt else stmt expr ::= term + term | term – term term ::= factor * factor factor ::= ( expr ) program ::= stmt | stmt program 1) Unproductive non-terminals Productive symbols are obtained using these two rules (what remains is unproductive) –Terminals (tokens) are productive –If X::= s 1 s 2 … s n is rule and each s i is productive then X is productive Delete unproductive symbols. Will the meaning of top-level symbol (program) change?

What is funny about this grammar with starting terminal ‘program’ program ::= stmt | stmt program stmt ::= assignment | whileStmt assignment ::= expr = expr ifStmt ::= if (expr) stmt else stmt whileStmt ::= while (expr) stmt expr ::= identifier 2) Unreachable non-terminals No way to reach symbol ‘ifStmt’ from ‘program’

2) Computing unreachable non-terminals What is the general algorithm? What is funny about this grammar with starting terminal ‘program’ program ::= stmt | stmt program stmt ::= assignment | whileStmt assignment ::= expr = expr ifStmt ::= if (expr) stmt else stmt whileStmt ::= while (expr) stmt expr ::= identifier

2) Unreachable non-terminals Reachable terminals are obtained using the following rules (the rest are unreachable) –starting non-terminal is reachable (program) –If X::= s 1 s 2 … s n is rule and X is reachable then each non-terminal among s 1 s 2 … s n is reachable Delete unreachable symbols. Will the meaning of top-level symbol (program) change?

3) Removing Empty Strings Ensure only top-level symbol can be nullable program ::= stmtSeq stmtSeq ::= stmt | stmt ; stmtSeq stmt ::= “” | assignment | whileStmt | blockStmt blockStmt ::= { stmtSeq } assignment ::= expr = expr whileStmt ::= while (expr) stmt expr ::= identifier How to do it in this example?

3) Removing Empty Strings - Result program ::=  | stmtSeq stmtSeq ::= stmt| stmt ; stmtSeq | | ; stmtSeq | stmt ; | ; stmt ::= assignment | whileStmt | blockStmt blockStmt ::= { stmtSeq } | { } assignment ::= expr = expr whileStmt ::= while (expr) stmt whileStmt ::= while (expr) expr ::= identifier

3) Removing Empty Strings - Algorithm Compute the set of nullable non-terminals Add extra rules –If X::= s 1 s 2 … s n is rule then add new rules of form X::= r 1 r 2 … r n where r i is either s i or, if Remove all empty right-hand sides If starting symbol S was nullable, then introduce a new start symbol S’ instead, and add rule S’ ::= S |  s i is nullable then r i can also be the empty string (so it disappears)

3) Removing Empty Strings Since stmtSeq is nullable, the rule blockStmt ::= { stmtSeq } gives blockStmt ::= { stmtSeq } | { } Since stmtSeq and stmt are nullable, the rule stmtSeq ::= stmt | stmt ; stmtSeq gives stmtSeq ::= stmt | stmt ; stmtSeq | ; stmtSeq | stmt ; | ;