Computing if a token can follow first(B 1... B p ) = {a  | B 1...B p ...  aw } follow(X) = {a  | S ... ...Xa... } There exists a derivation from.

Slides:



Advertisements
Similar presentations
Parsing 4 Dr William Harrison Fall 2008
Advertisements

Grammar vs Recursive Descent Parser
Exercise: Balanced Parentheses
Exercise 1: Balanced Parentheses Show that the following balanced parentheses grammar is ambiguous (by finding two parse trees for some input sequence)
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
Courtesy Costas Buch - RPI1 Simplifications of Context-Free Grammars.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Parsing Compiler Baojian Hua Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer.
Parsing III (Eliminating left recursion, recursive descent parsing)
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Costas Busch - RPI1 Context-Free Languages. Costas Busch - RPI2 Regular Languages.
1 Simplifications of Context-Free Grammars. 2 A Substitution Rule Substitute Equivalent grammar.
1 Compilers. 2 Compiler Program v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } Add v,v,0 cmp v,5 jmplt ELSE THEN: add x, 12,v.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
Top-Down Parsing.
Fall 2004COMP 3351 Context-Free Languages. Fall 2004COMP 3352 Regular Languages.
CPSC 388 – Compiler Design and Construction
Efficiency in Parsing Arbitrary Grammars. Parsing using CYK Algorithm 1) Transform any grammar to Chomsky Form, in this order, to ensure: 1.terminals.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 5 Mälardalen University 2010.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CYK Algorithm for Parsing General Context-Free Grammars
Syntax and Semantics Structure of programming languages.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Exercise 1 A ::= B EOF B ::=  | B B | (B) Tokens: EOF, (, ) Generate constraints and compute nullable and first for this grammar. Check whether first.
Compiler verification for fun and profit a week from now, 8:55am in Salle Polyvalente Xavier LeroyInria Paris–Rocquencourt …Compilers are, however, vulnerable.
LL(k) Parsing Compiler Baojian Hua
Parsing Top-Down.
Compiler Construction 2011 CYK Algorithm for Parsing General Context-Free Grammars
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing Recursive Descent & LL(1) Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Bc. Jozef Lang (xlangj01) Bc. Zoltán Zemko (xzemko01) Increasing power of LL(k) parsers.
11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.
CYK Algorithm for Parsing General Context-Free Grammars.
Context-Free Languages. Regular Languages Context-Free Languages.
Top-Down Parsing.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Parsing using CYK Algorithm Transform grammar into Chomsky Form: 1.remove unproductive symbols 2.remove unreachable symbols 3.remove epsilons (no non-start.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
Earley’s Algorithm J. Earley, "An efficient context-free parsing algorithm", Communications of the Association for Computing Machinery, 13:2:94-102, 1970."An.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
Costas Busch - LSU1 Parsing. Costas Busch - LSU2 Compiler Program File v = 5; if (v>5) x = 12 + v; while (x !=3) { x = x - 3; v = 10; } Add v,v,5.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Bernd Fischer RW713: Compiler and Software Language Engineering.
Parsing III (Top-down parsing: recursive descent & LL(1) )
Exercises on Chomsky Normal Form and CYK parsing
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Compilers: Bottom-up/6 1 Compiler Structures Objective – –describe bottom-up (LR) parsing using shift- reduce and parse tables – –explain how LR.
Bottom-up parsing. Bottom-up parsing builds a parse tree from the leaves (terminals) to the start symbol int E T * TE+ T (4) (2) (3) (5) (1) int*+ E 
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
G. Pullaiah College of Engineering and Technology
CS510 Compiler Lecture 4.
Simplifications of Context-Free Grammars
Top-Down Parsing CS 671 January 29, 2008.
Parsing Costas Busch - LSU.
5. Bottom-Up Parsing Chih-Hung Wang
Kanat Bolazar February 16, 2010
Presentation transcript:

Computing if a token can follow first(B 1... B p ) = {a  | B 1...B p ...  aw } follow(X) = {a  | S ... ...Xa... } There exists a derivation from the start symbol that produces a sequence of terminals and nonterminals of the form...Xa... (the token a follows the non-terminal X)

Rule for Computing Follow Given X ::= YZ(for reachable X) then first(Z)  follow(Y) and follow(X)  follow(Z) Now take care of nullable ones as well: For each ruleX ::= Y 1... Y p... Y q... Y r follow(Y p ) should contain: first(Y p+1 Y p+2...Y r ) also follow(X) if nullable(Y p+1 Y p+2 Y r ) S ::= Xa X ::= YZ Y ::= b Z ::= c S => Xa => YZa => Yba

Compute nullable, first, follow stmtList ::=  | stmt stmtList stmt ::= assign | block assign ::= ID = ID ; block ::= beginof ID stmtList ID ends Compute follow (for that we need nullable,first)

Conclusion of the Solution The grammar is not LL(1) because we have nullable(stmtList) first(stmt)  follow(stmtList) = {ID} If a recursive-descent parser sees ID, it does not know if it should –finish parsing stmtList or –parse another stmt

LL(1) Grammar - good for building recursive descent parsers Grammar is LL(1) if for each nonterminal X –first sets of different alternatives of X are disjoint –if nullable(X), first(X) must be disjoint from follow(X) For each LL(1) grammar we can build recursive-descent parser Each LL(1) grammar is unambiguous If a grammar is not LL(1), we can sometimes transform it into equivalent LL(1) grammar

Table for LL(1) Parser: Example S ::= B EOF (1) B ::=  | B (B) (1) (2) EOF() S{1} {} B{1}{1,2}{1} nullable: B first(S) = { ( } follow(S) = {} first(B) = { ( } follow(B) = { ), (, EOF } Parsing table: parse conflict - choice ambiguity: grammar not LL(1) empty entry: when parsing S, if we see ), report error 1 is in entry because ( is in follow(B) 2 is in entry because ( is in first(B(B))

Table for LL(1) Parsing Tells which alternative to take, given current token: choice : Nonterminal x Token -> Set[Int] A ::= (1) B 1... B p | (2) C 1... C q | (3) D 1... D r For example, when parsing A and seeing token t choice(A,t) = {2} means: parse alternative 2 (C 1... C q ) choice(A,t) = {1} means: parse alternative 3 (D 1... D r ) choice(A,t) = {} means: report syntax error choice(A,t) = {2,3} : not LL(1) grammar if t  first(C 1... C q ) add 2 to choice(A,t) if t  follow(A) add K to choice(A,t) where K is nullable alternative

Transform Grammar for LL(1) S ::= B EOF B ::=  | B (B) (1) (2) EOF() S{1} {} B{1}{1,2}{1} Transform the grammar so that parsing table has no conflicts. Old parsing table: conflict - choice ambiguity: grammar not LL(1) 1 is in entry because ( is in follow(B) 2 is in entry because ( is in first(B(B)) EOF() S B S ::= B EOF B ::=  | (B) B (1) (2) Left recursion is bad for LL(1) choice(A,t)

Parse Table is Code for Generic Parser var stack : Stack[GrammarSymbol] // terminal or non-terminal stack.push(EOF); stack.push(StartNonterminal); var lex = new Lexer(inputFile) while (true) { X = stack.pop t = lex.curent if (isTerminal(X)) if (t==X) if (X==EOF) return success else lex.next // eat token t else parseError("Expected " + X) else { // non-terminal cs = choice(X)(t) // look up parsing table cs match { // result is a set case {i} => { // exactly one choice rhs = p(X,i) // choose correct right-hand side stack.push(reverse(rhs)) } case {} => parseError("Parser expected an element of " + unionOfAll(choice(X))) case _ => crash(“parse table with conflicts - grammar was not LL(1)") } }

What if we cannot transform the grammar into LL(1)? 1) Redesign your language 2) Use a more powerful parsing technique

regular Languages semi-decidable decidable context-sensitive context-free unambiguous deterministic = LR(1) LL(1) LALR(1) SLR LR(0)

Remark: Grammars and Languages Language S is a set of words For each language S, there can be multiple possible grammars G such that S=L(G) Language S is –Non-ambiguous if there exists a non-ambiguous grammar for it –LL(1) if there is an LL(1) grammar for it Even if a language has ambiguous grammar, it can still be non-ambiguous if it also has a non-ambiguous grammar

Parsing General Grammars: Why Can be difficult or impossible to make grammar unambiguous Some inputs are more complex than simple programming languages –mathematical formulas: x = y /\ z? (x=y) /\ z x = (y /\ z) –future programming languages –natural language: I saw the man with the telescope.

Ambiguity I saw the man with the telescope. 1) 2)

CYK Parsing Algorithm C: John CockeJohn Cocke and Jacob T. Schwartz (1970). Programming languages and their compilers: Preliminary notes. Technical report, Courant Institute of Mathematical Sciences, New York University.Courant Institute of Mathematical SciencesNew York University Y: Daniel H. Younger (1967). Recognition and parsing of context-free languages in time n 3. Information and Control 10(2): 189–208. K: T. KasamiT. Kasami (1965). An efficient recognition and syntax-analysis algorithm for context-free languages. Scientific report AFCRL , Air Force Cambridge Research Lab, Bedford, MA.Bedford, MA

Two Steps in the Algorithm 1)Transform grammar to normal form called Chomsky Normal Form (Noam Chomsky, mathematical linguist) 2)Parse input using transformed grammar dynamic programming algorithm “a method for solving complex problems by breaking them down into simpler steps. It is applicable to problems exhibiting the properties of overlapping subproblems” (>WP)

Chomsky Normal Form Essentially, only binary rules Concretely, these kinds of rules: X ::= Y Zbinary rule X,Y,Z - non-terminals X ::= anon-terminal as a name for token S ::=  only for top-level symbol S

Balanced Parentheses Grammar Original grammar G S   | ( S ) | S S Modified grammar in Chomsky Normal Form: S   | S’ S’  N ( N S) | N ( N ) | S’ S’ N S)  S’ N ) N (  ( N )  ) Terminals: ( ) Nonterminals: S S’ N S) N ) N (

Idea How We Obtained the Grammar S  ( S ) S’  N ( N S) | N ( N ) N (  ( N S)  S’ N ) N )  ) Chomsky Normal Form transformation can be done fully mechanically

Transforming Grammars into Chomsky Normal Form Steps: 1.remove unproductive symbols 2.remove unreachable symbols 3.remove epsilons (no non-start nullable symbols) 4.remove single non-terminal productions X::=Y 5.transform productions w/ more than 3 on RHS 6.make terminals occur alone on right-hand side

4) Eliminating single productions Single production is of the form X ::=Y where X,Y are non-terminals program ::= stmtSeq stmtSeq ::= stmt | stmt ; stmtSeq stmt ::= assignment | whileStmt assignment ::= expr = expr whileStmt ::= while (expr) stmt

4) Eliminate single productions - Result Generalizes removal of epsilon transitions from non-deterministic automata program ::= expr = expr | while (expr) stmt | stmt ; stmtSeq stmtSeq ::= expr = expr | while (expr) stmt | stmt ; stmtSeq stmt ::= expr = expr | while (expr) stmt assignment ::= expr = expr whileStmt ::= while (expr) stmt

4) “Single Production Terminator” If there is single production X ::=Y put an edge (X,Y) into graph If there is a path from X to Z in the graph, and there is rule Z ::= s 1 s 2 … s n then add rule program ::= expr = expr | while (expr) stmt | stmt ; stmtSeq stmtSeq ::= expr = expr | while (expr) stmt | stmt ; stmtSeq stmt ::= expr = expr | while (expr) stmt X ::= s 1 s 2 … s n At the end, remove all single productions.

5) No more than 2 symbols on RHS stmt ::= while (expr) stmt becomes stmt ::= while stmt 1 stmt 1 ::= ( stmt 2 stmt 2 ::= expr stmt 3 stmt 3 ::= ) stmt

6) A non-terminal for each terminal stmt ::= while (expr) stmt becomes stmt ::= N while stmt 1 stmt 1 ::= N ( stmt 2 stmt 2 ::= expr stmt 3 stmt 3 ::= N ) stmt N while ::= while N ( ::= ( N ) ::= )

Parsing using CYK Algorithm Transform grammar into Chomsky Form: 1.remove unproductive symbols 2.remove unreachable symbols 3.remove epsilons (no non-start nullable symbols) 4.remove single non-terminal productions X::=Y 5.transform productions of arity more than two 6.make terminals occur alone on right-hand side Have only rules X ::= Y Z, X ::= t, and possibly S ::= “” Apply CYK dynamic programming algorithm

Dynamic Programming to Parse Input Assume Chomsky Normal Form, 3 types of rules: S   | S’ (only for the start non-terminal) N j  t(names for terminals) N i  N j N k (just 2 non-terminals on RHS) Decomposing long input: find all ways to parse substrings of length 1,2,3,… ((()())())(()) NiNi NjNj NkNk

Parsing an Input S’  N ( N S) | N ( N ) | S’ S’ N S)  S’ N ) N (  ( N )  ) N(N( N(N( N)N) N(N( N)N) N(N( N)N) N)N) ambiguity (()()())

Algorithm Idea S’  S’ S’ w pq – substring from p to q d pq – all non-terminals that could expand to w pq Initially d pp has N w(p,p) key step of the algorithm: if X  Y Z is a rule, Y is in d p r, and Z is in d (r+1)q then put X into d pq (p r < q), in increasing value of (q-p) w pq – substring from p to q d pq – all non-terminals that could expand to w pq Initially d pp has N w(p,p) key step of the algorithm: if X  Y Z is a rule, Y is in d p r, and Z is in d (r+1)q then put X into d pq (p r < q), in increasing value of (q-p) N(N( N(N( N)N) N(N( N)N) N(N( N)N) N)N) (()()())

Algorithm INPUT: grammar G in Chomsky normal form word w to parse using G OUTPUT: true iff (w in L(G))true N = |w| varvar d : Array[N][N] for p = 1 to N { d(p)(p) = {X | G contains X->w(p)} for q in {p N} d(p)(q) = {} }for for k = 2 to N // substring length for p = 0 to N-k // initial position for j = 1 to k-1 // length of first halffor val r = p+j-1; val q = p+k-1; for (X::=Y Z) in G if Y in d(p)(r) and Z in d(r+1)(q) d(p)(q) = d(p)(q) union {X} return S in d(0)(N-1)forif return (()()()) What is the running time as a function of grammar size and the size of input? O( ) What is the running time as a function of grammar size and the size of input? O( )

Parsing another Input S’  N ( N S) | N ( N ) | S’ S’ N S)  S’ N ) N (  ( N )  ) ()()()() N(N( N)N) N(N( N)N) N(N( N)N) N(N( N)N)

Number of Parse Trees Let w denote word ()()() –it has two parse trees Give a lower bound on number of parse trees of the word w n (n is positive integer) w 5 is the word ()()() ()()() ()()() ()()() ()()() CYK represents all parse trees compactly –can re-run algorithm to extract first parse tree, or enumerate parse trees one by one

Earley’s Algorithm also parser arbitrary grammars J. Earley, "An efficient context-free parsing algorithm", Communications of the Association for Computing Machinery, 13:2:94-102, 1970."An efficient context-free parsing algorithm"

Z ::= X YZ parses w pq CYK: if d pr parses X and d (r+1)q parses Y, then in d pq stores symbol Z Earley’s parser: in set S q stores item (Z ::= XY., p) Move forward, similar to top-down parsers Use dotted rules to avoid binary rules CYK vs Earley’s Parser Comparison (()()())

Dotted Rules Like Nonterminals X ::= Y 1 Y 2 Y 3 Chomsky transformation is (a simplification of) this: X ::= W 123 W 123 ::= W 12 Y 3 W 12 ::= W 1 Y 2 W 1 ::= W  Y 1 W  ::=  Early parser: dotted RHS as names of fresh non-terminals: X ::= (Y 1 Y 2 Y 3.) (Y 1 Y 2 Y 3.) ::= (Y 1 Y 2.Y 3 ) Y 3 (Y 1 Y 2.Y 3 ) ::= (Y 1.Y 2 Y 3 ) Y 2 (Y 1.Y 2 Y 3 ) ::= (.Y 1 Y 2 Y 3 ) Y 3 (.Y 1 Y 2 Y 3 ) ::= 

Example: expressions D ::= e EOF e ::= ID | e – e | e == e Rules with a dot inside D ::=. e EOF | e. EOF | e EOF. e ::=. ID | ID. |. e – e | e. – e | e –. e | e – e. |. e == e | e. == e | e ==. e | e == e.

ID- ==IDEOF IDID-ID-IDID-ID==ID-ID==ID ID --ID-ID==-ID==ID - IDID==ID==ID ID ====ID == ID EOF e ::=. ID | ID. |. e – e | e. – e | e –. e | e – e. |. e == e | e. == e | e ==. e | e == e. S ::=. e EOF | e. EOF | e EOF.

ID- ==IDEOF IDID-ID-IDID-ID==ID-ID==ID ID --ID-ID==-ID==ID - IDID==ID==ID ID ====ID == ID EOF e ::=. ID | ID. |. e – e | e. – e | e –. e | e – e. |. e == e | e. == e | e ==. e | e == e. S ::=. e EOF | e. EOF | e EOF.