Chapter 4 Chang Chi-Chung 2007.5.17.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

Lecture # 7 Chapter 4: Syntax Analysis. What is the job of Syntax Analysis? Syntax Analysis is also called Parsing or Hierarchical Analysis. A Parser.
LESSON 18.
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Top-Down Parsing.
1 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice Grammars and Parsing.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
1 October 2, October 2, 2015October 2, 2015October 2, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
1 Compiler Construction Syntax Analysis Top-down parsing.
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Chapter 4. Syntax Analysis (1). 2 Application of a production  A  in a derivation step  i   i+1.
CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.
1 Problems with Top Down Parsing  Left Recursion in CFG May Cause Parser to Loop Forever.  Indeed:  In the production A  A  we write the program procedure.
Pembangunan Kompilator.  The parse tree is created top to bottom.  Top-down parser  Recursive-Descent Parsing ▪ Backtracking is needed (If a choice.
UNIT - 2 -Compiled by: Namratha Nayak | Website for Students | VTU - Notes - Question Papers.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Top-Down Parsing The parse tree is created top to bottom. Top-down parser –Recursive-Descent Parsing.
1 Compiler Construction Syntax Analysis Top-down parsing.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
1 Nonrecursive Predictive Parsing  It is possible to build a nonrecursive predictive parser  This is done by maintaining an explicit stack.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Top-Down Parsing.
Syntax Analyzer (Parser)
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Chapter 3 Chang Chi-Chung
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
lec02-parserCFG May 8, 2018 Syntax Analyzer
Programming Languages Translator
The role of parser Lexical Analyzer Parser Rest of Front End Symbol
Lecture #12 Parsing Types.
Syntax Analysis Part I Chapter 4
Introduction to Top Down Parser
Top-down parsing cannot be performed on left recursive grammars.
Syntax Analysis Chapter 4.
Syntax Analysis Sections :.
Lecture 7 Predictive Parsing
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Top-Down Parsing The parse tree is created top to bottom.
Chapter 4 Top Down Parser.
Lecture 7 Predictive Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
lec02-parserCFG May 27, 2019 Syntax Analyzer
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Chapter 4 Chang Chi-Chung 2007.5.17

The Role of the Parser Parser Symbol Table Lexical Analyzer Token Rest of Front End Parse tree intermediate representation Source Program getNextToken Symbol Table

The Types of Parsers for Grammars Universal (any CFG) Cocke-Younger-Kasimi Earley Top-down (CFG with restrictions) Build parse trees from the root to the leaves. Recursive descent (predictive parsing) LL (Left-to-right, Leftmost derivation) methods Bottom-up (CFG with restrictions) Build parse trees from the leaves to the root. Operator precedence parsing LR (Left-to-right, Rightmost derivation) methods SLR, canonical LR, LALR

Representative Grammars E → T E’ E’ → + T E’ | ε T → F T’ T’ → * F T’ | ε F → ( E ) | id E → E + T | T T → T * F | F F → ( E ) | id E → E + E | E * E | ( E ) | id

Error Handling A good compiler should assist in identifying and locating errors Lexical errors important, compiler can easily recover and continue Syntax errors most important for compiler, can almost always recover Static semantic errors important, can sometimes recover Dynamic semantic errors hard or impossible to detect at compile time, runtime checks are required Logical errors hard or impossible to detect

Error Recovery Strategies Panic mode Discard input until a token in a set of designated synchronizing tokens is found Phrase-level recovery Perform local correction on the input to repair the error Error productions Augment grammar with productions for erroneous constructs Global correction Choose a minimal sequence of changes to obtain a global least-cost correction

Context-Free Grammar Context-free grammar is a 4-tuple G = < T, N, P, S> where T is a finite set of tokens (terminal symbols) N is a finite set of nonterminals P is a finite set of productions of the form    where   N and   (NT)* S  N is a designated start symbol

Notational Conventions Terminals a, b, c, …  T example: 0, 1, +, *, id, if Nonterminals A, B, C, …  N example: expr, term, stmt Grammar symbols X, Y, Z  (N  T) Strings of terminals u, v, w, x, y, z  T* Strings of grammar symbols (sentential form) , ,   (N  T)* The head of the first production is the start symbol, unless stated.

Derivations The one-step derivation is defined by  A      where A   is a production in the grammar In addition, we define  is leftmost  lm if  does not contain a nonterminal  is rightmost  rm if  does not contain a nonterminal Transitive closure  * (zero or more steps) Positive closure  + (one or more steps)

Sentence and Language Sentence form Sentence Language If S *  in the grammar G, then  is a sentential form of G Sentence A sentential form of G has no nonterminals. Language The language generated by G is it’s set of sentences. The language generated by G is defined by L(G) = { w  T* | S * w } A language that can be generated by a grammar is said to be a Context-Free language. If two grammars generate the same language, the grammars are said to be equivalent.

Example G = < T, N, P, S> T = { +, *, (, ), -, id } N = { E } P = E  E + E E  E * E E  ( E ) E  - E E  id S = E E  lm – E  lm –(E)  lm –(E+E)  lm –(id +E)  lm –(id + id)

Ambiguity A grammar that produces more than one parse tree for some sentence is said to be ambiguous. Example id + id * id E → E + E | E * E | ( E ) | id E  E + E  id + E  id + E * E  id + id * E  id + id * id E  E * E  E + E * E  id + E * E  id + id * E  id + id * id

Grammar Classification A grammar G is said to be Regular (Type 3) right linear A  a B or A  a left linear A  B a or A  a Context free (Type 2)    where   N and   ( N  T )* Context sensitive (Type 1)  A      where A  N, , ,   (N  T )*, |  | > 0 Unrestricted (Type 0)    where ,   ( N  T )*, 

Language Classification The set of all languages generated by grammars G of type T L(T) = { L(G) | G is of type T } L(regular)  L(context free)  L(context sensitive)  L(unrestricted)

Example ( a | b )* a b b A0  aA0 | bA0 | aA1 A1  bA2 A2  bA3 A3 ε

Example L = { anbn | n  1 } is context free. path labeled aj-i f S0 Si … … path labeled ai path labeled bi ajbi can be accepted by D, but ajbi is not in the L. Finite automata cannot count. CFG can count two items but not three.

Writing a Predictive Parsing Grammar Eliminating Ambiguity Elimination of Left Recursion Left Factoring ( Elimination of Left Factor ) Compute FIRST and FOLLOW Two variants: Recursive (recursive calls) Non-recursive (table-driven)

Eliminating Ambiguity Dangling-else Grammar stmt  if expr then stmt | if expr then stmt else stmt | other if E1 then S1 else if E2 then S2 else S3

Eliminating Ambiguity(2) if E1 then if E2 then S1 else S2

Eliminating Ambiguity(3) Rewrite the dangling-else grammar stmt  matched_stmt | open_stmt matched_stmt  if expr then matched_stmt else matched_stmt | other open_stmt  if expr then stmt | if expr then matched_stmt else open_stmt

Elimination of Left Recursion Productions of the form A  A  |  are left recursive Non-left-recursions A   A’ A’   A’ | ε When one of the productions in a grammar is left recursive then a predictive parser loops forever on certain inputs

Immediate Left-Recursion Elimination Group the Productions as A  A1 | A2 | … | Am | 1 | 2 | … | n Where no i begins with an A Replace the A-Productions by A   1 A’ | 2 A’ | … | n A’ A’  1 A’ | 2 A’ | … | m A’ | ε

Example Left-recursive grammar Into a right-recursive production A  A  |  |  | A  Into a right-recursive production A   AR |  AR AR   AR |  AR | 

Non-Immediate Left-Recursion The Grammar S  A a | b A  A c | S d | ε The nonterminal S is left recursive, because S  A a  Sda But S is not immediately left recursive.

Elimination of Left Recursion Eliminating left recursion algorithm Arrange the nonterminals in some order A1, A2, …, An for (each i from 1 to n) { for (each j from 1 to i-1){ replace each production Ai  Aj with Ai  1 | 2 | … | k where Aj  1 | 2 | … | k } eliminate the immediate left recursion in Ai }

Example A  B C | a B  C A | A b C  A B | C C | a i = 1 nothing to do i = 2, j = 1 B  C A | A b  B  C A | B C b | a b (imm) B  C A BR | a b BR BR  C b BR |  i = 3, j = 1 C  A B | C C | a  C  B C B | a B | C C | a i = 3, j = 2 C  B C B | a B | C C | a C  C A BR C B | a b BR C B | a B | C C | a (imm)C  a b BR C B CR | a B CR | a CR CR  A BR C B CR | C CR | 

Exercise Answer The grammar A  A c | A a d | b d |  S  A a | b A  A c | S d | ε Answer A  A c | A a d | b d | 

Left Factoring Left Factoring is a grammar transformation. Predictive Parsing Top-down Parsing Replace productions A   1 |  2 | … |  n |  with A   AR |  AR  1 | 2 | … | n

Example The Grammar stmt  if expr then stmt | if expr then stmt else stmt Replace with stmt  if expr then stmt stmts stmts  else stmt | ε

Exercise The following grammar Answer S  i E t S | i E t S e S | a E  b Answer S  i E t S S’ | a S’  e S | ε

Non-Context-Free Grammar Constructs A few syntactic constructs found in typical programming languages cannot be specified using grammars alone. Checking the identifiers are declared before they are used in a program. The abstract language is L1 = { wcw | w is in (a|b)* } aabcaab is in L1 and L1 is not CFG. C/C++ and Java does not distinguish among identifiers that are different character strings. All identifiers are represented by a token such as id in a grammar. In the semantic-analysis phase checks that identifiers are declared before they are used.

Top-Down Parsing LL methods and recursive-descent parsing Left-to-right, Leftmost derivation Creating the nodes of the parse tree in preorder ( depth-first ) Grammar E  T + T T  ( E ) T  - E T  id Leftmost derivation E lm T + T lm id + T lm id + id E T + id E T + id E E T +

recursive-descent parsing Top-down Parsing Give a Grammar G E → T E’ E’ → + T E’ | ε T → F T’ T’ → * F T’ | ε F → ( E ) | id LL recursive-descent parsing Predictive Parsing

FIRST and FOLLOW S A a β α c γ c is in FIRST(A) a is in FOLLOW(A)

FIRST and FOLLOW The constructed of both top-down and bottom-up parsers is aided by two functions, FIRST and FOLLOW, associated with a grammar G. During top-down parsing, FIRST and FOLLOW allow us to choose which production to apply. During panic-mode error recovery, sets of tokens produced by FOLLOW can be used as synchronizing tokens.

FIRST FIRST() The set of terminals that begin all strings derived from  FIRST(a) = { a } if a  T FIRST() = {  } FIRST(A) = A FIRST () for A  P FIRST(X1X2…Xk) = if   FIRST (Xj) for all j = 1, …, i-1 then add non- in FIRST(Xi) to FIRST(X1X2…Xk) if   FIRST (Xj) for all j = 1, …, k then add  to FIRST (X1X2…Xk)

FIRST(1) By definition of the FIRST, we can compute FIRST(X) If XT, then FIRST(X) = {X}. If XN, X→, then add  to FIRST(X). If XN, and X → Y1 Y2 . . . Yn, then add all non- elements of FIRST(Y1) to FIRST(X), if FIRST(Y1), then add all non- elements of FIRST(Y2) to FIRST(X), ..., if FIRST(Yn), then add  to FIRST(X).

FOLLOW FOLLOW(A) the set of terminals that can immediately follow nonterminal A FOLLOW(A) = for all (B   A )  P do add FIRST()-{} to FOLLOW(A) for all (B   A )  P and   FIRST() do add FOLLOW(B) to FOLLOW(A) for all (B   A)  P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol S then add $ to FOLLOW(A)

FOLLOW(1) By definition of the FOLLOW, we can compute FOLLOW(X) Put $ into FOLLOW(S). For each A B, add all non- elements of FIRST() to FOLLOW(B). For each A B or A B, where FIRST(), add all of FOLLOW(A) to FOLLOW(B).

Example FIRST E ( id E’ +  T T’ *  F Give a Grammar G FOLLOW E $ ) +  T T’ *  F Give a Grammar G E → T E’ E’ → + T E’ | ε T → F T’ T’ → * F T’ | ε F → ( E ) | id FOLLOW E $ ) E’ $ ) T + $ ) T’ + $ ) F * + $ )

Recursive Descent Parsing Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal’s syntactic category of input tokens When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information

Procedure in Recursive-Descent Parsing void A() { Choose an A-Production, AX1X2…Xk; for (i = 1 to k) { if ( Xi is a nonterminal) call procedure Xi(); else if ( Xi = current input symbol a ) advance the input to the next symbol; else /* an error has occurred */ }

Using FIRST and FOLLOW to Write a Recursive Descent Parser rest() { if (lookahead in FIRST(+ term rest) ) { match(‘+’); term(); rest() } else if (lookahead in FIRST(- term rest) ) { match(‘-’); term(); rest() } else if (lookahead in FOLLOW(rest) ) return else error() } expr  term rest rest  + term rest | - term rest |  term  id FIRST(+ term rest) = { + } FIRST(- term rest) = { - } FOLLOW(rest) = { $ }

LL(1)

LL(1) Grammar Predictive parsers, that is, recursive-descent parsers needing no backtracking, can be constructed for a class of grammars called LL(1) First “L” means the input from left to right. Second “L” means leftmost derivation. “1” for using one input symbol of lookahead at each step tp make parsing action decisions. No left-recursive. No ambiguous.

LL(1) A grammar G is LL(1) if it is not left recursive and for each collection of productions A  1 | 2 | … | n for nonterminal A the following holds: FIRST(i)  FIRST(j) =  for all i  j 如果交集不是空集合,會如何? if i *  then j *  for all i  j FIRST(j)  FOLLOW(A) =  for all i  j

Example Grammar Not LL(1) because: S  S a | a Left recursive S  a S | a FIRST(a S)  FIRST(a) ={a}  S  a R |  R  S |  For R: S *  and  *  S  a R a R  S |  For R: FIRST(S)  FOLLOW(R)  

Non-Recursive Predictive Parsing Table-Driven Parsing Given an LL(1) grammar G = <N, T, P, S> construct a table M[A,a] for A  N, a  T and use a driver program with a stack input a + b $ stack Predictive parsing program (driver) X Y Z $ output Parsing table M

Predictive Parsing Table Algorithm

Example A   FIRST() FOLLOW(A) E  T E’ ( id $ ) E’  + T E’ + E’    T  F T’ + $ ) T’  * F T’ * T’   F  ( E ) ( * + $ ) F  id id E  T E’ E’  + T E’ |  T  F T ’ T ’  * F T ’ |  F  ( E ) | id id + * ( ) $ E E  T E’ E’ E’  + T E’ E’   T T  F T’ T  F TR T’ T’   T’  * F TR F F  id F  ( E )

Exercise Give a Grammar G as below Calculate the FIRST and FOLLOW S  i E t S S’ S’  e S | ε E  b Calculate the FIRST and FOLLOW Create a predictive parsing table

Answer Ambiguous grammar S  i E t S S’ | a S’ e S |  E  b FIRST() FOLLOW(A) S  i E t S S’ i e $ S  a a S’  e S e S’    E  b b t Ambiguous grammar S  i E t S S’ | a S’ e S |  E  b Error: duplicate table entry a b e i t $ S S  a S  i E t S SR SR SR   SR  e S SR   E E  b

Predictive Parsing Algorithm Initially, w$ is in the input buffer and S is on top of the stack set ip to point to the first symbol of w; set X to the top stack symbol; while (X != $) { if (X is a) pop the stack and advance ip; else if ( X is a terminal) error(); else if ( M[X, a] is an error entry) error(); else if ( M[X, a] = X  Y1Y2…Yk ) { output the production X  Y1Y2…Yk pop the stack; push Yk , Yk-1 , ….Y1 onto the stack, with Y1 on top; }

Example MATCHED STACK INPUT ACTION E$ id + id * id$ TE’$ E  T E’ FT’E’$ T  F T ’ id T’E’$ F  id id T’E’$ + id * id$ match id E’$ T ’   +TE’$ E’  + T E ’ id + id * id$ match + T  F T’ id + id * id$ * FT’E’$ T’  * F T’ id + id * id$ match * id + id * id $ Match id T’   E’   Table-Driven Parsing E  T E’ E’  + T E’ |  T  F T ’ T ’  * F T ’ |  F  ( E ) | id

Panic Mode Recovery Add synchronizing actions to undefined entries based on FOLLOW FOLLOW(E) = { ) $ } FOLLOW(E’) = { ) $ } FOLLOW(T) = { + ) $ } FOLLOW(TR) = { + ) $ } FOLLOW(F) = { + * ) $ } Pro: Can be automated Cons: Error messages are needed id + * ( ) $ E E  T E’ synch E’ E’  + T E’ E’   v   T T  F T’ T  F TR T’ T’   T’  * F TR F F  id F  ( E ) synch: the driver pops current nonterminal A and skips input till synch token or skips input until one of FIRST(A) is found

Example STACK INPUT ACTION E$ ) id * + id $ error, skip ) id * + id $ id is in FIRST(E) TE’$ FT’E’$ id T’E’$ T’E’$ * + id $ * FT’E’$ + id $ error, M[F, +] = synch F has been popped E’$ +TE’$ id $ $ E  T E’ E’  + T E’ |  T  F T ’ T ’  * F T ’ |  F  ( E ) | id

Phrase-Level Recovery Change input stream by inserting missing tokens For example: id id is changed into id * id Pro: Can be automated Cons: Recovery not always intuitive (直覺) Can then continue here id + * ( ) $ E E  T E’ synch E’ E’  + T E’ E’   T T  F T’ T’ insert * T’   T’  * F T’ F F  id F  ( E ) insert *: driver inserts missing * and retries the production

Error Productions Add “error production”: T’  F T’ to ignore missing *, e.g.: id id E  T E’ E’  + T E’ |  T  F T’ T’  * F T’ |  F  ( E ) | id Pro: Powerful recovery method Cons: Cannot be automated id + * ( ) $ E E  T E’ synch E’ E’  + T E’ E’   T T  F T’ T’ T’  F T’ T’   T’  * F TR F F  id F  ( E )