Syntax Analysis Part I Chapter 4

Slides:



Advertisements
Similar presentations
Lecture # 7 Chapter 4: Syntax Analysis. What is the job of Syntax Analysis? Syntax Analysis is also called Parsing or Hierarchical Analysis. A Parser.
Advertisements

By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Top-Down Parsing.
Chapter 4 Chang Chi-Chung
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
1 October 2, October 2, 2015October 2, 2015October 2, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
1 Compiler Construction Syntax Analysis Top-down parsing.
Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CSI 3120, Syntactic analysis, page 1 Syntactic Analysis and Parsing Based on A. V. Aho, R. Sethi and J. D. Ullman Compilers: Principles, Techniques and.
Joey Paquet, 2000, Lecture 5 Error Recovery Techniques in Top-Down Predictive Syntactic Analysis.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Syntax Analyzer (Parser)
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Syntax Analysis Part II Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
Error recovery in predictive parsing An error is detected during the predictive parsing when the terminal on top of the stack does not match the next input.
Problems with Top Down Parsing
Parsing #1 Leonidas Fegaras.
lec02-parserCFG May 8, 2018 Syntax Analyzer
Compiler Construction
Programming Languages Translator
CS510 Compiler Lecture 4.
Bottom-Up Parsing.
Lexical and Syntax Analysis
Compiler Baojian Hua LR Parsing Compiler Baojian Hua
The role of parser Lexical Analyzer Parser Rest of Front End Symbol
Lecture #12 Parsing Types.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Table-driven parsing Parsing performed by a finite state machine.
Parsing.
Introduction to Top Down Parser
Top-down parsing cannot be performed on left recursive grammars.
Syntax Analysis Chapter 4.
SYNTAX ANALYSIS (PARSING).
CS 404 Introduction to Compiler Design
Syntax Analysis Part II
4 (c) parsing.
Syntax Analysis Sections :.
Subject Name:COMPILER DESIGN Subject Code:10CS63
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Subject Name:COMPILER DESIGN Subject Code:10CS63
Lecture 7 Predictive Parsing
CS 540 George Mason University
Syntax Analysis source program lexical analyzer tokens syntax analyzer
Compiler Design 7. Top-Down Table-Driven Parsing
Top-Down Parsing The parse tree is created top to bottom.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Chapter 4 Top Down Parser.
Bottom Up Parsing.
R.Rajkumar Asst.Professor CSE
Designing a Predictive Parser
Computing Follow(A) : All Non-Terminals
Syntax Analysis - Parsing
Lecture 7 Predictive Parsing
Chapter 3 Syntactic Analysis I.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Predictive Parsing Program
lec02-parserCFG May 27, 2019 Syntax Analyzer
Presentation transcript:

Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005

Position of a Parser in the Compiler Model Token, tokenval Lexical Analyzer Parser and rest of front-end Source Program Intermediate representation Get next token Lexical error Syntax error Semantic error Symbol Table

The Parser The task of the parser is to check syntax The syntax-directed translation stage in the compiler’s front-end checks static semantics and produces an intermediate representation (IR) of the source program Abstract syntax trees (ASTs) Control-flow graphs (CFGs) with triples, three-address code, or register transfer lists WHIRL (SGI Pro64 compiler) has 5 IR levels!

Error Handling A good compiler should assist in identifying and locating errors Lexical errors: important, compiler can easily recover and continue Syntax errors: most important for compiler, can almost always recover Static semantic errors: important, can sometimes recover Dynamic semantic errors: hard or impossible to detect at compile time, runtime checks are required Logical errors: hard or impossible to detect

Viable-Prefix Property The viable-prefix property of LL/LR parsers allows early detection of syntax errors Goal: detection of an error as soon as possible without consuming unnecessary input How: detect an error as soon as the prefix of the input does not match a prefix of any string in the language Error is detected here Error is detected here … for (;) … … DO 10 I = 1;0 … Prefix Prefix

Error Recovery Strategies Panic mode Discard input until a token in a set of designated synchronizing tokens is found Phrase-level recovery Perform local correction on the input to repair the error Error productions Augment grammar with productions for erroneous constructs Global correction Choose a minimal sequence of changes to obtain a global least-cost correction

Grammars (Recap) Context-free grammar is a 4-tuple G=(N,T,P,S) where T is a finite set of tokens (terminal symbols) N is a finite set of nonterminals P is a finite set of productions of the form    where   (NT)* N (NT)* and   (NT)* S is a designated start symbol S  N

Notational Conventions Used Terminals a,b,c,…  T specific terminals: 0, 1, id, + Nonterminals A,B,C,…  N specific nonterminals: expr, term, stmt Grammar symbols X,Y,Z  (NT) Strings of terminals u,v,w,x,y,z  T* Strings of grammar symbols ,,  (NT)*

Derivations (Recap) The one-step derivation is defined by  A      where A   is a production in the grammar In addition, we define  is leftmost lm if  does not contain a nonterminal  is rightmost rm if  does not contain a nonterminal Transitive closure * (zero or more steps) Positive closure + (one or more steps) The language generated by G is defined by L(G) = {w | S + w}

Derivation (Example) E  E + E E  E * E E  ( E ) E  - E E  id E rm E + E rm E + id rm id + id E * E E + id * id + id

Chomsky Hierarchy: Language Classification A grammar G is said to be Regular if it is right linear where each production is of the form A  w B or A  w or left linear where each production is of the form A  B w or A  w Context free if each production is of the form A   where A  N and   (NT)* Context sensitive if each production is of the form  A      where A  N, ,,  (NT)*, || > 0 Unrestricted

Chomsky Hierarchy L(regular)  L(context free)  L(context sensitive)  L(unrestricted) Where L(T) = { L(G) | G is of type T } That is, the set of all languages generated by grammars G of type T Examples: Every finite language is regular L1 = { anbn | n  1 } is context free L2 = { anbncn | n  1 } is context sensitive

Parsing Universal (any C-F grammar) Cocke-Younger-Kasimi Earley Top-down (C-F grammar with restrictions) Recursive descent (predictive parsing) LL (Left-to-right, Leftmost derivation) methods Bottom-up (C-F grammar with restrictions) Operator precedence parsing LR (Left-to-right, Rightmost derivation) methods SLR, canonical LR, LALR

Top-Down Parsing LL methods (Left-to-right, Leftmost derivation) and recursive-descent parsing Grammar: E  T + T T  ( E ) T  - E T  id Leftmost derivation: E lm T + T lm id + T lm id + id E E E E T T T T T T + id + id + id

Left Recursion (Recap) Productions of the form A  A  |  |  are left recursive When one of the productions in a grammar is left recursive then a predictive parser may loop forever

General Left Recursion Elimination Arrange the nonterminals in some order A1, A2, …, An for i = 1, …, n do for j = 1, …, i-1 do replace each Ai  Aj  with Ai  1  | 2  | … | k  where Aj  1 | 2 | … | k enddo eliminate the immediate left recursion in Ai enddo

Immediate Left-Recursion Elimination Rewrite every left-recursive production A  A  |  |  | A  into a right-recursive production: A   AR |  AR AR   AR |  AR | 

Example Left Rec. Elimination A  B C | a B  C A | A b C  A B | C C | a Choose arrangement: A, B, C i = 1: nothing to do i = 2, j = 1: B  C A | A b  B  C A | B C b | a b (imm) B  C A BR | a b BR BR  C b BR |  i = 3, j = 1: C  A B | C C | a  C  B C B | a B | C C | a i = 3, j = 2: C  B C B | a B | C C | a  C  C A BR C B | a b BR C B | a B | C C | a (imm) C  a b BR C B CR | a B CR | a CR CR  A BR C B CR | C CR | 

Left Factoring When a nonterminal has two or more productions whose right-hand sides start with the same grammar symbols, the grammar is not LL(1) and cannot be used for predictive parsing Replace productions A   1 |  2 | … |  n |  with A   AR |  AR  1 | 2 | … | n

Predictive Parsing Eliminate left recursion from grammar Left factor the grammar Compute FIRST and FOLLOW Two variants: Recursive (recursive calls) Non-recursive (table-driven)

FIRST (Revisited) FIRST() = the set of terminals that begin all strings derived from  FIRST(a) = {a} if a  T FIRST() = {} FIRST(A) = A FIRST() for A  P FIRST(X1X2…Xk) = if for all j = 1, …, i-1 :   FIRST(Xj) then add non- in FIRST(Xi) to FIRST(X1X2…Xk) if for all j = 1, …, k :   FIRST(Xj) then add  to FIRST(X1X2…Xk)

FOLLOW FOLLOW(A) = the set of terminals that can immediately follow nonterminal A FOLLOW(A) = for all (B   A )  P do add FIRST()\{} to FOLLOW(A) for all (B   A )  P and   FIRST() do add FOLLOW(B) to FOLLOW(A) for all (B   A)  P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol S then add $ to FOLLOW(A)

LL(1) Grammar A grammar G is LL(1) if for each collections of productions A  1 | 2 | … | n for nonterminal A the following holds: 1. FIRST(i)  FIRST(j) =  for all i  j 2. if i *  then 2.a. j *  for all i  j 2.b. FIRST(j)  FOLLOW(A) =  for all i  j

Non-LL(1) Examples Grammar Not LL(1) because S  S a | a Left recursive S  a S | a FIRST(a S)  FIRST(a)   S  a R |  R  S |  For R: S *  and  *  S  a R a R  S |  For R: FIRST(S)  FOLLOW(R)  

Recursive Descent Parsing Grammar must be LL(1) Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal’s syntactic category of input tokens When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information

Using FIRST and FOLLOW to Write a Recursive Descent Parser procedure rest(); begin if lookahead in FIRST(+ term rest) then match(‘+’); term(); rest() else if lookahead in FIRST(- term rest) then match(‘-’); term(); rest() else if lookahead in FOLLOW(rest) then return else error() end; expr  term rest rest  + term rest | - term rest |  term  id FIRST(+ term rest) = { + } FIRST(- term rest) = { - } FOLLOW(rest) = { $ }

Non-Recursive Predictive Parsing Given an LL(1) grammar G=(N,T,P,S) construct a table M[A,a] for A  N, a  T and use a driver program with a stack input a + b $ stack Predictive parsing program (driver) output X Y Z $ Parsing table M

Constructing a Predictive Parsing Table for each production A   do for each a  FIRST() do add A   to M[A,a] enddo if   FIRST() then for each b  FOLLOW(A) do add A   to M[A,b] enddo endif enddo Mark each undefined entry in M error

Example Table A   FIRST() FOLLOW(A) E  T ER ( id $ ) ER  + T ER + ER    T  F TR + $ ) TR  * F TR * TR   F  ( E ) ( * + $ ) F  id id E  T ER ER  + T ER |  T  F TR TR  * F TR |  F  ( E ) | id id + * ( ) $ E E  T ER ER ER  + T ER ER   T T  F TR TR TR   TR  * F TR F F  id F  ( E )

LL(1) Grammars are Unambiguous Ambiguous grammar S  i E t S SR | a SR  e S |  E  b A   FIRST() FOLLOW(A) S  i E t S SR i e $ S  a a SR  e S e SR    E  b b t Error: duplicate table entry a b e i t $ S S  a S  i E t S SR SR SR   SR  e S SR   E E  b

Predictive Parsing Program (Driver) push($) push(S) a := lookahead repeat X := pop() if X is a terminal or X = $ then match(X) // move to next token, a := lookahead else if M[X,a] = X  Y1Y2…Yk then push(Yk, Yk-1, …, Y2, Y1) // such that Y1 is on top produce output and/or invoke actions else error() endif until X = $

Example Table-Driven Parsing Stack $E $ERT $ERTRF $ERTRid $ERTR $ER $ERT+ $ERT $ERTRF $ERTRid $ERTR $ERTRF* $ERTRF $ERTRid $ERTR $ER $ Input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ +id*id$ +id*id$ id*id$ id*id$ id*id$ *id$ *id$ id$ id$ $ $ $ Production applied E  T ER T  F TR F  id TR   ER  + T ER T  F TR F  id TR  * F TR F  id TR   ER  

Panic Mode Recovery FOLLOW(E) = { $ ) } FOLLOW(ER) = { $ ) } FOLLOW(T) = { + $ ) } FOLLOW(TR) = { + $ ) } FOLLOW(F) = { * + $ ) } Add synchronizing actions to undefined entries based on FOLLOW id + * ( ) $ E E  T ER synch ER ER  + T ER ER   T T  F TR TR TR   TR  * F TR F F  id F  ( E ) synch: pop A and skip input till synch token or skip until FIRST(A) found

Phrase-Level Recovery Change input stream by inserting missing * For example: id id is changed into id * id id + * ( ) $ E E  T ER synch ER ER  + T ER ER   T T  F TR TR insert * TR   TR  * F TR F F  id F  ( E ) insert *: insert missing * and redo the production

Error Productions E  T ER ER  + T ER |  T  F TR TR  * F TR |  F  ( E ) | id Add error production: TR  F TR to ignore missing *, e.g.: id id id + * ( ) $ E E  T ER synch ER ER  + T ER ER   T T  F TR TR TR  F TR TR   TR  * F TR F F  id F  ( E )