Chapter 3 Chang Chi-Chung 2015.05.18. Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.

Slides:

Advertisements

Similar presentations

1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.

Advertisements

Lecture # 11 Grammar Problems.

Top-Down Parsing.

By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.

ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.

1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.

Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)

1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.

1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.

Professor Yihjia Tsai Tamkang University

Top-Down Parsing.

Chapter 2 Chang Chi-Chung rev.1. A Simple Syntax-Directed Translator This chapter contains introductory material to Chapters 3 to 8  To create.

1 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice Grammars and Parsing.

Chapter 4 Chang Chi-Chung

CPSC Compiler Tutorial 3 Parser. Parsing The syntax of most programming languages can be specified by a Context-free Grammar (CGF) Parsing: Given.

CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.

COP4020 Programming Languages Computing LL(1) parsing table Prof. Xin Yuan.

Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.

Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.

1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,

1 October 2, October 2, 2015October 2, 2015October 2, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.

Review: –How do we define a grammar (what are the components in a grammar)? –What is a context free grammar? –What is the language defined by a grammar?

Top-Down Parsing - recursive descent - predictive parsing

4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.

Chapter 5 Top-Down Parsing.

BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.

Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.

Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.

Topic #2: Infix to Postfix EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

1 Compiler Construction Syntax Analysis Top-down parsing.

Lesson 5 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.

1 Chapter 4 Grammars and Parsing. 2 Context-Free Grammars: Concepts and Notation A context-free grammar G = (Vt, Vn, S, P) –A finite terminal vocabulary.

4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.

11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.

COP4020 Programming Languages Parsing Prof. Xin Yuan.

Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University.

Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.

Lecture 3: Parsing CS 540 George Mason University.

1 Context free grammars  Terminals  Nonterminals  Start symbol  productions E --> E + T E --> E – T E --> T T --> T * F T --> T / F T --> F F --> (F)

Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.

1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Top-Down Parsing.

Syntax Analyzer (Parser)

1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.

CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.

Chapter 3 Chang Chi-Chung

Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.

Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.

1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )

1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman Summer 2004 (1425)

Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.

UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.

COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.

Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 due on Monday February 8 th Name and date your submission Submit electronically in Homework Server.

Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.

Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.

Parsing #1 Leonidas Fegaras.

Programming Languages Translator

Context free grammars Terminals Nonterminals Start symbol productions

Lecture #12 Parsing Types.

Syntax Analysis Part I Chapter 4

Lexical and Syntax Analysis

Syntax Analysis source program lexical analyzer tokens syntax analyzer

CSC 4181Compiler Construction Context-Free Grammars

R.Rajkumar Asst.Professor CSE

Syntax Analysis - Parsing

CSC 4181 Compiler Construction Context-Free Grammars

Compilers Principles, Techniques, & Tools Taught by Jing Zhang

Parsing CSCI 432 Computer Science Theory

Presentation transcript:

Chapter 3 Chang Chi-Chung

Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol Table getNextToken Rest of Front End

如何表示程式語言的文法 ?  使用 Context Free Grammar ，簡稱 CFG  CFG 比起 Regular Expression 更有威力 (powerful notation than RE)

Context-Free Grammar  Context-free grammar is a 4-tuple G = where  T is a finite set of tokens (terminal symbols)  N is a finite set of nonterminals  P is a finite set of productions of the form    where   N and   (N  T)*  S  N is a designated start symbol

Derivations  The one-step derivation is defined by  A      where A   is a production in the grammar  In addition, we define   is leftmost  lm if  does not contain a nonterminal   is rightmost  rm if  does not contain a nonterminal  Transitive closure  * (zero or more steps)  Positive closure  + (one or more steps)

Example of the Derivations  Leftmost derivation  replaces the leftmost nonterminal (underlined) in each step.  Rightmost derivation  replaces the rightmost nonterminal in each step. list  list + digit  list - digit + digit  digit - digit + digit  9 - digit + digit  digit  Production  list  list + digit  list  list – digit  list  digit  digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Example of the Parser Tree  Parse tree of the string using grammar G list digit list digit The sequence of leafs is called the yield of the parse tree

Sentence and Language  Sentential form  If S  *  in the grammar G, then  is a sentential form of G  Sentence  A sentential form of G has no nonterminals.  Language  The language generated by G is it ’ s set of sentences.  The language generated by G is defined by L ( G ) = { w  T* | S  * w }  A language that can be generated by a grammar is said to be a Context-Free language.  If two grammars generate the same language, the grammars are said to be equivalent.

An Example  Expr  ( Expr ) | Expr Op name | name  Op  + | - | x | /  Expr  Expr Op c  ( Expr ) Op c  (Expr Op b) Op c  ( a Op b ) Op c  (a + b) Op c  (a + b) x c  (a + b) x c

Ambiguity  A grammar that produces more than one parse tree for some sentence is said to be ambiguous.  Example  id + id * id E  E + E  id + E  id + E * E  id + id * E  id + id * id E  E * E  E + E * E  id + E * E  id + id * E  id + id * id E → E + E | E * E | ( E ) | id

Example  Consider the following context-free grammar  This grammar is ambiguous, because more than one parse tree represents the string P = string  string + string | string - string | 0 | 1 | … | 9 G =

Example string

Ambiguity  Dangling-else Grammar stmt  if expr then stmt | if expr then stmt else stmt | other if E 1 then S 1 else if E 2 then S 2 else S 3

Eliminating Ambiguity(2) if E 1 then if E 2 then S 1 else S 2

Parsing  The process of determining if a string of terminals (tokens) can be generated by a grammar.  Time complexity:  For any CFG there is a parser that takes at most O(n 3 ) time to parse a string of n terminals.  Linear algorithms suffice to parse essentially all languages that arise in practice.  Two kinds of methods  Top-down: constructs a parse tree from root to leaves  Bottom-up: constructs a parse tree from leaves to root

兩種語法分析方式  Top-down Parsing  最左推導  不可以有左遞迴  不可以有左因子  明確性文法  Bottom-up Parsing  最右推導  不可以有右遞迴  不可以有右因子  明確性文法 CFG LR(1) LL(1) RG

Notational Conventions  Terminals  a, b, c, …  T  example: 0, 1, +, *, id, if  Nonterminals  A, B, C, …  N  example: expr, term, stmt  Grammar symbols  X, Y, Z  ( N  T )  Strings of terminals  u, v, w, x, y, z  T *  Strings of grammar symbols (sentential form)  , ,   (N  T)*  The head of the first production is the start symbol, unless stated.

Top-down Parsing  recursive-descent parsing  LL(1)  Left-to-right, Leftmost derivation  Creating the nodes of the parse tree in preorder ( depth-first ) Grammar E  T + T T  ( E ) T  - E T  id Leftmost derivation E  lm T + T  lm id + T  lm id + id E E T + T id E TT + E T + T

Recursive Descent Parsing  Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal ’ s syntactic category of input tokens  When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input lookahead information

Recursive Descent Parsing void A() { Choose an A -Production, A  X 1 X 2 …X k ; for (i = 1 to k) { if ( X i is a nonterminal) call procedure Xi(); else if ( X i = current input symbol a ) advance the input to the next symbol; else /* an error has occurred */ }

Conclusion: Parsing and Translation Scheme  Complete void term() throws IOException { if (Character.isDigit((char)lookahead){ System.out.write((char)lookahead); match(lookahead); } else throw new Error(“syntax error”); } void match(int t) throws IOException { if ( lookahead == t ) lookahead = System.in.read(); else throw new Error(“syntax error”); } } import java.io.*; class Parser { static int lookahead; public Parser() throws IOException { lookahead = System.in.read(); } void expr() { term(); while ( true ) { if ( lookahead == ‘+’ ) { match(‘+’); term(); System.out.write(‘+’); continue; } else if (lookahead == ‘-’) { match(‘-’); term(); System.out.write(‘-’); continue; } else return; }

LL(1)

LL(1) Grammar  Predictive parsers, that is, recursive-descent parsers needing no backtracking, can be constructed for a class of grammars called LL(1)  First “L” means the input from left to right.  Second “L” means leftmost derivation.  “1” for using one input symbol of lookahead at each step tp make parsing action decisions.  No left-recursive.  No ambiguous.

FIRST and FOLLOW S α A c γ a β c is in FIRST(A) a is in FOLLOW(A)

FIRST and FOLLOW  The constructed of both top-down and bottom- up parsers is aided by two functions, FIRST and FOLLOW, associated with a grammar G.  During top-down parsing, FIRST and FOLLOW allow us to choose which production to apply.  During panic-mode error recovery, sets of tokens produced by FOLLOW can be used as synchronizing tokens.

FIRST  FIRST(  )  The set of terminals that begin all strings derived from   FIRST (a) = { a } if a  T  FIRST (  ) = {  }  FIRST (A) =  A  FIRST (  ) for A   P  FIRST (X 1 X 2 … X k ) = if   FIRST (X j ) for all j = 1, …, i-1 then add non-  in FIRST(X i ) to FIRST(X 1 X 2 …X k ) if   FIRST (X j ) for all j = 1, …, k then add  to FIRST (X 1 X 2 …X k )

FIRST(1)  By definition of the FIRST, we can compute FIRST(X)  If X  T, then FIRST(X) = {X}.  If X  N, X→ , then add  to FIRST(X).  If X  N, and X → Y 1 Y 2... Y n, then add all non-  elements of FIRST(Y 1 ) to FIRST(X), if  FIRST(Y 1 ), then add all non-  elements of FIRST(Y 2 ) to FIRST(X),..., if  FIRST(Y n ), then add  to FIRST(X).

FOLLOW  FOLLOW( A )  the set of terminals that can immediately follow nonterminal A  FOLLOW(A) = for all (B   A  )  P do add FIRST(  )-{  } to FOLLOW(A) for all (B   A  )  P and   FIRST(  ) do add FOLLOW(B) to FOLLOW(A) for all (B   A)  P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol S then add $ to FOLLOW(A)

FOLLOW(1)  By definition of the FOLLOW, we can compute FOLLOW(X)  Put $ into FOLLOW(S).  For each A   B , add all non-  elements of FIRST(  ) to FOLLOW(B).  For each A   B or A   B , where  FIRST(  ), add all of FOLLOW(A) to FOLLOW(B).

Example  Give a Grammar G E → T E’ E’ → + T E’ | ε T → F T’ T’ → * F T’ | ε F → ( E ) | id FIRST E(id E’ ++ T(id T’ ** F(id FOLLOW E$ ) E’ $ ) T + $ ) T’ + $ ) F * + $ )

Using FIRST and FOLLOW to Write a Recursive Descent Parser expr  term rest rest  + term rest | - term rest |  term  id FIRST(+ term rest) = { + } FIRST(- term rest) = { - } FOLLOW(rest) = { $ } rest() { if (lookahead in FIRST(+ term rest) ) { match(‘+’); term(); rest() } else if (lookahead in FIRST(- term rest) ) { match(‘-’); term(); rest() } else if (lookahead in FOLLOW(rest) ) return else error() }