4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
YANGYANG 1 Chap 5 LL(1) Parsing LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying.
Mooly Sagiv and Roman Manevich School of Computer Science
LESSON 18.
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Parsing III (Eliminating left recursion, recursive descent parsing)
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial)
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Professor Yihjia Tsai Tamkang University
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Shift/Reduce and LR(1) Professor Yihjia Tsai Tamkang University.
1 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice Grammars and Parsing.
Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands.
Chapter 3 Chang Chi-Chung Parse tree intermediate representation The Role of the Parser Lexical Analyzer Parser Source Program Token Symbol.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Chapter 5 LL (1) Grammars and Parsers. 2 Naming of parsing techniques The way to parse token sequence L: Leftmost R: Righmost Top-down  LL Bottom-up.
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
1 Compiler Construction Syntax Analysis Top-down parsing.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Syntax and Semantics Structure of programming languages.
1 Chapter 4 Grammars and Parsing. 2 Context-Free Grammars: Concepts and Notation A context-free grammar G = (Vt, Vn, S, P) –A finite terminal vocabulary.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Syntax Analysis Mooly Sagiv Textbook:Modern Compiler Design Chapter 2.2 (Partial) 1.
Top Down Parsing - Part I Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
Chap. 4, Formal Grammars and Parsing J. H. Wang Oct. 19, 2015.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Top-Down Parsing.
1 Pertemuan 7 & 8 Syntax Analysis (Parsing) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
4. Bottom-up Parsing Chih-Hung Wang
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
7. Symbol Table Chih-Hung Wang Compilers References 1. C. N. Fischer and R. J. LeBlanc. Crafting a Compiler with C. Pearson Education Inc., D.
Syntax and Semantics Structure of programming languages.
Programming Languages Translator
CS510 Compiler Lecture 4.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 5, 09/25/2003 Prof. Roy Levow.
Table-driven parsing Parsing performed by a finite state machine.
4 (c) parsing.
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
3. Formal Grammars and and Top-down Parsing Chih-Hung Wang
5. Bottom-Up Parsing Chih-Hung Wang
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, (2 nd Ed. 2006) 1

2 Introduction Context-free Grammar The syntax of programming language constructs can be described by context-free grammar Important aspects A grammar serves to impose a structure on the linear sequence of tokens which is the program. Using techniques from the field of formal languages, a grammar can be employed to construct a parser for it automatically. Grammars aid programmers to write syntactically correct programs and provide answer to detailed questions about the syntax.

3 The role of the parser

Context-Free Grammars A context-free grammar(CFG) is a compact, finite representation of a language, defined by the following four components: A finite terminal alphabet Σ A finite non-terminal alphabet N A start symbol S  N A finite set of productions P 4

A Simple Expression Grammar 5

Leftmost Derivations A sentential form produced via a leftmost derivation is called a left sentential form. The production sequence discovered by a large class of parsers (the top-down parsers) is a leftmost derivation. Hence, these parsers are said to produce a leftmost parse. Example: f(V+V) 6 E  lm Prefix(E)  lm f(E)  lm f(V Tail)  lm f(V+E)  lm f(V+V Tail)  lm f(V+V)

Rightmost Derivations As a bottom-up parser discovers the productions that derive a given token sequence, it traces a rightmost derivation, but the productions are applied in reverse order. Called rightmost or canonical parse Example: f(V+V) 7 E  rm Prefix(E)  rm Prefix(V Tail)  rm Prefix(V+E)  rm Prefix(V+V Tail)  rm Prefix(V+V)  rm f(V+V)

Parse Tree It is rooted by the start symbol S Each node is either a grammar symbol or 8

Properties of CFGs The grammar may include useless symbols The grammar may allow multiple, distinct derivations (parse trees) for some input string. The grammar may include strings that do not belong in the language, or the grammar may exclude strings that are in the language. 9

Ambiguity (1) Some grammars allow a derived string to have two or more different parse trees (and thus a nonunique structure). Example: 1. Expr → Expr – Expr 2. | id This grammar allows two different parse tree for id - id - id. 10

Ambiguity (2) 11

Parsers and Recognizers Two approaches A parser is considered top-down if it generates a parse tree by starting at the root of the tree, expanding the tree by applying productions in a depth-first manner. The bottom-up parsers generate a parse tree by starting the tree’s leaves and working toward its root. 12

13 Two approaches of Parser Deterministic left-to-right top-down LL method Deterministic left-to-right bottom-up LR method Left-to-right The sequence of tokens is processed from left to right Deterministic No searching is involved: each token brings the parser one step closer to the goal of constructing the syntax tree

Parsers (Top- down) 14

15 Parsers (bottom-Up)

16 Pre-order and post-order (1) The top-down method constructs the syntax tree in pre- order The bottom-up method constructs the syntax tree in post- order

17 Pre-order and post-order (2)

18 Principles of top-down parsing The main task of a top-down parser is to choose the correct alternatives for known non-terminals

19 Principles of bottom-up parsing The main task of a bottom-up parser is to repeatedly find the first node all of whose children have already been constructed.

20 Creating a top-down parser manually Recursive descent parsing Simplest way but has its limitations

21 Recursive descent parsing program (1)

22 Recursive descent parsing program (2)

23 Drawbacks Three drawbacks There is still some searching through the alternatives The method often fails to produce a correct parser Error handling leaves much to be desired

24 Second problems (1) Example 1 Index_element will never be tried IDENTIFIER ‘ [ ‘

25 Second problems (2) Example 2 The recognizer will not recognize ab

26 Second problems (3) Example 3 Recursive descent parsers cannot handle left-recursive grammars

27 Creating a top-down parser automatically The principles of constructing a top-down parser automatically derive from those of writing one by hand, by applying precomputation. Grammars which allow the construction of a top-down parser to be performed are called LL(1) grammars.

28 LL(1) parsing FIRST set The sets of first tokens produced by all alternatives in the grammar. We have to precompute the FIRST sets of all non-terminals The first sets of the terminals are obvious. Finding FIRST(  ) is trivial when  starts with a terminal. FIRST(N) is the union of the FIRST sets of its alternatives. First(  )={a  Σ|   * a  }

29 Predictive recursive descent parser The FIRST sets can be used in the construction of a predictive parser because it predicts the presence of a given alternative without trying to find out if it is there.

30 Closure algorithm for computing the FIRST set (1) Data definitions

31 Closure algorithm for computing the FIRST set (2) Initializations

32 Closure algorithm for computing the FIRST set (3) Inference rules

33 FIRST sets example(1) Grammar

34 FIRST sets example(2) The initial FIRST sets

35 FIRST sets example(3) The final FIRST sets

Another Example of First Set 36

37 Another Example of First Set (II)

Algorithms of Computing First( α ) 38

39 The predictive parser (1)

40 The predictive parser (2)

41 Practice Find the FIRST sets of all alternative of the following grammar. E -> TE ’ E ’ ->+TE ’ |  T->FT ’ T ’ ->*FT ’ |  F->(E)| id

42 Nullable alternatives A complication arises with the case label for the empty alternative (ex. rest_expression). Since it does not itself start with any token, how can we decide whether it is the correct alternative?

43 FOLLOW sets Follow sets Determining the set of tokens that can immediately follow a given non-terminal N. LL(1) parser ‘ LL ’ because the parser works from Left to right identifying the nodes in what is called Leftmost derivation order. ‘ (1) ’ because all choices are based on a one token look-ahead. Follow(A)={b  Σ |S  +  Ab β}

44 Closure algorithm for computing the FOLLOW sets

45 The first and follow sets

Another Example of Follow Set 46

Another Example of Follow Set (II) 47

Algorithm of Follow(A) 48

49 Recall the predictive parser rest_expression  ‘+’ expression |  FIRST(rest_expr) = {‘+’,  } void rest_expression(void) { switch (Token.class) { case '+': token('+'); expression(); break; case EOF: case ')': break; default: error(); } FOLLOW(rest_expr) = {EOF, ‘)’}

50 LL(1) conflicts Example The codes

51 LL(1) conflicts FIRST/FIRST conflict term  IDENTIFIER | IDENTIFIER ‘ [ ‘ expression ‘ ] ’ | ‘ ( ’ expression ‘ ) ’

52 LL(1) conflicts FIRST/FOLLOW conflict FIRST set FOLLOW set S  A ‘ a ’ ‘ b ’ { ‘ a ’ } {} A  ‘ a ’ |  { ‘ a ’,  } { ‘ a ’ }

53 LL(1) conflicts left recursion expression  expression ‘ - ’ term | term Look-ahead token LL(1) method predicts the alternative A k for a non-terminal N FIRST(A k )  (if is nullable then FOLLOW(N)) LL(1) grammar No FIRST/FIRST conflicts No FIRST/FOLLOW conflicts No multiple nullable alternatives No non-terminal can have more than one nullable alternative.

54 Solve the LL(1) conflicts Two options Use a stronger parser Make the grammar LL(1)

55 Making a grammar LL(1) manual labour rewrite grammar adjust semantic actions three rewrite methods left factoring substitution left-recursion removal

56 Left-factoring term  IDENTIFIER | IDENTIFIER ‘ [ ‘ expression ‘ ] ’ factor out common prefix term  IDENTIFIER after_identifier after_identifier   | ‘ [ ‘ expression ‘ ] ’ ‘[’  FOLLOW(after_identifier)

57 Substitution A  a | B c |  S  p A q replace non-terminal by its alternative S  p a q | p B c q | p q Example S  A ‘ a ’ ‘ b ’ A  ‘ a ’ |  replace non-terminal by its alternative S  ‘ a ’ ‘ a ’ ‘ b ’ | ‘ a ’ ‘ b ’

58 Left-recursion removal Three types of left-recursion Direct left-recursion N  N  | … Indirect left-recursion Chain structure N  A … A  B … … Z  N … Hidden left-recursion N   N| … (  can produce  )

59 Left-recursion removal N  N  |  replace by N   M M   M |  example expression  expression ‘ - ’ term | term          ... expression  term expression_tail_option expression_tail_option  ‘-’ term expression_tail_option |  N 

60 Practice make the following grammar LL(1) expression  expression ‘ + ’ term | expression ‘ - ’ term | term term  term ‘ * ’ factor | term ‘ / ’ factor | factor factor  ‘ ( ‘ expression ‘ ) ’ | func-call | identifier | constant func-call  identifier ‘ ( ‘ expr-list? ‘ ) ’ expr-list  expression ( ‘, ’ expression)*

61 Answers substitution F  ‘ ( ‘ E ‘ ) ’ | ID ‘ ( ‘ expr-list? ‘ ) ’ | ID | constant left factoring E  E ( ‘ + ’ | ‘ - ’ ) T | T T  T ( ‘ * ’ | ‘ / ’ ) F | F F  ‘ ( ‘ E ‘ ) ’ | ID ( ‘ ( ‘ expr-list? ‘ ) ’ )? | constant left recursion removal E  T (( ‘ + ’ | ‘ - ’ ) T )* T  F (( ‘ * ’ | ‘ / ’ ) F )*

62 Undoing the semantic effects of grammar transformations While it is often possible to transform our grammar into a new grammar that is acceptable by a parser generator and that generates the same language, the new grammar usually assigns a different structure to strings in the language than our original grammar did Fortunately, in many cases we are not really interested in the structure but rather in the semantics implied by it.

63 Semantics Non-left-recursive equivalent

64 Automatic conflict resolution (1) There are two ways in which LL parsers can be strengthened By increasing the look-ahead Distinguishing alternatives not by their first token but by their first two tokens is called LL(2). Disadvantages: the parser code can get much bigger. By allowing dynamic conflict resolvers When the conflict arises during parsing, some of conditions are evaluated to solve it. The parser generator LLgen requires a conflict resolver to be placed on the first of two conflicting alternatives.

65 If-else statement in C else_tail_option: both FIRST set and FOLLOW set contain the token ‘ else ’ Conflict resolver Automatic conflict resolution (2)

66 The LL(1) push-down automation Transition table for an LL(1) parser

67 Push-down automation (PDA) Type of moves Prediction move Top of the prediction stack is a non-terminal N. N is removed from the stack Look up the prediction table Push the alternative of N into the prediction stack Match move Top of the prediction stack is a terminal Termination Parsing terminates when the prediction stack is exhausted.

68 Prediction move in an LL(1) PDA

69 Match move in an LL(1) PDA

70 Predictive parsing with an LL(1) PDA

71 PDA example (1) aap + ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression 

72 PDA example (2) aap + ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression  replace non-terminal by transition entry

73 PDA example (3) aap + ( noot + mies ) EOF expression EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression 

74 PDA example (4) aap + ( noot + mies ) EOF expression EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression  replace non-terminal by transition entry

75 PDA example (5) aap + ( noot + mies ) EOF term rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression 

76 PDA example (6) aap + ( noot + mies ) EOF term rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest- expr term IDENT( expression ) rest-expr + expression  replace non-terminal by transition entry

77 PDA example (7) Please continue!! Example of parsing (i+i)+i

Another Example (1) 78

Another Example (2) 79

LL Parser Table 80

Trace of an LL(1) Parse 81

Obtaining LL(1) Grammars Most LL(1) prediction conflicts can be grouped into two categories: common prefix and left recursion 82

Common Prefixes 83 Factoring method

Algorithm of Factoring 84

Left Recursion 85

Algorithm of Eliminating Left Recursion 86

87 LLgen LLgen is part of the Amsterdam Compiler Kit takes LL(1) grammar + semantic actions in C and generates a recursive descent parser The non-terminals in the grammar can have parameters, and rules can have local variables, both again expressed in C. LLgen features: repetition operators advanced error handling parameter passing control over semantic actions dynamic conflict resolvers

88 LLgen start from LR(1) grammar make grammar LL(1) use repetition operators %token DIGIT; main : [line]+ ; line : expr '\n' ; expr : term [ '+' term ]* ; term : factor [ '*' factor ]* ; factor : '(' expr ')‘ | DIGIT ; LLgen add semantic actions attach parameters to grammar rules insert C-code between the symbols

89 Minimal non-left-recursive grammar for expressions

90 LLgen code for a parser GrammarSemantics

91 LLgen code for a parser The code from previous page resides in a file called parser.g. LLgen converts the file to one called parser.c, which contains a recursive descent parser.

92 LLgen interface to lexical analyzer

93 LLgen interface to back-end LLgen handles syntax errors by inserting missing tokens and deleting unexpected tokens LLmessage() is invoked to notify the lexical analyzer