COMPILER CONSTRUCTION

Slides:



Advertisements
Similar presentations
Parsing 4 Dr William Harrison Fall 2008
Advertisements

A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Chapter 2 Syntax. Syntax The syntax of a programming language specifies the structure of the language The lexical structure specifies how words can be.
Session 14 (DM62) / 15 (DM63) Recursive Descendent Parsing.
1 Contents Introduction A Simple Compiler Scanning – Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Lex and yacc Semantic Processing.
Context-Free Grammars Lecture 7
Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.
Syntax and Semantics Structure of programming languages.
Parsing Chapter 4 Parsing2 Outline Top-down v.s. Bottom-up Top-down parsing Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First.
Parsing Jaruloj Chongstitvatana Department of Mathematics and Computer Science Chulalongkorn University.
Context-Free Grammars and Parsing
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lexical and Syntax Analysis
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Syntax and Semantics Structure of programming languages.
COP4020 Programming Languages Parsing Prof. Xin Yuan.
Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University.
Compiler Principle and Technology Prof. Dongming LU Mar. 27th, 2015.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
Compiler Principle and Technology Prof. Dongming LU Mar. 12th, 2014.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Top-Down Parsing.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
COMPILER CONSTRUCTION
Syntax and Semantics Structure of programming languages.
CSE 3302 Programming Languages
Chapter 3 – Describing Syntax
COMPILER CONSTRUCTION
4.1 Introduction - Language implementation systems must analyze
Chapter 4 - Parsing CSCE 343.
Programming Languages Translator
CS510 Compiler Lecture 4.
Chapter 4 Lexical and Syntax Analysis.
Lexical and Syntax Analysis
Lecture #12 Parsing Types.
Chapter 3 Context-Free Grammar and Parsing
Syntax Specification and Analysis
Parsing.
Top-down parsing cannot be performed on left recursive grammars.
Syntax Analysis Chapter 4.
CS 404 Introduction to Compiler Design
Chapter 4 Top-Down Parsing Part-1 September 8, 2018
Compiler Construction (CS-636)
4 (c) parsing.
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Lexical and Syntactic Analysis
CSC 4181 Compiler Construction Parsing
CSE 3302 Programming Languages
Lecture 8 Bottom Up Parsing
R.Rajkumar Asst.Professor CSE
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
Designing a Predictive Parser
LL and Recursive-Descent Parsing
Chapter 4: Lexical and Syntax Analysis Sangho Ha
Syntax Analysis - Parsing
Lexical and Syntax Analysis
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Lexical and Syntax Analysis
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
Lexical and Syntax Analysis
COMPILER CONSTRUCTION
4.1 Introduction - Language implementation systems must analyze
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

COMPILER CONSTRUCTION Principles and Practice Kenneth C. Louden

4. Top-Dowm Parsing PART ONE

The outline of this chapter

Concept of Top-Down Parsing(1) It parses an input string of tokens by tracing out the steps in a leftmost derivation. And the implied traversal of the parse tree is a preorder traversal and, thus, occurs from the root to the leaves. The example: number + number, and corresponds to the parse tree exp op + number

Concept of Top-Down Parsing(2) The example: number + number, and corresponds to the parse tree The above parse tree is corresponds to the leftmost derivations: (1) exp => exp op exp (2) => number op exp (3) => number + exp (4) => number + number exp op + number

Two forms of Top-Down Parsers Predictive parsers: attempts to predict the next construction in the input string using one or more look-ahead tokens Backtracking parsers: try different possibilities for a parse of the input, backing up an arbitrary amount in the input if one possibility fails. It is more powerful but much slower, unsuitable for practical compilers.

Two kinds of Top-Down parsing algorithms Recursive-descent parsing: is quite versatile and suitable for a handwritten parser. LL(1) parsing: The first “L” refers to the fact that it processes the input from left to right; The second “L” refers to the fact that it traces out a leftmost derivation for the input string; The number “1” means that it uses only one symbol of input to predict the direction of the parse.

Other Contents Look-Ahead Sets A TINY Parser Error recovery methods First and Follow sets: are required by both recursive-descent parsing and LL(1) parsing. A TINY Parser It is constructed by recursive-descent parsing algorithm. Error recovery methods The error recovery methods used in Top-Down parsing will be described.

Contents PART ONE 4.1 Top-Down Parsing by Recursive-Descent [More] 4.2 LL(1) Parsing [More] PART TWO 4.3 First and Follow Sets 4.4 A Recursive-Descent Parser for the TINY Language 4.5 Error Recovery in Top-Down Parsers

4.1 Top-Down Parsing by Recursive-Descent

4.1.1 The Basic Method of Recursive-Descent

The idea of Recursive-Descent Parsing Viewing the grammar rule for a non-terminal A as a definition for a procedure to recognize an A The right-hand side of the grammar for A specifies the structure of the code for this procedure The Expression Grammar: exp → exp addop term∣term addop → + ∣- term → term mulop factor ∣ factor mulop →* factor →(exp) ∣ number

A recursive-descent procedure that recognizes a factor procedure factor begin case token of ( : match( ( ); exp; match( )); number: match (number); else error; end case; end factor The token keeps the current next token in the input (one symbol of look-ahead) The Match procedure matches the current next token with its parameters, advances the input if it succeeds, and declares error if it does not

Match Procedure The Match procedure matches the current next token with its parameters, advances the input if it succeeds, and declares error if it does not procedure match( expectedToken); begin if token = expectedToken then getToken; else error; end if; end match

Requiring the Use of EBNF The corresponding EBNF is exp  term { addop term } addop + | - term  factor { mulop factor } mulop * factor  ( exp ) | numberr Writing recursive-decent procedure for the remaining rules in the expression grammar is not as easy for factor

The corresponding syntax diagrams addop + - exp term addop term factor mulop * mulop factor ( exp ) number

4.1.2 Repetition and Choice: Using EBNF

An Example The grammar rule for an if-statement: procedure ifstmt; begin match( if ); match( ( ); exp; match( ) ); statement; if token = else then match (else); end if; end ifstmt; The grammar rule for an if-statement: If-stmt → if ( exp ) statement ∣ if ( exp ) statement else statement Could not immediately distinguish the two choices because the both start with the token if Put off the decision until we see the token else in the input

The EBNF of the if-statement If-stmt → if ( exp ) statement [ else statement] Square brackets of the EBNF are translated into a test in the code for ifstmt. if token = else then match (else); statement; endif; Notes EBNF notation is designed to mirror closely the actual code of a recursive-descent parser, So a grammar should always be translated into EBNF if recursive-descent is to be used. It is natural to write a parser that matches each else token as soon as it is encountered in the input

EBNF for Simple Arithmetic Grammar(1) The EBNF rule for exp → exp addop term∣term exp → term {addop term} Where, the curly bracket expressing repetition can be translated into the code for a loop: procedure exp; begin term; while token = + or token = - do match(token); end while; end exp;

EBNF for Simple Arithmetic Grammar(2) The EBNF rule for term: term → factor {mulop factor} Becomes the code procedure term; begin factor; while token = * do match(token); end while; end exp;

Left associatively implied by the curly bracket The left associatively implied by the curly bracket (and explicit in the original BNF) can still be maintained within this code function exp: integer; var temp: integer; begin temp:=term; while token=+ or token = - do case token of + : match(+); temp:=temp+term; -:match(-); temp:=temp-term; end case; end while; return temp; end exp;

A working simple calculator in C code(1) /*Simple integer arithmetic calculator according to the EBNF; <exp> → <term> { <addop> <term>} <addop> → + ∣ - <term>→ <factor> { <mulop> <factor> } <mulop> → * <factor> → ( <exp> ) ∣ Number inputs a line of text from stdin outputs “error” or the result. */

A working simple calculator in C code(2) #include <stdio.h> char token; /* global token variable */ /*function prototype for recursive calls*/ int exp(void); int term(void); int factor(void); void error(void) {fprint(stderr, “error\n”); exit(1); }

A working simple calculator in C code(3) void match(char expectedToken) {if (token==expectedToken) token=getchar(); else error(); } main() { int result; token=getchar();/*load token with first character for lookahead*/ result=exp(); if (token==’\n’) /*check for end of line*/ printf(“Result = %d\n”, result); else error(); /*extraneous chars on line*/ return 0;

A working simple calculator in C code(4) int exp(void) { int temp =term(); while ((token==’+’) || token==’-‘)) switch (token) { case ‘+’: match (‘+’); temp+=term(); break; case ‘-‘: match (‘-‘); temp-=term(); } return temp;

A working simple calculator in C code(5) int term(void) {int temp=factor(); while (token==’*’){ match(‘*’); temp*=factor(); } return temp;

A working simple calculator in C code(5) int factor(void) { int temp; if (token==’(‘) { match (‘(‘); temp = exp(); match(‘)’); } else if (isdigit(token)){ ungetc(token,stdin); scanf(“%d”,&temp); token = getchar(); else error(); return temp;

Some Notes The method of turning grammar rule in EBNF into code is quite powerful. There are a few pitfalls, and care must be taken in scheduling the actions within the code. In the previous pseudo-code for exp: (1) The match of operation should be before repeated calls to term; (2) The global token variable must be set before the parse begins; (3) The getToken must be called just after a successful test of a token

Construction of the syntax tree The expression: 3+4+5 + 5 3 4

The pseudo-code for constructing the syntax tree(1) function exp : syntaxTree; var temp, newtemp: syntaxTree; begin temp:=term; while token=+ or token = - do case token of + : match(+); newtemp:=makeOpNode(+); leftChild(newtemp):=temp; rightChild(newtemp):=term; temp=newtemp;

The pseudo-code for constructing the syntax tree(2) -:match(-); newtemp:=makeOpNode(-); leftChild(newtemp):=temp; rightChild(newtemp):=term; temp=newtemp; end case; end while; return temp; end exp;

A simpler one function exp : syntaxTree; var temp, newtemp: syntaxTree; begin temp:=term; while token=+ or token = - do newtemp:=makeOpNode(token); match(token); leftChild(newtemp):=temp; rightChild(newtemp):=term; temp=newtemp; end while; return temp; end exp;

The pseudo-code for the if-statement procedure (1) function ifstatement: syntaxTree; var temp:syntaxTree; begin match(if); match((); temp:= makeStmtNode(if); testChild(temp):=exp; match()); thenChild(temp):=statement;

The pseudo-code for the if-statement procedure (2) if token= else then match(else); elseChild(temp):=statement; else ElseChild(temp):=nil; end if; end ifstatement

4.1.3 Further Decision Problems

More formal methods to deal with complex situation (1) It may be difficult to convert a grammar in BNF into EBNF form; (2) It is difficult to decide when to use the choice A →αand the choice A →β; if both α andβ begin with non-terminals. Such a decision problem requires the computation of the First Sets.

More formal methods to deal with complex situation (3) It may be necessary to know what token legally coming from the non-terminal A, in writing the code for an ε-production: A→ε. Such tokens indicate A may disappear at this point in the parse. This set is called the Follow Set of A. (4) It requires computing the First and Follow sets in order to detect the errors as early as possible. Such as “)3-2)”, the parse will descend from exp to term to factor before an error is reported.

4.2 LL(1) PARSING

4.2.1 The Basic Method of LL(1) Parsing

Main idea LL(1) Parsing uses an explicit stack rather than recursive calls to perform a parse An example: a simple grammar for the strings of balanced parentheses: S→(S) S∣ε The following table shows the actions of a top-down parser given this grammar and the string ( )

Table of Actions Steps Parsing Stack Input Action 1 $S ( ) $ S→(S) S 2 match 3 $S)S )$ S→ε 4 $S) 5 $ 6 accept

General Schematic A top-down parser begins by pushing the start symbol onto the stack It accepts an input string if, after a series of actions, the stack and the input become empty A general schematic for a successful top-down parse: $ StartSymbol Inputstring$ … … //one of the two actions … … //one of the two actions $ $ accept

Two Actions The two actions Generate: Replace a non-terminal A at the top of the stack by a string α(in reverse) using a grammar rule A →α, and Match: Match a token on top of the stack with the next input token. The list of generating actions in the above table: S => (S)S [S→(S) S] => ( )S [S→ε] => ( ) [S→ε] Which corresponds precisely to the steps in a leftmost derivation of string ( ). This is the characteristic of top-down parsing.

4.2.2 The LL(1) Parsing Table and Algorithm

4.2.3 Left Recursion Removal and Left Factoring

4.2.4 Syntax Tree Construction in LL(1) Parsing

End of Part One THANKS