Compiler Design BMZ 1 Chapter4: Syntax Analysis. Compiler Design BMZ 2 Syntax Analysis Source Program Target Program Semantic Analyser Intermediate Code.

Slides:



Advertisements
Similar presentations
Chapter 3 Syntax Analysis
Advertisements

Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
ISBN Chapter 3 Describing Syntax and Semantics.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
Context-Free Grammars Lecture 7
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Compiler Constreuction 1 Chapter 4 Syntax Analysis Topics to cover: Context-Free Grammars: Concepts and Notation Writing and rewriting a grammar Syntax.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
1 CMPSC 160 Translation of Programming Languages Fall 2002 slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Lecture-Module #5 Introduction.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 Fundamentals (Chapter 4) Compilers and Syntax.
COP4020 Programming Languages
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
Parser construction tools: YACC
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
1 Syntax and Semantics The Purpose of Syntax Problem of Describing Syntax Formal Methods of Describing Syntax Derivations and Parse Trees Sebesta Chapter.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
CS 355 – PROGRAMMING LANGUAGES Dr. X. Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
Grammars CPSC 5135.
PART I: overview material
3-1 Chapter 3: Describing Syntax and Semantics Introduction Terminology Formal Methods of Describing Syntax Attribute Grammars – Static Semantics Describing.
ISBN Chapter 3 Describing Syntax and Semantics.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Chapter 4. Syntax Analysis (1). 2 Application of a production  A  in a derivation step  i   i+1.
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
作者 : 陳鍾誠 單位 : 金門技術學院資管系 URL : 日期 : 2016/6/4 程式語言的語法 Grammar.
Introduction to Parsing
CPS 506 Comparative Programming Languages Syntax Specification.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
ISBN Chapter 3 Describing Syntax and Semantics.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Syntax Analysis – Part I EECS 483 – Lecture 4 University of Michigan Monday, September 17, 2006.
Syntax Analyzer (Parser)
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Syntax.
COMP 3438 – Part II - Lecture 4 Syntax Analysis I Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:
Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.
Introduction to Parsing
CS 3304 Comparative Languages
Chapter 3: Describing Syntax and Semantics
Chapter 3 – Describing Syntax
LESSON 16.
Introduction to Parsing
Parsing & Context-Free Grammars
CS510 Compiler Lecture 4.
Introduction to Parsing (adapted from CS 164 at Berkeley)
Chapter 3 – Describing Syntax
Lexical Analysis & Syntactic Analysis
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
CS 3304 Comparative Languages
BNF 9-Apr-19.
COMPILER CONSTRUCTION
Presentation transcript:

Compiler Design BMZ 1 Chapter4: Syntax Analysis

Compiler Design BMZ 2 Syntax Analysis Source Program Target Program Semantic Analyser Intermediate Code Generator Code Optimiser Code Generator Syntax Analyser Lexical Analyser Symbol Table Manager Error Handler  

Compiler Design BMZ 3 Where is Syntax Analysis? Lexical Analysis or Scanner if (b == 0) a = b; if(b==0)a=b; Syntax Analysis or Parsing if === b0ab abstract syntax tree or parse tree

Compiler Design BMZ 4 The Role of the Parser source program Lexical Analyzer Parser symbol table token get next token syntax tree Semantic Analyzer intermediate representation p 160

Compiler Design BMZ 5 Declaration Section User types: As in flex, these are in a section bracketed by “%{“ and “%}” Tokens – terminal symbols of the grammar –%token terminal1 terminal2... Values for tokens assigned sequentially after all ASCII characters –or %token terminal1 val1 terminal2 val2... Tip – Use ‘-d’ option in bison to get foo.tab.h that contains the token definitions that can be included in the flex file

Compiler Design BMZ 6 Declaration continued Start symbol –%start non-terminal Associativity – (left, right or none) –%leftTK_PLUS –%rightTK_EXPONENT –%nonassocTK_LESSTHAN Precedence –Order of the directives specifies precedence –%prec changes the precedence of a rule

Compiler Design BMZ 7 Declaration continued Attribute values – information associated with all terminal/non-terminal symbols – passed from the lexer –%union { int ival; char *name; double dval; –} –Becomes YYSTYPE Symbol attributes – types of non-terminals –%type non_terminal –%type IntNumber

Compiler Design BMZ 8 Values Used by yyparse() Error function –yyerror(char *s); Last token value –yylval of type YYSTYPE (%union decl) Setting yylval in flex –[a-z]{yylval.ival = yytext[0] – ‘a’; return TK_NAME;} Then, yylval is available in bison –But in a strange way

Compiler Design BMZ 9 Rules Section Every name appearing that has not been declared is a non-terminal Productions –non-terminal : first_production | second_production |... ; –  production has the form non-terminal : ; Thus you can say, foo: production1 | /* nothing*/ ; –Adding actions non-terminal : RHS {action routine} ; Action called before LHS is pushed on parse stack

Compiler Design BMZ 10 Attribute Values (aka $ vars) Each terminal/non-terminal has one Denoted by $n where n is its rank in the rule starting by 1 –$$ = LHS –$1 = first symbol of the RHS –$2 = second symbol, etc. –Note, semantic actions have values too!!! A: B {...} C {...} ; C’s value is denoted by $3

Compiler Design BMZ 11 example %union { intvalue; char*symbol; } %type exp term factor %type ident... exp : exp ‘+’ term {$$ = $1 + $3; }; /* Note, $1 and $3 are ints here */ factor : ident {$$ = lookup(symbolTable, $1); }; /* Note, $1 is a char* here */

Compiler Design BMZ 12 Conflict Bison reports the number of shift/reduce and reduce/reduce conflicts found Shift/reduce conflicts –Occurs when there are 2 possible parses for an input string, one parse completes a rule (reduce) and one does not (shift) –Example e:‘X’ | e ‘+’ e ;\ “X+X+X” has 2 possible parses “(X+X)+X” or “X+(X+X)”

Compiler Design BMZ 13 Conflict continued Reduce/reduce conflict occurs when the same token could complete 2 different rules –Example prog : proga | progb ; proga : ‘X’ ; progb : ‘X’ ; “X” can either be a proga or progb –Ambiguous grammar!!

Compiler Design BMZ 14 Parsing Analogy sentence subjectverbindirect objectobject Igavehimnoun phrase articlenoun bookthe “I gave him the book” Syntax analysis for natural languages Recognize whether a sentence is grammatically correct Identify the function of each word

Compiler Design BMZ 15 Overview Goal – determine if the input token stream satisfies the syntax of the program What do we need to do this? –An expressive way to describe the syntax –A mechanism that determines if the input token stream satisfies the syntax description For lexical analysis –Regular expressions describe tokens –Finite automata = mechanisms to generate tokens from input stream

Compiler Design BMZ 16 Use Regular Expressions? REs can expressively describe tokens –Easy to implement via DFAs So just use them to describe the syntax of a programming language??? –NO! – They don’t have enough power to express any non-trivial syntax –Example – Nested constructs (blocks, expressions, statements) – Detect balanced braces: {{} {} {{} { }}} { {{{{ }}}}}...

Compiler Design BMZ 17 Context-Free Grammars Consist of 4 components: –Terminal symbols = token or  –Nonterminal symbols = syntactic variables –Start symbol S = special non-terminal –Productions of the form LHS  RHS LHS = single non-terminal RHS = string of terminals and non-terminals Specify how non-terminals may be expanded Language generated by a grammar is the set of strings of terminals derived from the start symbol by repeatedly applying the productions –L(G) = language generated by grammar G S  a S a S  T T  b T b T  

Compiler Design BMZ 18 Context-Free Grammars continued  A set of terminals: basic symbols from which sentences are formed  A set of nonterminals: syntactic categories denoting sets of sentences  A set of productions: rules specifying how the terminals and nonterminals can be combined to form sentences  The start symbol: a distinguished nonterminal denoting the language

Compiler Design BMZ 19 Notational Conventions To avoid always having to state that "these are the terminals”, "these are the nonterminals”, and so on, we shall employ the following notational conventions with regard to grammars throughout the remainder of this subject Terminals: id, +, -, *, /, (, ) Nonterminals: expr, op Productions: expr  expr op expr expr  ( expr ) expr  - expr expr  id op  + | - | * | / The start symbol: expr

Compiler Design BMZ 20 Notational Conventions continued l. These Symbols are Terminals: i) Lower-case letters early in the alphabet such as a, b, c. ii) Operator symbols such as +, -, etc. iii) Punctuation symbols such as parentheses, comma, etc. iv) The digits 0, 1,..., 9. v) Boldface strings such as id or if. 2. These Symbols are Nonterminals: i) Upper-case letters early in the alphabet such as A, B, C. ii) The letter S, when it appears, is usually the start symbol. iii) Lower-case italic names such as expr or stmt.

Compiler Design BMZ 21 Notational Conventions continued 3. Upper-case letters late in the alphabet, such as X, Y, Z, represent grammar symbols, that is, either nonterminals or terminals. 4. Lower-case letters late in the alphabet, chiefly u, v,..., z, represent strings of terminals. 5. Lower-case Greek letters, , , , …, represent strings of grammar symbols. Thus, a generic production could be written as A  , indicating that there is a single nonterminal A on the left of the arrow (the left side of the production) and a string of grammar symbols  to the right of the arrow (the right side of the production).

Compiler Design BMZ 22 Notational Conventions continued 6. If A   1, A   2,..., A   k are all productions with A on the left (we call them A-productions), we may write A   1 |  2 | … |  k. We call  1,  2,...,  k the alternatives for A. 7. The left side of the first production is the start symbol. E  E A E | ( E ) | - E | id A  + | - | * | /

Compiler Design BMZ 23 Derivations * + 1.   for any string  2. If   and  , then    *  *  *  *  + A derivation step is an application of a production as a rewriting rule E  - E A sequence of derivation steps E  - E  - ( E )  - ( id ) is called a derivation of “- ( id )” from E The symbol denotes: derives in zero or more steps The symbol denotes: derives in one or more steps E  - ( id )E  - ( id ) E  E A E | ( E ) | - E | id A  + | - | * | /

Compiler Design BMZ 24 example Grammar for balanced-parentheses language –S  ( S ) S –S   1 non-terminal: S 2 terminals: “(”, “)” Start symbol: S 2 productions If grammar accepts a string, there is a derivation of that string using the productions –“(())” –S = (S)  = ((S) S)  = ((  )  )  = (()) ? Why is the final S required?

Compiler Design BMZ 25 More on CFGs Shorthand notation – vertical bar for multiple productions S  a S a | T T  b T b |  CFGs powerful enough to expression the syntax in most programming languages Derivation = successive application of productions starting from S Acceptance? = Determine if there is a derivation for an input token stream

Compiler Design BMZ 26 RE is a Subset of CFG Can inductively build a grammar for each RE  S   aS  a R1 R2S  S1 S2 R1 | R2S  S1 | S2 R1*S  S1 S |  Where G1 = grammar for R1, with start symbol S1 G2 = grammar for R2, with start symbol S2

Compiler Design BMZ 27 Context-Free Languages A context-free language L(G) is the language defined by a context-free grammar G A string of terminals  is in L(G) if and only if S  + ,  is called a sentence of G If S  * , where  may contain non terminals, then we call  a sentential form of G E  - E  - ( E )  - ( id ) G 1 is equivalent to G 2 if L(G 1 ) = L(G 2 )

Compiler Design BMZ 28 Parser A Parser Context free grammar, G Token stream, s (from lexer) Yes, if s in L(G) No, otherwise Error messages Syntax analyzers (parsers) = CFG acceptors which also output the corresponding derivation when the token stream is accepted Various kinds: LL(k), LR(k), SLR, LALR

Compiler Design BMZ 29 Left- & Right-most Derivations Each derivation step needs to choose a nonterminal to rewrite a production to apply A leftmost derivation always chooses the leftmost nonterminal to rewrite E  lm - E  lm - ( E )  lm - ( E + E )  lm - ( id + E )  lm - ( id + id ) A rightmost derivation always chooses the rightmost nonterminal to rewrite E  rm - E  rm - ( E )  rm - ( E + E )  rm - (E + id )  rm - ( id + id )

Compiler Design BMZ 30 Parse Trees * A parse tree is a graphical representation for a derivation that filters out the order of choosing nonterminals for rewriting * Many derivations may correspond to the same parse tree, but every parse tree has associated with it a unique leftmost and a unique rightmost derivation E - () + id E EE E E  lm - E  lm - ( E )  lm - ( E + E )  lm - ( id + E )  lm - ( id + id ) E  rm - E  rm - ( E )  rm - ( E + E )  rm - (E + id )  rm - ( id + id )

Compiler Design BMZ 31 Parse Tree vs Abstract Syntax Tree S E+S ( S )E E + S 5 1 2E ( S ) E + S E3 4 S  E + S | E E  number | (S) Derive: ( (3 + 4)) AST discards (abstracts) unneeded information more compact format Parse tree = tree representation of the derivation Leaves of the tree are terminals Internal nodes are non-terminals No information about the order of the derivation steps

Compiler Design BMZ 32 Derivation Order Can choose to apply productions in any order, select non- terminal and substitute RHS of production Two standard orders: left and right-most Leftmost derivation –In the string, find the leftmost non-terminal and apply a production to it –E + S  1 + S Rightmost derivation –Same, but find rightmost non-terminal –E + S  E + E + S

Compiler Design BMZ 33 Leftmost & Rightmost Derivation S  E + S | E E  number | (S) S  E + S  (S)+S  (E+S) + S  (1+S)+S  (1+E+S)+S  (1+2+S)+S  (1+2+E)+S  (1+2+(S))+S  (1+2+(E+S))+S  (1+2+(3+S))+S  (1+2+(3+E))+S  (1+2+(3+4))+S  (1+2+(3+4))+E  (1+2+(3+4))+5 Rightmost derive: ( (3 + 4)) + 5 Result: Same parse tree, same productions chosen, but in different order S  E+S  E+E  E+5  (S)+5  (E+S)+5  (E+E+S)+5  (E+E+E)+5  (E+E+(S))+5  (E+E+(E+S))+5  (E+E+(E+E))+5  (E+E+(E+4))+5  (E+E+(3+4))+5  (E+2+(3+4))+5  (1+2+(3+4))+5 Leftmost derive: ( (3 + 4)) + 5

Compiler Design BMZ 34 Ambiguous Grammar * A grammar is ambiguous if it produces more than one parse tree for some sentence E  E + E  id + E  id + E * E  id + id * E  id + id * id E  E * E  E + E * E  id + E * E  id + id * E  id + id * id E +EE id *EE E *EE +EE

Compiler Design BMZ 35 Resolving Ambiguity * Use disambiguating rules to throw away undesirable parse trees * Rewrite grammars by incorporating disambiguating rules into grammars

Compiler Design BMZ 36 Example The dangling-else grammar stmt  if expr then stmt | if expr then stmt else stmt | other if E 1 then if E 2 then S 1 else S 2 S elseESSifthen ifEthenSelseE S SSifthen ifEthenS

Compiler Design BMZ 37 Disambiguating Rules * Rule: match each else with the closest previous unmatched then * Remove undesired state transitions in the pushdown automaton

Compiler Design BMZ 38 Grammar Rewriting stmt  m_stmt | unm_stmt m_stmt  if expr then m_stmt else m_stmt | other unm_stmt  if expr then stmt | if expr then m_stmt else unm_stmt