Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

Slides:



Advertisements
Similar presentations
Parsing 4 Dr William Harrison Fall 2008
Advertisements

Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Análise Sintática – Parte 2 Árvores Sintáticas Abstratas (ASTs) Scanning.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 531 Compiler Construction Ch. 3: Compilation Spring 2007 Marco Valtorta.
1 Languages and Compilers (SProg og Oversættere) Lecture 3 Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 531 Compiler Construction Ch.4: Syntactic Analysis Spring 2007 Marco Valtorta.
Chapter 4 Lexical and Syntax Analysis Sections 1-4.
Chapter 3 Describing Syntax and Semantics Sections 1-3.
CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.
Parsing III (Eliminating left recursion, recursive descent parsing)
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 531 Compiler Construction Ch.1 Spring 2010 Marco Valtorta
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 3 Lexical and Syntactic Analysis Syntactic.
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
1 Languages and Compilers (SProg og Oversættere) Parsing.
Compilation (Chapter 3) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Introduction to Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
CS 461 – Oct. 7 Applications of CFLs: Compiling Scanning vs. parsing Expression grammars –Associativity –Precedence Programming language (handout)
1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.
CS Describing Syntax CS 3360 Spring 2012 Sec Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Joey Paquet, Lecture 12 Review. Joey Paquet, Course Review Compiler architecture –Lexical analysis, syntactic analysis, semantic.
C H A P T E R TWO Syntax and Semantic.
TextBook Concepts of Programming Languages, Robert W. Sebesta, (10th edition), Addison-Wesley Publishing Company CSCI18 - Concepts of Programming languages.
COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.
Lexical and Syntax Analysis
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.
Introduction (Chapter 1) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
1 Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 531 Compiler Construction Ch.4: Syntactic Analysis Spring 2013 Marco Valtorta.
1 Languages and Compilers (SProg og Oversættere) Lexical analysis.
Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University.
Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
CPS 506 Comparative Programming Languages Syntax Specification.
1 Languages and Compilers (SProg og Oversættere) Lecture 3 recap Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to.
Language Translation A programming language processor is any system that manipulates programs expressed in a PL A source program in some source language.
Contextual Analysis (Chapter 5) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
Chapter 4 Top-Down Parsing Recursive-Descent Gang S. Liu College of Computer Science & Technology Harbin Engineering University.
1 Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.
The Model of Compilation Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
 Fall Chart 2  Translators and Compilers  Textbook o Programming Language Processors in Java, Authors: David A. Watts & Deryck F. Brown, 2000,
Top-Down Parsing.
PZ03BX Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03BX –Recursive descent parsing Programming Language.
Lesson 4 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Parsing III (Top-down parsing: recursive descent & LL(1) )
1 Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
CS 614: Theory and Construction of Compilers Lecture 4 Fall 2002 Department of Computer Science University of Alabama Joel Jones.
BNF A CFL Metalanguage Some Variations Particular View to SLK Copyright © 2015 – Curt Hill.
Introduction to Parsing
Chapter 3 – Describing Syntax
Programming Languages 2nd edition Tucker and Noonan
Languages and Compilers (SProg og Oversættere)
Programming Languages Translator
Introduction to Parsing
Introduction to Parsing
Course Overview PART I: overview material PART II: inside a compiler
Course Overview PART I: overview material PART II: inside a compiler
Presentation transcript:

Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II: inside a compiler 4Syntax analysis 5Contextual analysis 6Runtime organization 7Code generation PART III: conclusion 8Interpretation 9Review

Syntax Analysis (Chapter 4) 2 Systematic Development of Rec. Descent Parser (1)Express grammar in EBNF (2)Grammar Transformations: Left factorization and Left recursion elimination (3)Create a parser class with –private variable currentToken –methods to call the scanner: accept and acceptIt (4) Implement a public method for main function to call: –public parse method that fetches the first token from the scanner calls parse S (where S is start symbol of the grammar) verifies that scanner next produces the end–of–file token (5)Implement private parsing methods: –add private parse N method for each non terminal N

Syntax Analysis (Chapter 4) 3 Developing RD Parser for Mini Triangle Identifier := Letter (Letter|Digit)* Integer-Literal ::= Digit Digit* Operator ::= + | - | * | / | | = Comment ::= ! Graphic* eol Identifier := Letter (Letter|Digit)* Integer-Literal ::= Digit Digit* Operator ::= + | - | * | / | | = Comment ::= ! Graphic* eol Before we begin: The following non-terminals are recognized by the scanner They will be returned as tokens by the scanner Assume scanner returns instances of this class: public class Token { byte kind; String spelling; final static byte IDENTIFIER = 0, INTLITERAL = 1;... public class Token { byte kind; String spelling; final static byte IDENTIFIER = 0, INTLITERAL = 1;...

Syntax Analysis (Chapter 4) 4 (1)&(2) Developing RD Parser for Mini Triangle Program ::= single-Command Command ::= single-Command | Command ; single-Command single-Command ::= V-name := Expression | Identifier ( Expression ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end V-name ::= Identifier... Program ::= single-Command Command ::= single-Command | Command ; single-Command single-Command ::= V-name := Expression | Identifier ( Expression ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end V-name ::= Identifier... Left factorization needed Left recursion elimination needed

Syntax Analysis (Chapter 4) 5 (1)&(2) Express grammar in EBNF and transform Program ::= single-Command Command ::= single-Command (; single-Command)* single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end V-name ::= Identifier... Program ::= single-Command Command ::= single-Command (; single-Command)* single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end V-name ::= Identifier... After factorization etc. we get:

Syntax Analysis (Chapter 4) 6 (1)&(2) Developing RD Parser for Mini Triangle Expression ::= primary-Expression | Expression Operator primary-Expression primary-Expression ::= Integer-Literal | V-name | Operator primary-Expression | ( Expression ) Declaration ::= single-Declaration | Declaration ; single-Declaration single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter Type-denoter ::= Identifier Expression ::= primary-Expression | Expression Operator primary-Expression primary-Expression ::= Integer-Literal | V-name | Operator primary-Expression | ( Expression ) Declaration ::= single-Declaration | Declaration ; single-Declaration single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter Type-denoter ::= Identifier Left recursion elimination needed

Syntax Analysis (Chapter 4) 7 (1)&(2) Express grammar in EBNF and transform Expression ::= primary-Expression ( Operator primary-Expression )* primary-Expression ::= Integer-Literal | Identifier | Operator primary-Expression | ( Expression ) Declaration ::= single-Declaration (; single-Declaration)* single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter Type-denoter ::= Identifier Expression ::= primary-Expression ( Operator primary-Expression )* primary-Expression ::= Integer-Literal | Identifier | Operator primary-Expression | ( Expression ) Declaration ::= single-Declaration (; single-Declaration)* single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter Type-denoter ::= Identifier After factorization and recursion elimination :

Syntax Analysis (Chapter 4) 8 (3)&(4) Create a parser class and public parse method public class Parser { private Token currentToken; private void accept (byte expectedKind) { if (currentToken.kind == expectedKind) currentToken = scanner.scan( ); else report syntax error } private void acceptIt( ) { currentToken = scanner.scan( ); } public void parse( ) { acceptIt( ); // get the first token parseProgram( ); // Program is the start symbol if (currentToken.kind != Token.EOT) report syntax error }... public class Parser { private Token currentToken; private void accept (byte expectedKind) { if (currentToken.kind == expectedKind) currentToken = scanner.scan( ); else report syntax error } private void acceptIt( ) { currentToken = scanner.scan( ); } public void parse( ) { acceptIt( ); // get the first token parseProgram( ); // Program is the start symbol if (currentToken.kind != Token.EOT) report syntax error }...

Syntax Analysis (Chapter 4) 9 (5) Implement private parsing methods private void parseProgram( ) { parseSingleCommand( ); } private void parseProgram( ) { parseSingleCommand( ); } Program ::= single-Command

Syntax Analysis (Chapter 4) 10 (5) Implement private parsing methods single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command |... other alternatives... single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command |... other alternatives... private void parseSingleCommand( ) { switch (currentToken.kind) { case Token.IDENTIFIER :... case Token.IF : other cases... default: report a syntax error } private void parseSingleCommand( ) { switch (currentToken.kind) { case Token.IDENTIFIER :... case Token.IF : other cases... default: report a syntax error }

Syntax Analysis (Chapter 4) 11 (5) Implement private parsing methods single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end From the above we can straightforwardly derive the entire implementation of parseSingleCommand (much as we did in the microEnglish example)

Syntax Analysis (Chapter 4) 12 Algorithm to convert EBNF into a RD parser private void parseN( ) { parse  // as explained on next two slides } private void parseN( ) { parse  // as explained on next two slides } N ::=  The conversion of an EBNF specification into a Java or C ++ implementation for a recursive descent parser is so “mechanical” that it could easily be automated (such tools exist, but we won’t use them in this course) We can describe the algorithm by a set of mechanical rewrite rules

Syntax Analysis (Chapter 4) 13 Algorithm to convert EBNF into a RD parser // a dummy statement parse  parse N where N is a non-terminal parseN( ); parse t where t is a terminal accept(t); parse X Y parse X parse Y parse X parse Y

Syntax Analysis (Chapter 4) 14 Algorithm to convert EBNF into a RD parser parse X* while (currentToken.kind is in starters[X]) { parse X } while (currentToken.kind is in starters[X]) { parse X } parse X | Y switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: if neither X nor Y generates  then report syntax error } switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: if neither X nor Y generates  then report syntax error }

Syntax Analysis (Chapter 4) 15 private void parseCommand( ) { parse single-Command ( ; single-Command )* } private void parseCommand( ) { parse single-Command ( ; single-Command )* } Example: “Generation” of parseCommand Command ::= single-Command ( ; single-Command )* private void parseCommand( ) { parse single-Command parse ( ; single-Command )* } private void parseCommand( ) { parse single-Command parse ( ; single-Command )* } private void parseCommand( ) { parseSingleCommand( ); parse ( ; single-Command )* } private void parseCommand( ) { parseSingleCommand( ); parse ( ; single-Command )* } private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { parse ; single-Command } private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { parse ; single-Command } private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { parse ; parse single-Command } private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { parse ; parse single-Command } private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { acceptIt( );// because SEMICOLON has just been checked parseSingleCommand( ); } private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { acceptIt( );// because SEMICOLON has just been checked parseSingleCommand( ); }

Syntax Analysis (Chapter 4) 16 Example: Generation of parseSingleDeclaration single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter private void parseSingleDeclaration( ) { parse const Identifier ~ Expression | var Identifier : Type-denoter } private void parseSingleDeclaration( ) { parse const Identifier ~ Expression | var Identifier : Type-denoter } private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: parse const Identifier ~ Expression case Token.VAR: parse var Identifier : Type-denoter default: report syntax error } private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: parse const Identifier ~ Expression case Token.VAR: parse var Identifier : Type-denoter default: report syntax error } private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: parse const parse Identifier parse ~ parse Expression case Token.VAR: parse var Identifier : Type-denoter default: report syntax error } private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: parse const parse Identifier parse ~ parse Expression case Token.VAR: parse var Identifier : Type-denoter default: report syntax error } private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: acceptIt( ); parseIdentifier( ); accept(Token.IS); parseExpression( ); case Token.VAR: parse var Identifier : Type-denoter default: report syntax error } private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: acceptIt( ); parseIdentifier( ); accept(Token.IS); parseExpression( ); case Token.VAR: parse var Identifier : Type-denoter default: report syntax error } private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: acceptIt( ); parseIdentifier( ); accept(Token.IS); parseExpression( ); case Token.VAR: acceptIt( ); parseIdentifier( ); accept(Token.COLON); parseTypeDenoter( ); default: report syntax error } private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: acceptIt( ); parseIdentifier( ); accept(Token.IS); parseExpression( ); case Token.VAR: acceptIt( ); parseIdentifier( ); accept(Token.COLON); parseTypeDenoter( ); default: report syntax error }

Syntax Analysis (Chapter 4) 17 LL 1 Grammars The presented algorithm to convert EBNF into a parser does not work for all possible grammars. It only works for so called “LL 1” grammars. Basically, an LL 1 grammar is a grammar which can be parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token. What grammars are LL 1? How can we recognize that a grammar is (or is not) LL 1? => We can deduce the necessary conditions from the parser generation algorithm.

Syntax Analysis (Chapter 4) 18 LL 1 Grammars parse X* while (currentToken.kind is in starters[X]) { parse X } while (currentToken.kind is in starters[X]) { parse X } parse X |Y switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: if neither X nor Y generates  then report syntax error } switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: if neither X nor Y generates  then report syntax error } Conditions: starters[X] and starters[Y] must be disjoint sets, and if either X or Y generates  then must also be disjoint from the set of tokens that can immediately follow X | Y Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X *

Syntax Analysis (Chapter 4) 19 LL 1 grammars and left factorization single-Command ::= V-name := Expression | Identifier ( Expression ) |... V-name ::= Identifier single-Command ::= V-name := Expression | Identifier ( Expression ) |... V-name ::= Identifier The original Mini-Triangle grammar is not LL 1: For example: Starters[ V-name := Expression ] = Starters[ V-name ] = Starters[ Identifier ] Starters[ Identifier ( Expression ) ] = Starters[ Identifier ] NOT DISJOINT!

Syntax Analysis (Chapter 4) 20 LL 1 grammars: left factorization private void parseSingleCommand( ) { switch (currentToken.kind) { case Token.IDENTIFIER: parse V-name := Expression case Token.IDENTIFIER: parse Identifier ( Expression )... other cases... default: report syntax error } private void parseSingleCommand( ) { switch (currentToken.kind) { case Token.IDENTIFIER: parse V-name := Expression case Token.IDENTIFIER: parse Identifier ( Expression )... other cases... default: report syntax error } single-Command ::= V-name := Expression | Identifier ( Expression ) |... single-Command ::= V-name := Expression | Identifier ( Expression ) |... What happens when we generate a RD parser from a non LL 1 grammar? wrong: overlapping cases

Syntax Analysis (Chapter 4) 21 LL 1 grammars: left factorization single-Command ::= V-name := Expression | Identifier ( Expression ) |... single-Command ::= V-name := Expression | Identifier ( Expression ) |... Left factorization (and substitution of V-name) single-Command ::= Identifier ( := Expression | ( Expression ) ) |... single-Command ::= Identifier ( := Expression | ( Expression ) ) |...

Syntax Analysis (Chapter 4) 22 LL 1 Grammars: left recursion elimination Command ::= single-Command | Command ; single-Command Command ::= single-Command | Command ; single-Command public void parseCommand( ) { switch (currentToken.kind) { case in starters[single-Command] parseSingleCommand( ); case in starters[Command] parseCommand( ); accept(Token.SEMICOLON); parseSingleCommand( ); default: report syntax error } public void parseCommand( ) { switch (currentToken.kind) { case in starters[single-Command] parseSingleCommand( ); case in starters[Command] parseCommand( ); accept(Token.SEMICOLON); parseSingleCommand( ); default: report syntax error } What happens if we don’t perform left-recursion elimination? wrong: overlapping cases

Syntax Analysis (Chapter 4) 23 LL 1 Grammars: left recursion elimination Command ::= single-Command | Command ; single-Command Command ::= single-Command | Command ; single-Command Left recursion elimination Command ::= single-Command (; single-Command)* Command ::= single-Command (; single-Command)*

Syntax Analysis (Chapter 4) 24 Abstract Syntax Trees So far we have talked about how to build a recursive descent parser which recognizes a given language described by an (LL 1) EBNF grammar. Next we will look at –how to represent AST as data structures. –how to modify the parser to construct an AST data structure. We make heavy use of Object–Oriented Programming! (classes, inheritance, dynamic method binding)