1 Languages and Compilers (SProg og Oversættere) Lecture 3 recap Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to.

Slides:

Advertisements

Similar presentations

Lexical and Syntactic Analysis Here, we look at two of the tasks involved in the compilation process –Given source code, we need to first break it into.

Advertisements

1 Languages and Compilers (SProg og Oversættere) Lecture 3 Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 531 Compiler Construction Ch.4: Syntactic Analysis Spring 2007 Marco Valtorta.

6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)

Top-Down Parsing.

Languages and Compilers (SProg og Oversættere) Lecture 4

CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.

Context-Free Grammars Lecture 7

Parsing III (Eliminating left recursion, recursive descent parsing)

ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.

Slide 1 Chapter 2-b Syntax, Semantics. Slide 2 Syntax, Semantics - Definition The syntax of a programming language is the form of its expressions, statements.

Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 330 Programming Language Structures Chapter 3: Lexical and Syntactic Analysis.

Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)

Professor Yihjia Tsai Tamkang University

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 531 Compiler Construction Ch.1 Spring 2010 Marco Valtorta

Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University

Grammars and Parsing. Sentence  Noun Verb Noun Noun  boys Noun  girls Noun  dogs Verb  like Verb  see Grammars Grammar: set of rules for generating.

Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

1 Languages and Compilers (SProg og Oversættere) Parsing.

Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

Top-Down Parsing - recursive descent - predictive parsing

1 Top Down Parsing. CS 412/413 Spring 2008Introduction to Compilers2 Outline Top-down parsing SLL(1) grammars Transforming a grammar into SLL(1) form.

Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.

Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.

COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 3, 09/11/2003 Prof. Roy Levow.

1 Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson.

1 Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 531 Compiler Construction Ch.4: Syntactic Analysis Spring 2013 Marco Valtorta.

1 Languages and Compilers (SProg og Oversættere) Lexical analysis.

Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University.

1 Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson.

CPS 506 Comparative Programming Languages Syntax Specification.

Comp 311 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright August 28, 2009.

Language Translation A programming language processor is any system that manipulates programs expressed in a PL A source program in some source language.

Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.

CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.

1 Languages and Compilers (SProg og Oversættere) Lecture 4 Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm.

1 Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson.

CS 153: Concepts of Compiler Design October 21 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak

Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.

Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.

Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.

Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.

1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

 Fall Chart 2  Translators and Compilers  Textbook o Programming Language Processors in Java, Authors: David A. Watts & Deryck F. Brown, 2000,

Top-Down Parsing.

Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.

1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )

1 Languages and Compilers (SProg og Oversættere) Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson.

 Fall Chart 2  Sub-phases of Syntactic Analysis  Grammars Revisited  Parsing  Abstract Syntax Trees  Scanning  Case Study: Syntactic Analysis.

CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.

CS 614: Theory and Construction of Compilers Lecture 4 Fall 2002 Department of Computer Science University of Alabama Joel Jones.

Comp 411 Principles of Programming Languages Lecture 3 Parsing

Chapter 3 – Describing Syntax

Languages and Compilers (SProg og Oversættere)

Programming Languages Translator

CS510 Compiler Lecture 4.

Introduction to Parsing (adapted from CS 164 at Berkeley)

CS 363 Comparative Programming Languages

Top-Down Parsing CS 671 January 29, 2008.

R.Rajkumar Asst.Professor CSE

The Recursive Descent Algorithm

Syntactic sugar causes cancer of the semicolon.

Compilers Principles, Techniques, & Tools Taught by Jing Zhang

Course Overview PART I: overview material PART II: inside a compiler

COMPILER CONSTRUCTION

Presentation transcript:

1 Languages and Compilers (SProg og Oversættere) Lecture 3 recap Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson whose slides this lecture is based on.

2 Syntax Analysis: Parser Scanner Source Program Abstract Syntax Tree Error Reports Parser Stream of “Tokens” Stream of Characters Error Reports Dataflow chart

3 Look-Ahead Derivation LL-Analyse (Top-Down) Left-to-Right Left Derivative Look-Ahead Reduction LR-Analyse (Bottom-Up) Left-to-Right Right Derivative Top-Down vs. Bottom-Up parsing

4 Recursive Descent Parsing Recursive descent parsing is a straightforward top-down parsing algorithm. We will now look at how to develop a recursive descent parser from an EBNF specification. Idea: the parse tree structure corresponds to the “call graph” structure of parsing procedures that call each other recursively.

5 Recursive Descent Parsing Sentence ::= Subject Verb Object. Subject ::= I | a Noun | the Noun Object::= me | a Noun | the Noun Noun::= cat | mat | rat Verb::= like | is | see | sees Sentence ::= Subject Verb Object. Subject ::= I | a Noun | the Noun Object::= me | a Noun | the Noun Noun::= cat | mat | rat Verb::= like | is | see | sees Define a procedure parseN for each non-terminal N private void parseSentence() ; private void parseSubject(); private void parseObject(); private void parseNoun(); private void parseVerb(); private void parseSentence() ; private void parseSubject(); private void parseObject(); private void parseNoun(); private void parseVerb();

6 Recursive Descent Parsing: Auxiliary Methods public class MicroEnglishParser { private TerminalSymbol currentTerminal private void accept(TerminalSymbol expected) { if (currentTerminal matches expected) currentTerminal = next input terminal ; else report a syntax error }... } public class MicroEnglishParser { private TerminalSymbol currentTerminal private void accept(TerminalSymbol expected) { if (currentTerminal matches expected) currentTerminal = next input terminal ; else report a syntax error }... }

7 Recursive Descent Parsing: Parsing Methods private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’); } private void parseSentence() { parseSubject(); parseVerb(); parseObject(); accept(‘.’); } Sentence ::= Subject Verb Object.

8 Recursive Descent Parsing: Parsing Methods private void parseSubject() { if (currentTerminal matches ‘ I ’) accept(‘ I ’); else if (currentTerminal matches ‘ a ’) { accept(‘ a ’); parseNoun(); } else if (currentTerminal matches ‘ the ’) { accept(‘ the ’); parseNoun(); } else report a syntax error } private void parseSubject() { if (currentTerminal matches ‘ I ’) accept(‘ I ’); else if (currentTerminal matches ‘ a ’) { accept(‘ a ’); parseNoun(); } else if (currentTerminal matches ‘ the ’) { accept(‘ the ’); parseNoun(); } else report a syntax error } Subject ::= I | a Noun | the Noun

9 Recursive Descent Parsing: Parsing Methods private void parseNoun() { if (currentTerminal matches ‘ cat ’) accept(‘ cat ’); else if (currentTerminal matches ‘ mat ’) accept(‘ mat ’); else if (currentTerminal matches ‘ rat ’) accept(‘ rat ’); else report a syntax error } private void parseNoun() { if (currentTerminal matches ‘ cat ’) accept(‘ cat ’); else if (currentTerminal matches ‘ mat ’) accept(‘ mat ’); else if (currentTerminal matches ‘ rat ’) accept(‘ rat ’); else report a syntax error } Noun::= cat | mat | rat

10 Top-down parsing Thecatseesarat.Thecatseesrat. The parse tree is constructed starting at the top (root). Sentence SubjectVerbObject. Sentence Noun Subject The Noun cat Verb seesa Noun Object Noun rat.

11 A little bit of useful theory We will now look at a few useful bits of theory. These will be necessary later when we implement parsers. –Grammar transformations A grammar can be transformed in a number of ways without changing the meaning (i.e. the set of strings that it defines) –The definition and computation of “starter sets”

12 1) Grammar Transformations Left factorization single-Command ::= V-name := Expression | if Expression then single-Command | if Expression then single-Command else single-Command single-Command ::= V-name := Expression | if Expression then single-Command | if Expression then single-Command else single-Command single-Command ::= V-name := Expression | if Expression then single-Command (  | else single-Command) single-Command ::= V-name := Expression | if Expression then single-Command (  | else single-Command) X Y | X Z X ( Y | Z ) Example: X Y=  Z

13 1) Grammar Transformations (ctd) Elimination of Left Recursion N ::= X | N Y Identifier ::= Letter | Identifier Letter | Identifier Digit Identifier ::= Letter | Identifier Letter | Identifier Digit N ::= X Y * Example: Identifier ::= Letter | Identifier (Letter|Digit) Identifier ::= Letter | Identifier (Letter|Digit) Identifier ::= Letter (Letter|Digit)*

14 1) Grammar Transformations (ctd) Substitution of non-terminal symbols N ::= X M ::=  N  single-Command ::= for contrVar := Expression to-or-dt Expression do single-Command to-or-dt ::= to | downto single-Command ::= for contrVar := Expression to-or-dt Expression do single-Command to-or-dt ::= to | downto Example: N ::= X M ::=  X  single-Command ::= for contrVar := Expression (to|downto) Expression do single-Command single-Command ::= for contrVar := Expression (to|downto) Expression do single-Command

15 Developing RD Parser for Mini Triangle Identifier := Letter (Letter|Digit)* Integer-Literal ::= Digit Digit* Operator ::= + | - | * | / | | = Comment ::= ! Graphic* eol Identifier := Letter (Letter|Digit)* Integer-Literal ::= Digit Digit* Operator ::= + | - | * | / | | = Comment ::= ! Graphic* eol Before we begin: The following non-terminals are recognized by the scanner They will be returned as tokens by the scanner Assume scanner produces instances of: public class Token { byte kind; String spelling; final static byte IDENTIFIER = 0, INTLITERAL = 1;... public class Token { byte kind; String spelling; final static byte IDENTIFIER = 0, INTLITERAL = 1;...

16 (1+2) Express grammar in EBNF and factorize... Program ::= single-Command Command ::= single-Command | Command ; single-Command single-Command ::= V-name := Expression | Identifier ( Expression ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end V-name ::= Identifier... Program ::= single-Command Command ::= single-Command | Command ; single-Command single-Command ::= V-name := Expression | Identifier ( Expression ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end V-name ::= Identifier... Left factorization needed Left recursion elimination needed

17 (1+2)Express grammar in EBNF and factorize... Program ::= single-Command Command ::= single-Command (;single-Command)* single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end V-name ::= Identifier... Program ::= single-Command Command ::= single-Command (;single-Command)* single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end V-name ::= Identifier... After factorization etc. we get:

18 Developing RD Parser for Mini Triangle Expression ::= primary-Expression | Expression Operator primary-Expression primary-Expression ::= Integer-Literal | V-name | Operator primary-Expression | ( Expression ) Declaration ::= single-Declaration | Declaration ; single-Declaration single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter Type-denoter ::= Identifier Expression ::= primary-Expression | Expression Operator primary-Expression primary-Expression ::= Integer-Literal | V-name | Operator primary-Expression | ( Expression ) Declaration ::= single-Declaration | Declaration ; single-Declaration single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter Type-denoter ::= Identifier Left recursion elimination needed

19 (1+2) Express grammar in EBNF and factorize... Expression ::= primary-Expression ( Operator primary-Expression )* primary-Expression ::= Integer-Literal | Identifier | Operator primary-Expression | ( Expression ) Declaration ::= single-Declaration (;single-Declaration)* single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter Type-denoter ::= Identifier Expression ::= primary-Expression ( Operator primary-Expression )* primary-Expression ::= Integer-Literal | Identifier | Operator primary-Expression | ( Expression ) Declaration ::= single-Declaration (;single-Declaration)* single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter Type-denoter ::= Identifier After factorization and recursion elimination :

20 (3)Create a parser class with... public class Parser { private Token currentToken; private void accept(byte expectedKind) { if (currentToken.kind == expectedKind) currentToken = scanner.scan(); else report syntax error } private void acceptIt() { currentToken = scanner.scan(); } public void parse() { acceptIt(); //Get the first token parseProgram(); if (currentToken.kind != Token.EOT) report syntax error }... public class Parser { private Token currentToken; private void accept(byte expectedKind) { if (currentToken.kind == expectedKind) currentToken = scanner.scan(); else report syntax error } private void acceptIt() { currentToken = scanner.scan(); } public void parse() { acceptIt(); //Get the first token parseProgram(); if (currentToken.kind != Token.EOT) report syntax error }...

21 (4) Implement private parsing methods: private void parseProgram() { parseSingleCommand(); } private void parseProgram() { parseSingleCommand(); } Program ::= single-Command

22 (4) Implement private parsing methods: single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command |... more alternatives... single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command |... more alternatives... private void parseSingleCommand() { switch (currentToken.kind) { case Token.IDENTIFIER :... case Token.IF : more cases... default: report a syntax error } private void parseSingleCommand() { switch (currentToken.kind) { case Token.IDENTIFIER :... case Token.IF : more cases... default: report a syntax error }

23 (4) Implement private parsing methods: single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end single-Command ::= Identifier ( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command end From the above we can straightforwardly derive the entire implementation of parseSingleCommand (much as we did in the microEnglish example)

24 Algorithm to convert EBNF into a RD parser private void parseN() { parse X } private void parseN() { parse X } N ::= X The conversion of an EBNF specification into a Java implementation for a recursive descent parser is so “mechanical” that it can easily be automated! => JavaCC “Java Compiler Compiler” We can describe the algorithm by a set of mechanical rewrite rules

25 Algorithm to convert EBNF into a RD parser // a dummy statement parse  parse N where N is a non-terminal parseN(); parse t where t is a terminal accept(t); parse XY parse X parse Y parse X parse Y

26 Algorithm to convert EBNF into a RD parser parse X* while (currentToken.kind is in starters[X]) { parse X } while (currentToken.kind is in starters[X]) { parse X } parse X|Y switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error } switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error }

27 private void parseCommand() { parse single-Command ( ; single-Command )* } private void parseCommand() { parse single-Command ( ; single-Command )* } Example: “Generation” of parseCommand Command ::= single-Command ( ; single-Command )* private void parseCommand() { parse single-Command parse ( ; single-Command )* } private void parseCommand() { parse single-Command parse ( ; single-Command )* } private void parseCommand() { parseSingleCommand(); parse ( ; single-Command )* } private void parseCommand() { parseSingleCommand(); parse ( ; single-Command )* } private void parseCommand() { parseSingleCommand(); while (currentToken.kind==Token.SEMICOLON) { parse ; single-Command } private void parseCommand() { parseSingleCommand(); while (currentToken.kind==Token.SEMICOLON) { parse ; single-Command } private void parseCommand() { parseSingleCommand(); while (currentToken.kind==Token.SEMICOLON) { parse ; parse single-Command } private void parseCommand() { parseSingleCommand(); while (currentToken.kind==Token.SEMICOLON) { parse ; parse single-Command } private void parseCommand() { parseSingleCommand(); while (currentToken.kind==Token.SEMICOLON) { acceptIt(); parseSingleCommand(); } private void parseCommand() { parseSingleCommand(); while (currentToken.kind==Token.SEMICOLON) { acceptIt(); parseSingleCommand(); }

28 Example: Generation of parseSingleDeclaration single-Declaration ::= const Identifier ~ Type-denoter | var Identifier : Expression single-Declaration ::= const Identifier ~ Type-denoter | var Identifier : Expression private void parseSingleDeclaration() { parse const Identifier ~ Type-denoter | var Identifier : Expression } private void parseSingleDeclaration() { parse const Identifier ~ Type-denoter | var Identifier : Expression } private void parseSingleDeclaration() { switch (currentToken.kind) { case Token.CONST: parse const Identifier ~ Type-denoter case Token.VAR: parse var Identifier : Expression default: report syntax error } private void parseSingleDeclaration() { switch (currentToken.kind) { case Token.CONST: parse const Identifier ~ Type-denoter case Token.VAR: parse var Identifier : Expression default: report syntax error } private void parseSingleDeclaration() { switch (currentToken.kind) { case Token.CONST: parse const parse Identifier parse ~ parse Type-denoter case Token.VAR: parse var Identifier : Expression default: report syntax error } private void parseSingleDeclaration() { switch (currentToken.kind) { case Token.CONST: parse const parse Identifier parse ~ parse Type-denoter case Token.VAR: parse var Identifier : Expression default: report syntax error } private void parseSingleDeclaration() { switch (currentToken.kind) { case Token.CONST: acceptIt(); parseIdentifier(); acceptIt(Token.IS); parseTypeDenoter(); case Token.VAR: parse var Identifier : Expression default: report syntax error } private void parseSingleDeclaration() { switch (currentToken.kind) { case Token.CONST: acceptIt(); parseIdentifier(); acceptIt(Token.IS); parseTypeDenoter(); case Token.VAR: parse var Identifier : Expression default: report syntax error } private void parseSingleDeclaration() { switch (currentToken.kind) { case Token.CONST: acceptIt(); parseIdentifier(); acceptIt(Token.IS); parseTypeDenoter(); case Token.VAR: acceptIt(); parseIdentifier(); acceptIt(Token.COLON); parseExpression(); default: report syntax error } private void parseSingleDeclaration() { switch (currentToken.kind) { case Token.CONST: acceptIt(); parseIdentifier(); acceptIt(Token.IS); parseTypeDenoter(); case Token.VAR: acceptIt(); parseIdentifier(); acceptIt(Token.COLON); parseExpression(); default: report syntax error }

29 LL(1) Grammars The presented algorithm to convert EBNF into a parser does not work for all possible grammars. It only works for so called LL(1) grammars. What grammars are LL(1)? Basically, an LL(1) grammar is a grammar which can be parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token. How can we recognize that a grammar is (or is not) LL(1)?  There is a formal definition which we will skip for now  We can deduce the necessary conditions from the parser generation algorithm.

30 LL(1) Grammars parse X* while (currentToken.kind is in starters[X]) { parse X } while (currentToken.kind is in starters[X]) { parse X } parse X|Y switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error } switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error } Condition: starters[X] and starters[Y] must be disjoint sets. Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X *

31 2) Starter Sets Informal Definition: The starter set of a RE X is the set of terminal symbols that can occur as the start of any string generated by X Example : starters[ (+|-|  )(0|1| … |9)* ] = { +, -, 0, 1,…, 9 } Formal Definition: starters[  ={} starters[t  ={t}(where t is a terminal symbol) starters[X Y  = starters[X  starters[Y  if X generates  ) starters[X Y  = starters[X  if not X generates  ) starters[X | Y  = starters[X  starters[Y  starters[X*  = starters[X 

32 2) Starter Sets (ctd) Informal Definition: The starter set of RE can be generalized to extended BNF Formal Definition: starters[N  = starters[X  for production rules N ::= X) Example : starters[Expression] = starters[PrimaryExp (Operator PrimaryExp)*] = starters[PrimaryExp] = starters[Identifiers]  starters[(Expression)] = starters[a | b | c |... |z]  {(} = {a, b, c,…, z, (}

33 LL(1) grammars and left factorisation single-Command ::= V-name := Expression | Identifier ( Expression ) |... V-name ::= Identifier single-Command ::= V-name := Expression | Identifier ( Expression ) |... V-name ::= Identifier The original Mini Triangle grammar is not LL(1): For example: Starters[ V-name := Expression ] = Starters[ V-name ] = Starters[ Identifier ] Starters[ Identifier ( Expression ) ] = Starters[ Identifier ] NOT DISJOINT!

34 LL(1) grammars: left factorisation private void parseSingleCommand() { switch (currentToken.kind) { case Token.IDENTIFIER: parse V-name := Expression case Token.IDENTIFIER: parse Identifier ( Expression )... other cases... default: report syntax error } private void parseSingleCommand() { switch (currentToken.kind) { case Token.IDENTIFIER: parse V-name := Expression case Token.IDENTIFIER: parse Identifier ( Expression )... other cases... default: report syntax error } single-Command ::= V-name := Expression | Identifier ( Expression ) |... single-Command ::= V-name := Expression | Identifier ( Expression ) |... What happens when we generate a RD parser from a non LL(1) grammar? wrong: overlapping cases

35 LL(1) grammars: left factorisation single-Command ::= V-name := Expression | Identifier ( Expression ) |... single-Command ::= V-name := Expression | Identifier ( Expression ) |... Left factorisation (and substitution of V-name) single-Command ::= Identifier ( := Expression | ( Expression ) ) |... single-Command ::= Identifier ( := Expression | ( Expression ) ) |...

36 LL1 Grammars: left recursion elimination Command ::= single-Command | Command ; single-Command Command ::= single-Command | Command ; single-Command public void parseCommand() { switch (currentToken.kind) { case in starters[single-Command] parseSingleCommand(); case in starters[Command] parseCommand(); accept(Token.SEMICOLON); parseSingleCommand(); default: report syntax error } public void parseCommand() { switch (currentToken.kind) { case in starters[single-Command] parseSingleCommand(); case in starters[Command] parseCommand(); accept(Token.SEMICOLON); parseSingleCommand(); default: report syntax error } What happens if we don’t perform left-recursion elimination? wrong: overlapping cases

37 LL1 Grammars: left recursion elimination Command ::= single-Command | Command ; single-Command Command ::= single-Command | Command ; single-Command Left recursion elimination Command ::= single-Command (; single-Command)* Command ::= single-Command (; single-Command)*

38 Systematic Development of RD Parser (1)Express grammar in EBNF (2)Grammar Transformations: Left factorization and Left recursion elimination (3)Create a parser class with –private variable currentToken –methods to call the scanner: accept and acceptIt (4)Implement private parsing methods: –add private parse N method for each non terminal N –public parse method that gets the first token form the scanner calls parse S (S is the start symbol of the grammar)

39 Algorithm to convert EBNF into a RD parser private void parseN() { parse X } private void parseN() { parse X } N ::= X The conversion of an EBNF specification into a Java implementation for a recursive descent parser is straightforward We can describe the algorithm by a set of mechanical rewrite rules

40 Algorithm to convert EBNF into a RD parser // a dummy statement parse  parse N where N is a non-terminal parseN(); parse t where t is a terminal accept(t); parse XY parse X parse Y parse X parse Y

41 Algorithm to convert EBNF into a RD parser parse X* while (currentToken.kind is in starters[X]) { parse X } while (currentToken.kind is in starters[X]) { parse X } parse X|Y switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error } switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error }

42 LL(1) Grammars The presented algorithm to convert EBNF into a parser does not work for all possible grammars. It only works for so called “LL(1)” grammars. What grammars are LL(1)? Basically, an LL1 grammar is a grammar which can be parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token. How can we recognize that a grammar is (or is not) LL1?  We can deduce the necessary conditions from the parser generation algorithm.  There is a formal definition we can use.

43 LL 1 Grammars parse X* while (currentToken.kind is in starters[X]) { parse X } while (currentToken.kind is in starters[X]) { parse X } parse X|Y switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error } switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: report syntax error } Condition: starters[X] and starters[Y] must be disjoint sets. Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X *

44 Formal definition of LL(1) A grammar G is LL(1) iff for each set of productions M ::= X 1 | X 2 | … | X n : 1.starters[X 1 ], starters[X 2 ], …, starters[X n ] are all pairwise disjoint 2.If X i =>* ε then starters[X j ]∩ follow[X]=Ø, for 1≤j≤ n.i≠j If G is ε-free then 1 is sufficient

45 Derivation What does X i =>* ε mean? It means a derivation from X i leading to the empty production What is a derivation? –A grammar has a derivation:  A  =>  iff A   P (Sometimes A ::=  ) =>* is the transitive closure of => Example: –G = ({E}, {a,+,*,(,)}, P, E) where P = {E  E+E, E  E*E, E  a, E  (E)} –E => E+E => E+E*E => a+E*E => a+E*a => a+a*a –E =>* a+a*a

46 Follow Sets Follow(A) is the set of prefixes of strings of terminals that can follow any derivation of A in G –$  follow(S) (sometimes  follow(S)) –if (B  A  )  P, then – first(  )  follow(B)  follow(A) The definition of follow usually results in recursive set definitions. In order to solve them, you need to do several iterations on the equations.

47 A few provable facts about LL(1) grammars No left-recursive grammar is LL(1) No ambiguous grammar is LL(1) Some languages have no LL(1) grammar A ε-free grammar, where each alternative X j for N ::= X j begins with a distinct terminal, is a simple LL(1) grammar

48 Converting EBNF into RD parsers The conversion of an EBNF specification into a Java implementation for a recursive descent parser is so “mechanical” that it can easily be automated! => JavaCC “Java Compiler Compiler”

49 JavaCC JavaCC is a parser generator JavaCC can be thought of as “Lex and Yacc” for implementing parsers in Java JavaCC is based on LL(k) grammars JavaCC transforms an EBNF grammar into an LL(k) parser The lookahead can be change by writing LOOKAHEAD(…) The JavaCC can have action code written in Java embedded in the grammar JavaCC has a companion called JJTree which can be used to generate an abstract syntax tree

50 JavaCC and JJTree JavaCC is a parser generator –Inputs a set of token definitions, grammar and actions –Outputs a Java program which performs lexical analysis Finding tokens Parses the tokens according to the grammar Executes actions JJTree is a preprocessor for JavaCC –Inputs a grammar file –Inserts tree building actions –Outputs JavaCC grammar file From this you can add code to traverse the tree to do static analysis, code generation or interpretation.

51 JavaCC and JJTree

52 JavaCC input format One file with extension.jj containing –Header –Token specifications –Grammar Example: TOKEN: { } void StatementListReturn() : {} { (Statement())* “return” Expression() “;” }

53 JavaCC token specifications use regular expressions Characters and strings must be quoted –“;”, “int”, “while” Character lists […] is shorthand for | –[“a”-”z”] matches “a” | “b” | “c” | … | “z” –[“a”,”e”,”i”,”o”,u”] matches any vowel –[“a”-”z”,”A”-”Z”] matches any letter Repetition shorthand with * and + –[“a”-”z”,”A”-”Z”]* matches zero or more letters –[“a”-”z”,”A”-”Z”]+ matches one or more letters Shorthand with ? provides for optionals: –(“+”|”-”)?[“0”-”9”]+ matches signed and unsigned integers Tokens can be named –TOKEN : { ( | )*>} –TOKEN : { | } –Now can be used in defining syntax

54 A bigger example options { LOOKAHEAD=2; } PARSER_BEGIN(Arithmetic) public class Arithmetic { } PARSER_END(Arithmetic) SKIP : { " " | "\r" | "\t" } TOKEN: { )+ ( "." ( )+ )? > | } double expr(): { } { term() ( "+" expr() | "-" expr() )* } double term(): { } { unary() ( "*" term() | "/" term() )* } double unary(): { } { "-" element() | element() } double element(): { } { | "(" expr() ")" }

55 Generating a parser with JavaCC javacc filename.jj –generates a parser with specified name –Lots of.java files javac *.java –Compile all the.java files Note the parser doesn’t do anything on its own. You have to either –Add actions to grammar by hand –Use JJTree to generate actions for building AST –Use JBT to generate AST and visitors

56 Adding Actions by hand options { LOOKAHEAD=2; } PARSER_BEGIN(Calculator) public class Calculator { public static void main(String args[]) throws ParseException { Calculator parser = new Calculator(System.in); while (true) { parser.parseOneLine(); } PARSER_END(Calculator) SKIP : { " " | "\r" | "\t" } TOKEN: { )+ ( "." ( )+ )? > | } void parseOneLine(): { double a; } { a=expr() { System.out.println(a); } | | { System.exit(-1); } }

57 Adding Actions by hand (ctd.) double expr(): { double a; double b; } { a=term() ( "+" b=expr() { a += b; } | "-" b=expr() { a -= b; } )* { return a; } } double term(): { double a; double b; } { a=unary() ( "*" b=term() { a *= b; } | "/" b=term() { a /= b; } )* { return a; } } double unary(): { double a; } { "-" a=element() { return -a; } | a=element() { return a; } } double element(): { Token t; double a; } { t= { return Double.parseDouble(t.toString()); } | "(" a=expr() ")" { return a; } }

58 Using JJTree JJTree is a preprocessor for JavaCC JTree transforms a bare JavaCC grammar into a grammar with embedded Java code for building an AST –Classes Node and SimpleNode are generated –Can also generate classes for each type of node All AST nodes implement interface Node –Useful methods provided include: Public void jjtGetNumChildren() - returns the number of children Public void jjtGetChild(int i) - returns the i’th child –The “state” is in a parser field called jjtree The root is at Node rootNode() You can display the tree with ((SimpleNode)parser.jjtree.rootNode()).dump(“ “); JJTree supports the building of abstract syntax trees which can be traversed using the visitor design pattern

59 JBT JBT – Java Tree Builder is an alternative to JJTree It takes a plain JavaCC grammar file as input and automatically generates the following: –A set of syntax tree classes based on the productions in the grammar, utilizing the Visitor design pattern. –Two interfaces: Visitor and ObjectVisitor. Two depth-first visitors: DepthFirstVisitor and ObjectDepthFirst, whose default methods simply visit the children of the current node. –A JavaCC grammar with the proper annotations to build the syntax tree during parsing. New visitors, which subclass DepthFirstVisitor or ObjectDepthFirst, can then override the default methods and perform various operations on and manipulate the generated syntax tree.

60 The Visitor Pattern For object-oriented programming the visitor pattern enables the definition of a new operator on an object structure without changing the classes of the objects When using visitor pattern The set of classes must be fixed in advance Each class must have an accept method Each accept method takes a visitor as argument The purpose of the accept method is to invoke the visitor which can handle the current object. A visitor contains a visit method for each class (overloading) A method for class C takes an argument of type C The advantage of Visitors: New methods without recompilation!

61 Summary: Parser Generator Tools JavaCC is a Parser Generator Tool It can be thought of as Lex and Yacc for Java There are several other Parser Generator tools for Java –We shall look at some of them later Beware! To use Parser Generator Tools efficiently you need to understand what they do and what the code they produce does Note, there can be bugs in the tools and/or in the code they generate!

62 So what can I do with this in my project? Language Design –What type of language are you designing? C like, Java Like, Scripting language, Functional.. Does it fit with existing paradigms/generations? Which language design criteria to emphasise? –Readability, writability, orthogonality, … –What is the syntax? Informal description CFG in EBNF (Human readable, i.e. may be ambiguous) LL(1) or (LALR(1)) grammar Grammar transformations –left-factorization, left-recursion elimination, substitution Compiler Implementation –Recursive Decent Parser –(Tools generated Parser)