Exercise: Build Lexical Analyzer Part For these two tokens, using longest match, where first has the priority: binaryToken ::= ( z |1) * ternaryToken ::=

Slides:

Advertisements

Similar presentations

Grammar vs Recursive Descent Parser

Advertisements

Exercise: Balanced Parentheses

CS5371 Theory of Computation

C. Varela; Adapted w/permission from S. Haridi and P. Van Roy1 Declarative Computation Model Defining practical programming languages Carlos Varela RPI.

Context-Free Grammars Lecture 7

1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.

COP4020 Programming Languages

(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.

Finite State Automata are Limited Let us use (context-free) grammars!

Problem of the DAY Create a regular context-free grammar that generates L= {w  {a,b}* : the number of a’s in w is not divisible by 3} Hint: start by designing.

Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee.

1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.

1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.

CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.

1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.

Exercise 1 Consider a language with the following tokens and token classes: ident ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP.

Automating Construction of Lexers. Example in javacc TOKEN: { ( | | "_")* > | ( )* > | } SKIP: { " " | "\n" | "\t" } --> get automatically generated code.

Grammars CPSC 5135.

Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.

Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.

1 Syntax In Text: Chapter 3. 2 Chapter 3: Syntax and Semantics Outline Syntax: Recognizer vs. generator BNF EBNF.

A Rule of While Language Syntax // Where things work very nicely for recursive descent! statmt ::= println ( stringConst, ident ) | ident = expr | if (

COP4020 Programming Languages Syntax Prof. Robert van Engelen (modified by Prof. Em. Chris Lacher)

Bernd Fischer RW713: Compiler and Software Language Engineering.

Lexer input and Output i d 3 = 0 LF w i d 3 = 0 LF w id3 = 0 while ( id3 < 10 ) id3 = 0 while ( id3 < 10 ) lexer Stream of Char-s ( lazy List[Char] ) class.

Introduction to Parsing

CPS 506 Comparative Programming Languages Syntax Specification.

D Goforth COSC Translating High Level Languages.

Syntax The Structure of a Language. Lexical Structure The structure of the tokens of a programming language The scanner takes a sequence of characters.

CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 3: Introduction to Syntactic Analysis.

Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.

1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.

Overview of Previous Lesson(s) Over View  In our compiler model, the parser obtains a string of tokens from the lexical analyzer & verifies that the.

Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.

1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=

Exercise Solution for Exercise (a) {1,2} {3,4} a b {6} a {5,6,1} {6,2} {4} {3} {5,6} { } b a b a a b b a a b a,b b b a.

Syntax Analyzer (Parser)

Given NFA A, find first(L(A))

1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.

COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.

Comp 311 Principles of Programming Languages Lecture 2 Syntax Corky Cartwright August 26, 2009.

CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.

Spring 16 CSCI 4430, A Milanova 1 Announcements HW1 will be out this evening Due Monday, 2/8 Submit in HW Server AND at start of class on 2/8 A review.

CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.

Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.

Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.

Chapter 3 – Describing Syntax CSCE 343. Syntax vs. Semantics Syntax: The form or structure of the expressions, statements, and program units. Semantics:

1 Context-Free Languages & Grammars (CFLs & CFGs) Reading: Chapter 5.

Last Chapter Review Source code characters combination lexemes tokens pattern Non-Formalization Description Formalization Description Regular Expression.

CS 3304 Comparative Languages

Chapter 3 – Describing Syntax

Introduction to Parsing

Programming Languages Translator

CS510 Compiler Lecture 4.

Chapter 2 :: Programming Language Syntax

Chapter 3 – Describing Syntax

PROGRAMMING LANGUAGES

Lexical and Syntax Analysis

COP4020 Programming Languages

Lecture 7: Introduction to Parsing (Syntax Analysis)

R.Rajkumar Asst.Professor CSE

CS 3304 Comparative Languages

Lecture 4: Lexical Analysis & Chomsky Hierarchy

CS 3304 Comparative Languages

Chapter 2 :: Programming Language Syntax

Chapter 3 Describing Syntax and Semantics.

Chapter 2 :: Programming Language Syntax

COMPILER CONSTRUCTION

Presentation transcript:

Exercise: Build Lexical Analyzer Part For these two tokens, using longest match, where first has the priority: binaryToken ::= ( z |1) * ternaryToken ::= (0|1|2) * 1111z1021z1 

Lexical Analyzer binaryToken ::= ( z |1) * ternaryToken ::= (0|1|2) * 1111z1021z1 

Exercise: Realistic Integer Literals Integer literals are in three forms in Scala: decimal, hexadecimal and octal. The compiler discriminates different classes from their beginning. –Decimal integers are started with a non-zero digit. –Hexadecimal numbers begin with 0x or 0X and may contain the digits from 0 through 9 as well as upper or lowercase digits A to F afterwards. –If the integer number starts with zero, it is in octal representation so it can contain only digits 0 through 7. –l or L at the end of the literal shows the number is Long. Draw a single DFA that accepts all the allowable integer literals. Write the corresponding regular expression.

Exercise Let L be the language of strings A = { 2. Construct a DFA that accepts L Describe how the lexical analyzer will tokenize the following inputs. 1) <===== 2) ==<==<==<==<== 3) <=====<

More Questions Find automaton or regular expression for: –Sequence of open and closed parentheses of even length? –as many digits before as after decimal point? –Sequence of balanced parentheses ( ( () ) ())- balanced ( ) ) ( ( ) - not balanced –Comment as a sequence of space,LF,TAB, and comments from // until LF –Nested comments like /*... /* */ … */

Automaton that Claims to Recognize { a n b n | n >= 0 } Make the automaton deterministic Let the resulting DFA have K states, |Q|=K Feed it a, aa, aaa, …. Let q i be state after reading a i q 0, q 1, q 2,..., q K This sequence has length K+1 -> a state must repeat q i = q i+p p > 0 Then the automaton should accept a i+p b i+p. But then it must also accept a i b i+p because it is in state after reading a i as after a i+p. So it does not accept the given language.

Limitations of Regular Languages Every automaton can be made deterministic Automaton has finite memory, cannot count Deterministic automaton from a given state behaves always the same If a string is too long, deterministic automaton will repeat its behavior

Pumping Lemma Each finite language is regular (why?) To prove that an infinite L is not regular: –suppose it is regular –let the automaton recognizing it have K states –long words will make the automaton loop –shortest cycle has length K or less –if adding or removing a loop changes if w is in L, we have contradiction, e.g. uvw in L, uw not in L Pumping lemma: a way to do proofs as above

Pumping Lemma If L is a regular language, then there exists a positive integer p (the pumping length) such that every string s  L for which |s| ≥ p, can be partitioned into three pieces, s = x y z, such that |y| > 0 |xy| ≤ p ∀ i ≥ 0. xy i z  L Let’s try again: { a n b n | n >= 0 }

Context-Free Grammars Σ - terminals Symbols with recursive defs - nonterminals Rules are of form N ::= v v is sequence of terminals and non-terminals Derivation starts from a starting symbol Replaces non-terminals with right hand side –terminals and –non-terminals

Context Free Grammars S ::= "" | a S b(for a n b n ) Example of a derivation S => => aaabbb Corresponding derivation tree:

Context Free Grammars S ::= "" | a S b(for a n b n ) Example of a derivation S => aSb => a aSb b => aa aSb bb => aaabbb Corresponding derivation tree: leaves give result

Grammars for Natural Language Statement = Sentence "." Sentence ::= Simple | Belief Simple ::= Person liking Person liking ::= "likes" | "does" "not" "like" Person ::= "Barack" | "Helga" | "John" | "Snoopy" Belief ::= Person believing "that" Sentence but believing ::= "believes" | "does" "not" "believe" but ::= "" | "," "but" Sentence Exercise: draw the derivation tree for: John does not believe that Barack believes that Helga likes Snoopy, but Snoopy believes that Helga likes Barack.  can also be used to automatically generate essays

Balanced Parentheses Grammar Sequence of balanced parentheses ( ( () ) ())- balanced ( ) ) ( ( ) - not balanced Exercise: give the grammar and example derivation

Balanced Parantheses Grammar

Remember While Syntax program ::= statmt* statmt ::= println( stringConst, ident ) | ident = expr | if ( expr ) statmt (else statmt) ? | while ( expr ) statmt | { statmt* } expr ::= intLiteral | ident | expr (&& | < | == | + | - | * | / | % ) expr | ! expr | - expr

Eliminating Additional Notation Grouping alternatives s ::= P | Q instead ofs ::= P s ::= Q Parenthesis notation expr (&& | < | == | + | - | * | / | % ) expr Kleene star within grammars { statmt* } Optional parts if ( expr ) statmt (else statmt) ?

Compiler (scalac, gcc) Compiler (scalac, gcc) Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 } Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 } source code Compiler i d 3 = 0 LF w i d 3 = 0 LF w id3 = 0 while ( id3 < 10 ) id3 = 0 while ( id3 < 10 ) lexer characters words (tokens) trees parser assign while i 0 + * 3 7i assign a[i] < i 10

Recursive Descent Parsing

Recursive Descent is Decent descent = a movement downward decent = adequate, good enough Recursive descent is a decent parsing technique –can be easily implemented manually based on the grammar (which may require transformation) –efficient (linear) in the size of the token sequence Correspondence between grammar and code –concatenation  ; –alternative (|)  if –repetition (*)  while –nonterminal  recursive procedure

A Rule of While Language Syntax statmt ::= println ( stringConst, ident ) | ident = expr | if ( expr ) statmt (else statmt) ? | while ( expr ) statmt | { statmt* }

Parser for the statmt (rule -> code) def skip(t : Token) = if (lexer.token == t) lexer.next else error(“Expected”+ t) // statmt ::= def statmt = { // println ( stringConst, ident ) if (lexer.token == Println) { lexer.next; skip(openParen); skip(stringConst); skip(comma); skip(identifier); skip(closedParen) // | ident = expr } else if (lexer.token == Ident) { lexer.next; skip(equality); expr // | if ( expr ) statmt (else statmt) ? } else if (lexer.token == ifKeyword) { lexer.next; skip(openParen); expr; skip(closedParen); statmt; if (lexer.token == elseKeyword) { lexer.next; statmt } // | while ( expr ) statmt

Continuing Parser for the Rule // | while ( expr ) statmt // | { statmt* } } else if (lexer.token == whileKeyword) { lexer.next; skip(openParen); expr; skip(closedParen); statmt } else if (lexer.token == openBrace) { lexer.next; while (isFirstOfStatmt) { statmt } skip(closedBrace) } else { error(“Unknown statement, found token ” + lexer.token) }

First Symbols for Non-terminals statmt ::= println ( stringConst, ident ) | ident = expr | if ( expr ) statmt (else statmt) ? | while ( expr ) statmt | { statmt* } Consider a grammar G and non-terminal N L G (N) = { set of strings that N can derive } e.g. L(statmt) – all statements of while language first(N) = { a | aw in L G (N), a – terminal, w – string of terminals} first(statmt) = { println, ident, if, while, { } (we will see how to compute first in general)

Compiler (scalac, gcc) Compiler (scalac, gcc) Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 } Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 } source code Compiler Construction i d 3 = 0 LF w i d 3 = 0 LF w id3 = 0 while ( id3 < 10 ) id3 = 0 while ( id3 < 10 ) lexer characters words (tokens) trees parser assign while i 0 + * 3 7i assign a[i] < i 10

Trees for Statements statmt ::= println ( stringConst, ident ) | ident = expr | if ( expr ) statmt (else statmt) ? | while ( expr ) statmt | { statmt* } abstract class Statmt case class PrintlnS(msg : String, var : Identifier) extends Statmt case class Assignment(left : Identifier, right : Expr) extends Statmt case class If(cond : Expr, trueBr : Statmt, falseBr : Option[Statmt]) extends Statmt case class While(cond : Expr, body : Expr) extends Statmt case class Block(sts : List[Statmt]) extends Statmt

Our Parser Produced Nothing  def skip(t : Token) : unit = if (lexer.token == t) lexer.next else error(“Expected”+ t) // statmt ::= def statmt : unit = { // println ( stringConst, ident ) if (lexer.token == Println) { lexer.next; skip(openParen); skip(stringConst); skip(comma); skip(identifier); skip(closedParen) // | ident = expr } else if (lexer.token == Ident) { lexer.next; skip(equality); expr

Parser Returning a Tree def expect(t : Token) : Token = if (lexer.token == t) { lexer.next;t} else error(“Expected”+ t) // statmt ::= def statmt : Statmt = { // println ( stringConst, ident ) if (lexer.token == Println) { lexer.next; skip(openParen); val s = getString(expect(stringConst)); skip(comma); val id = getIdent(expect(identifier)); skip(closedParen) PrintlnS(s, id) // | ident = expr } else if (lexer.token.class == Ident) { val lhs = getIdent(lexer.token) lexer.next; skip(equality); val e = expr Assignment(lhs, e)

Constructing Tree for ‘if’ def expr : Expr = { … } // statmt ::= def statmt : Statmt = { … // if ( expr ) statmt (else statmt) ? // case class If(cond : Expr, trueBr: Statmt, falseBr: Option[Statmt]) } else if (lexer.token == ifKeyword) { lexer.next; skip(openParen); val c = expr; skip(closedParen); val trueBr = statmt val elseBr = if (lexer.token == elseKeyword) { lexer.next; Some(statmt) } else Nothing If(c, trueBr, elseBr) // made a tree node }

Task: Constructing Tree for ‘while’ def expr : Expr = { … } // statmt ::= def statmt : Statmt = { … // while ( expr ) statmt // case class While(cond : Expr, body : Expr) extends Statmt } else if (lexer.token == WhileKeyword) { } else

Here each alternative started with different token statmt ::= println ( stringConst, ident ) | ident = expr | if ( expr ) statmt (else statmt) ? | while ( expr ) statmt | { statmt* } What if this is not the case?

Left Factoring Example: Function Calls statmt ::= println ( stringConst, ident ) | ident = expr | if ( expr ) statmt (else statmt) ? | while ( expr ) statmt | { statmt* } | ident (expr (, expr )* ) code to parse the grammar: } else if (lexer.token.class == Ident) { ??? } foo = 42 + x foo ( u, v )

Left Factoring Example: Function Calls statmt ::= println ( stringConst, ident ) | ident assignmentOrCall | if ( expr ) statmt (else statmt) ? | while ( expr ) statmt | { statmt* } assignmentOrCall ::= “=“ expr | (expr (, expr )* ) code to parse the grammar: } else if (lexer.token.class == Ident) { val id = getIdentifier(lexer.token); lexer.next assignmentOrCall(id) } // Factoring pulls common parts from alternatives