COS 320 Compilers David Walker. The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Syntax Analysis: Create abstract syntax.

Slides:



Advertisements
Similar presentations
Parsing 4 Dr William Harrison Fall 2008
Advertisements

COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
Chap. 5, Top-Down Parsing J. H. Wang Mar. 29, 2011.
Mooly Sagiv and Roman Manevich School of Computer Science
6/12/2015Prof. Hilfinger CS164 Lecture 111 Bottom-Up Parsing Lecture (From slides by G. Necula & R. Bodik)
Top-Down Parsing.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
Parsing Compiler Baojian Hua Front End source code abstract syntax tree lexical analyzer parser tokens IR semantic analyzer.
CS Summer 2005 Top-down and Bottom-up Parsing - a whirlwind tour June 20, 2005 Slide acknowledgment: Radu Rugina, CS 412.
Context-Free Grammars Lecture 7
COS 320 Compilers David Walker.
Parsing III (Eliminating left recursion, recursive descent parsing)
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Prof. Bodik CS 164 Lecture 61 Building a Parser II CS164 3:30-5:00 TT 10 Evans.
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
COS 320 Compilers David Walker. Outline Last Week –Introduction to ML Today: –Lexical Analysis –Reading: Chapter 2 of Appel.
Professor Yihjia Tsai Tamkang University
COS 320 Compilers David Walker. last time context free grammars (Appel 3.1) –terminals, non-terminals, rules –derivations & parse trees –ambiguous grammars.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
CPSC 388 – Compiler Design and Construction
CSC3315 (Spring 2009)1 CSC 3315 Lexical and Syntax Analysis Hamid Harroud School of Science and Engineering, Akhawayn University
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
CPSC 388 – Compiler Design and Construction Parsers – Context Free Grammars.
Syntax and Semantics Structure of programming languages.
Top-Down Parsing - recursive descent - predictive parsing
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
Parsing III (Top-down parsing: recursive descent & LL(1) )
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CS 153 A little bit about LR Parsing. Background We’ve seen three ways to write parsers:  By hand, typically recursive descent  Using parsing combinators.
Syntax and Semantics Structure of programming languages.
COP4020 Programming Languages Syntax Prof. Robert van Engelen (modified by Prof. Em. Chris Lacher)
4 4 (c) parsing. Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces.
11 Chapter 4 Grammars and Parsing Grammar Grammars, or more precisely, context-free grammars, are the formalism for describing the structure of.
Exercise 1 A ::= B EOF B ::=  | B B | (B) Tokens: EOF, (, ) Generate constraints and compute nullable and first for this grammar. Check whether first.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Introduction to Parsing
LL(k) Parsing Compiler Baojian Hua
Notes on First and Follow Written by David Walker Edited by Phil Sweany.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Parsing — Part II (Top-down parsing, left-recursion removal) Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Top-Down Parsing.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Top-Down Predictive Parsing We will look at two different ways to implement a non- backtracking top-down parser called a predictive parser. A predictive.
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
CS 330 Programming Languages 09 / 25 / 2007 Instructor: Michael Eckmann.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Bernd Fischer RW713: Compiler and Software Language Engineering.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
Parsing III (Top-down parsing: recursive descent & LL(1) )
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Programming Languages Translator
CS510 Compiler Lecture 4.
4 (c) parsing.
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Presentation transcript:

COS 320 Compilers David Walker

The Front End Lexical Analysis: Create sequence of tokens from characters (Chap 2) Syntax Analysis: Create abstract syntax tree from sequence of tokens (Chap 3) Type Checking: Check program for well- formedness constraints LexerParser stream of characters stream of tokens abstract syntax Type Checker

Parsing with CFGs Context-free grammars are (often) given by BNF expressions (Backus-Naur Form) –Appel Chap 3.1 More powerful than regular expressions –Matching parens –Nested comments wait, we could do nested comments with ML-LEX! CFGs are good for describing the overall syntactic structure of programs.

Context-Free Grammars Context-free grammars consist of: –Set of symbols: terminals that denotes token types non-terminals that denotes a set of strings –Start symbol –Rules: left-hand side: non-terminal right-hand side: terminals and/or non-terminals rules explain how to rewrite non-terminals (beginning with start symbol) into terminals symbol ::= symbol symbol... symbol

Context-Free Grammars A string is in the language of the CFG if only if it is possible to derive that string using the following non-deterministic procedure: 1.begin with the start symbol 2.while any non-terminals exist, pick a non-terminal and rewrite it using a rule 3.stop when all you have left are terminals (and check you arrived at the string your were hoping to) Parsing is the process of checking that a string is in the CFG for your programming language. It is usually coupled with creating an abstract syntax tree.

non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: S ::= S; S S ::= ID := E S ::= PRINT ( Elist ) E ::= ID E ::= NUM E ::= E + E E ::= ( S, Elist ) Elist ::= E Elist ::= Elist, E

non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) 4. E ::= ID 5. E ::= NUM 6. E ::= E + E 7. E ::= ( S, Elist ) 8. Elist ::= E 9. Elist ::= Elist, E ID = NUM ; PRINT ( NUM ) Derive me!

non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) 4. E ::= ID 5. E ::= NUM 6. E ::= E + E 7. E ::= ( S, Elist ) 8. Elist ::= E 9. Elist ::= Elist, E S ID = NUM ; PRINT ( NUM ) Derive me!

non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) 4. E ::= ID 5. E ::= NUM 6. E ::= E + E 7. E ::= ( S, Elist ) 8. Elist ::= E 9. Elist ::= Elist, E S ID = E ID = NUM ; PRINT ( NUM ) Derive me!

non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) 4. E ::= ID 5. E ::= NUM 6. E ::= E + E 7. E ::= ( S, Elist ) 8. Elist ::= E 9. Elist ::= Elist, E S ID = E ID = NUM ; PRINT ( NUM ) Derive me! oops, can’t make progress

non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) 4. E ::= ID 5. E ::= NUM 6. E ::= E + E 7. E ::= ( S, Elist ) 8. Elist ::= E 9. Elist ::= Elist, E S ID = NUM ; PRINT ( NUM ) Derive me!

non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) 4. E ::= ID 5. E ::= NUM 6. E ::= E + E 7. E ::= ( S, Elist ) 8. Elist ::= E 9. Elist ::= Elist, E S S ; S ID = NUM ; PRINT ( NUM ) Derive me!

non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) 4. E ::= ID 5. E ::= NUM 6. E ::= E + E 7. E ::= ( S, Elist ) 8. Elist ::= E 9. Elist ::= Elist, E S S ; S ID := E ; S ID = NUM ; PRINT ( NUM ) Derive me!

non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) 4. E ::= ID 5. E ::= NUM 6. E ::= E + E 7. E ::= ( S, Elist ) 8. Elist ::= E 9. Elist ::= Elist, E S S ; S ID = E ; S ID = NUM ; S ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM ) Derive me!

non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) 4. E ::= ID 5. E ::= NUM 6. E ::= E + E 7. E ::= ( S, Elist ) 8. Elist ::= E 9. Elist ::= Elist, E S S ; S ID = E ; S ID = NUM ; S ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM ) Another way to derive the same string S S ; S S ; PRINT ( Elist ) S ; PRINT ( E ) S ; PRINT ( NUM ) ID = E ; PRINT ( NUM ) ID = NUM ; PRINT ( NUM ) left-most derivationright-most derivation

Parse Trees Representing derivations as trees –useful in compilers: Parse trees correspond quite closely (but not exactly) with abstract syntax trees we’re trying to generate difference: abstract syntax vs concrete (parse) syntax each internal node is labeled with a non-terminal each leaf note is labeled with a terminal each use of a rule in a derivation explains how to generate children in the parse tree from the parents

Parse Trees Example: S SS ID E:= NUM E L)( PRINT ; S S ; S ID = E ; S ID = NUM ; S ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM )

Parse Trees Example: 2 derivations, but 1 tree S SS ID E:= NUM E L)( PRINT ; S S ; S ID = E ; S ID = NUM ; S ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM ) S S ; S S ; PRINT ( Elist ) S ; PRINT ( E ) S ; PRINT ( NUM ) ID = E ; PRINT ( NUM ) ID = NUM ; PRINT ( NUM )

Parse Trees parse trees have meaning. –order of children, nesting of subtrees is significant S S S ID E:= NUM E L)( PRINT ; S S S ID E:= NUM E L)( PRINT ;

Ambiguous Grammars a grammar is ambiguous if the same sequence of tokens can give rise to two or more parse trees

Ambiguous Grammars non-terminals: E terminals: ID NUM PLUS MULT E ::= ID | NUM | E + E | E * E characters: * 6 tokens: NUM(4) PLUS NUM(5) MULT NUM(6) E EE NUM(4) E * E + NUM(6) NUM(5) I like using this notation where I avoid repeating E ::=

Ambiguous Grammars non-terminals: E terminals: ID NUM PLUS MULT E ::= ID | NUM | E + E | E * E characters: * 6 tokens: NUM(4) PLUS NUM(5) MULT NUM(6) E EE NUM(4) E * E + NUM(6) NUM(5) E E E NUM(6) E + E * NUM(5) NUM(4)

Ambiguous Grammars problem: compilers use parse trees to interpret the meaning of parsed expressions –different parse trees have different meanings –eg: (4 + 5) * 6 is not 4 + (5 * 6) –languages with ambiguous grammars are DISASTROUS; The meaning of programs isn’t well-defined! You can’t tell what your program might do! solution: rewrite grammar to eliminate ambiguity –fold precedence rules into grammar to disambiguate –fold associativity rules into grammar to disambiguate –other tricks as well

Building Parsers In theory classes, you might have learned about general mechanisms for parsing all CFGs –algorithms for parsing all CFGs are expensive –to compile 1/10/100 million-line applications, compilers must be fast. even for 10 thousand-line apps, speed is nice –sometimes 1/3 of compilation time is spent in parsing compiler writers have developed specialized algorithms for parsing the kinds of CFGs that you need to build effective programming languages –LL(k), LR(k) grammars can be parsed.

Recursive Descent Parsing Recursive Descent Parsing (Appel Chap 3.2): –aka: predictive parsing; top-down parsing –simple, efficient –can be coded by hand in ML quickly –parses many, but not all CFGs parses LL(1) grammars –Left-to-right parse; Leftmost-derivation; 1 symbol lookahead –key ideas: one recursive function for each non terminal each production becomes one clause in the function

1. S ::= IF E THEN S ELSE S 2. | BEGIN S L 3. | PRINT E 4. L ::= END 5. | ; S L 6. E ::= NUM = NUM non-terminals: S, E, L terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, = rules:

1. S ::= IF E THEN S ELSE S 2. | BEGIN S L 3. | PRINT E 4. L ::= END 5. | ; S L 6. E ::= NUM = NUM non-terminals: S, E, L terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, = rules: datatype token = NUM | IF | THEN | ELSE | BEGIN | END | PRINT | SEMI | EQ Step 1: Represent the tokens Step 2: build infrastructure for reading tokens from lexing stream val tok = ref (getToken ()) fun advance () = tok := getToken () fun eat t = if (! tok = t) then advance () else error ()

1. S ::= IF E THEN S ELSE S 2. | BEGIN S L 3. | PRINT E 4. L ::= END 5. | ; S L 6. E ::= NUM = NUM non-terminals: S, E, L terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, = rules: datatype token = NUM | IF | THEN | ELSE | BEGIN | END | PRINT | SEMI | EQ Step 1: Represent the tokens Step 2: build infrastructure for reading tokens from lexing stream val tok = ref (getToken ()) fun advance () = tok := getToken () fun eat t = if (! tok = t) then advance () else error ()

1. S ::= IF E THEN S ELSE S 2. | BEGIN S L 3. | PRINT E 4. L ::= END 5. | ; S L 6. E ::= NUM = NUM non-terminals: S, E, L terminals: NUM, IF, THEN, ELSE, BEGIN, END, PRINT, ;, = rules: fun S () = case !tok of IF => eat IF; E (); eat THEN; S (); eat ELSE; S () | BEGIN => eat BEGIN; S (); L () | PRINT => eat PRINT; E () and L () = case !tok of END => eat END | SEMI => eat SEMI; S (); L () and E () = eat NUM; eat EQ; eat NUM Step 3: write parser => one function per non-terminal; one clause per rule val tok = ref (getToken ()) fun advance () = tok := getToken () fun eat t = if (! tok = t) then advance () else error () datatype token = NUM | IF | THEN | ELSE | BEGIN | END | PRINT | SEMI | EQ

1. A ::= S EOF 2. | ID := E 3. | PRINT ( L ) 4. E ::= ID 5. | NUM 6. L ::= E 7. | L, E non-terminals: A, S, E, L rules: fun A () = S (); eat EOF and S () = case !tok of ID => eat ID; eat ASSIGN; E () | PRINT => eat PRINT; eat LPAREN; L (); eat RPAREN and E () = case !tok of ID => eat ID | NUM => eat NUM and L () = case !tok of ID => ??? | NUM => ???

problem predictive parsing only works for grammars where the first terminal symbol of each self-expression provides enough information to choose which production to use –LL(1) if !tok = ID, the parser cannot determine which production to use: 6. L ::= E (E could be ID) 7. | L, E (L could be E could be ID)

solution eliminate left-recursion rewrite the grammar so it parses the same language but the rules are different: L ::= E | L, E L ::= E M M ::=, E M | A ::= S EOF | ID := E | PRINT ( L ) E ::= ID | NUM A ::= S EOF | ID := E | PRINT ( L ) E ::= ID | NUM

eliminating left-recursion in general Original grammar form: Transformed grammar: X ::= base X ::= X repeat X ::= base Xnew Xnew ::= repeat Xnew Xnew ::= Strings: base repeat repeat...

Recursive Descent Parsing Unfortunately, left factoring doesn’t always work Questions: –how do we know when we can parse grammars using recursive descent? –Is there an algorithm for generating such parsers automatically?

Constructing RD Parsers To construct an RD parser, we need to know what rule to apply when –we have seen a non terminal X –we see the next terminal a in input We apply rule X ::= s when –a is the first symbol that can be generated by string s, OR –s reduces to the empty string (is nullable) and a is the first symbol in any string that can follow X

Constructing RD Parsers To construct an RD parser, we need to know what rule to apply when –we have seen a non terminal X –we see the next terminal a in input We apply rule X ::= s when –a is the first symbol that can be generated by string s, OR –s reduces to the empty string (is nullable) and a is the first symbol in any string that can follow X

Constructing Predictive Parsers 1. Y ::= 2. | bb 3. X ::= c 4. | Y Z 5. Z ::= d Xc Xb Xd non-terminal seen next terminalrule

Constructing Predictive Parsers 1. Y ::= 2. | bb 3. X ::= c 4. | Y Z 5. Z ::= d Xc3 Xb Xd non-terminal seen next terminalrule

Constructing Predictive Parsers 1. Y ::= 2. | bb 3. X ::= c 4. | Y Z 5. Z ::= d Xc3 Xb4 Xd non-terminal seen next terminalrule

Constructing Predictive Parsers 1. Y ::= 2. | bb 3. X ::= c 4. | Y Z 5. Z ::= d Xc3 Xb4 Xd4 non-terminal seen next terminalrule

Constricting Predictive Parsers in general, must compute: –for each production X ::= s, must determine if s can derive the empty string. if yes, X  Nullable –for each production X := s, must determine the set of all first terminals Q derivable from s Q  First(X) –for each non terminal X, determine all terminals symbols Q that immediately follow X Q  Follow(X)

Iterative Analysis Many compilers algorithms are iterative techniques. Iterative analysis applies when: –must compute a set of objects with some property P –P is defined inductively. ie, there are: base cases: objects o1, o2 “obviously” have property P inductive cases: if certain objects (o3, o4) have property P, this implies other objects (f o3; f o4) have property P –The number of objects in the set is finite or we can represent infinite collections using some finite notation & we can find effective termination conditions

Iterative Analysis general form: –initialize set S with base cases –applied inductive rules over and over until you reach a fixed point a fixed point is a set that does not change when you apply an inductive rule –Nullable, First and Follow sets can be determined through iteration –many program optimizations use iteration –worst-case complexity is bad –average-case complexity is good: iteration “usually” terminates in a couple of rounds

Computing Nullable Sets Non-terminal X is Nullable only if the following constraints are satisfied (computed using iterative analysis) –base case: if (X := ) then X is Nullable –inductive case: if (X := ABC...) and A, B, C,... are all Nullable then X is Nullable

Computing First Sets First(X) is computed iteratively –base case: if T is a terminal symbol then First (T) = {T} –inductive case: if X is a non-terminal and (X:= ABC...) then –First (X) = First (X) U First (ABC...) where First(ABC...) = F1 U F2 U F3 U... and »F1 = First (A) »F2 = First (B), if A is Nullable »F3 = First (C), if A is Nullable & B is Nullable »...

Computing Follow Sets Follow(X) is computed iteratively –base case: initially, we assume nothing in particular follows X –(Follow (X) is initially { }) –inductive case: if (Y := s1 X s2) for any strings s1, s2 then –Follow (X) = First (s2) U Follow (X) if (Y := s1 X s2) for any strings s1, s2 then –Follow (X) = Follow(Y) U Follow (X), if s2 is Nullable

building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Z Y X

building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Zno Yyes Xno base case

building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Zno Yyes Xno after one round of induction, we realize we have reached a fixed point

building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod Yyesc Xnoa,b base case

building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b Yyesc Xnoa,b after one round of induction, no fixed point

building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b Yyesc Xnoa,b after two rounds of induction, no more changes ==> fixed point

building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesc{ } Xnoa,b{ } base case

building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesce,d,a,b Xnoa,bc,e,d,a,b after one round of induction, no fixed point

building a predictive parser Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesce,d,a,b Xnoa,bc,e,d,a,b after two rounds of induction, fixed point (but notice, computing Follow(X) before Follow (Y) would have required 3 rd round)

Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesce,d,a,b Xnoa,bc,e,d,a,b Grammar: Computed Sets: Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: abcde Z Y X if T  First(s) then enter (X ::= s) in row X, col T if s is Nullable and T  Follow(X) enter (X ::= s) in row X, col T

Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesce,d,a,b Xnoa,bc,e,d,a,b Grammar: Computed Sets: Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: abcde ZZ ::= XYZ Y X if T  First(s) then enter (X ::= s) in row X, col T if s is Nullable and T  Follow(X) enter (X ::= s) in row X, col T

Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesce,d,a,b Xnoa,bc,e,d,a,b Grammar: Computed Sets: Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: abcde ZZ ::= XYZ Z ::= d Y X if T  First(s) then enter (X ::= s) in row X, col T if s is Nullable and T  Follow(X) enter (X ::= s) in row X, col T

Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesce,d,a,b Xnoa,bc,e,d,a,b Grammar: Computed Sets: Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: abcde ZZ ::= XYZ Z ::= d YY ::= c X if T  First(s) then enter (X ::= s) in row X, col T if s is Nullable and T  Follow(X) enter (X ::= s) in row X, col T

Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesce,d,a,b Xnoa,bc,e,d,a,b Grammar: Computed Sets: Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: abcde ZZ ::= XYZ Z ::= d YY ::= Y ::= cY ::= X if T  First(s) then enter (X ::= s) in row X, col T if s is Nullable and T  Follow(X) enter (X ::= s) in row X, col T

Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesce,d,a,b Xnoa,bc,e,d,a,b Grammar: Computed Sets: Build parsing table where row X, col T tells parser which clause to execute in function X with next-token T: abcde ZZ ::= XYZ Z ::= d YY ::= Y ::= cY ::= XX ::= aX ::= b Y e if T  First(s) then enter (X ::= s) in row X, col T if s is Nullable and T  Follow(X) enter (X ::= s) in row X, col T

Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesce,d,a,b Xnoa,bc,e,d,a,b Grammar: Computed Sets: What are the blanks? abcde ZZ ::= XYZ Z ::= d YY ::= Y ::= cY ::= XX ::= aX ::= b Y e

Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesce,d,a,b Xnoa,bc,e,d,a,b Grammar: Computed Sets: What are the blanks? --> syntax errors abcde ZZ ::= XYZ Z ::= d YY ::= Y ::= cY ::= XX ::= aX ::= b Y e

Z ::= X Y Z Z ::= d Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesce,d,a,b Xnoa,bc,e,d,a,b Grammar: Computed Sets: Is it possible to put 2 grammar rules in the same box? abcde ZZ ::= XYZ Z ::= d YY ::= Y ::= cY ::= XX ::= aX ::= b Y e

Z ::= X Y Z Z ::= d Z ::= d e Y ::= c Y ::= X ::= a X ::= b Y e nullablefirstfollow Znod,a,b{ } Yyesce,d,a,b Xnoa,bc,e,d,a,b Grammar: Computed Sets: Is it possible to put 2 grammar rules in the same box? abcde ZZ ::= XYZ Z ::= d Z ::= d e YY ::= Y ::= cY ::= XX ::= aX ::= b Y e

predictive parsing tables if a predictive parsing table constructed this way contains no duplicate entries, the grammar is called LL(1) –Left-to-right parse, Left-most derivation, 1 symbol lookahead if not, of the grammar is not LL(1) in LL(k) parsing table, columns include every k- length sequence of terminals: aaabbabbacca...

another trick Previously, we saw that grammars with left-recursion were problematic, but could be transformed into LL(1) in some cases the example non-LL(1) grammar we just saw: how do we fix it? Z ::= X Y Z Z ::= d Z ::= d e Y ::= c Y ::= X ::= a X ::= b Y e

another trick Previously, we saw that grammars with left-recursion were problematic, but could be transformed into LL(1) in some cases the example non-LL(1) grammar we just saw: solution here is left-factoring: Z ::= X Y Z Z ::= d Z ::= d e Y ::= c Y ::= X ::= a X ::= b Y e Z ::= X Y Z Z ::= d W Y ::= c Y ::= X ::= a X ::= b Y e W ::= W ::= e

summary CFGs are good at specifying programming language structure parsing general CFGs is expensive so we define parsers for simple classes of CFG –LL(k), LR(k) we can build a recursive descent parser for LL(k) grammars by: –computing nullable, first and follow sets –constructing a parse table from the sets –checking for duplicate entries, which indicates failure –creating an ML program from the parse table if parser construction fails we can –rewrite the grammar (left factoring, eliminating left recursion) and try again –try to build a parser using some other method