Presentation Outline Review of Lexical analysis

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Compiler Construction
A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
By Neng-Fa Zhou Syntax Analysis lexical analyzer syntax analyzer semantic analyzer source program tokens parse tree parser tree.
1 Predictive parsing Recall the main idea of top-down parsing: Start at the root, grow towards leaves Pick a production and try to match input May need.
1 Chapter 4: Top-Down Parsing. 2 Objectives of Top-Down Parsing an attempt to find a leftmost derivation for an input string. an attempt to construct.
Top-Down Parsing.
– 1 – CSCE 531 Spring 2006 Lecture 7 Predictive Parsing Topics Review Top Down Parsing First Follow LL (1) Table construction Readings: 4.4 Homework: Program.
Syntax and Semantics Structure of programming languages.
Chapter 9 Syntax Analysis Winter 2007 SEG2101 Chapter 9.
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
Chapter 5 Top-Down Parsing.
BİL 744 Derleyici Gerçekleştirimi (Compiler Design)1 Syntax Analyzer Syntax Analyzer creates the syntactic structure of the given source program. This.
1 Compiler Construction Syntax Analysis Top-down parsing.
Syntax and Semantics Structure of programming languages.
Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING DEPARTMENT GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE.
Syntax Analyzer (Parser)
Parsing methods: –Top-down parsing –Bottom-up parsing –Universal.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Topic #4: Syntactic Analysis (Parsing) CSC 338 – Compiler Design and implementation Dr. Mohamed Ben Othman ( )
Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad 1.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
Syntax and Semantics Structure of programming languages.
Parsing COMP 3002 School of Computer Science. 2 The Structure of a Compiler syntactic analyzer code generator program text interm. rep. machine code tokenizer.
lec02-parserCFG May 8, 2018 Syntax Analyzer
Parsing & Context-Free Grammars
Chapter 4 - Parsing CSCE 343.
Programming Languages Translator
CS510 Compiler Lecture 4.
Compiler design Bottom-up parsing Concepts
Introduction to Parsing (adapted from CS 164 at Berkeley)
Table-driven parsing Parsing performed by a finite state machine.
Syntactic Analysis and Parsing
Compiler Construction
Introduction to Top Down Parser
Top-down parsing cannot be performed on left recursive grammars.
Syntax Analysis Chapter 4.
Compiler Construction
CS 404 Introduction to Compiler Design
Compiler design Bottom-up parsing: Canonical LR and LALR
UNIT 2 - SYNTAX ANALYSIS Role of the parser Writing grammars
Bottom-Up Syntax Analysis
Top-Down Parsing.
4 (c) parsing.
Lexical and Syntax Analysis
Top-Down Parsing CS 671 January 29, 2008.
Lecture 7 Predictive Parsing
CS 540 George Mason University
Syntax Analysis source program lexical analyzer tokens syntax analyzer
4d Bottom Up Parsing.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Top-Down Parsing The parse tree is created top to bottom.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Chapter 4 Top Down Parser.
Bottom Up Parsing.
R.Rajkumar Asst.Professor CSE
LL and Recursive-Descent Parsing Hal Perkins Autumn 2011
Bottom-Up Parsing “Shift-Reduce” Parsing
Lecture 7 Predictive Parsing
BNF 9-Apr-19.
Context Free Grammar – Quick Review
LL and Recursive-Descent Parsing Hal Perkins Autumn 2009
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
LL and Recursive-Descent Parsing Hal Perkins Winter 2008
lec02-parserCFG May 27, 2019 Syntax Analyzer
Compiler design Bottom-up parsing: Canonical LR and LALR
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Presentation Outline Review of Lexical analysis Introduction to Syntax Analysis Context Free Grammar Parsing Grammar Ambiguity Top Down Parser Bottom Up Parser

Introduction to Syntax Analysis Every programming language has precise rules that prescribe the syntactic structure of well-formed programs. The Syntax Analysis phase of a compiler has two major goals: Check the input program to determine whether it is syntactically correct. Produce either a complete parse tree or at least trace the structure of the complete parse tree for syntactically correct input.

Some Basic Definitions syntax: the way in which words are put together to form phrases, clauses, or sentences. The rules governing the formation of statements in a programming language. syntax analysis: the task concerned with fitting a sequence of tokens into a specified syntax. parsing: To break a sentence down into its component parts of speech with an explanation of the form, function, and syntactical relationship of each part.

Some Basic Definitions Expression(s) Syntactic structure: the syntactic structure of programming languages can be informally expressed by the following diagram. Program Block(s) Statement(s) Expression(s) Token(s)

Context-free Grammars 62

CFG = (V, T, P, S) Context-free Grammars Definition ❚ V : Finite set of variables/non-terminals ❚ T : Alphabet/Finite set of terminals ❚ P : Finite set of rules/productions ❚ S : Start symbol S V V  T   Rule : A   A  V  (V  T) *

Context-free Grammars Definition ❚ Context-freeness: An A-rule can be applied whenever A occurs in a string, irrespective of the context (that is, non- terminals and terminals around A).

v  w1  ...  wn  w uAv  uv Context-free Grammars Derivation ❚ One-step Derivation uAv  uv A   ❚ w is derivable from v in CFG, if there is a finite sequence of rule applications such that: v  w1  ...  wn  w In this case we can write this derivation as v * w

Context-free Grammars Derivation The derivation as v * w is called: Leftmost derivation: if in every step the leftmost variable is selected for reduction Rightmost derivation: if in every step the rightmost variable is selected for reduction

Context-free Grammars Example 1 Let G = ({S}, {a,b},S,P) with for P: ❚S→aSa, and S→bSb, and S→λ. ❚Some derivations from this grammar: ❚ S  aSa  aaSaa  aabSbaa  aabbaa ❚ S  bSb  baSab  baab, and so on. ❚In general S …. wwR for w{a,b}*. L(G)={wwR : w{a,b}*}

Rightmost Derivation : S  A  A b  Ab  aAb  ab A A Context-free Grammars G  ({S, A, B},{a, b}, {S  AB, Example 2 | , | }, A → aA   λ B → Bb   λ S ) L(G)  L(a * b*) Leftmost Derivation : S  B  a B  aB  aBb  ab Rightmost Derivation : S  A  A b  Ab  aAb  ab A A B B

Context-free Grammars Example 4 Consider the CFG: G  {{S},{a, b},{S  , S  aSb}, S ) ❚ Derivation of aabb is S  aSb  aaSbb  aabb

S  aSa | aBa B  bB | b L(B)  {bm | m  0} Context-free Grammars Example 5 Consider the CFG G: S  aSa | aBa B  bB | b L(B)  {bm | m  0} L(S)  {anbman | n  0  m  0} L(G)= L(S)

L(G1)  {anbman | n  0  m  0} L(G )  {(ab)n cn | n  0} Context-free Grammars Example 6 S  aSa | B B  bB |  Consider the CFG G1: The language generated by G1 is: L(G1)  {anbman | n  0  m  0} Consider the CFG G2: S  abSc |  The language generated by G2 is: L(G )  {(ab)n cn | n  0} 2

Context-free Grammars Example 1 Consider the CFG G: G  {{S},{a, b},{S  , S  aSb}, S ) ❚ The derivation of aabb is: S  aSb  aaSbb  aabb S ❚ Derivation tree is S a b a  b

Context-free Grammars Example 2 A  0A1  00A11  00B11  00#11 A A A B 0 0 # 1 1

Context-free Grammars Example 3 <EXPR> → <EXPR> + <EXPR> <EXPR> → <EXPR> * <EXPR> <EXPR> → ( <EXPR> ) <EXPR> → a Build a parse tree for a + a * a <EXPR> <EXPR> <EXPR> <EXPR> a + a * a

Recognition of strings in a language CFG: Parsing Recognition of strings in a language

CFG: Parsing Generative aspect of CFG: By now it should be clear how, from a CFG G, you can derive strings wL(G). Analytical aspect: Given a CFG G and a string w, how do you decide if wL(G) and –if so– how do you determine the derivation tree or the sequence of production rules that produce w? This is called the problem of parsing.

CFG: Parsing Parser Top-down parsers Bottom-up parsers A program that determines if a string   L(G) by constructing a derivation. Equivalently, it searches the graph of G. Top-down parsers Constructs the derivation tree from root to leaves. Leftmost derivation. Bottom-up parsers Constructs the derivation tree from leaves to root. Rightmost derivation in reverse.

Parse trees (=Derivation Tree) CFG: Parsing Parse trees (=Derivation Tree) A parse tree is a graphical representation of a derivation sequence of a sentential form. Tree nodes represent symbols of the grammar (nonterminals or terminals) and tree edges represent derivation steps.

CFG: Parsing Parse Tree: Example Given the following grammar: E  E + E | E * E | ( E ) | - E | id Is the string -(id + id) a sentence in this grammar? Yes because there is the following derivation: E  -E  -(E)  -(E + E)  -(id + id)

CFG: Parsing Parse Tree: Example 1 E  E + E | E * E | ( E ) | - E | id Lets examine this derivation: E  -E  -(E)  -(E + E)  -(id + id) E E E E E - E - E - E - E ( E ) ( E ) ( E ) E + E E + E This is a top-down derivation because we start building the parse tree at the top id id parse tree

S  SS  aS  ab CFG: Parsing Parse Tree: Example 2 S S S S S S S S S S  SS | a | b ab  L(S ) Parse Tree: Example 2 S S S S Derivation Trees S S S S S S a a b Leftmost derivation S  SS  aS  ab

S  SS  Sb  ab CFG: Parsing S S S S S S S S S S b a b S S S S a b S Rightmost derivation S  SS  Sb  ab S S Parse Tree: Example 2 S S Derivation Trees S S S S S S b a b S S S S Rightmost Derivation in Reverse a b S S

CFG: Parsing Example 3 Consider the CFG grammar G S  A A  T | A  T T  b | ( A) Show that (b)+b  L(G)? S S S S S S S S A A A A A A A A T A A T ( b )+ b A T A T A T T A A T T T T T T ( b )+ 11 + + ( ) + ( ) +

Practical Parsers CFG: Parsing Top-down parsers : LL(k) languages Language/Grammar designed to enable deterministic (directed and backtrack-free) searches. Top-down parsers : LL(k) languages E.g., Pascal, Ada, etc. Better error diagnosis and recovery. Bottom-up parsers : LALR(1), LR(k) languages E.g., C/C++, Java, etc. Handles left recursion in the grammar. Backtracking parsers E.g., Prolog interpreter. 12

Grammar Ambiguity Definition Definition: a string is derived ambiguously in a context-free grammar if it has two or more different parse trees Definition: a grammar is ambiguous if it generates some string ambiguously 15

A grammar is ambiguous if some strings are derived ambiguously. Grammar Ambiguity A string wL(G) is derived ambiguously if it has more than one derivation tree (or equivalently: if it has more than one leftmost derivation (or rightmost)). A grammar is ambiguous if some strings are derived ambiguously. Typical example: rule S  0 | 1 | S+S | SS S  S+S  SS+S  0S+S  01+S  01+1 versus S  SS  0S  0S+S  01+S  01+1 16

S S S  S S + S S 1 S + S  S 1 1 1 Grammar Ambiguity The ambiguity of 01+1 is shown by the two different parse trees: S S S  S S + S S 1 S + S  S 1 1 1 17

S + 1 Grammar Ambiguity Note that the two different derivations: S  S+S  0+S  0+1 and S  S+S  S+1  0+1 do not constitute an ambiguous string 0+1 as have the same parse tree: S + 1 Ambiguity causes troubles when trying to interpret strings like: “She likes men who love women who don't smoke.” Solutions: Use parentheses, or use precedence rules such as a+(bc) = a+bc ≠ (a+b)c. 18

<EXPR> <EXPR> Grammar Ambiguity Example <EXPR> → <EXPR> + <EXPR> <EXPR> → <EXPR> * <EXPR> <EXPR> → ( <EXPR> ) <EXPR> → a Build a parse tree for a + a * a <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> a a 19 a + a * + a * a

Find a derivation for the expression: id + id * id Grammar Ambiguity Example E  E + E | E * E | ( E ) | - E | id Find a derivation for the expression: id + id * id E E E E E + E E + E E + E E * E id E * E id id E E E E E * E E * E E * E E + E E + E id Which derivation tree is correct? id id

Find a derivation for the expression: id + id * id Grammar Ambiguity Example E  E + E | E * E | ( E ) | - E | id Find a derivation for the expression: id + id * id E According to the grammar, both are correct. E + E id E * E A grammar that produces more than one parse tree for any input sentence is said to be an ambiguous grammar. id id E E + E E * E id id id

* has precedence over + Grammar Ambiguity One way to resolve ambiguity is to associate precedence to the operators. Example * has precedence over + 1 + 2 * 3 = 1 + (2 * 3) 1 + 2 * 3 ≠ (1 + 2)*3 Associativity and precedence information is typically used to disambiguate non-fully parenthesized expressions containing unary prefix/postfix operators or binary infix operators.

if B1 then if B2 then S1 else S2 vs Grammar Ambiguity Example stm  if expr | if expr then stm else stm Grammar: if B1 then if B2 then S1 else S2 vs Ambiguity:

P  aPb |  C  cC |  Q  bQc |  A  aA |  Grammar Ambiguity Quiz 1 S  PC | AQ P  aPb |  C  cC |  Q  bQc |  A  aA |  Is the following grammar ambiguous? Yes: consider the string abc

S  aS | Sb | ab |  Grammar Ambiguity Quiz 2 Yes: consider ab Is the following grammar ambiguous? Yes: consider ab

S  SS |   Grammar Ambiguity Quiz S Yes SS Cyclic structure SSS Is the following grammar ambiguous? S Yes SS  Cyclic structure SSS (Illustrates ambiguous grammar with cycles.) 27

Grammar Applications Programming Languages Programming languages are often defined as Context Free Grammars in Backus-Naur Form (BNF). Example: <if_statement> ::= IF <expression><then_clause><else_clause> <expression> ::= <term> | <expression>+<term> <term> ::= <factor>|<term>*<factor> The variables as indicated by <a variable name> The arrow → is replaces by ::= Here, IF, + and * are terminals. “Syntax Checking” is checking if a program is an element of the CFG of the programming language.

This part of the compiler use the Grammar Grammar Applications Compiler Syntax Analysis This part of the compiler use the Grammar Compiler: Source Program Scanner Parser Semantic Analy. Inter. Code Gen. Optimizer Code Generation 33 Target Program

get next lexical analyzer next token Syntax analyzer token Source 1. 2. Uses Regular Expressions to define tokens Uses Finite Automata to recognize tokens next char lexical analyzer next token Syntax analyzer get next get next char token Source Program symbol table (Contains a record for each identifier) Uses Top-down parsing or Bottom-up parsing To construct a Parse tree

Syntax errors Parsing errors include: 1. 1.  misspelling of identifier, keyword, or operator 2. arithmetic expression with unbalanced parentheses 3. punctuation errors such as using comma in place of semicolon 4. missing brackets, semicolons, etc.

Error recovery The error handler in a parser has the following jobs: 1. report the presence of errors clearly and accurately 2. quick recovery of errors 3. not to slow the processing of programs

Example: The following C code shows some examples of syntax errors: #include<stdio.h> int max(int I int j) { if(i>j) return(i) return(j); } void main() int x, y scanf("%d %d", x, y); printf("%d", max(x,y) ; ; ,

Example: A typical compilation of this erroneous program gives the following list of errors: 1. 2. 3. 4. 5. 6. error C2235: C2059: C2239: C2078: C2660: C2143: ';' in formal parameter list syntax error : ')' unexpected token 'f' following declaration too many initializers of 'j' 'max' : function does not take 2 parameters syntax error : missing ')' before ';'

Example: The correct version of this program is #include<stdio.h> int max(int i, int j) { if(i>j) return(i); return(j); } void main() int x, y; scanf("%d %d", x, y); printf("%d", max(x,y));

Significance of Context-Free Grammars Grammars offer several significant advantages: 1. 2. 3. 4. Easy Easy Easy Easy to understand and construct programs parsing error detection and handling language extension

Parsing Bottom Up Parsing Top Down Parsing Shift-reduce Parsing Predictive Parsing Shift-reduce Parsing LR(k) Parsing LL(k) Parsing Left Recursion Left Factoring

Parsing Bottom Up Parsing Shift-reduce Parsing Top Down Parsing LR(k) Parsing Top Down Parsing Predictive Parsing LL(k) Parsing Left Recursion Left Factoring Top-down parsers: starts constructing the parse tree at the top (root) of the tree and move down towards the leaves. Easy to implement by hand, but work with restricted grammars. Example: predictive parsers

Left Recursion E → E + T | T T → T * F | F F → ( E ) | id Consider the grammar: A top-down parser might loop forever when parsing an expression E using this grammar E E E E + T E + T E + T E + T E + T T E +

Left Recursion E → E + T | T T → T * F | F F → ( E ) | id Consider the grammar: A grammar that has at least one production of the form A ⇒ Aα is a left recursive grammar. Top-down parsers do not grammars. work with left-recursive Left-recursion can often be eliminated by rewriting grammar. the

Left Recursion E’ → +TE’ | λ E → E + T | T T → T * F | F F → ( E ) | id This left-recursive grammar: Can be re-written to eliminate the immediate left recursion: E → TE’ E’ → +TE’ | λ T → FT’ T’ → *FT’ | λ F → ( E ) | id

Predictive Parsing stm → if expr then stmt else stmt | while expr do stmt | begin stmt_list end Consider the grammar: A parser for this following simple grammar can be written with the structure: switch(gettoken()) { case if: …. break; case while: …. break; case begin: default: reject input; } Based only on the first token, the parser knows which rule to use to derive a statement. Therefore this is called a predictive parser.

Left Factoring stmt‘→ else stmt | λ The following grammar: stmt → if expr then stmt else stmt | if expr then stmt Cannot be parsed by a predictive parser one element ahead. that looks stmt → if expr then stmt stmt’ stmt‘→ else stmt | λ But the grammar can be re-written: Where λ is the empty string. Rewriting a grammar to eliminate multiple productions starting with the same token is called left factoring.

Left Factoring The basic idea is, in general, as follows: 1. 2. let A à αβ1 | αβ2 be two production rules for the nonterminal if the input begins with a nonempty string derived from α symbol A 3. and we do not know whether to expand A to αβ1 or αβ2 then we may defer the decision by expanding A to αA' after seeing the input derived from α, we expand A' to β1 or to β2 this means, left-factored, the original productions become 4. 5. 6. A à αA' A' à β1 | β2

A Predictive Parser How it works? 1. Construct the parsing table from the given grammar 2. Apply the predictive parsing algorithm to construct the parse tree

A Predictive Parser 1. Construct the parsing table from the given grammar The following algorithm shows how we can construct the parsing table: Input: a grammar G Output: the corresponding parsing table M Method: For each production A ! α of the grammar do the following steps: 1. For each terminal a in FIRST(α), add A ! α to M[A,a]. 2. If λ in FIRST(α), add A ! α to M[A,b] for each terminal b in FOLLOW(A). 3. If λ FIRST(α) and $ in FOLLOW(A), add A ! α to M[A,$] How to construct FIRST and FOLLOW operations?

The Parsing Table E’ → +TE’ | ε How to construct FIRST and FOLLOW operations? Example E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id Here ε = λ = empty string Given this grammar: How is this parsing table built? NON- INPUT SYMBOL TERMINAL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E) PARSING TABLE:

FIRST and FOLLOW We need to build a FIRST set and a FOLLOW for each symbol in the grammar. set The elements of FIRST and FOLLOW are terminal symbols. FIRST(α) is the set of terminal symbols that can begin any string derived from α. FOLLOW(α) is the set of terminal symbols that can follow α: t ∈ FOLLOW(α) ↔ ∃ derivation containing αt

Rules to Create FIRST 3. If X → Y1Y2 ••• Yk GRAMMAR: FIRST rules: E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id 1. If X is a terminal, FIRST(X) = {X} 2. If X → ε , then ε ∈ FIRST(X) and Y1 ••• Yi-1 ⇒* ε and a ∈FIRST(Yi) then a ∈ FIRST(X) 3. If X → Y1Y2 ••• Yk SETS: FIRST(id) = {id} FIRST(*) = {*} FIRST(+) = {+} FIRST(() = {(} FIRST()) = {)} FIRST(E’) = {ε} {+, ε} FIRST(T’) = {ε} {*, ε} FIRST(F) = {(, id} FIRST(T) = FIRST(F) = {(, id} FIRST(E) = FIRST(T) = {(, id}

Create FOLLOW Rules ⇒* β FIRST(F) = {(, id}to GRAMMAR: FOLLOW rules: FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(F) = {(, id}to Create FOLLOW FIRST(T) = {(, id} FIRST(E) = {(, id} GRAMMAR: FOLLOW rules: E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id 1. If S is the start symbol, then $ ∈ FOLLOW(S) and a ∈ FIRST(β) then a ∈ FOLLOW(B) 3. If A → αB and a ∈ FOLLOW(A) 3a. If A → αBβ 2. If A → αBβ, and a ≠ ε SETS: FOLLOW(E) = {$} { ), $} FOLLOW(E’) = { ), $} FOLLOW(T) = { ), $} ⇒* and β ε A and B are non-terminals, α and β are strings of grammar symbols

Create FOLLOW Rules 1. If S is the start symbol, then $ ∈ FOLLOW(S) FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(E) = {(, id} FIRST(F) = {(, id}to Create FOLLOW FIRST(T) = {(, id} GRAMMAR: FOLLOW rules: E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id 1. If S is the start symbol, then $ ∈ FOLLOW(S) 2. If A → αBβ, and a ∈ FIRST(β) then a ∈ FOLLOW(B) and a ≠ ε SETS: 3. If A → αB and a ∈ FOLLOW(A) then a ∈ FOLLOW(B) 3a. If A → αBβ FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW(T) = { ), $} {+, ), $} ⇒* and β ε A and B are non-terminals, α and β are strings of grammar symbols

Create FOLLOW Rules then a ∈ FOLLOW(B) then a ∈ FOLLOW(B) FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(E) = {(, id} FIRST(F) = {(, id}to Create FOLLOW FIRST(T) = {(, id} GRAMMAR: FOLLOW rules: E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id 1. If S is the start symbol, then $ ∈ FOLLOW(S) and a ∈ FIRST(β) then a ∈ FOLLOW(B) 2. If A → αBβ, and a ≠ ε SETS: 3. If A → αB and a ∈ FOLLOW(A) then a ∈ FOLLOW(B) FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW(T’) = {+, ), $} 3a. If A → αBβ and β ⇒* ε and a ∈ FOLLOW(A) then a ∈ FOLLOW(B) FOLLOW(T) = {+, ), $} A and B are non-terminals, α and β are strings of grammar symbols

Create FOLLOW Rules ≠ ε 3a. If A → αBβ and β ⇒* ε FIRST(F) = {(, id}to FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(E) = {(, id} FIRST(F) = {(, id}to Create FOLLOW FIRST(T) = {(, id} GRAMMAR: FOLLOW rules: E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id 1. If S is the start symbol, then $ ∈ FOLLOW(S) and a ∈ FIRST(β) then a ∈ FOLLOW(B) 3. If A → αB and a ∈ FOLLOW(A) 2. If A → αBβ, and a ≠ ε SETS: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW(T’) = {+, ), $} 3a. If A → αBβ and β ⇒* ε and a ∈ FOLLOW(A) then a ∈ FOLLOW(B) FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, ), $} A and B are non-terminals, α and β are strings of grammar symbols

Create FOLLOW Rules 1. If S is the start symbol, then $ ∈ FOLLOW(S) FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id}to Create FOLLOW FIRST(F) = {(, id} FIRST(E) = {(, id} FIRST(T) = {(, id} GRAMMAR: FOLLOW rules: E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id 1. If S is the start symbol, then $ ∈ FOLLOW(S) 2. If A → αBβ, and a ∈ FIRST(β) then a ∈ FOLLOW(B) and a ≠ ε SETS: 3. If A → αB and a ∈ FOLLOW(A) then a ∈ FOLLOW(B) 3a. If A → αBβ and β ⇒* ε FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW(T’) = {+, ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, ), $} {+, *, ), $} A and B are non-terminals, α and β are strings of grammar symbols

Rule s to Build Table ing FIRST SETS: ε} Pars ε GRAMMAR: FOLLOW SETS: E → TE’ T → FT’ T’ → *FT’ | ε F → ( E ) | id s to Build Table FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, *, ), $} ing E E’ → TE’ → +TE’ | FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} FIRST(E’) = {+, ε FOLLOW(T’) = {+, ), $} 1. If A → α: if a ∈ FIRST(α), add A → α to M[A, a] PARSING TABLE: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

Rule s to Build Table ing FIRST SETS: ε} Pars ε GRAMMAR: FOLLOW SETS: E → TE’ T → FT’ T’ → *FT’ | ε F → ( E ) | id s to Build Table FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, *, ), $} ing E E’ → TE’ → +TE’ | FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} FIRST(E’) = {+, ε FOLLOW(T’) = {+, ), $} 1. If A → α: if a ∈ FIRST(α), add A → α to M[A, a] PARSING TABLE M: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

Rule s to Build Table ing FIRST SETS: ε} Pars ε GRAMMAR: FOLLOW SETS: E → TE’ T → FT’ T’ → *FT’ | ε F → ( E ) | id s to Build Table FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, *, ), $} ing E E’ → TE’ → +TE’ | FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} FIRST(E’) = {+, ε FOLLOW(T’) = {+, ), $} 1. If A → α: if a ∈ FIRST(α), add A → α to M[A, a] PARSING TABLE M: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

Rule s to Build Table ing FIRST SETS: ε} Pars ε GRAMMAR: FOLLOW SETS: E → TE’ T → FT’ T’ → *FT’ | ε F → ( E ) | id s to Build Table FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, *, ), $} ing E E’ → TE’ → +TE’ | FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} FIRST(E’) = {+, ε FOLLOW(T’) = {+, ), $} 1. If A → α: if a ∈ FIRST(α), add A → α to M[A, a] PARSING TABLE M: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

Rule s to Build Table ing FIRST SETS: ε} Pars ε GRAMMAR: FOLLOW SETS: E → TE’ T → FT’ T’ → *FT’ | ε F → ( E ) | id s to Build Table FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, *, ), $} ing E E’ → TE’ → +TE’ | FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} FIRST(E’) = {+, ε FOLLOW(T’) = {+, ), $} 1. If A → α: if a ∈ FIRST(α), add A → α to M[A, a] PARSING TABLE M: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

Rule s to Build Table ing FIRST SETS: ε} Pars ε GRAMMAR: FOLLOW SETS: E → TE’ T → FT’ T’ → *FT’ | ε F → ( E ) | id s to Build Table FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, *, ), $} ing E E’ → TE’ → +TE’ | FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} FIRST(E’) = {+, ε FOLLOW(T’) = {+, ), $} 1. If A → α: if a ∈ FIRST(α), add A → α to M[A, a] 2. If A → α: if ε ∈ FIRST(α), add A → α to M[A, b] for each terminal b ∈ FOLLOW(A), PARSING TABLE M: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

Rule s to Build Table ing FIRST SETS: ε} Pars ε GRAMMAR: FOLLOW SETS: E → TE’ T → FT’ T’ → *FT’ | ε F → ( E ) | id s to Build Table FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, *, ), $} ing E E’ → TE’ → +TE’ | FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} FIRST(E’) = {+, ε FOLLOW(T’) = {+, ), $} 1. If A → α: if a ∈ FIRST(α), add A → α to M[A, a] 2. If A → α: if ε ∈ FIRST(α), add A → α to M[A, b] for each terminal b ∈ FOLLOW(A), PARSING TABLE M: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

Rule s to Build Table ing FIRST SETS: ε} Pars GRAMMAR: FOLLOW SETS: E → TE’ T → FT’ T’ → *FT’ | ε F → ( E ) | id Rule s to Build Table FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, *, ), $} ing E E’ → TE’ → +TE’ | ε FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} FIRST(E’) = {+, FOLLOW(T’) = {+, ), $} 1. If A → α: if a ∈ FIRST(α), add A → α to M[A, a] 2. If A → α: if ε ∈ FIRST(α), add A → α to M[A, b] for each terminal b ∈ FOLLOW(A), if ε ∈ FIRST(α), and $ ∈ FOLLOW(A), add A → α to M[A, $] 3. If A → α: NON- INPUT SYMBOL TERMINAL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E) PARSING TABLE M:

A Predictive Parser end X! Y1Y2 …Yk then 2. Apply the predictive parsing algorithm to construct the parse tree The following algorithm shows how we can construct the move parsing table for an input string w$ with respect to a given grammar G. set ip to point to the first symbol of the input string w$ repeat if Top(stack) is a terminal or $ then if Top(stack) = Current-Input(ip) then Pop(stack) and advance ip else begin Pop(stack); Push Y1; Y2;… ; Yk onto the stack, with Y1 on top; Output the production X! Y1Y2 …Yk end else null until Top(stack) = $ (i.e. the stack become empty) else null if M[X,a]= X! Y1Y2 …Yk then

A Predictive Parser E’ → +TE’ | ε E → TE’ T → FT’ T’ → *FT’ | ε 2. Apply the predictive parsing algorithm to construct the parse tree E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id Example Grammar: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ Parsing Table: E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

if Top(stack) = Current-Input(ip) then Set ip to point to the first symbol of the input string w$ repeat if Top(stack) is a terminal or $ then if Top(stack) = Current-Input(ip) then else else null be Push Y1; Y2;… ; Yk onto the stack, with Y1 on top; else null Set ip to point to the first symbol of the input string w$ if Top(stack) is a terminal or $ then Pop(stack) and advance ip else if M[X,a]= X! Y1Y2 …Yk then if M[X,a]= X! Y1Y2 …Yk then gin Pop(stack); Pop(stack); Push Y1; Y2;… ; Yk onto the stack, with Y1 on top; Output the production X! Y1Y2 …Yk Output the production Y1; Y2;… ; Yk ; end until Top(stack)=$ Top(stack) = $ (i.e. the stack become empty) id + id * id $ OUTPUT: INPUT: E T E’ ip Predictive Parsing Program STACK: T E $ E’ $ PARSING TABLE: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

A Predictive Parser * Predictive Parsing Program INPUT: OUTPUT: E T E’ id + id * id $ OUTPUT: E T E’ F T’ Predictive Parsing Program STACK: T T F E’ T’ E’ $ E’ $ $ PARSING TABLE: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

A Predictive Parser * Predictive Parsing Program INPUT: OUTPUT: E T E’ id + id * id $ OUTPUT: E T E’ F T’ id Predictive Parsing Program STACK: T F T id T’ E’ T’ E’ E’ $ E’ $ $ $ PARSING TABLE: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

A Predictive Parser * Predictive Parsing Program Action when Top(Stack) = input ≠ $ : Pop stack, advance input. id + id * id $ OUTPUT: INPUT: E T E’ F T’ id Predictive Parsing Program STACK: F id T’ T’ E’ E’ $ $ NON- INPUT SYMBOL TERMINAL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E) PARSING TABLE:

A Predictive Parser * Predictive Parsing Program INPUT: OUTPUT: E T E’ id + id * id $ OUTPUT: E T E’ F T’ id ε Predictive Parsing Program STACK: E’ T’ $ E’ $ PARSING TABLE: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

A Predictive Parser * The predictive parser proceeds in this fashion emiting following productions: the T E’ E’ → +TE’ F T’ + T E’ T F → FT’ → id id ε F T’ ε id * F T’ T’ → * FT’ F → id T’ → ε E’ → ε id ε When Top(Stack) = input = $ the parser halts and accepts the input string.

LL(k) Parser This parser parses from left to right, and does a leftmost-derivation. It looks up 1 symbol ahead to choose its next action. Therefore, it is known as a LL(1) parser. An LL(k) parser looks k symbols ahead to decide its action. LL(1) A grammar whose parsing table has no multiply-defined entries LL(1) grammars enjoys several nice properties: for example they are not ambiguous and not left recursive.

LL(k) Parser E’ → +TE’ | ε LL(1) A grammar whose parsing table has no multiply-defined entries Example 1 E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id The grammar Whose PARSINGTABLE: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E) Is LL(1) grammar

LL(k) Parser LL(1) A grammar whose parsing table has no multiply-defined entries Example 2 The grammar S → iEtSS`| a S’ → eS | ε E → Fb Whose PARSINGTABLE: NON- TERMINAL INPUT SYMBOL a b e i t $ S S→ a S → iEtSS’ S’ S’ → ε S’ →eS S’ → ε E E →b Is NOT LL(1) grammar

Parsing Top Down Parsing Bottom Up Parsing Predictive Parsing LL(k) Parsing Left Recursion Left Factoring Bottom Up Parsing Shift-reduce Parsing LR(k) Parsing

Bottom-Up Parsers Bottom-up parsers: build the nodes on the bottom of the parse tree first. Suitable for automatic parser generation, handle a larger class of grammars. Examples: shift-reduce parser (or LR(k) parsers)

Bottom-up Parsing LR(1), SLR(1), LALR(1) • •  No problem with left-recursion Widely used in practice LR(1), SLR(1), LALR(1)

Grammar Hierarchy Non-ambiguous CFG CLR(1) LALR(1) LL(1) SLR(1)

– identify handle - reducible sequence: Bottom-up Parsing •  Works from tokens to Repeat: start-symbol –  identify handle - reducible sequence: •  non-terminal is not constructed but •  all its children have been constructed –  reduce stack - construct non-terminal and update •  Until reducing to start-symbol

Bottom-up Parsing → 1 E + (2) (E) (3) + (3) (3) (3) E → E + (E) i E E (3) (3) E → E + (E) i = 0,1, 2, …, 9 → i E E E E E 1 + ( 2 ) + ( 3 )

Bottom-up Parsing → E • Is the following grammar LL(1) ? ❚ NO 1 + (2) •  Is the following grammar LL(1) ? E → E + (E) → i ❚  NO 1 + (2) + (3) ❚  But this is a useful grammar

Bottom-Up Parser A bottom-up parser, or a shift-reduce parser, begins at the leaves and works up to the top of the tree. The reduction steps trace a rightmost on reverse. derivation S → aABe A → Abc | b B → d Consider the Grammar: We want to parse the input string abbcde.

Bottom-Up Parser Example Bottom-Up Parsing Program OUTPUT: INPUT: c d e $ OUTPUT: Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example Bottom-Up Parsing Program OUTPUT: INPUT: A c d e $ OUTPUT: A b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example Bottom-Up Parsing Program OUTPUT: INPUT: A c d e $ OUTPUT: A b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example Bottom-Up Parsing Program INPUT: a A b c d e $ OUTPUT: A b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d We are not reducing here in this example. A parser would reduce, get stuck and then backtrack!

Bottom-Up Parser Example Bottom-Up Parsing Program OUTPUT: INPUT: A c d e $ OUTPUT: A A b c b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example Bottom-Up Parsing Program OUTPUT: INPUT: A d e $ OUTPUT: A A b c b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example A B A b c d b Bottom-Up Parsing Program INPUT: a A d e $ OUTPUT: A B A b c d b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example A B A b c d b Bottom-Up Parsing Program INPUT: a A B e $ OUTPUT: A B A b c d b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example a A B e Bottom-Up Parsing Program OUTPUT: INPUT: a A B e $ OUTPUT: S a A B e A b c d b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example a A B e Bottom-Up Parsing Program OUTPUT: INPUT: S $ S a A B e A b c d b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d This parser is known as an LR Parser because it scans the input from Left to right, and it constructs a Rightmost derivation in reverse order.

Bottom-Up Parser Example The scanning of productions for matching with handles in the input string, and backtracking makes the method used in the previous example very inefficient. Can we do better?

LR Parser Example Input S t a c k LR Parsing Program Output action goto

Shift reduce parser 1. Construct the action-goto table from the given grammar 2. Apply the shift-reduce parsing algorithm to construct the parse tree

Shift reduce parser 1. Construct the action-goto table from the given grammar This is what make difference between different typs of shift reduce parsing such as SLR, CLR, LALR In this course due to short of time we will not study how to construct the action-goto table

Shift reduce parser 2. Apply the shift-reduce parsing algorithm to construct the parse tree The following algorithm shows how we can construct the move parsing table for an input string w$ with respect to a given grammar G. set ip to point to the first symbol of the input string w$ repeat forever begin if action[top(stack), current-input(ip)] = shift(s) then begin push current-input(ip) then s on top of the stack advance ip to the next input symbol else if action[top(stack), current-input(ip)] = reduce A ! β then pop 2*|β| symbols off the stack; push A then goto[top(stack), A] on top of the stack; output the production A ! β end else if action[top(stack), current-input(ip)] = accept then return else error() end begin

LR Parser Example Can be parsed with this action The following grammar: and goto table (1) E → E + T (2) E → T (3) T → T * F (4) T → F (5) F → ( E ) (6) F → id State action goto id + * ( ) $ E T F s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 s represents shift r represents reduce acc represents accept empty represents error 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

LR Parser Example * GRAMMAR: (1) E → E + T (2) E → T (3) T → T * F (4) T → F (5) F → ( E ) (6) F → id LR Parser Example OUTPUT: INPUT: id + id * id $ LR Parsing Program STACK: E State action goto id + * ( ) $ E T F 0 s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

LR Parser Example * GRAMMAR: (1) E → E + T (2) E → T (3) T → T * F (4) T → F (5) F → ( E ) Parser Example OUTPUT: F id INPUT: id * id + id $ (6) F → id LR Parsing Program STACK: 5 E id State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example * GRAMMAR: (1) E → E + T (2) E → T (3) T → T * F (4) T → F (5) F → ( E ) LR Parser Example OUTPUT: F id INPUT: id * id + id $ (6) F → id LR Parsing Program STACK: State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (3) T → T * F (5) F → ( E ) * GRAMMAR: (1) E → E + T (2) E → T (3) T → T * F Parser Example OUTPUT: (4) T → F T F id (5) F → ( E ) (6) F → id INPUT: id * id + id $ LR Parsing Program STACK: 3 E F The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

LR Parser Example (3) T → T * F (5) F → ( E ) * GRAMMAR: (1) E → E + T (2) E → T (3) T → T * F Parser Example OUTPUT: (4) T → F T F id (5) F → ( E ) (6) F → id INPUT: id * id + id $ LR Parsing Program STACK: State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example * 2 r2 s7 r2 r2 GRAMMAR: (1) E → E + T (2) E → T (3) T → T * F (4) T → F (5) F → ( E ) (6) F → id Parser Example OUTPUT: T F id INPUT: id * id + id $ LR Parsing Program STACK: E 2 T State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example * * GRAMMAR: (1) E → E + T (2) E’ → T (3) T → T * F (4) T → F (5) F → ( E ) (6) F → id Parser Example OUTPUT: T F id INPUT: id * id + id $ LR Parsing Program STACK: 7 E * 2 T State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (5) F → ( E ) F id * * GRAMMAR: (1) E → E + T (2) E’ → T (3) T → T * F (4) T → F (5) F → ( E ) Parser Example OUTPUT: T F F id id INPUT: id * id + id $ (6) F → id LR Parsing Program STACK: E 5 id 7 * State action goto 0 s5 s4 1 2 3 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 2 id + * ( ) $ E T F T 1 s6 acc

LR Parser Example (5) F → ( E ) F id * * GRAMMAR: (1) E → E + T (2) E’ → T (3) T → T * F (4) T → F (5) F → ( E ) Parser Example OUTPUT: T F F id id INPUT: id * id + id $ (6) F → id LR Parsing Program STACK: E 7 * 2 T State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (4) T → F F id * * GRAMMAR: (1) E → E + T (2) E’ → T (3) T → T * F OUTPUT: (4) T → F (5) F → ( E ) (6) F → id T T * F F id id INPUT: id * id + id $ LR Parsing Program STACK: 10 E F 7 * State action goto 0 s5 s4 1 2 3 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 2 id + * ( ) $ E T F T 1 s6 acc

LR Parser Example (4) T → F F id * GRAMMAR: (1) E → E + T (2) E → T (3) T → T * F OUTPUT: (4) T → F (5) F → ( E ) (6) F → id T T * F F id id INPUT: id * id + id $ LR Parsing Program STACK: State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (1) E → E + T F id * 2 r2 s7 r2 r2 GRAMMAR: (3) T → T * F (4) T → F (5) F → ( E ) (6) F → id OUTPUT: E T T * F F id id INPUT: id * id + id $ LR Parsing Program STACK: 2 T State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (1) E → E + T F id * GRAMMAR: (2) E → T (3) T → T * F (4) T → F (5) F → ( E ) (6) F → id OUTPUT: E T T * F F id id INPUT: id * id + id $ LR Parsing Program STACK: State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example F id * GRAMMAR: (1) E → E + T (2) E’ → T (3) T → T * F (4) T → F (5) F → ( E ) (6) F → id LR Parser Example OUTPUT: E T T * F F id id INPUT: id * id + id $ LR Parsing Program STACK: 1 E State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example F id * GRAMMAR: (1) E → E + T (2) E’ → T (3) T → T * F (4) T → F (5) F → ( E ) (6) F → id LR Parser Example OUTPUT: E T T * F F id id INPUT: id * id + id $ LR Parsing Program STACK: 6 + 1 E State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (5) F → ( E ) F id * GRAMMAR: (1) E → E + T (2) E’ → T (3) T → T * F (4) T → F (5) F → ( E ) LR Parser Example OUTPUT: E T F T * F id F id id INPUT: id * id + id $ (6) F → id LR Parsing Program STACK: 5 id 6 + State action goto 0 s5 s4 1 2 3 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 1 id + * ( ) $ E T F E 1 s6 acc

LR Parser Example (5) F → ( E ) F id * GRAMMAR: (1) E → E + T (2) E’ → T (3) T → T * F (4) T → F (5) F → ( E ) Parser Example OUTPUT: E T F T * F id F id id INPUT: id * id + id $ (6) F → id LR Parsing Program STACK: 6 + 1 E State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (3) T → T * F E T (5) F → ( E ) F id id * GRAMMAR: (1) E → E + T (2) E’ → T (3) T → T * F LR Parser Example OUTPUT: (4) T → F E T T F T * F id F id id (5) F → ( E ) (6) F → id INPUT: id * id + id $ LR Parsing Program STACK: 3 F 6 + State action goto 0 s5 s4 1 2 3 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 1 id + * ( ) $ E T F E 1 s6 acc

LR Parser Example (3) T → T * F E T (5) F → ( E ) F id id * GRAMMAR: (1) E → E + T (2) E’ → T (3) T → T * F Parser Example OUTPUT: (4) T → F E T T F T * F id F id id (5) F → ( E ) (6) F → id INPUT: id * id + id $ LR Parsing Program STACK: 6 + 1 E State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (2) E’ → T E + T F id id * GRAMMAR: (1) E → E + T (3) T → T * F (4) T → F (5) F → ( E ) (6) F → id OUTPUT: E E + T T F T * F id F id id INPUT: id * id + id $ LR Parsing Program STACK: 9 T 6 + State action goto 0 s5 s4 1 2 3 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 1 id + * ( ) $ E T F E 1 s6 acc

LR Parser Example (2) E → T E + T F id id * GRAMMAR: (1) E → E + T (3) T → T * F (4) T → F (5) F → ( E ) (6) F → id OUTPUT: E E + T T F T * F id F id id INPUT: id * id + id $ LR Parsing Program STACK: State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example E + T F id id * GRAMMAR: (1) E → E + T (2) E’ → T (3) T → T * F (4) T → F (5) F → ( E ) (6) F → id LR Parser Example OUTPUT: E E + T T F T * F id F id id INPUT: id * id + id $ LR Parsing Program STACK: 1 E State action goto 0 s5 s4 1 2 3 1 s6 acc 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

Constructing Parsing Tables All LR parsers use the same parsing program that we demonstrated in the previous slides. What differentiates the LR parsers are the action and the goto tables: Simple LR (SLR): succeeds for the fewest grammars, but is the easiest to implement. Canonical LR: succeeds for the most grammars, but is the hardest to implement. It splits states when necessary to prevent reductions that would get the parser stuck. Lookahead LR (LALR): succeeds for most common syntactic constructions used in programming languages, but produces LR tables much smaller than canonical LR.

Grammar Hierarchy Non-ambiguous CFG CLR(1) LALR(1) LL(1) SLR(1)

Parsing How parser works? Top Down Parsing Bottom Up Parsing Predictive Parsing LL(k) Parsing Left Recursion Left Factoring Bottom Up Parsing Shift-reduce Parsing LR(k) Parsing How to write parser?

get next lexical analyzer next token Syntax analyzer token Source 1. 2. Uses Regular Expressions to define tokens Uses Finite Automata to recognize tokens next char lexical analyzer next token Syntax analyzer get next get next char token Source Program symbol table (Contains a record for each identifier) Uses Top-down parsing or Bottom-up parsing To construct a Parse tree

How to write a parser? Yacc

Yacc Lex Yacc Compiler Source program token description lexical analysis Language grammar Yacc syntax analysis Inter. representation code generation Target program

How to write an LR parser? The construction is done General approach: The construction is done automatically by a tool such as the Unix program yacc. Using the source program language grammar to write a simple yacc program and save it in a file named name.y Using the unix program yacc to compile name.y resulting a C (parser) program named y.tab.c in Compiling and linking the C program y.tab.c in a normal way resulting the required parser.

LR parser generators Yacc: Yet another compiler compiler • •  Automatically generate LALR parsers •  Created by S.C. Johnson in 1970’s

Using Yacc Yacc source Yacc compiler program filename.y y.tab.c C a.out a.out (Parser) Input tokens Parse tree

Yacc analyzer spec Source program Lexical lexer spec LEX .c C compiler tokens Parser spec Yacc .c C compiler Parser

Yacc Example analyzer spec tomatoes + potatoes + carrots Lexical lexer spec LEX .c C compiler id1, PLUS, id2, PLUS, id3 Parser spec Yacc .c C compiler Parser + + id3 id1 id2

How to write parser symbol table source Scanner Parser token program lex.yy.c y.tab.c Lex Yacc Lex spec (.l) yacc spec (.y)

How to write parser symbol table source Scanner Parser token program lex.yy.c y.tab.c Yacc Lex spec (.l) yacc spec (.y)

How to write a yacc program comments > auxiliary subroutines> myfile.y %{ < C global variables, prototype   comments > %} [DEFINITION SECTION] This part will be embedded into myfile.tab.c < C global variables, prototypes, contains token declarations. Tokens are recognized in lexer. %% define how to “understand” the input language, and what actions to take for each “sentence”. [PRODUCTION RULES SECTION] %% any user code. For example, a main function to call the parser function < C auxiliary subroutines> < C auxiliary subroutines> < C auxiliary subroutines> yyparse()

Example: PRODUCTION RULES SECTION Example: statement ! expression expression ! expression + expression | expression - expression | expression * expression | expression / expression | NUMBER statement : expression { printf (“ = %g\n”, $1); } expression : expression ‘+’ expression { $$ = $1 + $3; } | expression ‘-’ expression { $$ = $1 - $3; } | expression ‘*’ expression { $$ = $1 * $3; } | expression ‘/’ expression { $$ = $1 / $3 ; } | NUMBER { $$ = $1; } ;