Presentation Outline Review of Lexical analysis

Presentation Outline Review of Lexical analysis
Introduction to Syntax Analysis Context Free Grammar Parsing Grammar Ambiguity Top Down Parser Bottom Up Parser

Introduction to Syntax Analysis
Every programming language has precise rules that prescribe the syntactic structure of well-formed programs. The Syntax Analysis phase of a compiler has two major goals: Check the input program to determine whether it is syntactically correct. Produce either a complete parse tree or at least trace the structure of the complete parse tree for syntactically correct input.

Some Basic Definitions syntax: the way in which words are put together
to form phrases, clauses, or sentences. The rules governing the formation of statements in a programming language. syntax analysis: the task concerned with fitting a sequence of tokens into a specified syntax. parsing: To break a sentence down into its component parts of speech with an explanation of the form, function, and syntactical relationship of each part.

Some Basic Definitions Expression(s)
Syntactic structure: the syntactic structure of programming languages can be informally expressed by the following diagram. Program Block(s) Statement(s) Expression(s) Token(s)

Context-free Grammars
62

CFG = (V, T, P, S) Context-free Grammars
Definition ❚ V : Finite set of variables/non-terminals ❚ T : Alphabet/Finite set of terminals ❚ P : Finite set of rules/productions ❚ S : Start symbol S V V  T   Rule : A   A  V  (V  T) *

Definition ❚ Context-freeness: An A-rule can be applied whenever A occurs in a string, irrespective of the context (that is, non- terminals and terminals around A).

v  w1  ...  wn  w uAv  uv Context-free Grammars
Derivation ❚ One-step Derivation uAv  uv A   ❚ w is derivable from v in CFG, if there is a finite sequence of rule applications such that: v  w1  ...  wn  w In this case we can write this derivation as v * w

Derivation The derivation as v * w is called: Leftmost derivation: if in every step the leftmost variable is selected for reduction Rightmost derivation: if in every step the rightmost variable is selected for reduction

Example 1 Let G = ({S}, {a,b},S,P) with for P: ❚S→aSa, and S→bSb, and S→λ. ❚Some derivations from this grammar: ❚ S  aSa  aaSaa  aabSbaa  aabbaa ❚ S  bSb  baSab  baab, and so on. ❚In general S …. wwR for w{a,b}*. L(G)={wwR : w{a,b}*}

Rightmost Derivation : S  A  A b  Ab  aAb  ab A A
Context-free Grammars G  ({S, A, B},{a, b}, {S  AB, Example 2 | , | }, A → aA   λ B → Bb   λ S ) L(G)  L(a * b*) Leftmost Derivation : S  B  a B  aB  aBb  ab Rightmost Derivation : S  A  A b  Ab  aAb  ab A A B B

Example 4 Consider the CFG: G  {{S},{a, b},{S  , S  aSb}, S ) ❚ Derivation of aabb is S  aSb  aaSbb  aabb

S  aSa | aBa B  bB | b L(B)  {bm | m  0}
Context-free Grammars Example 5 Consider the CFG G: S  aSa | aBa B  bB | b L(B)  {bm | m  0} L(S)  {anbman | n  0  m  0} L(G)= L(S)

L(G1)  {anbman | n  0  m  0} L(G )  {(ab)n cn | n  0}
Context-free Grammars Example 6 S  aSa | B B  bB |  Consider the CFG G1: The language generated by G1 is: L(G1)  {anbman | n  0  m  0} Consider the CFG G2: S  abSc |  The language generated by G2 is: L(G )  {(ab)n cn | n  0} 2

Example 1 Consider the CFG G: G  {{S},{a, b},{S  , S  aSb}, S ) ❚ The derivation of aabb is: S  aSb  aaSbb  aabb S ❚ Derivation tree is S a b a  b

Example 2 A  0A1  00A11  00B11  00#11 A A A B 0 0 # 1 1

Example 3 <EXPR> → <EXPR> + <EXPR> <EXPR> → <EXPR> * <EXPR> <EXPR> → ( <EXPR> ) <EXPR> → a Build a parse tree for a + a * a <EXPR> <EXPR> <EXPR> <EXPR> a + a * a

Recognition of strings in a language
CFG: Parsing Recognition of strings in a language

CFG: Parsing Generative aspect of CFG: By now it should be clear how, from a CFG G, you can derive strings wL(G). Analytical aspect: Given a CFG G and a string w, how do you decide if wL(G) and –if so– how do you determine the derivation tree or the sequence of production rules that produce w? This is called the problem of parsing.

CFG: Parsing Parser Top-down parsers Bottom-up parsers
A program that determines if a string   L(G) by constructing a derivation. Equivalently, it searches the graph of G. Top-down parsers Constructs the derivation tree from root to leaves. Leftmost derivation. Bottom-up parsers Constructs the derivation tree from leaves to root. Rightmost derivation in reverse.

Parse trees (=Derivation Tree)
CFG: Parsing Parse trees (=Derivation Tree) A parse tree is a graphical representation of a derivation sequence of a sentential form. Tree nodes represent symbols of the grammar (nonterminals or terminals) and tree edges represent derivation steps.

CFG: Parsing Parse Tree: Example Given the following grammar:
E  E + E | E * E | ( E ) | - E | id Is the string -(id + id) a sentence in this grammar? Yes because there is the following derivation: E  -E  -(E)  -(E + E)  -(id + id)

CFG: Parsing Parse Tree: Example 1
E  E + E | E * E | ( E ) | - E | id Lets examine this derivation: E  -E  -(E)  -(E + E)  -(id + id) E E E E E - E - E - E - E ( E ) ( E ) ( E ) E + E E + E This is a top-down derivation because we start building the parse tree at the top id id parse tree

S  SS  aS  ab CFG: Parsing Parse Tree: Example 2 S S S S S S S S S
S  SS | a | b ab  L(S ) Parse Tree: Example 2 S S S S Derivation Trees S S S S S S a a b Leftmost derivation S  SS  aS  ab

S  SS  Sb  ab CFG: Parsing S S S S S S S S S S b a b S S S S a b S
Rightmost derivation S  SS  Sb  ab S S Parse Tree: Example 2 S S Derivation Trees S S S S S S b a b S S S S Rightmost Derivation in Reverse a b S S

CFG: Parsing Example 3 Consider the CFG grammar G
S  A A  T | A  T T  b | ( A) Show that (b)+b  L(G)? S S S S S S S S A A A A A A A A T A A T ( b )+ b A T A T A T T A A T T T T T T ( b )+ 11 + + ( ) + ( ) +

Practical Parsers CFG: Parsing Top-down parsers : LL(k) languages
Language/Grammar designed to enable deterministic (directed and backtrack-free) searches. Top-down parsers : LL(k) languages E.g., Pascal, Ada, etc. Better error diagnosis and recovery. Bottom-up parsers : LALR(1), LR(k) languages E.g., C/C++, Java, etc. Handles left recursion in the grammar. Backtracking parsers E.g., Prolog interpreter. 12

Grammar Ambiguity Definition Definition: a string is derived ambiguously in a context-free grammar if it has two or more different parse trees Definition: a grammar is ambiguous if it generates some string ambiguously 15

A grammar is ambiguous if some strings are derived ambiguously.
Grammar Ambiguity A string wL(G) is derived ambiguously if it has more than one derivation tree (or equivalently: if it has more than one leftmost derivation (or rightmost)). A grammar is ambiguous if some strings are derived ambiguously. Typical example: rule S  0 | 1 | S+S | SS S  S+S  SS+S  0S+S  01+S  01+1 versus S  SS  0S  0S+S  01+S  01+1 16

S S S  S S + S S 1 S + S  S 1 1 1 Grammar Ambiguity
The ambiguity of 01+1 is shown by the two different parse trees: S S S  S S + S S 1 S + S  S 1 1 1 17

S + 1 Grammar Ambiguity Note that the two different derivations:
S  S+S  0+S  0+1 and S  S+S  S+1  0+1 do not constitute an ambiguous string 0+1 as have the same parse tree: S + 1 Ambiguity causes troubles when trying to interpret strings like: “She likes men who love women who don't smoke.” Solutions: Use parentheses, or use precedence rules such as a+(bc) = a+bc ≠ (a+b)c. 18

<EXPR> <EXPR>
Grammar Ambiguity Example <EXPR> → <EXPR> + <EXPR> <EXPR> → <EXPR> * <EXPR> <EXPR> → ( <EXPR> ) <EXPR> → a Build a parse tree for a + a * a <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> a a 19 a + a * + a * a

Find a derivation for the expression: id + id * id
Grammar Ambiguity Example E  E + E | E * E | ( E ) | - E | id Find a derivation for the expression: id + id * id E E E E E + E E + E E + E E * E id E * E id id E E E E E * E E * E E * E E + E E + E id Which derivation tree is correct? id id

Find a derivation for the expression: id + id * id
Grammar Ambiguity Example E  E + E | E * E | ( E ) | - E | id Find a derivation for the expression: id + id * id E According to the grammar, both are correct. E + E id E * E A grammar that produces more than one parse tree for any input sentence is said to be an ambiguous grammar. id id E E + E E * E id id id

* has precedence over + Grammar Ambiguity
One way to resolve ambiguity is to associate precedence to the operators. Example * has precedence over + 1 + 2 * 3 = 1 + (2 * 3) 1 + 2 * 3 ≠ (1 + 2)*3 Associativity and precedence information is typically used to disambiguate non-fully parenthesized expressions containing unary prefix/postfix operators or binary infix operators.

if B1 then if B2 then S1 else S2 vs
Grammar Ambiguity Example stm  if expr | if expr then stm else stm Grammar: if B1 then if B2 then S1 else S2 vs Ambiguity:

S  aS | Sb | ab |  Grammar Ambiguity Quiz 2 Yes: consider ab
Is the following grammar ambiguous? Yes: consider ab

S  SS |   Grammar Ambiguity Quiz S Yes SS Cyclic structure SSS
Is the following grammar ambiguous? S Yes SS  Cyclic structure SSS (Illustrates ambiguous grammar with cycles.) 27

Grammar Applications Programming Languages
Programming languages are often defined as Context Free Grammars in Backus-Naur Form (BNF). Example: <if_statement> ::= IF <expression><then_clause><else_clause> <expression> ::= <term> | <expression>+<term> <term> ::= <factor>|<term>*<factor> The variables as indicated by <a variable name> The arrow → is replaces by ::= Here, IF, + and * are terminals. “Syntax Checking” is checking if a program is an element of the CFG of the programming language.

This part of the compiler use the Grammar
Grammar Applications Compiler Syntax Analysis This part of the compiler use the Grammar Compiler: Source Program Scanner Parser Semantic Analy. Inter. Code Gen. Optimizer Code Generation 33 Target Program

get next lexical analyzer next token Syntax analyzer token Source
1. 2. Uses Regular Expressions to define tokens Uses Finite Automata to recognize tokens next char lexical analyzer next token Syntax analyzer get next get next char token Source Program symbol table (Contains a record for each identifier) Uses Top-down parsing or Bottom-up parsing To construct a Parse tree

Syntax errors Parsing errors include: 1.
1.  misspelling of identifier, keyword, or operator 2. arithmetic expression with unbalanced parentheses 3. punctuation errors such as using comma in place of semicolon 4. missing brackets, semicolons, etc.

Error recovery The error handler in a parser has the following jobs:
1. report the presence of errors clearly and accurately 2. quick recovery of errors 3. not to slow the processing of programs

Example: The following C code shows some examples of syntax errors:
#include<stdio.h> int max(int I int j) { if(i>j) return(i) return(j); } void main() int x, y scanf("%d %d", x, y); printf("%d", max(x,y) ; ; ,

Example: A typical compilation of this erroneous program gives the
following list of errors: 1. 2. 3. 4. 5. 6. error C2235: C2059: C2239: C2078: C2660: C2143: ';' in formal parameter list syntax error : ')' unexpected token 'f' following declaration too many initializers of 'j' 'max' : function does not take 2 parameters syntax error : missing ')' before ';'

Example: The correct version of this program is
#include<stdio.h> int max(int i, int j) { if(i>j) return(i); return(j); } void main() int x, y; scanf("%d %d", x, y); printf("%d", max(x,y));

Significance of Context-Free Grammars
Grammars offer several significant advantages: 1. 2. 3. 4. Easy Easy Easy Easy to understand and construct programs parsing error detection and handling language extension

Parsing Bottom Up Parsing Top Down Parsing Shift-reduce Parsing
Predictive Parsing Shift-reduce Parsing LR(k) Parsing LL(k) Parsing Left Recursion Left Factoring

Parsing Bottom Up Parsing Shift-reduce Parsing Top Down Parsing
LR(k) Parsing Top Down Parsing Predictive Parsing LL(k) Parsing Left Recursion Left Factoring Top-down parsers: starts constructing the parse tree at the top (root) of the tree and move down towards the leaves. Easy to implement by hand, but work with restricted grammars. Example: predictive parsers

Left Recursion E → E + T | T T → T * F | F F → ( E ) | id
Consider the grammar: A top-down parser might loop forever when parsing an expression E using this grammar E E E E + T E + T E + T E + T E + T T E +

Left Recursion E → E + T | T T → T * F | F F → ( E ) | id Consider the
grammar: A grammar that has at least one production of the form A ⇒ Aα is a left recursive grammar. Top-down parsers do not grammars. work with left-recursive Left-recursion can often be eliminated by rewriting grammar. the

Predictive Parsing stm → if expr then stmt else stmt
| while expr do stmt | begin stmt_list end Consider the grammar: A parser for this following simple grammar can be written with the structure: switch(gettoken()) { case if: …. break; case while: …. break; case begin: default: reject input; } Based only on the first token, the parser knows which rule to use to derive a statement. Therefore this is called a predictive parser.

Left Factoring stmt‘→ else stmt | λ The following grammar:
stmt → if expr then stmt else stmt | if expr then stmt Cannot be parsed by a predictive parser one element ahead. that looks stmt → if expr then stmt stmt’ stmt‘→ else stmt | λ But the grammar can be re-written: Where λ is the empty string. Rewriting a grammar to eliminate multiple productions starting with the same token is called left factoring.

Left Factoring The basic idea is, in general, as follows: 1. 2. let A à αβ1 | αβ2 be two production rules for the nonterminal if the input begins with a nonempty string derived from α symbol A 3. and we do not know whether to expand A to αβ1 or αβ2 then we may defer the decision by expanding A to αA' after seeing the input derived from α, we expand A' to β1 or to β2 this means, left-factored, the original productions become 4. 5. 6. A à αA' A' à β1 | β2

A Predictive Parser How it works? 1. Construct the parsing table from
the given grammar 2. Apply the predictive parsing algorithm to construct the parse tree

A Predictive Parser 1. Construct the parsing table from the given grammar The following algorithm shows how we can construct the parsing table: Input: a grammar G Output: the corresponding parsing table M Method: For each production A ! α of the grammar do the following steps: 1. For each terminal a in FIRST(α), add A ! α to M[A,a]. 2. If λ in FIRST(α), add A ! α to M[A,b] for each terminal b in FOLLOW(A). 3. If λ FIRST(α) and $ in FOLLOW(A), add A ! α to M[A,$] How to construct FIRST and FOLLOW operations?

The Parsing Table E’ → +TE’ | ε
How to construct FIRST and FOLLOW operations? Example E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id Here ε = λ = empty string Given this grammar: How is this parsing table built? NON INPUT SYMBOL TERMINAL id * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E) PARSING TABLE:

FIRST and FOLLOW We need to build a FIRST set and a FOLLOW
for each symbol in the grammar. set The elements of FIRST and FOLLOW are terminal symbols. FIRST(α) is the set of terminal symbols that can begin any string derived from α. FOLLOW(α) is the set of terminal symbols that can follow α: t ∈ FOLLOW(α) ↔ ∃ derivation containing αt

Rules to Create FIRST 3. If X → Y1Y2 ••• Yk GRAMMAR: FIRST rules:
E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id 1. If X is a terminal, FIRST(X) = {X} 2. If X → ε , then ε ∈ FIRST(X) and Y1 ••• Yi-1 ⇒* ε and a ∈FIRST(Yi) then a ∈ FIRST(X) 3. If X → Y1Y2 ••• Yk SETS: FIRST(id) = {id} FIRST(*) = {*} FIRST(+) = {+} FIRST(() = {(} FIRST()) = {)} FIRST(E’) = {ε} {+, ε} FIRST(T’) = {ε} {*, ε} FIRST(F) = {(, id} FIRST(T) = FIRST(F) = {(, id} FIRST(E) = FIRST(T) = {(, id}

Create FOLLOW Rules ⇒* β FIRST(F) = {(, id}to GRAMMAR: FOLLOW rules:
FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(F) = {(, id}to Create FOLLOW FIRST(T) = {(, id} FIRST(E) = {(, id} GRAMMAR: FOLLOW rules: E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id 1. If S is the start symbol, then $ ∈ FOLLOW(S) and a ∈ FIRST(β) then a ∈ FOLLOW(B) 3. If A → αB and a ∈ FOLLOW(A) 3a. If A → αBβ 2. If A → αBβ, and a ≠ ε SETS: FOLLOW(E) = {$} { ), $} FOLLOW(E’) = { ), $} FOLLOW(T) = { ), $} ⇒* and β ε A and B are non-terminals, α and β are strings of grammar symbols

Create FOLLOW Rules 1. If S is the start symbol, then $ ∈ FOLLOW(S)
FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(E) = {(, id} FIRST(F) = {(, id}to Create FOLLOW FIRST(T) = {(, id} GRAMMAR: FOLLOW rules: E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id 1. If S is the start symbol, then $ ∈ FOLLOW(S) 2. If A → αBβ, and a ∈ FIRST(β) then a ∈ FOLLOW(B) and a ≠ ε SETS: 3. If A → αB and a ∈ FOLLOW(A) then a ∈ FOLLOW(B) 3a. If A → αBβ FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW(T) = { ), $} {+, ), $} ⇒* and β ε A and B are non-terminals, α and β are strings of grammar symbols

Create FOLLOW Rules then a ∈ FOLLOW(B) then a ∈ FOLLOW(B)
FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(E) = {(, id} FIRST(F) = {(, id}to Create FOLLOW FIRST(T) = {(, id} GRAMMAR: FOLLOW rules: E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id 1. If S is the start symbol, then $ ∈ FOLLOW(S) and a ∈ FIRST(β) then a ∈ FOLLOW(B) 2. If A → αBβ, and a ≠ ε SETS: 3. If A → αB and a ∈ FOLLOW(A) then a ∈ FOLLOW(B) FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW(T’) = {+, ), $} 3a. If A → αBβ and β ⇒* ε and a ∈ FOLLOW(A) then a ∈ FOLLOW(B) FOLLOW(T) = {+, ), $} A and B are non-terminals, α and β are strings of grammar symbols

Create FOLLOW Rules ≠ ε 3a. If A → αBβ and β ⇒* ε FIRST(F) = {(, id}to
FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(E) = {(, id} FIRST(F) = {(, id}to Create FOLLOW FIRST(T) = {(, id} GRAMMAR: FOLLOW rules: E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id 1. If S is the start symbol, then $ ∈ FOLLOW(S) and a ∈ FIRST(β) then a ∈ FOLLOW(B) 3. If A → αB and a ∈ FOLLOW(A) 2. If A → αBβ, and a ≠ ε SETS: FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW(T’) = {+, ), $} 3a. If A → αBβ and β ⇒* ε and a ∈ FOLLOW(A) then a ∈ FOLLOW(B) FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, ), $} A and B are non-terminals, α and β are strings of grammar symbols

Create FOLLOW Rules 1. If S is the start symbol, then $ ∈ FOLLOW(S)
FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id}to Create FOLLOW FIRST(F) = {(, id} FIRST(E) = {(, id} FIRST(T) = {(, id} GRAMMAR: FOLLOW rules: E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id 1. If S is the start symbol, then $ ∈ FOLLOW(S) 2. If A → αBβ, and a ∈ FIRST(β) then a ∈ FOLLOW(B) and a ≠ ε SETS: 3. If A → αB and a ∈ FOLLOW(A) then a ∈ FOLLOW(B) 3a. If A → αBβ and β ⇒* ε FOLLOW(E) = {), $} FOLLOW(E’) = { ), $} FOLLOW(T’) = {+, ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, ), $} {+, *, ), $} A and B are non-terminals, α and β are strings of grammar symbols

Rule s to Build Table ing FIRST SETS: ε} Pars ε GRAMMAR: FOLLOW SETS:
E → TE’ T → FT’ T’ → *FT’ | ε F → ( E ) | id s to Build Table FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, *, ), $} ing E E’ → TE’ → +TE’ | FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} FIRST(E’) = {+, ε FOLLOW(T’) = {+, ), $} 1. If A → α: if a ∈ FIRST(α), add A → α to M[A, a] PARSING TABLE: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

E → TE’ T → FT’ T’ → *FT’ | ε F → ( E ) | id s to Build Table FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, *, ), $} ing E E’ → TE’ → +TE’ | FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} FIRST(E’) = {+, ε FOLLOW(T’) = {+, ), $} 1. If A → α: if a ∈ FIRST(α), add A → α to M[A, a] PARSING TABLE M: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

E → TE’ T → FT’ T’ → *FT’ | ε F → ( E ) | id s to Build Table FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, *, ), $} ing E E’ → TE’ → +TE’ | FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} FIRST(E’) = {+, ε FOLLOW(T’) = {+, ), $} 1. If A → α: if a ∈ FIRST(α), add A → α to M[A, a] 2. If A → α: if ε ∈ FIRST(α), add A → α to M[A, b] for each terminal b ∈ FOLLOW(A), PARSING TABLE M: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

Rule s to Build Table ing FIRST SETS: ε} Pars GRAMMAR: FOLLOW SETS:
E → TE’ T → FT’ T’ → *FT’ | ε F → ( E ) | id Rule s to Build Table FOLLOW(E’) = { ), $} FOLLOW(T) = {+, ), $} FOLLOW(F) = {+, *, ), $} ing E E’ → TE’ → +TE’ | ε FIRST(E’) = {+, ε} FIRST(T’) = {* , ε} FIRST(F) = {(, id} FIRST(T) = {(, id} FIRST(E) = {(, id} FOLLOW(E) = {), $} FIRST(E’) = {+, FOLLOW(T’) = {+, ), $} 1. If A → α: if a ∈ FIRST(α), add A → α to M[A, a] 2. If A → α: if ε ∈ FIRST(α), add A → α to M[A, b] for each terminal b ∈ FOLLOW(A), if ε ∈ FIRST(α), and $ ∈ FOLLOW(A), add A → α to M[A, $] 3. If A → α: NON INPUT SYMBOL TERMINAL id * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E) PARSING TABLE M:

A Predictive Parser end X! Y1Y2 …Yk then
2. Apply the predictive parsing algorithm to construct the parse tree The following algorithm shows how we can construct the move parsing table for an input string w$ with respect to a given grammar G. set ip to point to the first symbol of the input string w$ repeat if Top(stack) is a terminal or $ then if Top(stack) = Current-Input(ip) then Pop(stack) and advance ip else begin Pop(stack); Push Y1; Y2;… ; Yk onto the stack, with Y1 on top; Output the production X! Y1Y2 …Yk end else null until Top(stack) = $ (i.e. the stack become empty) else null if M[X,a]= X! Y1Y2 …Yk then

A Predictive Parser E’ → +TE’ | ε E → TE’ T → FT’ T’ → *FT’ | ε
2. Apply the predictive parsing algorithm to construct the parse tree E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id Example Grammar: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ Parsing Table: E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

if Top(stack) = Current-Input(ip) then
Set ip to point to the first symbol of the input string w$ repeat if Top(stack) is a terminal or $ then if Top(stack) = Current-Input(ip) then else else null be Push Y1; Y2;… ; Yk onto the stack, with Y1 on top; else null Set ip to point to the first symbol of the input string w$ if Top(stack) is a terminal or $ then Pop(stack) and advance ip else if M[X,a]= X! Y1Y2 …Yk then if M[X,a]= X! Y1Y2 …Yk then gin Pop(stack); Pop(stack); Push Y1; Y2;… ; Yk onto the stack, with Y1 on top; Output the production X! Y1Y2 …Yk Output the production Y1; Y2;… ; Yk ; end until Top(stack)=$ Top(stack) = $ (i.e. the stack become empty) id + id * id $ OUTPUT: INPUT: E T E’ ip Predictive Parsing Program STACK: T E $ E’ $ PARSING TABLE: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

A Predictive Parser * Predictive Parsing Program INPUT: OUTPUT: E T E’
id + id * id $ OUTPUT: E T E’ F T’ Predictive Parsing Program STACK: T T F E’ T’ E’ $ E’ $ $ PARSING TABLE: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

id + id * id $ OUTPUT: E T E’ F T’ id Predictive Parsing Program STACK: T F T id T’ E’ T’ E’ E’ $ E’ $ $ $ PARSING TABLE: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

A Predictive Parser * Predictive Parsing Program
Action when Top(Stack) = input ≠ $ : Pop stack, advance input. id + id * id $ OUTPUT: INPUT: E T E’ F T’ id Predictive Parsing Program STACK: F id T’ T’ E’ E’ $ $ NON INPUT SYMBOL TERMINAL id * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E) PARSING TABLE:

id + id * id $ OUTPUT: E T E’ F T’ id ε Predictive Parsing Program STACK: E’ T’ $ E’ $ PARSING TABLE: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E)

A Predictive Parser * The predictive parser proceeds
in this fashion emiting following productions: the T E’ E’ → +TE’ F T’ + T E’ T F → FT’ → id id ε F T’ ε id * F T’ T’ → * FT’ F → id T’ → ε E’ → ε id ε When Top(Stack) = input = $ the parser halts and accepts the input string.

LL(k) Parser This parser parses from left to right, and does a leftmost-derivation. It looks up 1 symbol ahead to choose its next action. Therefore, it is known as a LL(1) parser. An LL(k) parser looks k symbols ahead to decide its action. LL(1) A grammar whose parsing table has no multiply-defined entries LL(1) grammars enjoys several nice properties: for example they are not ambiguous and not left recursive.

LL(k) Parser E’ → +TE’ | ε
LL(1) A grammar whose parsing table has no multiply-defined entries Example 1 E → TE’ E’ → +TE’ | ε T → FT’ T’ → *FT’ | ε F → ( E ) | id The grammar Whose PARSINGTABLE: NON- TERMINAL INPUT SYMBOL id + * ( ) $ E E → TE’ E → TE’ E’ E’ → +TE’ E’ → ε E’ → ε T T → FT’ T → FT’ T’ T’→ ε T’ → *FT’ T’ → ε T’ → ε F F → id F → (E) Is LL(1) grammar

LL(k) Parser LL(1) A grammar whose parsing table has no multiply-defined entries Example 2 The grammar S → iEtSS`| a S’ → eS | ε E → Fb Whose PARSINGTABLE: NON- TERMINAL INPUT SYMBOL a b e i t $ S S→ a S → iEtSS’ S’ S’ → ε S’ →eS S’ → ε E E →b Is NOT LL(1) grammar

Parsing Top Down Parsing Bottom Up Parsing Predictive Parsing
LL(k) Parsing Left Recursion Left Factoring Bottom Up Parsing Shift-reduce Parsing LR(k) Parsing

Bottom-Up Parsers Bottom-up parsers: build the nodes on the bottom of the parse tree first. Suitable for automatic parser generation, handle a larger class of grammars. Examples: shift-reduce parser (or LR(k) parsers)

Bottom-up Parsing LR(1), SLR(1), LALR(1) •
•  No problem with left-recursion Widely used in practice LR(1), SLR(1), LALR(1)

Grammar Hierarchy Non-ambiguous CFG CLR(1) LALR(1) LL(1) SLR(1)

– identify handle - reducible sequence:
Bottom-up Parsing •  Works from tokens to Repeat: start-symbol –  identify handle - reducible sequence: •  non-terminal is not constructed but •  all its children have been constructed –  reduce stack - construct non-terminal and update •  Until reducing to start-symbol

Bottom-up Parsing → 1 E + (2) (E) (3) + (3) (3) (3) E → E + (E) i E E
(3) (3) E → E + (E) i = 0,1, 2, …, 9 → i E E E E E 1 + ( 2 ) + ( 3 )

Bottom-up Parsing → E • Is the following grammar LL(1) ? ❚ NO 1 + (2)
•  Is the following grammar LL(1) ? E → E + (E) → i ❚  NO 1 + (2) + (3) ❚  But this is a useful grammar

Bottom-Up Parser A bottom-up parser, or a shift-reduce parser, begins
at the leaves and works up to the top of the tree. The reduction steps trace a rightmost on reverse. derivation S → aABe A → Abc | b B → d Consider the Grammar: We want to parse the input string abbcde.

Bottom-Up Parser Example Bottom-Up Parsing Program OUTPUT: INPUT:
c d e $ OUTPUT: Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example Bottom-Up Parsing Program OUTPUT: INPUT: A
c d e $ OUTPUT: A b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example Bottom-Up Parsing Program
INPUT: a A b c d e $ OUTPUT: A b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d We are not reducing here in this example. A parser would reduce, get stuck and then backtrack!

c d e $ OUTPUT: A A b c b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

d e $ OUTPUT: A A b c b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example A B A b c d b Bottom-Up Parsing Program
INPUT: a A d e $ OUTPUT: A B A b c d b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example A B A b c d b Bottom-Up Parsing Program
INPUT: a A B e $ OUTPUT: A B A b c d b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example a A B e Bottom-Up Parsing Program OUTPUT:
INPUT: a A B e $ OUTPUT: S a A B e A b c d b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d

Bottom-Up Parser Example a A B e Bottom-Up Parsing Program
OUTPUT: INPUT: S $ S a A B e A b c d b Production S → aABe Bottom-Up Parsing A → Abc Program A → b B → d This parser is known as an LR Parser because it scans the input from Left to right, and it constructs a Rightmost derivation in reverse order.

Bottom-Up Parser Example The scanning of productions for matching with
handles in the input string, and backtracking makes the method used in the previous example very inefficient. Can we do better?

LR Parser Example Input S t a c k LR Parsing Program Output action
goto

Shift reduce parser 1. Construct the action-goto table from the given grammar 2. Apply the shift-reduce parsing algorithm to construct the parse tree

Shift reduce parser 1. Construct the action-goto table from the given grammar This is what make difference between different typs of shift reduce parsing such as SLR, CLR, LALR In this course due to short of time we will not study how to construct the action-goto table

Shift reduce parser 2. Apply the shift-reduce parsing algorithm to construct the parse tree The following algorithm shows how we can construct the move parsing table for an input string w$ with respect to a given grammar G. set ip to point to the first symbol of the input string w$ repeat forever begin if action[top(stack), current-input(ip)] = shift(s) then begin push current-input(ip) then s on top of the stack advance ip to the next input symbol else if action[top(stack), current-input(ip)] = reduce A ! β then pop 2*|β| symbols off the stack; push A then goto[top(stack), A] on top of the stack; output the production A ! β end else if action[top(stack), current-input(ip)] = accept then return else error() end begin

LR Parser Example Can be parsed with this action The following
grammar: and goto table (1) E → E + T (2) E → T (3) T → T * F (4) T → F (5) F → ( E ) (6) F → id State action goto id + * ( ) $ E T F s5 s4 1 2 3 1 s6 acc 2 r2 s7 r2 r2 3 r4 r4 r4 r4 4 s5 s4 8 2 3 5 r6 r6 r6 r6 s represents shift r represents reduce acc represents accept empty represents error 6 s5 s4 9 3 7 s5 s4 10 8 s6 s11 9 r1 s7 r1 r1 10 r3 r3 r3 r3 11 r5 r5 r5 r5

LR Parser Example * GRAMMAR: (1) E → E + T (2) E → T
(3) T → T * F (4) T → F (5) F → ( E ) (6) F → id LR Parser Example OUTPUT: INPUT: id + id * id $ LR Parsing Program STACK: E State action goto id * ( ) $ E T F s s s acc r2 s r r2 r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5

(3) T → T * F (4) T → F (5) F → ( E ) Parser Example OUTPUT: F id INPUT: id * id + id $ (6) F → id LR Parsing Program STACK: 5 E id State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

(3) T → T * F (4) T → F (5) F → ( E ) LR Parser Example OUTPUT: F id INPUT: id * id + id $ (6) F → id LR Parsing Program STACK: State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (3) T → T * F (5) F → ( E ) * GRAMMAR:
(1) E → E + T (2) E → T (3) T → T * F Parser Example OUTPUT: (4) T → F T F id (5) F → ( E ) (6) F → id INPUT: id * id + id $ LR Parsing Program STACK: 3 E F The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

LR Parser Example (3) T → T * F (5) F → ( E ) * GRAMMAR:
(1) E → E + T (2) E → T (3) T → T * F Parser Example OUTPUT: (4) T → F T F id (5) F → ( E ) (6) F → id INPUT: id * id + id $ LR Parsing Program STACK: State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example * 2 r2 s7 r2 r2 GRAMMAR: (1) E → E + T (2) E → T
(3) T → T * F (4) T → F (5) F → ( E ) (6) F → id Parser Example OUTPUT: T F id INPUT: id * id + id $ LR Parsing Program STACK: E 2 T State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example * * GRAMMAR: (1) E → E + T (2) E’ → T
(3) T → T * F (4) T → F (5) F → ( E ) (6) F → id Parser Example OUTPUT: T F id INPUT: id * id + id $ LR Parsing Program STACK: 7 E * 2 T State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (5) F → ( E ) F id * * GRAMMAR:
(1) E → E + T (2) E’ → T (3) T → T * F (4) T → F (5) F → ( E ) Parser Example OUTPUT: T F F id id INPUT: id * id + id $ (6) F → id LR Parsing Program STACK: E 5 id 7 * State action goto s s r2 s r r2 r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 2 id + * ( ) $ E T F T 1 s6 acc

LR Parser Example (5) F → ( E ) F id * * GRAMMAR:
(1) E → E + T (2) E’ → T (3) T → T * F (4) T → F (5) F → ( E ) Parser Example OUTPUT: T F F id id INPUT: id * id + id $ (6) F → id LR Parsing Program STACK: E 7 * 2 T State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (4) T → F F id * * GRAMMAR: (1) E → E + T (2) E’ → T
(3) T → T * F OUTPUT: (4) T → F (5) F → ( E ) (6) F → id T T * F F id id INPUT: id * id + id $ LR Parsing Program STACK: 10 E F 7 * State action goto s s r2 s r r2 r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 2 id + * ( ) $ E T F T 1 s6 acc

LR Parser Example (4) T → F F id * GRAMMAR: (1) E → E + T (2) E → T
(3) T → T * F OUTPUT: (4) T → F (5) F → ( E ) (6) F → id T T * F F id id INPUT: id * id + id $ LR Parsing Program STACK: State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (1) E → E + T F id * 2 r2 s7 r2 r2 GRAMMAR:
(3) T → T * F (4) T → F (5) F → ( E ) (6) F → id OUTPUT: E T T * F F id id INPUT: id * id + id $ LR Parsing Program STACK: 2 T State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (1) E → E + T F id * GRAMMAR: (2) E → T
(3) T → T * F (4) T → F (5) F → ( E ) (6) F → id OUTPUT: E T T * F F id id INPUT: id * id + id $ LR Parsing Program STACK: State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example F id * GRAMMAR: (1) E → E + T (2) E’ → T
(3) T → T * F (4) T → F (5) F → ( E ) (6) F → id LR Parser Example OUTPUT: E T T * F F id id INPUT: id * id + id $ LR Parsing Program STACK: 1 E State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example F id * GRAMMAR: (1) E → E + T (2) E’ → T
(3) T → T * F (4) T → F (5) F → ( E ) (6) F → id LR Parser Example OUTPUT: E T T * F F id id INPUT: id * id + id $ LR Parsing Program STACK: 6 + 1 E State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (5) F → ( E ) F id * GRAMMAR:
(1) E → E + T (2) E’ → T (3) T → T * F (4) T → F (5) F → ( E ) LR Parser Example OUTPUT: E T F T * F id F id id INPUT: id * id + id $ (6) F → id LR Parsing Program STACK: 5 id 6 + State action goto s s r2 s r r2 r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 1 id + * ( ) $ E T F E 1 s6 acc

LR Parser Example (5) F → ( E ) F id * GRAMMAR:
(1) E → E + T (2) E’ → T (3) T → T * F (4) T → F (5) F → ( E ) Parser Example OUTPUT: E T F T * F id F id id INPUT: id * id + id $ (6) F → id LR Parsing Program STACK: 6 + 1 E State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (3) T → T * F E T (5) F → ( E ) F id id * GRAMMAR:
(1) E → E + T (2) E’ → T (3) T → T * F LR Parser Example OUTPUT: (4) T → F E T T F T * F id F id id (5) F → ( E ) (6) F → id INPUT: id * id + id $ LR Parsing Program STACK: 3 F 6 + State action goto s s r2 s r r2 r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 1 id + * ( ) $ E T F E 1 s6 acc

LR Parser Example (3) T → T * F E T (5) F → ( E ) F id id * GRAMMAR:
(1) E → E + T (2) E’ → T (3) T → T * F Parser Example OUTPUT: (4) T → F E T T F T * F id F id id (5) F → ( E ) (6) F → id INPUT: id * id + id $ LR Parsing Program STACK: 6 + 1 E State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example (2) E’ → T E + T F id id * GRAMMAR: (1) E → E + T
(3) T → T * F (4) T → F (5) F → ( E ) (6) F → id OUTPUT: E E T T F T * F id F id id INPUT: id * id + id $ LR Parsing Program STACK: 9 T 6 + State action goto s s r2 s r r2 r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 1 id + * ( ) $ E T F E 1 s6 acc

LR Parser Example (2) E → T E + T F id id * GRAMMAR: (1) E → E + T
(3) T → T * F (4) T → F (5) F → ( E ) (6) F → id OUTPUT: E E T T F T * F id F id id INPUT: id * id + id $ LR Parsing Program STACK: State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

LR Parser Example E + T F id id * GRAMMAR: (1) E → E + T (2) E’ → T
(3) T → T * F (4) T → F (5) F → ( E ) (6) F → id LR Parser Example OUTPUT: E E T T F T * F id F id id INPUT: id * id + id $ LR Parsing Program STACK: 1 E State action goto s s s acc r4 r r r4 s s r6 r r r6 s s s s s s11 r1 s r r1 r3 r r r3 r5 r r r5 id + * ( ) $ E T F 2 r2 s7 r2 r2

Constructing Parsing Tables
All LR parsers use the same parsing program that we demonstrated in the previous slides. What differentiates the LR parsers are the action and the goto tables: Simple LR (SLR): succeeds for the fewest grammars, but is the easiest to implement. Canonical LR: succeeds for the most grammars, but is the hardest to implement. It splits states when necessary to prevent reductions that would get the parser stuck. Lookahead LR (LALR): succeeds for most common syntactic constructions used in programming languages, but produces LR tables much smaller than canonical LR.

Grammar Hierarchy Non-ambiguous CFG CLR(1) LALR(1) LL(1) SLR(1)

Parsing How parser works? Top Down Parsing Bottom Up Parsing
Predictive Parsing LL(k) Parsing Left Recursion Left Factoring Bottom Up Parsing Shift-reduce Parsing LR(k) Parsing How to write parser?

get next lexical analyzer next token Syntax analyzer token Source
1. 2. Uses Regular Expressions to define tokens Uses Finite Automata to recognize tokens next char lexical analyzer next token Syntax analyzer get next get next char token Source Program symbol table (Contains a record for each identifier) Uses Top-down parsing or Bottom-up parsing To construct a Parse tree

How to write a parser? Yacc

Yacc Lex Yacc Compiler Source program token description
lexical analysis Language grammar Yacc syntax analysis Inter. representation code generation Target program

How to write an LR parser? The construction is done
General approach: The construction is done automatically by a tool such as the Unix program yacc. Using the source program language grammar to write a simple yacc program and save it in a file named name.ｙ Using the unix program yacc to compile name.ｙ resulting a C (parser) program named y.tab.c in Compiling and linking the C program y.tab.c in a normal way resulting the required parser.

LR parser generators Yacc: Yet another compiler compiler •
•  Automatically generate LALR parsers •  Created by S.C. Johnson in 1970’s

Using Yacc Yacc source Yacc compiler program filename.y y.tab.c C
a.out a.out (Parser) Input tokens Parse tree

Yacc analyzer spec Source program Lexical lexer spec LEX .c C compiler
tokens Parser spec Yacc .c C compiler Parser

Yacc Example analyzer spec tomatoes + potatoes + carrots Lexical
lexer spec LEX .c C compiler id1, PLUS, id2, PLUS, id3 Parser spec Yacc .c C compiler Parser + + id3 id1 id2

How to write parser symbol table source Scanner Parser token program
lex.yy.c y.tab.c Lex Yacc Lex spec (.l) yacc spec (.y)

How to write parser symbol table source Scanner Parser token program
lex.yy.c y.tab.c Yacc Lex spec (.l) yacc spec (.y)

How to write a yacc program comments > auxiliary subroutines>
myfile.y %{ < C global variables, prototype 　　comments > %} [DEFINITION SECTION] This part will be embedded into myfile.tab.c < C global variables, prototypes, contains token declarations. Tokens are recognized in lexer. %% define how to “understand” the input language, and what actions to take for each “sentence”. [PRODUCTION RULES SECTION] %% any user code. For example, a main function to call the parser function < C auxiliary subroutines> < C auxiliary subroutines> < C auxiliary subroutines> yyparse()

Example: PRODUCTION RULES SECTION Example: statement ! expression
expression ! expression + expression | expression - expression | expression * expression | expression / expression | NUMBER statement : expression { printf (“ = %g\n”, $1); } expression : expression ‘+’ expression { $$ = $1 + $3; } | expression ‘-’ expression { $$ = $1 - $3; } | expression ‘*’ expression { $$ = $1 * $3; } | expression ‘/’ expression { $$ = $1 / $3 ; } | NUMBER { $$ = $1; } ;

Presentation Outline Review of Lexical analysis

Similar presentations

Presentation on theme: "Presentation Outline Review of Lexical analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presentation Outline Review of Lexical analysis

Similar presentations

Presentation on theme: "Presentation Outline Review of Lexical analysis"— Presentation transcript:

Similar presentations

About project

Feedback