Parsers and control structures Grammar checking
Outline Control Structures Parsers Expressions Selection statements Iterative statements Context-free grammar Parsers Recursive-descent parsing LL(1) parsing SLR(1) parsing 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Control Structures Expressions Selection statements Iterative statements 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Expressions Control flow in an expression is determined by Operator precedence Operators with higher precedence are executed before. Operator associativity If an operator is left/right associative, then in the expression … a …, a is used by the operator on the left/ right first. When is left-associative, a b c d means (((a b) c) d) . When is right-associative, a b c d means (a (b (c d))). 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Selection Statements Two-way selections: If statements Nested ifs Explicit: using blocks, such as {…} or begin…end Implicit: using semantic rule “else go with the nearest if” Multiway selections: switch-case statements Type of control expression Selectable segment Execution flow Unrepresented expression 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Type of Control Expression Real number FORTRAN Use 3-way selector : if (arithmeticExpression) N1, N2,N3 Ordinal (int, char, boolean, enum) Pascal, Ada Integer only C, C++, JAVA Why control expression must be ordinal Jump tables are used for implementation 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Execution flow FORTRAN PASCAL C case … of const1: … const2: otherwise: end IF (…) N1, N2, N3 N1 … GOTO OUT N2 … N3 … OUT switch (…) { case const1: … case const2: break; default: } 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Selectable segments Single statement Compound statement Allowed in most languages Compound statement C, C++, JAVA 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Unrepresented Expression Unspecified Error occurs during execution Some Pascal Can be specified, but not required Use default, else, otherwise as keywords C, other Pascal Require lists to be exhaustive Ada 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Iteration Statements Counter-controlled loops Logically-controlled loops Should both be combined into one construct ? Example: for loop in C 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Loop Variables in Counter-controlled Loops Type Only integer in FORTRAN Integer and real in ALGOL Ordinal in Pascal Discrete range in Ada scope only in loop in Pascal, Ada As defined Value at loop termination Can be changed in loop body ? FORTRAN, Pascal: Allowed, but not effect loop control ALGOL: Allowed and effect loop control Ada: not allowed Evaluation of loop parameters Once in FORTRAN, Pascal, Ada Every iteration in ALGOL ALGOL allows var := list 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Logically-controlled Loops Pre-test or post-test ? While or until ? Pascal has while-do and repeat- until. C, C++ and Java have while-do and do-while. Ada has a pretest version, but no posttest. FORTRAN 77 and 90 have none. Perl has two pretest loops, while and until, but no posttest loop. 2301380 Chapter 4 Parsers and Control Structures 24/11/61
User-located Loop Control Break or exit Bad for readability Allow in most languages 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Iteration Based on Data Structures Foreach in Perl: for arrays 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Context-free Grammar 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Grammar Rules V is a finite set of nonterminals, containing S, T is a finite set of terminals, P is a set of production rules in the form of aβ where is in V and β is in (V UT )*, and S is the start symbol Terminology sentential form Derivation Leftmost derivation Rightmost derivation 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Left/Right Recursive A grammar is a left recursive if its production rules can generate a derivation of the form A * A X. Examples: E F + id | (E) | id F E * id | id E F + id E * id + id A grammar is a right recursive if its production rules can generate a derivation of the form A * X A. Examples: E id + F | (E) | id F id * E | id E id + F id + id * E 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Parse Tree A labeled tree in which the interior nodes are labeled by nonterminals leaf nodes are labeled by terminals the children of an interior node represent a replacement of the associated nonterminal in a derivation corresponding to a derivation 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Abstract Syntax Tree + + * * Representation of actual source tokens Interior nodes represent operators. Leaf nodes represent operands. + E * id3 id2 id1 + id1 id3 * id2 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Abstract Syntax Tree for If Statement ) exp true ( if else elsePart return if true st return 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Ambiguous Grammar A grammar is ambiguous if it can generate two different parse trees for one string. Ambiguous grammars can cause inconsistency in parsing. E E + E | E – E | E * E | E / E | id * E id2 + id1 id3 + E * id3 id2 id1 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Ambiguity in Expressions Which operation is to be done first? solved by precedence An operator with higher precedence is done before one with lower precedence. An operator with higher precedence is placed in a rule (logically) further from the start symbol. solved by associativity Right-associated : W + (X + (Y + Z)) Left-associated : ((W + X) + Y) + Z 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Parsers Recursive-descent parsers LL(1) parsers SLR(1) parsers 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Parsing Parsing is a process that constructs a syntactic structure (i.e. parse tree) from the stream of tokens. We already learn how to describe the syntactic structure of a language using (context-free) grammar. Stream of tokens Parser Parse tree Context-free grammar 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Top-down Parsing Bottom-up Parsing A parse tree is created from leaves to root Tracing rightmost derivation id + id * id E + id * id E + E * id E + E * E E E + E A parse tree is created from root to leaves Tracing leftmost derivation E E + E id + E id + E * E id + id * E id + id * id id E * + 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Recursive-Descent Parsing Write one procedure for each set of productions with the same nonterminal in the LHS Each procedure recognizes a structure described by a nonterminal. A procedure calls other procedures if it need to recognize other structures. A procedure calls match procedure if it need to recognize a terminal. 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Recursive-Descent: Example E E O F | F O + | - F ( E ) | id procedure F { switch token { case ‘(‘: match(‘(‘); E; match(‘)’); case id: match(id); default: error; } } We cannot decide which rule to use for E If we choose E E O F, it leads to infinitely recursive loops. Rewrite the grammar into E F { O F } procedure E { F; while (token==‘+’ or token==‘-’) { O; F; } } 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Problems in Recursive-Descent Difficult to convert grammars into EBNF Cannot decide which production to use at each point Cannot decide when to use -production A 2301380 Chapter 4 Parsers and Control Structures 24/11/61
LL(1) Parsers 2301380 Chapter 4 Parsers and Control Structures 24/11/61
LL(1) Parsing LL(1) Use stack to simulate leftmost derivation Read input from (L) left to right Simulate (L) leftmost derivation 1 lookahead symbol Use stack to simulate leftmost derivation Part of sentential form produced in the leftmost derivation is stored in the stack. Top of stack is the leftmost nonterminal symbol in the fragment of sentential form. 2301380 Chapter 4 Parsers and Control Structures 24/11/61
How LL(1) Parsing Works Keep part of sentential form in the stack. If the symbol on the top of stack is a terminal, try to match it with the next input token and pop it out of stack. If the symbol on the top of stack is a nonterminal X, and there is a production rule X Y, then replace X with Y. Which production will be chosen, if there are both X Y and X Z ? 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Example: LL(1) Parsing F n ( T N E X ) A + F F n Finished N ( T T N M E TX FNX (E)NX (TX)NX (FNX)NX (nNX)NX (nX)NX (nATX)NX (n+TX)NX (n+FNX)NX (n+(E)NX)NX (n+(TX)NX)NX (n+(FNX)NX)NX (n+(nNX)NX)NX (n+(nX)NX)NX (n+(n)NX)NX (n+(n)X)NX (n+(n))NX (n+(n))MFNX (n+(n))*FNX (n+(n))*nNX (n+(n))*nX (n+(n))*n E T X X A T X | A + | - T F N N M F N | M * F ( E ) | n ) A + F F n Finished N ( T T N M X * E X n ) F F N N T X E ( n + ) * $ $ 2301380 Chapter 4 Parsers and Control Structures 24/11/61
(LL(1) Parsing Algorithm Push the start symbol into the stack WHILE stack is not empty ($ is not on top of stack) and the stream of tokens is not empty (the next input token is not $) SWITCH (Top of stack, next token) CASE (terminal a, a): Pop stack; Get next token CASE (nonterminal A, terminal a): IF the parsing table entry M[A, a] is not empty THEN Get A X1 X2 ... Xn from the parsing table entry M[A, a] Pop stack; Push Xn ... X2 X1 into stack in that order ELSE Error CASE ($,$): Accept OTHER: Error 2301380 Chapter 4 Parsers and Control Structures 24/11/61
What is in LL(1) Parsing Table Which production to use if nonterminal N is on the top of stack and the next token is t ? Choose a production such that Left handside is N Right handside X FINALLY produces t X * tY or X * and S * WNtY 2301380 Chapter 4 Parsers and Control Structures 24/11/61
First Let X be or be in V or T. First(X ) is the set of the first terminal in any sentential form derived from X. If X is a terminal or , then First(X)={X}. If X is a nonterminal and there is X X1 X2 ... Xn First(X1) -{} is a subset of First(X) First(Xi )-{} is a subset of First(X) if for all j<i First(Xj) contains {} is in First(X) if for all j≤n First(Xj)contains 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Example: First exp exp addop term | term addop + | - term term mulop factor|factor mulop * factor (exp) | num First(addop) = {+, -} First(mulop) = {*} First(factor) = {(, num} First(term) = {(, num} First(exp) = {(, num} st ifst | other ifst if ( exp ) st elsepart elsepart else st | exp 0 | 1 First(exp) = {0, 1} First(elsepart) = {else, } First(ifst) = {if} First(st) = {if, other} 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Algorithm for finding First FOR ALL terminals a, First(a) = {a} FOR ALL nonterminals A, First(A) := {} WHILE there is any change to any First(A) FOR ALL rule A X1 X2 ... Xn FOR ALL Xi in {X1, X2, …, Xn } IF FOR ALL j<i First(Xj) contains , THEN add First(Xi)-{} to First(A) IF is in First(X1), First(X2), ..., and First(Xn) THEN add to First(A) 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Example: Finding First exp term exp’ exp’ addop term exp’ | addop + | - term factor term’ term’ mulop factor term’| mulop * factor ( exp ) | num exp ( num exp’ + - addop + - term term’ * mulop factor 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Follow Let $ denote the end of input tokens If A is the start symbol, then $ is in Follow(A). If there is a rule B X A Y, then First(Y) - {} is in Follow(A). If there is production B X A Y and is in First(Y), then Follow(A) contains Follow(B). 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Algorithm for Finding Follow(A) Follow(S) = {$} FOR ALL A in V-{S} Follow(A) = {} WHILE change is made to some Follow sets FOR each production A X1 X2 ... Xn, FOR ALL nonterminal Xi Add First(Xi+1 Xi+2...Xn) – {} into Follow(Xi). (NOTE: If i=n, Xi+1 Xi+2...Xn= ) IF is in First(Xi+1 Xi+2...Xn) THEN Add Follow(A) to Follow(Xi). 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Example: Finding Follow Set exp term exp’ exp’ addop term exp’ | addop + | - term factor term’ term’mulop factor term’ | mulop * factor ( exp ) | num First Follow exp ( num $ ) exp’ + - addop + - term + - $ ) term’ * mulop * factor 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Constructing LL(1) Parsing Tables FOR each nonterminal A and a production A X FOR each token a in First(X) A X is in M(A, a) IF is in First(X) THEN FOR each element a in Follow(A) Add A X to M(A, a) 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Example: Constructing LL(1) Parsing Table First Follow exp {(, num} {$,)} exp’ {+,-, } {$,)} addop {+,-} {(,num} term {(,num} {+,-, ),$} term’ {*, } {+,-, ),$} mulop {*} {(,num} factor {(, num} {+,-,*,),$} 1 exp term exp’ 2 exp’ addop term exp’ 3 exp’ 4 addop + 5 addop - 6 term factor term’ 7 term’ mulop factor term’ 8 term’ 9 mulop * 10 factor ( exp ) 11 factor num ( ) + - * n $ exp exp’ addop term term’ mulop factor 1 1 3 2 2 3 4 5 6 6 8 8 8 7 8 9 10 11 2301380 Chapter 4 Parsers and Control Structures 24/11/61 43
LL(1) Grammar A grammar is an LL(1) grammar if its LL(1) parsing table has at most one production in each table entry. Non-LL(1) grammar More than 1 production in at least one entry of the table. Causes: Left-recursion Left-factor 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Non-LL(1) Grammar 1 exp exp addop term 2 exp term 3 term term mulop factor 4 term factor 5 factor ( exp ) 6 factor num 7 addop + 8 addop - 9 mulop * First(exp) = { (, num } First(term) = { (, num } First(factor) = { (, num } First(addop) = { +, - } First(mulop) = { * } 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Left Recursion Immediate left recursion General left recursion A A X | Y (A=Y X*) A A X1 | A X2 |…| A Xn | Y1 | Y2 |... | Ym A=(Y1,Y2,..,Ym) (X1,X2,…,Xn)* General left recursion A => X =>* A Y Can be removed easily A Y A’, A’ X A’| A Y1 A’ | Y2 A’ |...| Ym A’, A’ X1 A’| X2 A’|…| Xn A’| Can be removed when there is no empty-string production and no cycle in the grammar 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Removal of Immediate Left Recursion exp exp + term | exp - term | term term term * factor | factor factor ( exp ) | num Remove left recursion exp term exp’ exp’ + term exp’ | - term exp’ | term factor term’ term’ * factor term’ | 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Left Factoring Left factor causes non-LL(1) A X Y | X Z Given A X Y | X Z. Both A X Y and A X Z can be chosen when A is on top of stack and a token in First(X) is the next token. A X Y | X Z can be left-factored as A X A’ and A’ Y | Z 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Example: Left-factor ifSt if ( exp ) st else st | if ( exp ) st can be left-factored as ifSt if ( exp ) st elsePart elsePart else st | seq st ; seq | st seq st seq’ seq’ ; seq | 2301380 Chapter 4 Parsers and Control Structures 24/11/61
SLR(1) Parsers 2301380 Chapter 4 Parsers and Control Structures 24/11/61
SLR(1) Parsing Bottom-up parsing Simulate rightmost derivation (R) from left (L) to right, thus called LR parsing More powerful than top-down parsing Left recursion does not cause problem Two actions Shift: take next input token into the stack Reduce: replace a string B on top of stack by a nonterminal A, given a production A B 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Example of Shift-reduce Parsing Grammar S’ S S (S)S | Parsing actions Stack Input Action $ ( ( ) ) $ shift $ ( ( ) ) $ shift $ ( ( ) ) $ reduce S $ ( ( S ) ) $ shift $ ( ( S ) ) $ reduce S $ ( ( S ) S ) $ reduce S ( S )S $ ( S ) $ shift $ ( S ) $ reduce S $ ( S ) S $ reduce S ( S )S $ S $ accept Reverse of rightmost derivation from left to right 1 ( ( ) ) 2 ( ( ) ) 3 ( ( ) ) 4 ( ( S ) ) 5 ( ( S ) ) 6 ( ( S ) S ) 7 ( S ) 8 ( S ) 9 ( S ) S 10 S’ S 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Terminologies $ ( ( ) ) $ shift $ ( ( ) ) $ shift $ ( ( ) ) $ reduce S $ ( ( S ) ) $ shift $ ( ( S ) ) $ reduce S $ ( ( S ) S ) $ reduce S ( S )S $ ( S ) $ shift $ ( S ) $ reduce S $ ( S ) S $ reduce S ( S )S $ S $ accept 1 ( ( ) ) 2 ( ( ) ) 3 ( ( ) ) 4 ( ( S ) ) 5 ( ( S ) ) 6 ( ( S ) S ) 7 ( S ) 8 ( S ) 9 ( S ) S 10 S’ S Right sentential form: sentential form in a rightmost derivation Viable prefix: sequence of symbols on the parsing stack Handle: right sentential form + position where reduction can be performed + production used for reduction 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Items Production with distinguished position in its RHS Keep track of what is left to be done in the parsing process by using finite automata of items An item A w . B y means: A w B y might be used for the reduction in the future, at the time, we know we already construct w in the parsing process, if B is constructed next, we get the new item A w B . Y 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Types of LR(0) Items Initial Item Complete Item Closure Item of x Item with the distinguished position on the leftmost of the production Complete Item Item with the distinguished position on the rightmost of the production Closure Item of x Item x together with items which can be reached from x via -transition Kernel Item Original item, not including closure items 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Finite Automata of Items Grammar: S’ S S (S)S S Items: S’ .S S’ S. S .(S)S S (.S)S S (S.)S S (S).S S (S)S. S . S S’ .S S’ S. S .(S)S S . ( S (.S)S S S (S.)S ) S (S).S S (S)S. S 2301380 Chapter 4 Parsers and Control Structures 24/11/61
DFA of LR(0) Items S’ .S S’ S. S .(S)S S . S (.S)S S (S.)S S .(S)S S . ( ( S (S.)S S S .(S)S S (.S)S S . S (.S)S S ) S (S).S S .(S)S S . S (S.)S ( ( ) S (S).S S S S (S)S. S (S)S. 2301380 Chapter 4 Parsers and Control Structures 24/11/61
LR(0) Parsing Algorithm Let B be a terminal. Item in state token Action A x.By B shift B and push state s containing A xB.y not B error A x. - reduce with A x (i.e. pop x, backup to the state s on top of stack) and push A with new state d(s,A) S’ S. none accept any 2301380 Chapter 4 Parsers and Control Structures 24/11/61
LR(0) Parsing Table State Action Rule ( a ) A shift 3 2 1 reduce shift 3 2 1 reduce A’ A A a 4 5 A (A) A’ .A A .(A) A .a A A’ A. 1 a ( a A a. 2 A (.A) A .(A) A .a A (A.) 4 3 A ) ( A (A). 5 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Example of LR(0) Parsing Stack Input Action $0 ( ( a ) ) $ shift $0(3 ( a ) ) $ shift $0(3(3 a ) ) $ shift $0(3(3a2 ) ) $ reduce $0(3(3A4 ) ) $ shift $0(3(3A4)5 ) $ reduce $0(3A4 ) $ shift $0(3A4)5 $ reduce $0A1 $ accept State Action Rule ( a ) A shift 3 2 1 reduce A’ A A a 4 5 A (A) 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Non-LR(0) Grammar Conflict Shift-reduce conflict A state contains a complete item A x. and a shift item A x.By Reduce-reduce conflict A state contains more than one complete items. A grammar is a LR(0) grammar if there is no conflict in the grammar S’ .S S .(S)S S . S S’ S. ( S (S.)S S S .(S)S S (.S)S S . ) S (S).S S .(S)S S . ( ( S S (S)S. 2301380 Chapter 4 Parsers and Control Structures 24/11/61
SLR(1) Parsing Simple LR with 1 lookahead symbol Examine the next token before deciding to shift or reduce If the next token is the token expected in an item, then it can be shifted into the stack. If a complete item A x. is constructed and the next token is in Follow(A), then reduction can be done using A x. Otherwise, error occurs. Can avoid some conflict. 2301380 Chapter 4 Parsers and Control Structures 24/11/61
SLR(1) parsing algorithm Let B be a terminal. Item in state token Action A x.By B shift B & push state s containing AxB.y not B error A x. Follow(A) reduce with Ax (pop x, backup to the state s on top of stack) & push A with state d(s,A) Follow(A) S’ S. none accept any 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Constructing SLR(1) Parsing Table A (A) | a State ( a ) $ A S3 S2 1 AC 2 R2 3 4 S5 5 R1 A A’ A. 1 A’ .A A .(A) A .a 0 a A a 2 ( a A (.A) A .(A) A .a 3 A (A.) 4 A ) ( A (A). 5 2301380 Chapter 4 Parsers and Control Structures 24/11/61
SLR(1), but not LR(0), Grammar S (S)S | S’ .S S .(S)S S . 0 S S’ S. 1 State ( ) $ S S2 R2 1 AC 2 3 S4 4 5 R1 ( S (S.)S 3 S S .(S)S S (.S)S S . 2 ) S (S).S S .(S)S S . 4 ( ( S S (S)S. 5 2301380 Chapter 4 Parsers and Control Structures
Disambiguating Rules for Parsing Conflict Shift-reduce conflict Prefer shift over reduce In case of nested if statements, preferring shift over reduce implies most closely nested rule for dangling else Reduce-reduce conflict Error in design 2301380 Chapter 4 Parsers and Control Structures 24/11/61
Dangling Else S’ S. 1 S’ .S 0 S .I S .other I .if S state if else other $ S I S4 S3 1 2 ACC R1 3 R2 4 5 S6 R3 6 7 R4 S’ S. 1 S S’ .S 0 S .I S .other I .if S I .if S else S I S I. 2 I I if S else .S 6 S .I S .other I .if S I .if S else S other I if other S other. 3 if I if .S 4 I if .S else S S .I S .other I .if S I .if S else S I .if S else S 7 S other else S I if S. 5 I if S. else S if 2301380 Chapter 4 Parsers and Control Structures 24/11/61