Lesson 4 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg
2 Outline Recursive descent parsers Left recursion Left factoring
RECURSIVE DESCENT PARSING 3
Writing a recursive descent parser Straightforward once the grammar is written in an appropriate form: –For each nonterminal: create a function Represents the expectation of that nonterminal in the input Each such function should choose a grammar production, i.e., RHS, based on the lookahead token It should then process the chosen RHS –Terminals are “matched”: match(IF); match(LEFT_PARENTHESIS); … match(RIGHT_PARENTHESIS); … –For nonterminals their corresponding “expectation functions” are called 4
The function match() Helper function to consume terminals: void match(int expected_lookahead) { if (lookahead == expected_lookahead) lookahead = nextToken(); else error(); } (assumes tokens are represented as ints) 5
Recursive descent example Grammar for a subset of the language “types in Pascal”: type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num Examples of “programs”: ^ my_type array [ ] of Integer array [ Char ] of
Recursive descent example void type() { switch (lookahead) { case '^': match('^'); match(ID); break; case ARRAY: match(ARRAY); match('['); simple(); match(']'); match(OF); type(); break; default: simple(); } void simple() { switch (lookahead) { case INTEGER: match(INTEGER); break; case CHAR: match(CHAR); break; case NUM: match(NUM); match(DOTDOT); match(NUM); break; default: error(); } 7
Exercise (1) List the calls made by the previous recursive descent parser on the input string array [ num dotdot num ] of integer To get you started: type() match(ARRAY) match('[') simple()... 8
9 type array [ num dotdot num ] of integer type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num
10 type array [ num dotdot num ] of integer type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num simple type
11 type array [ num dotdot num ] of integer type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num simple type
12 type array [ num dotdot num ] of integer type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num simple type
13 type array [ num dotdot num ] of integer type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num simple type
14 type array [ num dotdot num ] of integer type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num simple type
15 type array [ num dotdot num ] of integer type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num simple type
16 type array [ num dotdot num ] of integer type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num simple type
17 type array [ num dotdot num ] of integer type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num simple type
18 type array [ num dotdot num ] of integer type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num simple type
19 type array [ num dotdot num ] of integer type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num simple type simple
20 type array [ num dotdot num ] of integer type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num type→ ^ id | array [ simple ] of type | simple simple→ integer | char | num dotdot num simple type simple
LEFT RECURSION 21
The problem with left recursion Left-recursive grammar: A → A α | β Problematic for recursive descent parsing –Infinite recursion 22
The problem with left recursion The left-recursive expression grammar: expr→ expr + num | expr – num | num Parser code: void expr() { if (lookahead != NUM) expr(); match('+'); … 23
Eliminating left recursion Left-recursive grammar: A → A α | β Rewritten grammar: A → β M M → α M | ε 24
Exercise (2) Remove the left recursion from the following grammar for formal parameter lists in C: list→ par | list, par par → int id int and id are tokens that represent the keyword int and identifiers, respectively. Hint: what is α and what is β in this case? 25
LEFT FACTORING 26
The problem Recall: how does a predictive parser choose production body? What if the lookahead token matches more than one such production body? 27
The problem Problematic grammar: list→ num | num, list If lookahead = num, what to expect? 28
Left factoring The previous grammar, list→ num | num, list becomes list → num list’ list’→ ε |, list 29
Exercise (3) Perform left factoring on the following grammar for declarations of variables and functions in C: decl→ int id ; | int id ( pars ) ; pars →... 30
Conclusion Recursive descent parsers Left recursion Left factoring 31
Next time The sets FIRST and FOLLOW Defining LL(1) grammars Non-recursive top-down parser Handling syntax errors 32