Download presentation
Presentation is loading. Please wait.
1
Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands
2
Summary of lecture 2 lexical analyzer generator description FSA FSA construction dotted items character moves moves program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description
3
Quiz 2.24 Will the function identify() (to access a symbol table) still work if the hash function maps all identifiers onto the same number? 2.26 Tutor X insists that macro processing must be implemented as a separate phase between reading the program and the lexical analysis. Show mr. X wrong.
4
syntax analysis: tokens AST AST construction by hand: recursive descent automatic: top-down (LLgen), bottom-up (yacc) Overview program text lexical analysis syntax analysis context handling annotated AST tokens AST parser generator language grammar
5
Syntax analysis parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together. result: parse tree ‘b’ identifier expression term factor term ‘b’ factor identifier ‘4’ constant term factor term ‘a’ factor identifier term factor identifier expression ‘c’‘c’ ‘*’‘*’ ‘-’‘-’ ‘*’‘*’ ‘*’‘*’ syn-tax: the way in which words are put together to form phrases, clauses, or sentences. Webster’s Dictionary
6
Syntax analysis parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together. result: parse tree ‘b’ identifier expression term factor term ‘b’ factor identifier ‘4’ constant term factor term ‘a’ factor identifier term factor identifier expression ‘c’‘c’ ‘*’‘*’ ‘-’‘-’ ‘*’‘*’ ‘*’‘*’
7
Context free grammar G = (V N, V T, S, P) V N : set of Non-terminal symbols V T : set of Terminal symbols S : Start symbol (S V N ) P : set of Production rules {N } V N V T = P = {N | N V N (V N V T )*}
8
Top-down parsing process tokens left to right expression grammar input expression EOF expression term rest_expression term IDENTIFIER | ‘(’ expression ‘)’ rest_expression ‘+’ expression | example expression input expressionEOFtermrest_expressionIDENTIFIER aap + ( noot + mies )
9
rest_expressionexpression rest_expr term Bottom-up parsing process tokens left to right expression grammar input expression EOF expression term rest_expression term IDENTIFIER | ‘(’ expression ‘)’ rest_expression ‘+’ expression | IDENT IDENT IDENT aap + ( noot + mies )
10
Comparison top-downbottom-up node creationpre-orderpost-order alternative selectionfirst tokenlast token grammar typerestricted LL(1) relaxed LR(1) implementationmanual + automatic automatic 5 1 4 2 3 1 2 3 4 5
11
Recursive descent parsing each rule N translates to a boolean function return true if a terminal production of N was matched return false otherwise (without consuming any token) try alternatives of N in turn a terminal symbol must match the current token a non-terminal is matched by calling its routine input expression EOF int input(void) { return expression() && require(token(EOF)); }
12
Recursive descent parsing expression term rest_expression int expression(void) { return term() && require(rest_expression()); } term IDENTIFIER | ‘(’ expression ‘)’ int term(void) { return token(IDENTIFIER) || token('(') && require(expression()) && require(token(')')); }
13
Recursive descent parsing auxiliary functions consume matched tokens report syntax errors int token(int tk) { if (Token.class != tk) return 0; get_next_token(); return 1; } int require(int found) { if (!found) error(); return 1; } rest_expression ‘+’ expression | int rest_expression(void) { return token('+') && require(expression()) || 1; }
14
Automatic top-down parsing follow recursive descent scheme, but avoid interpretation overhead for each rule and alternative determine the tokens it can start with: FIRST set parsing scheme for rule N A 1 | A 2 | … if token FIRST(N) then ERROR if token FIRST(A 1 ) then parse A 1 if token FIRST(A 2 ) then parse A 2 …
15
Exercise (7 min.) design an algorithm to compute the FIRST sets of all non-terminals in a context free grammar. hint: consider the types of rules alternatives composition empty productions input expression EOF expression term rest_expression term IDENTIFIER | ‘(’ expression ‘)’ rest_expression ‘+’ expression |
16
Answers
17
Answers (Fig 2.58, page 122) N w (w V T ) FIRST(N) = {w} N A 1 | A 2 | … FIRST(N) = FIRST(A i ) N FIRST(N) = { } N A FIRST(N) = FIRST(A ), FIRST(A) FIRST(N) = FIRST(A ) \ { } FIRST( ), otherwise closure algorithm
18
Break
19
Predictive parsing similar to recursive descent, but no back-tracking functions “know” what they are doing input expression EOF FIRST(expression) = {IDENT, ‘(‘} void input(void) { switch (Token.class) { case IDENT: case '(': expression(); token(EOF); break; default: error(); } void token(int tk) { if (Token.class != tk) error(); get_next_token(); }
20
Predictive parsing expression term rest_expression FIRST(term) = {IDENT, ‘(‘} void expression(void) { switch (Token.class) { case IDENT: case '(': term(); rest_expression(); break; default: error(); } term IDENTIFIER | ‘(’ expression ‘)’ void term(void) { switch (Token.class) { case IDENT: token(IDENT); break; case '(': token('('); expression(); token(')'); break; default: error(); }
21
Predictive parsing FIRST( ) = { } check nothing? NO: token FOLLOW(rest_expr) rest_expression ‘+’ expression | FIRST(rest_expr) = {‘+’, } void rest_expression(void) { switch (Token.class) { case '+': token('+'); expression(); break; case EOF: case ')': break; default: error(); } FOLLOW(rest_expr) = {EOF, ‘)’}
22
Limitations of LL(1) parsers FIRST/FIRST conflict term IDENTIFIER | IDENTIFIER ‘[‘ expression ‘]’ | ‘(’ expression ‘)’ FIRST/FOLLOW conflict S A ‘a’ ‘b’ A ‘a’ | left recursion expression expression ‘-’ term | term FIRST(A) = { ‘a’ } = FOLLOW(A)
23
Making grammars LL(1) manual labour rewrite grammar adjust semantic actions three rewrite methods left factoring substitution left-recursion removal
24
Left factoring term IDENTIFIER | IDENTIFIER ‘[‘ expression ‘]’ factor out common prefix term IDENTIFIER after_identifier after_identifier | ‘[‘ expression ‘]’ ‘[’ FOLLOW(after_identifier)
25
Substitution S A ‘a’ ‘b’ A ‘a’ | replace non-terminal by its alternative S ‘a’ ‘a’ ‘b’ | ‘a’ ‘b’
26
Left-recursion removal N N | replace by N M M M | example expression expression ‘-’ term | term ... expression term tail tail ‘-’ term tail | N
27
Exercise (7 min.) make the following grammar LL(1) expression expression ‘+’ term | expression ‘-’ term | term term term ‘*’ factor | term ‘/’ factor | factor factor ‘(‘ expression ‘)’ | func-call | identifier | constant func-call identifier ‘(‘ expr-list? ‘)’ expr-list expression (‘,’ expression)* and what about S if E then S (else S)?
28
Answers
29
substitution F ‘(‘ E ‘)’ | ID ‘(‘ expr-list? ‘)’ | ID | constant left factoring E E ( ‘+’ | ‘-’ ) T | T T T ( ‘*’ | ‘/’ ) F | F F ‘(‘ E ‘)’ | ID ( ‘(‘ expr-list? ‘)’ )? | constant left recursion removal E T (( ‘+’ | ‘-’ ) T )* T F (( ‘*’ | ‘/’ ) F )* if-then-else grammar is ambiguous
30
program text lexical analysis syntax analysis context handling annotated AST tokens AST parser generator language grammar automatic generation LL(1) push-down automaton
31
LL(1) push-down automaton stack right-hand side of production transition table state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression
32
LL(1) push-down automaton aap + ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression
33
LL(1) push-down automaton aap + ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression replace non-terminal by transition entry
34
LL(1) push-down automaton aap + ( noot + mies ) EOF expression EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression
35
LL(1) push-down automaton aap + ( noot + mies ) EOF expression EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression replace non-terminal by transition entry
36
LL(1) push-down automaton aap + ( noot + mies ) EOF term rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression
37
LL(1) push-down automaton aap + ( noot + mies ) EOF term rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression replace non-terminal by transition entry
38
LL(1) push-down automaton aap + ( noot + mies ) EOF IDENT rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression
39
LL(1) push-down automaton aap + ( noot + mies ) EOF IDENT rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression pop matching token
40
LL(1) push-down automaton + ( noot + mies ) EOF rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression
41
LL(1) push-down automaton + ( noot + mies ) EOF rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression
42
LL(1) push-down automaton + ( noot + mies ) EOF + expression EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression
43
LL(1) push-down automaton + ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression + expression EOF
44
LL(1) push-down automaton ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression expression EOF
45
LL(1) push-down automaton ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression expression EOF
46
LLgen top-down parser generator to be used in assignment #1 discussed in lecture 5
47
syntax analysis: tokens AST top-down parsing recursive descent push-down automaton making grammars LL(1) Summary program text lexical analysis syntax analysis context handling annotated AST tokens AST parser generator language grammar
48
Homework study sections: 1.10 closure algorithm 2.2.4.6 error handling in LL(1) parsers print handout for next week [blackboard] find a partner for the “practicum” register your group send e-mail to koen@pds.twi.tudelft.nl
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.