Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands.

Similar presentations


Presentation on theme: "Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands."— Presentation transcript:

1 Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands

2 Summary of lecture 2 lexical analyzer generator description  FSA FSA construction dotted items character moves  moves program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description

3 Quiz 2.24 Will the function identify() (to access a symbol table) still work if the hash function maps all identifiers onto the same number? 2.26 Tutor X insists that macro processing must be implemented as a separate phase between reading the program and the lexical analysis. Show mr. X wrong.

4 syntax analysis: tokens  AST AST construction by hand: recursive descent automatic: top-down (LLgen), bottom-up (yacc) Overview program text lexical analysis syntax analysis context handling annotated AST tokens AST parser generator language grammar

5 Syntax analysis parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together. result: parse tree ‘b’ identifier expression term factor term ‘b’ factor identifier ‘4’ constant term factor term ‘a’ factor identifier term factor identifier expression ‘c’‘c’ ‘*’‘*’ ‘-’‘-’ ‘*’‘*’ ‘*’‘*’ syn-tax: the way in which words are put together to form phrases, clauses, or sentences. Webster’s Dictionary

6 Syntax analysis parsing: given a context-free grammar and a stream of tokens, find the derivation that ties them together. result: parse tree ‘b’ identifier expression term factor term ‘b’ factor identifier ‘4’ constant term factor term ‘a’ factor identifier term factor identifier expression ‘c’‘c’ ‘*’‘*’ ‘-’‘-’ ‘*’‘*’ ‘*’‘*’

7 Context free grammar G = (V N, V T, S, P) V N : set of Non-terminal symbols V T : set of Terminal symbols S : Start symbol (S  V N ) P : set of Production rules {N   } V N  V T =  P = {N   | N  V N    (V N  V T )*}

8 Top-down parsing process tokens left to right expression grammar input  expression EOF expression  term rest_expression term  IDENTIFIER | ‘(’ expression ‘)’ rest_expression  ‘+’ expression |  example expression input expressionEOFtermrest_expressionIDENTIFIER aap + ( noot + mies )

9 rest_expressionexpression rest_expr term Bottom-up parsing process tokens left to right expression grammar input  expression EOF expression  term rest_expression term  IDENTIFIER | ‘(’ expression ‘)’ rest_expression  ‘+’ expression |  IDENT IDENT IDENT aap + ( noot + mies ) 

10 Comparison top-downbottom-up node creationpre-orderpost-order alternative selectionfirst tokenlast token grammar typerestricted LL(1) relaxed LR(1) implementationmanual + automatic automatic 5 1 4 2 3 1 2 3 4 5

11 Recursive descent parsing each rule N translates to a boolean function return true if a terminal production of N was matched return false otherwise (without consuming any token) try alternatives of N in turn a terminal symbol must match the current token a non-terminal is matched by calling its routine input  expression EOF int input(void) { return expression() && require(token(EOF)); }

12 Recursive descent parsing expression  term rest_expression int expression(void) { return term() && require(rest_expression()); } term  IDENTIFIER | ‘(’ expression ‘)’ int term(void) { return token(IDENTIFIER) || token('(') && require(expression()) && require(token(')')); }

13 Recursive descent parsing auxiliary functions consume matched tokens report syntax errors int token(int tk) { if (Token.class != tk) return 0; get_next_token(); return 1; } int require(int found) { if (!found) error(); return 1; } rest_expression  ‘+’ expression |  int rest_expression(void) { return token('+') && require(expression()) || 1; }

14 Automatic top-down parsing follow recursive descent scheme, but avoid interpretation overhead for each rule and alternative determine the tokens it can start with: FIRST set parsing scheme for rule N  A 1 | A 2 | … if token  FIRST(N) then ERROR if token  FIRST(A 1 ) then parse A 1 if token  FIRST(A 2 ) then parse A 2 …

15 Exercise (7 min.) design an algorithm to compute the FIRST sets of all non-terminals in a context free grammar. hint: consider the types of rules alternatives composition empty productions input  expression EOF expression  term rest_expression term  IDENTIFIER | ‘(’ expression ‘)’ rest_expression  ‘+’ expression | 

16 Answers

17 Answers (Fig 2.58, page 122) N  w (w  V T ) FIRST(N) = {w} N  A 1 | A 2 | … FIRST(N) =  FIRST(A i ) N   FIRST(N) = {  } N  A  FIRST(N) = FIRST(A ),   FIRST(A) FIRST(N) = FIRST(A ) \ {  }  FIRST(  ), otherwise closure algorithm

18 Break

19 Predictive parsing similar to recursive descent, but no back-tracking functions “know” what they are doing input  expression EOF FIRST(expression) = {IDENT, ‘(‘} void input(void) { switch (Token.class) { case IDENT: case '(': expression(); token(EOF); break; default: error(); } void token(int tk) { if (Token.class != tk) error(); get_next_token(); }

20 Predictive parsing expression  term rest_expression FIRST(term) = {IDENT, ‘(‘} void expression(void) { switch (Token.class) { case IDENT: case '(': term(); rest_expression(); break; default: error(); } term  IDENTIFIER | ‘(’ expression ‘)’ void term(void) { switch (Token.class) { case IDENT: token(IDENT); break; case '(': token('('); expression(); token(')'); break; default: error(); }

21 Predictive parsing FIRST(  ) = {  } check nothing? NO: token  FOLLOW(rest_expr) rest_expression  ‘+’ expression |  FIRST(rest_expr) = {‘+’,  } void rest_expression(void) { switch (Token.class) { case '+': token('+'); expression(); break; case EOF: case ')': break; default: error(); } FOLLOW(rest_expr) = {EOF, ‘)’}

22 Limitations of LL(1) parsers FIRST/FIRST conflict term  IDENTIFIER | IDENTIFIER ‘[‘ expression ‘]’ | ‘(’ expression ‘)’ FIRST/FOLLOW conflict S  A ‘a’ ‘b’ A  ‘a’ |  left recursion expression  expression ‘-’ term | term FIRST(A) = { ‘a’ } = FOLLOW(A)

23 Making grammars LL(1) manual labour rewrite grammar adjust semantic actions three rewrite methods left factoring substitution left-recursion removal

24 Left factoring term  IDENTIFIER | IDENTIFIER ‘[‘ expression ‘]’ factor out common prefix term  IDENTIFIER after_identifier after_identifier   | ‘[‘ expression ‘]’ ‘[’  FOLLOW(after_identifier)

25 Substitution S  A ‘a’ ‘b’ A  ‘a’ |  replace non-terminal by its alternative S  ‘a’ ‘a’ ‘b’ | ‘a’ ‘b’

26 Left-recursion removal N  N  |  replace by N   M M   M |  example expression  expression ‘-’ term | term          ... expression  term tail tail  ‘-’ term tail |  N 

27 Exercise (7 min.) make the following grammar LL(1) expression  expression ‘+’ term | expression ‘-’ term | term term  term ‘*’ factor | term ‘/’ factor | factor factor  ‘(‘ expression ‘)’ | func-call | identifier | constant func-call  identifier ‘(‘ expr-list? ‘)’ expr-list  expression (‘,’ expression)* and what about S  if E then S (else S)?

28 Answers

29 substitution F  ‘(‘ E ‘)’ | ID ‘(‘ expr-list? ‘)’ | ID | constant left factoring E  E ( ‘+’ | ‘-’ ) T | T T  T ( ‘*’ | ‘/’ ) F | F F  ‘(‘ E ‘)’ | ID ( ‘(‘ expr-list? ‘)’ )? | constant left recursion removal E  T (( ‘+’ | ‘-’ ) T )* T  F (( ‘*’ | ‘/’ ) F )* if-then-else grammar is ambiguous

30 program text lexical analysis syntax analysis context handling annotated AST tokens AST parser generator language grammar automatic generation LL(1) push-down automaton

31 LL(1) push-down automaton stack right-hand side of production transition table state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression 

32 LL(1) push-down automaton aap + ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression 

33 LL(1) push-down automaton aap + ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression  replace non-terminal by transition entry

34 LL(1) push-down automaton aap + ( noot + mies ) EOF expression EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression 

35 LL(1) push-down automaton aap + ( noot + mies ) EOF expression EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression  replace non-terminal by transition entry

36 LL(1) push-down automaton aap + ( noot + mies ) EOF term rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression 

37 LL(1) push-down automaton aap + ( noot + mies ) EOF term rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression  replace non-terminal by transition entry

38 LL(1) push-down automaton aap + ( noot + mies ) EOF IDENT rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression 

39 LL(1) push-down automaton aap + ( noot + mies ) EOF IDENT rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression  pop matching token

40 LL(1) push-down automaton + ( noot + mies ) EOF rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression 

41 LL(1) push-down automaton + ( noot + mies ) EOF rest-expr EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression 

42 LL(1) push-down automaton + ( noot + mies ) EOF + expression EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression 

43 LL(1) push-down automaton + ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression  + expression EOF

44 LL(1) push-down automaton ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression  expression EOF

45 LL(1) push-down automaton ( noot + mies ) EOF input prediction stack state (top of stack) look-ahead token IDENT+()EOF input expression EOF expression term rest-expr term IDENT( expression ) rest-expr + expression  expression EOF

46 LLgen top-down parser generator to be used in assignment #1 discussed in lecture 5

47 syntax analysis: tokens  AST top-down parsing recursive descent push-down automaton making grammars LL(1) Summary program text lexical analysis syntax analysis context handling annotated AST tokens AST parser generator language grammar

48 Homework study sections: 1.10 closure algorithm 2.2.4.6 error handling in LL(1) parsers print handout for next week [blackboard] find a partner for the “practicum” register your group send e-mail to koen@pds.twi.tudelft.nl


Download ppt "Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands."

Similar presentations


Ads by Google