Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.

Similar presentations


Presentation on theme: "Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands."— Presentation transcript:

1 Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands

2 Summary of lecture 1 compiler is a structured toolbox front-end: program text  annotated AST back-end: annotated AST  executable code lexical analysis: program text  tokens token specifications implementation by hand program in some source language front-end analysis semantic represen- tation executable code for target machine back-end synthesis compiler

3 Quiz 2.7 What does the regular expression a?* mean? And a** ? Are these expressions erroneous? Are they ambiguous?

4 Overview Generating a lexical analyzer generic methods specific tool lex program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description

5 Token description (f)lex: scanner generator for UNIX token description  C code format of the lex input file: definitions % rules % user code regular descriptions regular expressions + actions auxiliary C-code

6 Lex description to recognize integers an integer is a non-zero sequence of digits optionally followed by a letter denoting the base class (b for binary and o for octal). base  [bo] integer  digit+ base? rule = expr + action {} signal application of a description %{ #include "lex.h" %} base [bo] digit [0-9] % {digit}+ {base}? {return INTEGER;} %

7 Lex resulting C-code char yytext[]; /* token representation */ int yylex(void); /* returns type of next token */ wrapper function to add token attributes % \n {line_number++;} % void get_next_token(void) { Token.class = yylex(); if (Token.class == 0) { Token.class = EOF; Token.repr = " "; return; } Token.pos.line_number = line_number; Token.repr = strdup(yytext); }

8 automatic generation program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description finite state automaton S0 ‘.’ digit S2 digit S3 digit S1 ‘.’

9 Finite-state automaton Recognize input character by character Transfer between states FSA Initial state S0 set of accepting states transition function: State x Char  State S0 ‘i’‘i’ S1S1S2S2 ‘f’‘f’

10 FSA examples integral_number  [0-9]+ fixed_point_number  [0-9]* ‘.’ [0-9]+ digit S0 ‘.’ digit S2 digit S3 digit S0 digit S1

11 Concurrent recognition integral_number  [0-9]+ fixed_point_number  [0-9]* ‘.’ [0-9]+ recognize both tokens in one pass digit S0 ‘.’ digit S2 digit S3 digit S0 digit S1

12 Concurrent recognition integral_number  [0-9]+ fixed_point_number  [0-9]* ‘.’ [0-9]+ naïve approach: merge initial states digit S0 ‘.’ digit S2 digit S3 digit S1

13 Concurrent recognition integral_number  [0-9]+ fixed_point_number  [0-9]* ‘.’ [0-9]+ correct approach: share common prefix transitions S0 ‘.’ digit S2 digit S3 digit S1 ‘.’

14 FSA implementation: transition table concurrent recognition of integers and fixed point numbers state character recognized token digitdotother S0 S1S2- S1 S2-integer S2 S3-- --fixed point S0 ‘.’ digit S2 digit S3 digit S1 ‘.’

15 FSA exercise (6 min.) draw an FSA to recognize integers base  [bo] integer  digit+ base? draw an FSA to recognize the regular expression (a|b)*bab

16 Answers

17 integer (a|b)*bab S2S3 S0S1 a b b b b a a a digit S0 digit S1 S2S2 [bo]

18 Break

19 Automatic generation: description  FSA start with initial set (S0) of all token descriptions to be recognized for each character (ch) find the set (S ch ) of descriptions that can start with ch extend the FSA with transition (S0,ch, S ch ) repeat adding transitions (to S ch ) until no new set is generated

20 Dotted items keeping track of matched characters in a token description: T  R regular expression  input already matchedstill to be matched T    

21 Types of dotted items shift item: dot in front of a basic pattern if   ‘i’ ‘f’ if  ‘i’  ‘f’ identifier   [a-z] [a-z0-9]* reduce item: dot at the end if  ‘i’ ‘f’  identifier  [a-z] [a-z0-9]*  non-basic item: dot in front of repeated pattern or parenthesis identifier  [a-z]  [a-z0-9]*

22 Character moves input T    c  c input c T   c  

23 Character moves T    c  T    [class]  T   .  input T    c  c input c T   c   c c  class T   c   T  .   T   [class]  

24  moves T    (R)?   T   (R)?   T   (  R)?  T    (R)*   T   (R)*   T   (  R)*  T   (R  )*   T   (R)*   T   (  R)*  T   (R  )?   T   (R)?  

25  moves T    (R)+   T   (  R)+  T   (R  )+   T   (R)+   T    (R 1 |R 2 |…)   T   (  R 1 |R 2 |…)  T   (R 1 |  R 2 |…)  … T   (R 1  |R 2 |…)   T   (R 1 |R 2 |…)   …… … T   (  R)+ 

26 FSA construction a state corresponds to a set of basic items a character move yields a new set expand non-basic items into basic items using  moves see if the resulting set was produced before, if not introduce a new state add transition

27 Example FSA construction tokens integer: I  (D)+ fixed-point: F  (D)* ‘.’ (D)+ initial state I   (D)+ F   (D)* ‘.’ (D)+ I  (  D)+ F  (  D)* ‘.’ (D)+ F  (D)*  ‘.’ (D)+ S0  moves

28 F  (D)* ‘.’ (D  )+ F  (D)* ‘.’ (D)+  F  (D)* ‘.’ (  D)+ S3 D I  (D  )+ F  (D  )* ‘.’ (D)+ Example FSA construction character moves I  (  D)+ F  (  D)* ‘.’ (D)+ F  (D)*  ‘.’ (D)+ S0 F  (D)* ‘.’  (D)+F  (D)* ‘.’ (  D)+ I  (D)+  I  (  D)+ F  (D  )* ‘.’ (D)+ I  (D)+  I  (  D)+ F  (D)*  ‘.’ (D)+ F  (  D)* ‘.’ (D)+ S1 D ‘.’ S2 ‘.’ D D

29 Exercise FSA construction (7 min.) draw the FSA (with item sets) for recognizing an identifier: identifier  letter (letter_or_digit_or_und* letter_or_digit+)? extend the above FSA to recognize the keyword ‘if’ as well. if  ‘i’ ‘f’

30 Answers

31 ID  L ((  LDU)* (LD)+)? ID  L ((LDU)* (  LD)+)? S2 ID   L ((LDU)* (LD)+)? S0 ID  L ((LDU)* (LD)+)?  ID  L ((  LDU)* (LD)+)? ID  L ((LDU)* (  LD)+)? S1 L LD U U ‘i’ LD ‘f’ LD U U S3 S4 accepting states S1 and S4

32 Transition table compression state character recognized token ‘i’‘f’LDU S0S3S1 -- S2identifier S2S1 S2 S3S1S4S1 S2 S4S1 S2keyword if

33 Transition table compression redundant rows empty transitions state character recognized token ‘i’‘f’LDU S0S3S1 -- S2identifier S2S1 S2 S3S1S4S1 S2 S4S1 S2keyword if S1S4S1 S2S1 S2S1 S3 S0 S1 S2 S4 S3 row displacement

34 Summary: generating a lexical analyzer tool: lex token descriptions + actions wrapper interface FSA construction dotted items character moves  moves program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description

35 Homework study sections 2.1.10 – 2.1.12 lexical identification of tokens symbol tables macro processing print handout lecture 3 [blackboard] find a partner for the “practicum” register your group send e-mail to koen@pds.twi.tudelft.nl


Download ppt "Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands."

Similar presentations


Ads by Google