Download presentation
Presentation is loading. Please wait.
Published byAntoine Yearby Modified over 9 years ago
1
Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands
2
Summary of lecture 1 compiler is a structured toolbox front-end: program text annotated AST back-end: annotated AST executable code lexical analysis: program text tokens token specifications implementation by hand program in some source language front-end analysis semantic represen- tation executable code for target machine back-end synthesis compiler
3
Quiz 2.7 What does the regular expression a?* mean? And a** ? Are these expressions erroneous? Are they ambiguous?
4
Overview Generating a lexical analyzer generic methods specific tool lex program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description
5
Token description (f)lex: scanner generator for UNIX token description C code format of the lex input file: definitions % rules % user code regular descriptions regular expressions + actions auxiliary C-code
6
Lex description to recognize integers an integer is a non-zero sequence of digits optionally followed by a letter denoting the base class (b for binary and o for octal). base [bo] integer digit+ base? rule = expr + action {} signal application of a description %{ #include "lex.h" %} base [bo] digit [0-9] % {digit}+ {base}? {return INTEGER;} %
7
Lex resulting C-code char yytext[]; /* token representation */ int yylex(void); /* returns type of next token */ wrapper function to add token attributes % \n {line_number++;} % void get_next_token(void) { Token.class = yylex(); if (Token.class == 0) { Token.class = EOF; Token.repr = " "; return; } Token.pos.line_number = line_number; Token.repr = strdup(yytext); }
8
automatic generation program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description finite state automaton S0 ‘.’ digit S2 digit S3 digit S1 ‘.’
9
Finite-state automaton Recognize input character by character Transfer between states FSA Initial state S0 set of accepting states transition function: State x Char State S0 ‘i’‘i’ S1S1S2S2 ‘f’‘f’
10
FSA examples integral_number [0-9]+ fixed_point_number [0-9]* ‘.’ [0-9]+ digit S0 ‘.’ digit S2 digit S3 digit S0 digit S1
11
Concurrent recognition integral_number [0-9]+ fixed_point_number [0-9]* ‘.’ [0-9]+ recognize both tokens in one pass digit S0 ‘.’ digit S2 digit S3 digit S0 digit S1
12
Concurrent recognition integral_number [0-9]+ fixed_point_number [0-9]* ‘.’ [0-9]+ naïve approach: merge initial states digit S0 ‘.’ digit S2 digit S3 digit S1
13
Concurrent recognition integral_number [0-9]+ fixed_point_number [0-9]* ‘.’ [0-9]+ correct approach: share common prefix transitions S0 ‘.’ digit S2 digit S3 digit S1 ‘.’
14
FSA implementation: transition table concurrent recognition of integers and fixed point numbers state character recognized token digitdotother S0 S1S2- S1 S2-integer S2 S3-- --fixed point S0 ‘.’ digit S2 digit S3 digit S1 ‘.’
15
FSA exercise (6 min.) draw an FSA to recognize integers base [bo] integer digit+ base? draw an FSA to recognize the regular expression (a|b)*bab
16
Answers
17
integer (a|b)*bab S2S3 S0S1 a b b b b a a a digit S0 digit S1 S2S2 [bo]
18
Break
19
Automatic generation: description FSA start with initial set (S0) of all token descriptions to be recognized for each character (ch) find the set (S ch ) of descriptions that can start with ch extend the FSA with transition (S0,ch, S ch ) repeat adding transitions (to S ch ) until no new set is generated
20
Dotted items keeping track of matched characters in a token description: T R regular expression input already matchedstill to be matched T
21
Types of dotted items shift item: dot in front of a basic pattern if ‘i’ ‘f’ if ‘i’ ‘f’ identifier [a-z] [a-z0-9]* reduce item: dot at the end if ‘i’ ‘f’ identifier [a-z] [a-z0-9]* non-basic item: dot in front of repeated pattern or parenthesis identifier [a-z] [a-z0-9]*
22
Character moves input T c c input c T c
23
Character moves T c T [class] T . input T c c input c T c c c class T c T . T [class]
24
moves T (R)? T (R)? T ( R)? T (R)* T (R)* T ( R)* T (R )* T (R)* T ( R)* T (R )? T (R)?
25
moves T (R)+ T ( R)+ T (R )+ T (R)+ T (R 1 |R 2 |…) T ( R 1 |R 2 |…) T (R 1 | R 2 |…) … T (R 1 |R 2 |…) T (R 1 |R 2 |…) …… … T ( R)+
26
FSA construction a state corresponds to a set of basic items a character move yields a new set expand non-basic items into basic items using moves see if the resulting set was produced before, if not introduce a new state add transition
27
Example FSA construction tokens integer: I (D)+ fixed-point: F (D)* ‘.’ (D)+ initial state I (D)+ F (D)* ‘.’ (D)+ I ( D)+ F ( D)* ‘.’ (D)+ F (D)* ‘.’ (D)+ S0 moves
28
F (D)* ‘.’ (D )+ F (D)* ‘.’ (D)+ F (D)* ‘.’ ( D)+ S3 D I (D )+ F (D )* ‘.’ (D)+ Example FSA construction character moves I ( D)+ F ( D)* ‘.’ (D)+ F (D)* ‘.’ (D)+ S0 F (D)* ‘.’ (D)+F (D)* ‘.’ ( D)+ I (D)+ I ( D)+ F (D )* ‘.’ (D)+ I (D)+ I ( D)+ F (D)* ‘.’ (D)+ F ( D)* ‘.’ (D)+ S1 D ‘.’ S2 ‘.’ D D
29
Exercise FSA construction (7 min.) draw the FSA (with item sets) for recognizing an identifier: identifier letter (letter_or_digit_or_und* letter_or_digit+)? extend the above FSA to recognize the keyword ‘if’ as well. if ‘i’ ‘f’
30
Answers
31
ID L (( LDU)* (LD)+)? ID L ((LDU)* ( LD)+)? S2 ID L ((LDU)* (LD)+)? S0 ID L ((LDU)* (LD)+)? ID L (( LDU)* (LD)+)? ID L ((LDU)* ( LD)+)? S1 L LD U U ‘i’ LD ‘f’ LD U U S3 S4 accepting states S1 and S4
32
Transition table compression state character recognized token ‘i’‘f’LDU S0S3S1 -- S2identifier S2S1 S2 S3S1S4S1 S2 S4S1 S2keyword if
33
Transition table compression redundant rows empty transitions state character recognized token ‘i’‘f’LDU S0S3S1 -- S2identifier S2S1 S2 S3S1S4S1 S2 S4S1 S2keyword if S1S4S1 S2S1 S2S1 S3 S0 S1 S2 S4 S3 row displacement
34
Summary: generating a lexical analyzer tool: lex token descriptions + actions wrapper interface FSA construction dotted items character moves moves program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description
35
Homework study sections 2.1.10 – 2.1.12 lexical identification of tokens symbol tables macro processing print handout lecture 3 [blackboard] find a partner for the “practicum” register your group send e-mail to koen@pds.twi.tudelft.nl
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.