Download presentation
Presentation is loading. Please wait.
0
Compiler construction in4020 – course 2001/2002
week 1 Compiler construction in4020 – course 2001/2002 Koen Langendoen Delft University of Technology The Netherlands
1
Compiler construction 2002
week 1 Goals understand the structure of a compiler understand how the components operate understand the tools involved scanner generator, parser generator, etc. understanding means [theory] be able to read source code [practice] be able to adapt/write source code
2
Format: “werkcollege” + practicum
Compiler construction 2002 week 1 Format: “werkcollege” + practicum 14 x 2 hours of interactive lectures sp book “Modern Compiler Design” schedule: see blackboard handouts: see blackboard assignment sp groups of 2 students modify reference compiler oral exam sp
3
Compiler construction 2002
week 1 Homework find a partner for the “practicum” register your group send to
4
Compiler construction 2002
week 1 What is a compiler? program in some source language executable code for target machine compiler Ask audience first.
5
Compiler construction 2002
week 1 What is a compiler? program in some source language front-end analysis semantic represen- tation back-end synthesis compiler executable code for target machine
6
Why study compilerconstruction?
week 1 Why study compilerconstruction? curiosity better understanding of programming language concepts wide applicability transforming “data” is very common many useful data structures and algorithms practical application of “theory” Ask audience first.
7
Compiler construction 2002
week 1 Overview lecture 1 [introduction] compiler structure exercise min. break lexical analysis excercise
8
Compiler construction 2002
week 1 Compiler structure program in some source language front-end analysis executable code for target machine back-end synthesis L+M modules = LxM compilers program in some source language front-end analysis semantic represen- tation executable code for target machine back-end synthesis compiler Ask audience about disadvantages BEFORE next slide. executable code for target machine back-end synthesis
9
Limitations of modular approach
Compiler construction 2002 week 1 Limitations of modular approach performance generic vs specific loss of information variations must be small same programming paradigm similar processor architecture program in some source language front-end analysis semantic represen- tation executable code for target machine back-end synthesis compiler
10
Semantic representation
Compiler construction 2002 week 1 Semantic representation program in some source language executable code for target machine semantic represen- tation front-end analysis back-end synthesis compiler heart of the compiler intermediate code linked lists of pseudo instructions abstract syntax tree (AST)
11
Compiler construction 2002
week 1 AST example expression grammar expression expression ‘+’ term | expression ‘-’ term | term term term ‘*’ factor | term ‘/’ factor | factor factor identifier | constant | ‘(‘ expression ‘)’ example expression b*b – 4*a*c
12
Compiler construction 2002
week 1 parse tree: b*b – 4*a*c expression expression ‘-’ term term term factor ‘*’ term factor term factor identifier ‘*’ ‘*’ Ask if they notice anything redundant in the parse tree. (wat valt je op?) factor identifier factor identifier ‘c’ identifier ‘b’ constant ‘a’ ‘b’ ‘4’
13
Compiler construction 2002
week 1 AST: b*b – 4*a*c ‘-’ ‘*’ ‘*’ ‘b’ ‘b’ ‘*’ ‘c’ ‘4’ ‘a’
14
annotated AST: b*b – 4*a*c
Compiler construction 2002 week 1 annotated AST: b*b – 4*a*c type: real loc: reg1 ‘-’ type: real loc: reg1 type: real loc: reg2 ‘*’ ‘*’ type: real loc: sp+16 type: real loc: sp+16 type: real loc: reg2 type: real loc: sp+24 ‘b’ Colors denote types of the nodes. ‘b’ ‘*’ ‘c’ identifier constant term expression type: real loc: const type: real loc: sp+8 ‘4’ ‘a’
15
Compiler construction 2002
week 1 AST exercise (5 min.) expression grammar expression expression ‘+’ term | expression ‘-’ term | term term term ‘*’ factor | term ‘/’ factor | factor factor identifier | constant | ‘(‘ expression ‘)’ example expression b*b – (4*a*c) draw parse tree and AST
16
Compiler construction 2002
week 1 Answers
17
answer parse tree: b*b – 4*a*c
Compiler construction 2002 week 1 answer parse tree: b*b – 4*a*c expression expression ‘-’ term term term factor ‘*’ term factor term factor identifier ‘*’ ‘*’ factor identifier factor identifier ‘c’ identifier ‘b’ constant ‘a’ ‘b’ ‘4’
18
answer parse tree: b*b – (4*a*c)
Compiler construction 2002 week 1 answer parse tree: b*b – (4*a*c) expression expression ‘-’ term term factor term factor ‘*’ ‘(’ expression ‘)’ factor identifier identifier ‘b’ ‘4*a*c’ ‘b’
19
Compiler construction 2002
week 1 Break
20
front-end: from program text to AST
Compiler construction 2002 week 1 front-end: from program text to AST program text lexical analysis syntax analysis context handling annotated AST tokens AST front-end
21
front-end: from program text to AST
Compiler construction 2002 week 1 front-end: from program text to AST program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description parser generator language grammar
22
Compiler construction 2002
week 1 Lexical analysis covert stream of characters to stream of tokens what is a token? sequence of characters with a semantic notion, see language definition rule of thumb: two characters belong to the same token if inserting white space changes the meaning. digit = *ptr++ - ’0’; digit = *ptr+ + - ’0’; lex-i-cal: of or relating to words or the vocabulary of a language as distinguished from its grammar and construction Webster’s Dictionary
23
Compiler construction 2002
week 1 Lexical analysis covert stream of characters to stream of tokens what is a token? sequence of characters with a semantic notion, see language definition rule of thumb: two characters belong to the same token if inserting white space changes the meaning. digit = *ptr++ - ’0’; digit = *ptr+ + - ’0’;
24
Compiler construction 2002
week 1 Tokens attributes type lexeme value file position examples typedef struct { int class; char *repr; file_pos position; } Token_Type; type lexeme IDENTIFIER foo, t3, ptr NUMBER 15, 082, 666 REAL 1.2, .002, 1e6 IF if
25
Compiler construction 2002
week 1 Non-tokens white spaces spaces, tabs, newlines comments /* a C-style comment */ // a C++ comment preprocessor directives #include “lex.h” #define is_digit(d) (’0’ <= (d) && (d) <= ’9’) Q: what is special about the newline character? A: its representation depends on the operating system!
26
Compiler construction 2002
week 1 Regular expressions Basic patterns Matching x the character x . any character, usually except a newline [abcA-Z] any of the characters a,b,c and the range A-Z Repetition operators R? an R or nothing (= optionally an R) R* zero or more occurrences of R R+ one or more occurrences of R Composition operators R1 R2 an R1 followed by an R2 R1 | R2 either an R1 or an R2 Grouping ( R ) R itself
27
Examples of regular expressions
Compiler construction 2002 week 1 Examples of regular expressions an integer is a sequence of digits: [0-9]+ an identifier is a sequence of letters and digits; the first character must be a letter: [a-z][a-z0-9]*
28
Compiler construction 2002
week 1 Regular descriptions structuring regular expressions by introducing named sub expressions letter [a-zA-Z] digit [0-9] letter_or_digit letter | digit identifier letter letter_or_digit* define before use
29
Compiler construction 2002
week 1 Exercise (5 min.) write down regular descriptions for the following descriptions: an integral number is a non-zero sequence of digits optionally followed by a letter denoting the base class (b for binary and o for octal). a fixed-point number is an (optional) sequence of digits followed by a dot (’.’) followed by a sequence of digits. an identifier is a sequence of letters and digits; the first character must be a letter. The underscore _ counts as a letter, but may not be used as the first or last character.
30
Compiler construction 2002
week 1 Answers
31
Compiler construction 2002
week 1 Answers base [bo] integral_number digit+ base? dot \. fixed_point_number digit* dot digit+ letter [a-zA-Z] digit [0-9] underscore _ letter_or_digit letter | digit letter_or_digit_or_und letter_or_digit | underscore identifier letter (letter_or_digit_or_und* letter_or_digit+)? Gotcha: the dot character must be escaped by a backslash.
32
Compiler construction 2002
week 1 Lexical analysis covert stream of characters to stream of tokens tokens are defined by a regular description tokens are demanded one-by-one by the syntax analyzer get_next_token() program text lexical analyzer syntax analyzer AST tokens
33
Compiler construction 2002
week 1 interface extern Token_Type Token; /* Global variable that holds the current token. */ void start_lex(void); /* Must be called before the first call to * get_next_token(). void get_next_token(void); /* Load the next token into the global * variable Token. Q: why a global variable? A: syntax analyzer tries multiple alternatives + backwards compatibility (C could not return structs)
34
lexical analysis by hand
Compiler construction 2002 week 1 lexical analysis by hand read complete program text into memory for simplicity avoids buffering and arbitrary limits variable length tokens get_next_token() dispatches on the next character dot input: main() { printf( ”hello world\n”);}
35
Compiler construction 2002
week 1 void get_next_token(void) { int start_dot; skip_layout_and_comment(); /* now we are at the start of a token or at end-of-file, so: */ note_token_position(); /* split on first character of the token */ start_dot = dot; if (is_end_of_input(input_char)) { Token.class = EoF; Token.repr = "<EoF>"; return; } if (is_letter(input_char)) {recognize_identifier();} else if (is_digit(input_char)) {recognize_integer();} if (is_operator(input_char) || is_separator(input_char)) { Token.class = input_char; next_char(); else {Token.class = ERRONEOUS; next_char();} Token.repr = input_to_zstring(start_dot, dot-start_dot); Q: why must next_char() be invoked on an erroneous token? A: to avoid an endless loop.
36
Character classification & token recognition
Compiler construction 2002 week 1 Character classification & token recognition #define is_end_of_input(ch) ((ch) == '\0') #define is_layout(ch) (!is_end_of_input(ch) && (ch) <= ' ') #define is_uc_letter(ch) ('A' <= (ch) && (ch) <= 'Z') #define is_lc_letter(ch) ('a' <= (ch) && (ch) <= 'z') #define is_letter(ch) (is_uc_letter(ch) || is_lc_letter(ch)) #define is_digit(ch) ('0' <= (ch) && (ch) <= '9') #define is_letter_or_digit(ch) (is_letter(ch) || is_digit(ch)) #define is_underscore(ch) ((ch) == '_') #define is_operator(ch) (strchr("+-*/", (ch)) != NULL) #define is_separator(ch) (strchr(";,(){}", (ch)) != NULL) void recognize_integer(void) { Token.class = INTEGER; next_char(); while (is_digit(input_char)) {next_char();} }
37
Compiler construction 2002
week 1 Summary compiler is a structured toolbox front-end: program text annotated AST back-end: annotated AST executable code lexical analysis: program text tokens token specifications implementation by hand exercises AST regular descriptions
38
Compiler construction 2002
week 1 Next week Generating a lexical analyzer generic methods specific tool lex program text lexical analysis syntax analysis context handling annotated AST tokens AST scanner generator token description
39
Compiler construction 2002
week 1 Homework find a partner for the “practicum” register your group send to print handout lecture 2 [blackboard]
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.