Compiler construction in4020 – lecture 5 Koen Langendoen Delft University of Technology The Netherlands.

Slides:



Advertisements
Similar presentations
Symbol Table.
Advertisements

Compiler construction in4020 – lecture 2 Koen Langendoen Delft University of Technology The Netherlands.
Semantic Analysis Chapter 6. Two Flavors  Static (done during compile time) –C –Ada  Dynamic (done during run time) –LISP –Smalltalk  Optimization.
Compiler construction in4020 – lecture 4 Koen Langendoen Delft University of Technology The Netherlands.
Chapter 3 Syntax Analysis
Compiler construction in4020 – lecture 10 Koen Langendoen Delft University of Technology The Netherlands.
Compilation (Semester A, 2013/14) Lecture 6b: Context Analysis (aka Semantic Analysis) Noam Rinetzky 1 Slides credit: Mooly Sagiv and Eran Yahav.
Joey Paquet, 2000, 2002, 2008, Lecture 7 Bottom-Up Parsing II.
Mooly Sagiv and Roman Manevich School of Computer Science
Recap Mooly Sagiv. Outline Subjects Studied Questions & Answers.
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter
Attribute Grammars Professor Yihjia Tsai Tamkang University.
Semantic analysis Enforce context-dependent language rules that are not reflected in the BNF, e.g.a function must have a return statement. Decorate AST.
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Compiler construction 2002
Compiler Summary Mooly Sagiv html://
Bottom-Up Syntax Analysis Mooly Sagiv & Greta Yorsh Textbook:Modern Compiler Design Chapter (modified)
Bottom-Up Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Implementation in C Chapter 3.
Table-driven parsing Parsing performed by a finite state machine. Parsing algorithm is language-independent. FSM driven by table (s) generated automatically.
Compiler construction in4020 – lecture 3 Koen Langendoen Delft University of Technology The Netherlands.
Parser construction tools: YACC
Syntax Analysis – Part II Quick Look at Using Bison Top-Down Parsers EECS 483 – Lecture 5 University of Michigan Wednesday, September 20, 2006.
Compilers: Yacc/7 1 Compiler Structures Objective – –describe yacc (actually bison) – –give simple examples of its use , Semester 1,
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
LEX and YACC work as a team
LR Parsing Compiler Baojian Hua
1 Semantic Analysis Aaron Bloomfield CS 415 Fall 2005.
Using the LALR Parser Generator yacc By J. H. Wang May 10, 2011.
Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Miscellaneous 컴파일러 입문.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
Basic Semantics Associating meaning with language entities.
–Writing a parser with YACC (Yet Another Compiler Compiler). Automatically generate a parser for a context free grammar (LALR parser) –Allows syntax direct.
4. Formal Grammars and Parsing and Top-down Parsing Chih-Hung Wang Compilers References 1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
1 Using Yacc. 2 Introduction Grammar –CFG –Recursive Rules Shift/Reduce Parsing –See Figure 3-2. –LALR(1) –What Yacc Cannot Parse It cannot deal with.
Compiler Principle and Technology Prof. Dongming LU Mar. 26th, 2014.
YACC. Introduction What is YACC ? a tool for automatically generating a parser given a grammar written in a yacc specification (.y file) YACC (Yet Another.
Compiler Principles Fall Compiler Principles Lecture 6: Parsing part 5 Roman Manevich Ben-Gurion University.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
PL&C Lab, DongGuk University Compiler Lecture Note, MiscellaneousPage 1 Yet Another Compiler-Compiler Stephen C. Johnson July 31, 1978 YACC.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
4. Bottom-up Parsing Chih-Hung Wang
Semantic Analysis II Type Checking EECS 483 – Lecture 12 University of Michigan Wednesday, October 18, 2006.
CS412/413 Introduction to Compilers Radu Rugina Lecture 11: Symbol Tables 13 Feb 02.
LECTURE 11 Semantic Analysis and Yacc. REVIEW OF LAST LECTURE In the last lecture, we introduced the basic idea behind semantic analysis. Instead of merely.
More yacc. What is yacc – Tool to produce a parser given a grammar – YACC (Yet Another Compiler Compiler) is a program designed to compile a LALR(1) grammar.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
YACC Primer CS 671 January 29, CS 671 – Spring Yacc Yet Another Compiler Compiler Automatically constructs an LALR(1) parsing table from.
YACC (Yet Another Compiler-Compiler) Chung-Ju Wu
Parser Generation Tools (Yacc and Bison) CS 471 September 24, 2007.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture Ahmed Ezzat.
YACC SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
Syntax error handling –Errors can occur at many levels lexical: unknown operator syntactic: unbalanced parentheses semantic: variable never declared runtime:
Announcements/Reading
Compiler Construction
Chapter 4 Syntax Analysis.
Context-free Languages
Bottom-Up Syntax Analysis
Bison: Parser Generator
Top-Down Parsing CS 671 January 29, 2008.
Bison Marcin Zubrowski.
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
Subject Name:Sysytem Software Subject Code: 10SCS52
Syntax-Directed Translation
5. Bottom-Up Parsing Chih-Hung Wang
Compiler Lecture Note, Miscellaneous
Compiler Structures 7. Yacc Objectives , Semester 2,
Compiler Design Yacc Example "Yet Another Compiler Compiler"
Presentation transcript:

Compiler construction in4020 – lecture 5 Koen Langendoen Delft University of Technology The Netherlands

syntax analysis: tokens  AST bottom-up parsing push-down automaton ACTION/GOTO tables LR(0)NO look-ahead SLR(1)one-token look-ahead, FOLLOW sets to solve shift-reduce conflicts LR(1)SLR(1), but FOLLOW set per item LALR(1)LR(1), but “equal” states are merged Summary of lecture 4

Quiz 2.50 Is the following grammar LR(0), SLR(1), LALR(1), or LR(1) ? (a) S  x S x | y (b) S  x S x | x

semantic analysis identification – symbol tables type checking assignment yacc LLgen Overview program text lexical analysis syntax analysis context handling annotated AST tokens AST parser generator language grammar

Semantic analysis information is scattered throughout the program identifiers serve as connectors find defining occurrence of each applied occurrence of an identifier in the AST undefined identifiers  error unused identifiers  warning check rules in the language definition type checking control flow (dead code) se-man-tic: of or relating to meaning in language. Webster’s Dictionary

Semantic analysis information is scattered throughout the program identifiers serve as connectors find defining occurrence of each applied occurrence of an identifier in the AST undefined identifiers  error unused identifiers  warning check rules in the language definition type checking control flow (dead code)

Symbol table global storage used by all compiler phases holds information about identifiers: type location size program text lexical analysis syntax analysis context handling annotated AST optimizations code generation executable symboltablesymboltable

Symbol table implementation extensible string-indexable array linear list tree hash table next name type … next name type … next name type … ”mies””noot” ”aap” bucket 0 bucket 1 bucket 2 bucket 3 hash function: string  int

Identification different kinds of identifiers variables type names field selectors name spaces scopes typedef int i; int j; void foo(int j) { struct i {i i;} i; i: i.i = 3; printf( "%d\n", i.i); }

Identification different kinds of identifiers variables type names field selectors name spaces scopes typedef int i; int j; void foo(int j) { struct i {i i;} i; i: i.i = 3; printf( "%d\n", i.i); }

Handling scopes stack of scope elements when entering a scope a new element is pushed on the stack declared identifiers are entered in the top scope element applied identifiers are looked up in the scope elements from top to bottom the top element is removed upon scope exit

A scoped hash-based symbol table ”mies” decl … bucket 0 bucket 1 bucket 2 bucket 3 ”aap” decl … ”noot” decl … aap( int noot) { int mies, aap;.... } prop level scope stack hash table

Identification: complications overloading operators: N*2 prijs* functions: PUT(s:STRING) PUT(i:INTEGER) solution: yield set of possibilities (to be constrained by type checking) imported scopes C++ scope resolution operator x:: Modula FROM module IMPORT... solution: stack (or merge) the new scope

Type checking operators and functions impose restrictions on the types of the arguments types basic types structured types type names typedef struct { double re; double im; } complex;

Forward declarations recursive data structures type information must be stored type table type information must be resolved undefined types circularities TYPE Tree = POINTER TO Node; Type Node = RECORD element : Integer; left, right : Tree; END RECORD;

Type equivalence name equivalence [all types get a unique name] VAR a : ARRAY [Integer 1..10] OF Real; VAR b : ARRAY [Integer 1..10] OF Real; structural equivalence [difficult to check] TYPE c = RECORD i : Integer; p : POINTER TO c; END RECORD; TYPE d = RECORD i : Integer; p : POINTER TO RECORD i : Integer; p : POINTER to c; END RECORD;

Coercions implicit data and type conversion to match operand (argument) type coercions complicate identification (ambiguity) two phase approach expand a type to a set by applying coercions reduce type sets based on constraints imposed by (overloaded) operators and language semantics VAR a : Real;... a := 5;

Variable: value or location? two usages of variables rvalue: value lvalue: location insert coercion to dereference variable checking rules: VAR p : Real; VAR q : Real;... p := q; found expected lvaluervalue lvalue-deref rvalueERROR- := (location of) p deref (location of) q

Exercise (5 min.) complete the table expression construct result kind (lvalue/rvalue) constantrvalue identifier &lvalue *rvalue V[rvalue] V.selector rvalue + rvalue lvalue = rvalue V stands for lvalue or rvalue

Answers complete table expression construct result kind (lvalue/rvalue) constantrvalue identifier (variable)lvalue identifier (otherwise)rvalue &lvaluervalue *rvaluelvalue V[rvalue]V V.selectorV rvalue + rvaluervalue lvalue = rvaluervalue V stands for lvalue or rvalue

Break

Assignment (practicum) Asterix compiler 1) replace yacc by LLgen 2) make Asterix object-oriented classes and objects inheritance and dynamic binding C-code token description Asterix grammar yacc lex lexical analysis syntax analysis context handling code generation Asterix program

Yet another compiler compiler yacc (bison): parser generator for UNIX LALR(1) grammar  C code format of the yacc input file: definitions % rules % user code tokens + properties grammar rules + actions auxiliary C-code

Yacc-based expression interpreter input file yacc maintains a stack of “values” that may be referenced ($i) in the semantic actions %token DIGIT % line : expr '\n' { printf("%d\n", $1);} ; expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ; % grammarsemantics

Yacc interface to lexical analyzer yacc invokes yylex() to get the next token the “value” of a token must be stored in the global variable yylval the default value type is int, but can be changed % yylex() { int c; c = getchar(); if (isdigit(c)) { yylval = c - '0'; return DIGIT; } return c; }

Yacc interface to back-end yacc generates a function named yyparse() syntax errors are reported by invoking a callback function yyerror() % yylex() {... } main() { yyparse(); } yyerror() { printf("syntax error\n"); exit(1); }

Yacc-based expression interpreter input file (desk0) run yacc % line : expr '\n' { printf("%d\n", $1);} ; expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ; % > make desk0 bison -v desk0.y desk0.y contains 4 shift/reduce conflicts. gcc -o desk0 desk0.tab.c >

Conflict resolution in Yacc shift-reduce: prefer shift reduce-reduce: prefer the rule that comes first

Conflict resolution in Yacc shift-reduce: prefer shift reduce-reduce: prefer the rule that comes first >cat desk0.output State 9 contains 2 shift/reduce conflicts. State 10 contains 2 shift/reduce conflicts. Grammar Number, Line, Rule 1 9 line -> expr '\n' 2 11 expr -> expr '+' expr 3 12 expr -> expr '*' expr 4 13 expr -> '(' expr ')' 5 14 expr -> DIGIT

Conflict resolution in Yacc shift-reduce: prefer shift reduce-reduce: prefer the rule that comes first state 9 expr -> expr. '+' expr (rule 2) expr -> expr '+' expr. (rule 2) expr -> expr. '*' expr (rule 3) '+' shift, and go to state 6 '*' shift, and go to state 7 '+' [reduce using rule 2 (expr)] '*' [reduce using rule 2 (expr)] $default reduce using rule 2 (expr)

Yacc-based expression interpreter input file (desk0) run yacc run desk0, is it correct? % line : expr '\n' { printf("%d\n", $1);} ; expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ; % > desk0 2* NO

Operator precedence in Yacc priority from top (low) to bottom (high) %token DIGIT %left '+' %left '*' % line : expr '\n' { printf("%d\n", $1);} ; expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ; %

Exercise (7 min.) multiple lines: % lines: line | lines line ; line : expr '\n' { printf("%d\n", $1);} ; expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ; % Extend the interpreter to a desk calculator with registers named a – z. Example input: v=3*(w+4)

Answers

%{ int reg[26]; %} %token DIGIT %token REG %right '=' %left '+' %left '*' % expr : REG '=' expr { $$ = reg[$1] = $3;} | expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | REG { $$ = reg[$1];} | DIGIT ; %

Answers % yylex() { int c = getchar(); if (isdigit(c)) { yylval = c - '0'; return DIGIT; } else if ('a' <= c && c <= 'z') { yylval = c - 'a'; return REG; } return c; }

LLgen: LL(1) parser generator LLgen is part of the Amsterdam Compiler Kit takes LL(1) grammar + semantic actions in C and generates a recursive descent parser LLgen features: repetition operators advanced error handling parameter passing control over semantic actions dynamic conflict resolvers

LLgen example: expression interpreter start from LR(1) grammar make grammar LL(1) use repetition operators lines : line | lines line ; line : expr '\n' ; expr : expr '+' expr | expr '*' expr | '(' expr ')‘ | DIGIT ; yacc left recursion operator precedence

LLgen example: expression interpreter start from LR(1) grammar make grammar LL(1) use repetition operators left recursion operator precedence %token DIGIT; main : [line]+ ; line : expr '\n' ; expr : term [ '+' term ]* ; term : factor [ '*' factor ]* ; factor : '(' expr ')‘ | DIGIT ; LLgen add semantic actions attach parameters to grammar rules insert C-code between the symbols

main : [line]+ ; line {int e;} : expr(&e) '\n' { printf("%d\n", e);} ; expr(int *e) {int t;} : term(e) [ '+' term(&t) { *e += t;} ]* ; term(int *t) {int f;} : factor(t) [ '*' factor(&f) { *t *= f;} ]* ; factor(int *f) : '(' expr(f) ')' | DIGIT { *f = yylval;} ; grammar semantics

main : [line]+ ; line {int e;} : expr(&e) '\n' { printf("%d\n", e);} ; expr(int *e) {int t;} : term(e) [ '+' term(&t) { *e += t;} ]* ; term(int *t) {int f;} : factor(t) [ '*' factor(&f) { *t *= f;} ]* ; factor(int *f) : '(' expr(f) ')' | DIGIT { *f = yylval;} ; values/results passed as parameters grammar semantics

main : [line]+ ; line {int e;} : expr(&e) '\n' { printf("%d\n", e);} ; expr(int *e) {int t;} : term(e) [ '+' term(&t) { *e += t;} ]* ; term(int *t) {int f;} : factor(t) [ '*' factor(&f) { *t *= f;} ]* ; factor(int *f) : '(' expr(f) ')' | DIGIT { *f = yylval;} ; semantic actions: C code between {} grammar semantics

LLgen interface to lexical analyzer by default LLgen invokes yylex() to get the next token the “value” of a token can be stored in any global variable (yylval) of any type (int) yylex() { int c; c = getchar(); if (isdigit(c)) { yylval = c - '0'; return DIGIT; } return c; }

LLgen interface to back-end LLgen generates a user-named function (parse) LLgen handles syntax errors by inserting missing tokens and deleting unexpected tokens LLmessage() is invoked to notify the lexical analyzer %start parse, main; LLmessage(int class) { switch (class) { case -1: printf("expecting EOF, "); case 0: printf("deleting token (%d)\n",LLsymb); break; default: /* push back token LLsymb */ printf("inserting token (%d)\n",class); break; }

Exercise (5 min.) extend LLgen-based interpreter to a desk calculator with registers named a – z. Example input: v=3*(w+4)

Answers

%token REG; expr(int *e) {int r,t;} : %if (ahead() == '=') reg(&r) '=' expr(e) { reg[r] = *e;} | term(e) [ '+' term(&t) { *e += t;} ]* ; factor(int *f) {int r;} : '(' expr(f) ')' | DIGIT { *f = yylval;} | reg(&r) { *f = reg[r];} ; reg(int *r) : REG { *r = yylval;} ;

Answers dynamic conflict resolution %token REG; expr(int *e) {int r,t;} : %if (ahead() == '=') reg(&r) '=' expr(e) { reg[r] = *e;} | term(e) [ '+' term(&t) { *e += t;} ]* ; factor(int *f) {int r;} : '(' expr(f) ')' | DIGIT { *f = yylval;} | reg(&r) { *f = reg[r];} ; reg(int *r) : REG { *r = yylval;} ;

Homework study sections: LLgen yacc assignment 1: replace yacc with LLgen deadline April 9 08:59 print handout for next week [blackboard]