CSc 453 Semantic Analysis Saumya Debray The University of Arizona Tucson.

Slides:



Advertisements
Similar presentations
CH4.1 Type Checking Md. Fahim Computer Engineering Department Jamia Millia Islamia (A Central University) New Delhi –
Advertisements

CPSC 388 – Compiler Design and Construction
Semantic Analysis and Symbol Tables
Intermediate Code Generation
Chapter 6 Type Checking. The compiler should report an error if an operator is applied to an incompatible operand. Type checking can be performed without.
Semantic Analysis Chapter 6. Two Flavors  Static (done during compile time) –C –Ada  Dynamic (done during run time) –LISP –Smalltalk  Optimization.
Programming Languages and Paradigms The C Programming Language.
1 Compiler Construction Intermediate Code Generation.
1 Error detection in LR parsing Errors are discovered when a slot in the action table is blank. Canonical LR(1) parsers detect and report the error as.
Compiler Construction
Compiler Principle and Technology Prof. Dongming LU Mar. 28th, 2014.
1 Beyond syntax analysis An identifier named x has been recognized. Is x a scalar, array or function? How big is x? If x is a function, how many and what.
Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,
1 Pertemuan 18 & 19 Semantic Analyzer (type checking) Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
1 Semantic Processing. 2 Contents Introduction Introduction A Simple Compiler A Simple Compiler Scanning – Theory and Practice Scanning – Theory and Practice.
Semantic analysis Enforce context-dependent language rules that are not reflected in the BNF, e.g.a function must have a return statement. Decorate AST.
1 Type Type system for a programming language = –set of types AND – rules that specify how a typed program is allowed to behave Why? –to generate better.
Chapter 2 A Simple Compiler
Cs164 Prof. Bodik, Fall Symbol Tables and Static Checks Lecture 14.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Symbol Table (  ) Contents Map identifiers to the symbol with relevant information about the identifier All information is derived from syntax tree -
Static checking and symbol table Chapter 6, Chapter 7.6 and Chapter 8.2 Static checking: check whether the program follows both the syntactic and semantic.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
CSc 453 Semantic Analysis Saumya Debray The University of Arizona Tucson.
1 Abstract Syntax Tree--motivation The parse tree –contains too much detail e.g. unnecessary terminals such as parentheses –depends heavily on the structure.
Saumya Debray The University of Arizona Tucson, AZ 85721
Semantic Analysis Legality checks –Check that program obey all rules of the language that are not described by a context-free grammar Disambiguation –Name.
Semantic Analysis CS 671 February 5, CS 671 – Spring The Compiler So Far Lexical analysis Detects inputs with illegal tokens –e.g.: main$
CSc 453 Intermediate Code Generation Saumya Debray The University of Arizona Tucson.
COMPILERS Semantic Analysis hussein suleman uct csc3005h 2006.
1 Semantic Analysis Aaron Bloomfield CS 415 Fall 2005.
Semantic Analysis1 Checking what parsers cannot.
COP4020 Programming Languages Semantics Prof. Xin Yuan.
Lesson 11 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
COMPILERS Symbol Tables hussein suleman uct csc3003s 2007.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
410/510 1 of 18 Week 5 – Lecture 1 Semantic Analysis Compiler Construction.
Programming Languages by Ravi Sethi Chapter 6: Groupings of Data and Operations.
CS 153: Concepts of Compiler Design October 5 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak
CS308 Compiler Principles Introduction to Yacc Fan Wu Department of Computer Science and Engineering Shanghai Jiao Tong University.
Chapter 2. Design of a Simple Compiler J. H. Wang Sep. 21, 2015.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
Review: Syntax directed translation. –Translation is done according to the parse tree. Each production (when used in the parsing) is a sub- structure of.
1 Static Checking and Type Systems Chapter 6 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Syntax-Directed Definitions and Attribute Evaluation Compiler Design Lecture (02/18/98) Computer Science Rensselaer Polytechnic.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Semantic Analysis II Type Checking EECS 483 – Lecture 12 University of Michigan Wednesday, October 18, 2006.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
CS412/413 Introduction to Compilers Radu Rugina Lecture 11: Symbol Tables 13 Feb 02.
6. Semantic Analysis. Semantic Analysis Phase – Purpose: compute additional information needed for compilation that is beyond the capabilities of Context-
Compiler Design Lecture 10 Semantic Analysis. int aintegers int a[2][3]array(2, array(3, integer)) int f(int, float, char) int x float x char  int Int.
CSE 420 Lecture Program is lexically well-formed: ▫Identifiers have valid names. ▫Strings are properly terminated. ▫No stray characters. Program.
Semantic Analysis Chapter 6. Two Flavors  Static (done during compile time) –C –Ada  Dynamic (done during run time) –LISP –Smalltalk  Optimization.
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
1 Static Checking and Type Systems Chapter 6 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 9 Ahmed Ezzat Semantic Analysis.
Lecture 9 Symbol Table and Attributed Grammars
Context-Sensitive Analysis
Constructing Precedence Table
CMPE 152: Compiler Design October 4 Class Meeting
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
CSE401 Introduction to Compiler Construction
Compiler Construction
Symbol Table 薛智文 (textbook ch#2.7 and 6.5) 薛智文 96 Spring.
Saumya Debray The University of Arizona Tucson, AZ 85721
Compiler Construction
Presentation transcript:

CSc 453 Semantic Analysis Saumya Debray The University of Arizona Tucson

CSc 453: Semantic Analysis2 Need for Semantic Analysis Not all program properties can be represented using context-free grammars. E.g.: “ variables must be declared before use ” is not a context-free property. Parsing context-sensitive grammars is expensive. As a pragmatic measure, compilers combine context-free and context-sensitive checking: ― Context-free parsing used to check “code shape;” ― Additional rules used to check context-sensitive aspects.

CSc 453: Semantic Analysis3 Syntax-Directed Translation Basic Idea: ― Associate information with grammar symbols using attributes. An attribute can represent any reasonable aspect of a program, e.g., character string, numerical value, type, memory location, etc. ― Use semantic rules associated with grammar productions to compute attribute values. A parse tree showing attribute values at each node is called an annotated parse tree. Implementation: Add code to parser to compute and propagate attribute values.

CSc 453: Semantic Analysis4 Example: Attributes for an Identifier name: character string (from scanner) scope: global, local, … ― if local: whether or not a formal parameter type: ― integer ― array: □ no. of dimensions □ upper and lower bound for each dimension □ type of elements ― struct: □ name and type of each field ― function: □ number and type of arguments (in order) □ type of returned value □ entry point in memory □ size of stack frame …

CSc 453: Semantic Analysis5 Types of Attributes Inherited attributes: An attribute is inherited at a parse tree node if its value is computed at a parent or sibling node. Synthesized attributes: An attribute is synthesized at a parse tree node if its value is computed at that node or one of its children.

CSc 453: Semantic Analysis6 Example: A Simple Calculator ProductionSemantic Rule E  E 1 + E 2 E. val = E 1. val  E 2. val E  E 1 * E 2 E. val = E 1. val  E 2. val E  (E 1 ) E. val = E 1. val E  intcon E. val = intcon. val

Issues in Semantic Analysis Maintaining (creating, accessing, updating) semantic information ― symbol tables Integrating semantic checking into the parser ― syntax trees Reasoning about semantic information ― type expressions CSc 453: Semantic Analysis7

8 MAINTAINING SEMANTIC INFORMATION

CSc 453: Semantic Analysis9 Symbol Tables Purpose: To hold information (i.e., attribute values) about identifiers that get computed at one point and used later. E.g.: type information: □ computed during parsing; □ used during type checking, code generation. Operations: □ create, delete a symbol table; □ insert, lookup an identifier Typical implementations: linked list, hash table.

Symbol Tables What does a symbol table entry look like? CSc 453: Semantic Analysis10 name type info info related to code generation (e.g., location) next/prev pointer(s) “xyz” e.g.: int array: element-type = char no. of elements = 10 function: argument types: … return type: …

Symbol Tables Information is filled in as it becomes available CSc 453: Semantic Analysis11 name type info info related to code generation (e.g., location) next/prev pointer(s) “xyz” filled in when processing declarations (during semantic checking) filled in during code generation

Symbol Tables What does a symbol table look like? CSc 453: Semantic Analysis12 symtab_hd “baz”“bas” “bar” “foo” Note: Any data structure that allows insertion and lookup based on the symbol name will do. E.g.: linked list, hash table, binary search tree, …

CSc 453: Semantic Analysis13 Managing Symbol Tables When looking up a name in a symbol table, we need to find the “appropriate” declaration. ― The scope rules of the language determine what is “appropriate.” ― Often, we want the most deeply nested declaration for a name. Implementation: for each new scope: push a new symbol table on entry; pop on exit (stack). ― implement symbol table stack as a linked list of symbol tables; newly declared identifiers go into the topmost symbol table. ― lookup: search the symbol table stack from the top downwards.

Managing Symbol Tables int foo, bar, bas; foo = … bar = … if (foo > bar) { int foo, bar, bas; bar = … bas = … while (bar != bas) { int foo, bar, bas; … } CSc 453: Semantic Analysis14 SCOPE 0 push symbol table for scope 0 symtbltop NULL

Managing Symbol Tables int foo, bar, bas; foo = … bar = … if (foo > bar) { int foo, bar, bas; bar = … bas = … while (bar != bas) { int foo, bar, bas; … } CSc 453: Semantic Analysis15 SCOPE 0 SCOPE 1 push symbol table for scope 1 symtbltop NULL

Managing Symbol Tables int foo, bar, bas; foo = … bar = … if (foo > bar) { int foo, bar, bas; bar = … bas = … while (bar != bas) { int foo, bar, bas; … } CSc 453: Semantic Analysis16 SCOPE 0 SCOPE 2 push symbol table for scope 2 symtbltop NULL symtbltop SCOPE 1

Managing Symbol Tables int foo, bar, bas; foo = … bar = … if (foo > bar) { int foo, bar, bas; bar = … bas = … while (bar != bas) { int foo, bar, bas; … } CSc 453: Semantic Analysis17 SCOPE 0 SCOPE 1 SCOPE 2 symtbltop NULL pop symbol table for scope 2 symtbltop

Managing Symbol Tables int foo, bar, bas; foo = … bar = … if (foo > bar) { int foo, bar, bas; bar = … bas = … while (bar != bas) { int foo, bar, bas; … } CSc 453: Semantic Analysis18 SCOPE 0 SCOPE 1 SCOPE 2 pop symbol table for scope 1 NULL symtbltop

Symbol Table Lookups In statically scoped languages (C, Java, …), each use of an identifier refers to the most deeply nested declaration enclosing that use At a use of an identifier, the symbol table is looked up to find its declaration: ― start at the symbol table most deeply nested scope (i.e., at the top of the symbol table stack) ― while not found: work down the symbol table stack, searching each symbol table in the stack CSc 453: Semantic Analysis19

Symbol Table Lookups int x, y; x = … y = … if (x > y) { float x = y ; y = 2.0 * x – 1.0; } else { x = y+1; } CSc 453: Semantic Analysis20

Symbol Table Lookups int foo, bar, bas; foo = … bar = … if (foo > bar) { int foo, bar, bas; bar = … bas = … while (bar != bas) { int foo, bar, bas; … } CSc 453: Semantic Analysis21 SCOPE 0 SCOPE 1 SCOPE 2 NULL

CSc 453: Semantic Analysis 22 INTEGRATING SEMANTIC CHECKING INTO THE PARSER

Digression: Syntax trees Syntax trees are an internal representation of the program ― created by the parser ― used by the code generator to generate code Each internal node represents an operation ― its children represent what it operates on

Syntax tree: Example Source code if (x > y+z) { y = x+1; } else { x = y*z; } Syntax tree CSc 453: Semantic Analysis24

Syntax tree: Example Source code if (x > y+z) { y = x+1; } else { x = y*z; } Syntax tree CSc 453: Semantic Analysis25

Syntax tree: Example Source code if (x > y+z) { y = x+1; } else { x = y*z; } Syntax tree CSc 453: Semantic Analysis26

Syntax tree: Example Source code if (x > y+z) { y = x+1; } else { x = y*z; } Syntax tree CSc 453: Semantic Analysis27

Syntax Trees What does a syntax tree node look like? CSc 453: Semantic Analysis28 + for if op2 op1 cond then else body incr cond init different types of nodes can have different numbers of children

Syntax Trees What does a syntax tree node look like? CSc 453: Semantic Analysis29 + for if op2 op1 cond then else body incr cond init node type child 0 child 1 … struct tn { enum nodetype ntype; struct tn *child0; struct tn *child1; … } nodetype  { Plus, If, While, For, … }

Syntax Trees Constructing syntax trees: (ignoring type checking for now) CSc 453: Semantic Analysis30 stmt:IF ‘(‘ exp ‘)’ stmt ELSE stmt{ $$ = mkTree(If, $3, $5, $7); } |WHILE ‘(‘ exp ‘)’ stmt{ $$ = mkTree(While, $3, $5, NULL); } … exp:exp ‘+’ exp{ $$ = mkTree(Plus, $1, $3, NULL); } |‘!’ expr{ $$ = mkTree(Not, $2, NULL, NULL); } … mkTree(): mallocs space for the node and fills in its fields

Integrating Semantic Checking Integrating type checking into syntax trees: CSc 453: Semantic Analysis31 + for if op2 op1 cond then else body incr cond init node type value type child 0 child 1 … struct tn { enum nodetype ntype; enum valuetype valtype; struct tn *child0; struct tn *child1; … } specifies the type of the value computed at this node at runtime ( t_None if the node has no value)

Integrating Semantic Checking exp:‘!’ exp{ node = mkTree(Not, $2, NULL, NULL); if ($2->valtype == t_Bool) { node->valtype = t_Bool; } else { ErrMsg(“line %d: illegal type …”); node->valtype = t_Error; } $$ = node; } CSc 453: Semantic Analysis32

Integrating Semantic Checking exp:exp ‘+’ exp{ node = mkTree(Plus, $1, $3, NULL); } if ($1->valtype == t_Int && $3->valtype == t_Int) { node->valtype = INT; } else if ($1->valtype == t_Int && $3->valtype == t_Float) { node->valtype = t_Float; } … else if ($1->valtype == t_Error || $3->valtype == t_Error) { node->valtype = t_Error; /* omit redundant error msg */ } else { ErrMsg(“line %d: illegal type …”) node->valtype = t_Error; } $$ = node; } CSc 453: Semantic Analysis33

Integrating Semantic Checking stmt:while ‘(‘ exp ‘)’ stmt{ node = mkTree(While, $3, $5, NULL); } if ($3->valtype == t_Bool) { node->valtype = t_None; } else if ($3->valtype == t_Error) { node->valtype = t_Error; /* omit error msg */ } else { ErrMsg(“line %d: illegal type …”) node->valtype = t_Error; } $$ = node; } CSc 453: Semantic Analysis34

CSc 453: Semantic Analysis 35 REASONING ABOUT SEMANTIC INFORMATION

CSc 453: Semantic Analysis36 Static Checking Static checking aims to ensure, at compile time, that syntactic and semantic constraints of the source language are obeyed. E.g.: ― Type checks : operators and operands must have compatible types. ― Flow-of-control checks : control transfer statements must have legitimate targets (e.g., break/continue statements). ― Uniqueness checks : a language may dictate unique occurrences in some situations, e.g., variable declarations, case labels in switch statements. These checks can often be integrated with parsing.

CSc 453: Semantic Analysis37 Data Types and Type Checking A data type is a set of values together with a set of operations that can be performed on them. Type checking aims to verify that operations in a program code are, in fact, permissible on their operand values. Reasoning about types: ― The language provides a set of base types and a set of type constructors; ― The compiler uses type expressions to represent types definable by the language.

CSc 453: Semantic Analysis38 Type Constructors and Type Expressions A type expression denotes (i.e., is a syntactic representation of) the type of a program entity: ― A base type is a type expression (e.g., boolean, char, int, float); ― A type name is a type expression; ― A type constructor applied to type expressions is a type expression, e.g.: □ arrays : if T is a type expression then so is array(T); □ records : if T 1, …, T n are type expressions and f 1, …, f n is a list of (unique) identifiers, then record(f 1 :T 1, …, f n :T n ) is a type expression; □ pointers : if T is a type expression then so is ptr(T); □ functions : if T, T 1, …, T n are type expressions, then so is (T 1, …, T n )  T.

CSc 453: Semantic Analysis39 Why use Type Expressions? char *X; char **f() { X = “GOTCHA!” return &X; } main() { printf(“%c\n”, (*f())[2]); /* legal??? */ } Program Code Type Expression Rule f ()  ptr(ptr(char)) symbol table lookup f() ptr(ptr(char)) if e : T 1  T 2 and e 1 : T 1 then e(e 1 ) : T 2 *f() ptr(char) if e : ptr(T) then *e : T *f() array(char) if e : ptr(T) then e : array(T) (*f()) array(char) if e : T then (e) : T 2 int base type (*f())[2] char if e 1 : array(T) and e 2 : int then e 1 [e 2 ] : T What about: qsort((void **)lptr,0,k,(int (*)(void*,void*))(num ? ncmp : strcmp));

CSc 453: Semantic Analysis40 Representing Type Expressions Type graphs: A graph-structured representation of type expressions: ― Basic types are given predefined “internal values”; ― Named types can be represented via pointers into a hash table. ― A composite type expression f ( T 1,…,T n ) is represented as a node identifying the constructor f and with pointers to the nodes for T 1, …, T n. E.g.: int x[10][20]:

CSc 453: Semantic Analysis41 Type Checking Expressions ProductionSemantic RuleYacc Code E → idE. type = id. type { $$ = symtab_lookup(id_name); } E → intconE. type = INTEGER{ $$ = INTEGER; } E → E 1 + E 2 E. type = result_type(E 1. type, E 2. type ){ $$ = result_type($1, $3); } /* arithmetic type conversions */ Type result_type(Type t1, Type t2) { if (t1 == error || t2 == error ) return error ; if (t1 == t2) return t1; if (t1 == double || t2 == double ) return double ; if (t1 == float || t2 == float ) return float ; … } Return types: currently: the type of the expression down the road: type location code to evaluate the expression

CSc 453: Semantic Analysis42 Operator Overloading Overloading refers to the use of the same syntax to refer to different operations, depending on the operand types. E.g.: in Java, ‘+’ can refer to integer addition, floating point addition, or string concatenation. The compiler uses operand type information to resolve the overloading, i.e., figure out which operation is actually referred to. If there is insufficient information to resolve overloading, the compiler may give an error.

Project-specific stuff CSc 453: Semantic Analysis43

CSc 453: Semantic Analysis44 Semantic Actions in Yacc Semantic actions are embedded in RHS of rules. An action consists of one or more C statements, enclosed in braces { … }. Examples: ident_decl : ID { symtbl_install( id_name ); } type_decl : type { tval = … } id_list;

CSc 453: Semantic Analysis45 Semantic Actions in Yacc: cont’d Each nonterminal can return a value. ― The value returned by the i th symbol on the RHS is denoted by $i. ― An action that occurs in the middle of a rule counts as a “symbol” for this. ― To set the value to be returned by a rule, assign to $$. By default, the value returned by a rule is the value of the first RHS symbol, i.e., $1.

CSc 453: Semantic Analysis46 Yacc: Declaring Return Value Types Declare the various kinds of values that may be returned: %union { symtab_ptr st_ptr; idlist_ptr idents; tree_node tn_ptr; int val; } Default return value for symbols is int. We may want other types of return values, e.g., symbol table pointers, syntax tree nodes. Specify return type for each grammar symbol: /* tokens: */ %token INTCON; /* nonterminals: */ %type ident; %type expr;

CSc 453: Semantic Analysis47 Semantic Actions in Yacc: Example 1 func : 1 type 2 { ret_type = $1; } 3 ID 4 { this_fn = symtbl_install(id_name); } 5 ′(′ 6 { scope = LOCAL; } 7 formals 8 ′)′ 9 ′{′ 10 decls 11 stmt 12 { this_fn  body = $11; } 13 ′}′ 14 { scope = GLOBAL; }

CSc 453: Semantic Analysis48 Semantic Actions in Yacc: Example 2 A simple calculator in Yacc: E : E ‘+’ E { $$ = $1 + $3; } E : E ‘*’ E { $$ = $1 * $3; } E :‘(‘ E ‘)’ { $$ = $2; } E : intcon { $$ = $1.val; }

CSc 453: Semantic Analysis49 Processing Declarations ProductionSemantic Rule decl  type varlist ‘;’ varlist. tval = type. tval ; varlist  var ‘,’ varlist 1 var. type = varlist. type ; varlist 1. type = varlist. type ; xxx : inherited yyy : synthesized

CSc 453: Semantic Analysis50 Processing Declarations: cont’d decl : type { tval = $1; } varlist ; varlist : var varlist | var ; var : ID opt_subscript { symtbl_insert($1, $2); } ;

CSc 453: Semantic Analysis51 Type Checking Expressions: cont’d Arrays: E → id [ E 1 ] { t 1 = id. type ; if ( t 1 == ARRAY  E 1. type == INTEGER) E. type = id. element_type ; else E. type = error ; }

CSc 453: Semantic Analysis52 Type Checking Expressions: cont’d Function calls: E → id ‘ ( ‘ expr_list ‘ ) ’ { if ( id. return_type == VOID) E. type = error ; else if ( chk_arg_types( id, expr_list) ) /* actuals match formals in number, type */ E. type = id. return_type ; else E. type = error ; }

CSc 453: Semantic Analysis53 Type Checking Statements Different kinds of statements have different type requirements. E.g.: ― if, while statements may require boolean conditiona; ― LHS of an assignment must be an “l-value”, i.e., something that can be assigned. ― LHS and RHS of an assignment must have “compatible” types. If they are of different types, conversion will be necessary.