Theory of Compilation 236360 Erez Petrank Lecture 5: Semantic Analysis: 1.

Slides:



Advertisements
Similar presentations
CH4.1 Type Checking Md. Fahim Computer Engineering Department Jamia Millia Islamia (A Central University) New Delhi –
Advertisements

CPSC 388 – Compiler Design and Construction
Semantic Analysis and Symbol Tables
Semantics Static semantics Dynamic semantics attribute grammars
Attribute Grammars Prabhaker Mateti ACK: Assembled from many sources.
Semantic Analysis Chapter 6. Two Flavors  Static (done during compile time) –C –Ada  Dynamic (done during run time) –LISP –Smalltalk  Optimization.
Chapter 5 Syntax-Directed Translation. Translation of languages guided by context-free grammars. Attach attributes to the grammar symbols. Values of the.
1 Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University Modern Compiler Design.
Lecture 08a – Backpatching & Recap Eran Yahav 1 Reference: Dragon 6.2,6.3,6.4,6.6.
1 Compiler Construction Intermediate Code Generation.
Chapter 5 Syntax Directed Translation. Outline Syntax Directed Definitions Evaluation Orders of SDD’s Applications of Syntax Directed Translation Syntax.
Compilation (Semester A, 2013/14) Lecture 6b: Context Analysis (aka Semantic Analysis) Noam Rinetzky 1 Slides credit: Mooly Sagiv and Eran Yahav.
Winter Compiler Construction T7 – semantic analysis part II type-checking Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv.
Lecture 05 – Syntax analysis & Semantic Analysis Eran Yahav 1.
Compiler Construction
Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,
Lecture 06 – Semantic Analysis Eran Yahav 1. 2 You are here Executable code exe Source text txt Compiler Lexical Analysis Syntax Analysis Parsing Semantic.
Theory of Compilation Erez Petrank Lecture 4: Semantic Analysis: 1.
Semantic analysis Enforce context-dependent language rules that are not reflected in the BNF, e.g.a function must have a return statement. Decorate AST.
Syntax-Directed Translation
Chapter 2 A Simple Compiler
Cs164 Prof. Bodik, Fall Symbol Tables and Static Checks Lecture 14.
Abstract Syntax Trees Lecture 14 Wed, Mar 3, 2004.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Chapter 5 Syntax-Directed Translation Section 0 Approaches to implement Syntax-Directed Translation 1、Basic idea Guided by context-free grammar (Translating.
CSc 453 Semantic Analysis Saumya Debray The University of Arizona Tucson.
Syntax-Directed Translation
1 Semantic Analysis Aaron Bloomfield CS 415 Fall 2005.
COP4020 Programming Languages Semantics Prof. Xin Yuan.
Lesson 11 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
410/510 1 of 18 Week 5 – Lecture 1 Semantic Analysis Compiler Construction.
CS 363 Comparative Programming Languages Semantics.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
Review: Syntax directed translation. –Translation is done according to the parse tree. Each production (when used in the parsing) is a sub- structure of.
1 Syntax-Directed Translation Part I Chapter 5 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Semantic Analysis II. Messages Please check lecturer notices in the Moodle Appeals  Legitimate: “I don’t have the bug you mentioned…”  Illegitimate:
Syntax Directed Definition and Syntax directed Translation
UNIT – 5 SYNTAX-DIRECTED TRANSLATION
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Syntax-Directed Definitions and Attribute Evaluation Compiler Design Lecture (02/18/98) Computer Science Rensselaer Polytechnic.
Semantic Analysis II Type Checking EECS 483 – Lecture 12 University of Michigan Wednesday, October 18, 2006.
Compilation Lecture 4: Syntax Analysis Bottom Up parsing Noam Rinetzky 1.
1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator.
Lecture 05 – Semantic Analysis Eran Yahav 1. 2 You are here Executable code exe Source text txt Compiler Lexical Analysis Syntax Analysis Parsing Semantic.
CSE 420 Lecture Program is lexically well-formed: ▫Identifiers have valid names. ▫Strings are properly terminated. ▫No stray characters. Program.
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
Chapter4 Syntax-Directed Translation Introduction : 1.In the lexical analysis step, each token has its attribute , e.g., the attribute of an id is a pointer.
Lecture 9 Symbol Table and Attributed Grammars
Compilation /15a Lecture 6
Context-Sensitive Analysis
A Simple Syntax-Directed Translator
Constructing Precedence Table
Syntax Analysis: Bottom-Up Parsing
Abstract Syntax Trees Lecture 14 Mon, Feb 28, 2005.
Ch. 4 – Semantic Analysis Errors can arise in syntax, static semantics, dynamic semantics Some PL features are impossible or infeasible to specify in grammar.
Syntax-Directed Translation Part I
Syntax-Directed Translation Part I
Syntax-Directed Translation Part I
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Syntax-Directed Translation Part I
SYNTAX DIRECTED DEFINITION
Compiler Construction
Syntax-Directed Translation Part I
Compiler Construction
Presentation transcript:

Theory of Compilation Erez Petrank Lecture 5: Semantic Analysis: 1

This talk slides have dark background 2 (image source:

3

4

You are here 5 Executable code Executable code exe Source text Source text txt Compiler Lexical Analysis Syntax Analysis Parsing Semantic Analysis Inter. Rep. (IR) Inter. Rep. (IR) Code Gen. characterstokens AST ( Abstract Syntax Tree) Annotated AST

What we want 6 Lexical analyzer Potato potato; Carrot carrot; x = tomato + potato + carrot,,,,,EOF Parser tomato is undefined, potato used before initialized, cannot add Potato and Carrot. symbolkindtypepropert ies xvar? tomatovar? potatovarPotato carrotvarCarrot Location. id=tomato Add Location. Id=potato Location. Id=carrot

Syntax vs. Semantics  Syntax  Program structure  Formally described via context free grammars  Semantics  Program meaning  Formally defined as various forms of semantics (e.g., operational, denotational)  It is actually NOT what “semantic analysis” phase does  Better name – “contextual analysis” 7

Contextual Analysis  Often called “Semantic analysis”  Properties that cannot be formulated via CFG  Declare before use  Even identifying the same word “w” re-appearing {wbw} is not possible with a CFG.  Initialization  Type checking  etc…  Properties that are hard to formulate via CFG  e.g., “break” only appears inside a loop  Processing of the AST 8

Contextual Analysis  Identification  Gather information about each named item in the program  e.g., what is the declaration for each usage  Context checking  Type checking  e.g., the condition in an if-statement is a Boolean 9

Identification  What is month_name?  Forward references?  Languages that don’t require declarations? 10 month : integer RANGE [1..12]; … month := 1; while (month <= 12) { print(month_name[month]); month : = month + 1; }

Symbol table  A table containing information about identifiers in the program  Single entry for each named item 11 namepostype… month1RANGE[1..12] month_name…… … month : integer RANGE [1..12]; … month := 1; while (month <= 12) { print(month_name[month]); month : = month + 1; }

Not so fast… 12 struct one_int { int i; } i; main() { i.i = 42; int t = i.i; printf(“%d”,t); { int i = 73; printf(“%d”,i); } A struct field named i A struct variable named i Assignment to the “i” field of struct “i” int variable named “i” Reading the “i” field of struct “i”

Scopes  Typically stack structured scopes  Scope entry  push new empty scope element  Scope exit  pop scope element and discard its content  Identifier declaration  identifier created inside top scope  Identifier Lookup  Search for identifier top-down in scope stack 13

Scope-structured symbol table 14 Scope stack 0 3 P “so” P “long” // 2 P “and” P “thanks” 1 P “x” P “all” // P “the” P “fish” P “thanks” // P “x” // { int the=1; int fish=2; Int thanks=3; { int x = 42; int all = 73; { … }

Scope and symbol table  Scope x Identifier -> properties  Expensive lookup  A better solution  hash table over identifiers 15

Hash-table based Symbol Table 16 name macro decl 2P “x” 1P // name macro decl 2P “thanks” 0P // name macro decl 3P “so” // Id.info

Keep also info by scope 17 Scope stack 0 3 Id.info(“so”)Id.info(“long”) // 2 Id.info(“and”)Id.info(“thanks”) 1 Id.info(“x”)Id.info(“all”) // Id.info(“the”)Id.info(“fish”) Id.info(“thanks”) // Id.info(“x”) // (now just pointers to the corresponding record in the symbol table)

Remember lexing/parsing?  How did we know to always map an identifier to the same symbol table entry?  Actually, the lexical-analyzer/parser already creates the symbol-table according to scopes. 18

19 Semantic Checks  Scope rules  Use symbol table to check that  Identifiers defined before use  No multiple definition of same identifier  Program conforms to scope rules  Type checking  Check that types in the program are consistent  How?

Types  What is a type?  Simplest answer: a set of values  Integers, real numbers, booleans, …  Why do we care?  Safety  Guarantee that certain errors cannot occur at runtime  Abstraction  Hide implementation details (e.g., of the type on the machine)  Documentation  Optimization 20

Type System (textbook definition) “A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute” Types and Programming Languages / Benjamin C. Pierce

Type System  A type system of a programming language is a way to define a “good” program  Good programs = well-typed programs  Bad programs = not well typed  Type checking  Static typing – type checking at compile time  Dynamic typing – (most) type checking at runtime  Type inference  Automatically infer types for a program (or show that there is no valid typing) 22

Static typing vs. dynamic typing  Static type checking is conservative  Any program that is determined to be well-typed is free from certain kinds of errors  May reject programs that cannot be statically determined as well typed  Why?  Dynamic type checking  May accept more programs as valid (runtime info)  Errors not caught at compile time  Runtime cost 23

24 Type Checking  Type rules specify  which types can be combined with certain operator  Assignment of expression to variable  Formal and actual parameters of a method call  Examples “drive” + “drink” 42 + “the answer” string intstring ERROR

Type Checking Rules  Basic Types  Building blocks for the type system (type rules)  e.g., int, boolean, string  Specify for each operator  Types of operands  Type of result  Type Expressions  Array types  Function types  Record types / Classes 25

26 Typing Rules If E1 has type int and E2 has type int, then E1 + E2 has type int E1 : intE2 : int E1 + E2 : int

27 More Typing Rules (examples) true : boolean E1 : int E2 : int E1 op E2 : int false : boolean int-literal : int string-literal : string op  { +, -, /, *, %} E1 : int E2 : int E1 rop E2 : boolean rop  {, >=} E1 : T E2 : T E1 rop E2 : boolean rop  { ==,!=}

28 And Even More Typing Rules E1 : boolean E2 : boolean E1 lop E2 : boolean lop  { &&,|| } E1 : int - E1 : int E1 : boolean ! E1 : boolean E1 : T[] E1.length : int E1 : T[]E2 : int E1[E2] : T E1 : int new T[E1] : T[]

Type Checking  Traverse AST and assign types for AST nodes  Use typing rules to compute node types  Alternative: type-check during parsing  More complicated alternative  But also more efficient 29

30 Example 45 > 32 && !false BinopExprUnopExpr BinopExpr … op=AND op=NEG op=GT intLiteral val=45 intLiteral val=32 boolLiteral val=false : int : boolean false : boolean int-literal : int E1 : int E2 : int E1 rop E2 : boolean rop  {, >=} E1 : boolean E2 : boolean E1 lop E2 : boolean lop  { &&,|| } E1 : boolean !E1 : boolean

Type Declarations  So far, we ignored the fact that types can also be declared 31 TYPE Int_Array = ARRAY [Integer 1..42] OF Integer; Var a : ARRAY [Integer 1..42] OF Real; (explicitly) (anonymously)

32 Var a : ARRAY [Integer 1..42] OF Real; TYPE #type01_in_line_73 = ARRAY [Integer 1..42] OF Real; Var a : #type01_in_line_73; Type Declarations

Strongly vs. weakly typed  Coercion  Strongly typed  C, C++, Java  Weakly typed  Perl, PHP 33 $a=31; $b="42x"; $c=$a+$b; print $c; main() { int a=31; char b[3]="42x"; int c=a+b; } public class… { public static void main() { int a=31; String b ="42x"; int c=a+b; } warning: initialization makes integer from pointer without a cast Output: 73 error: Incompatible type for declaration. Can't convert java.lang.String to int perl C Java

Coercions ( כפיה, אילוץ )  If we expect a value of type T1 at some point in the program, and find a value of type T2, is that acceptable? 34 float x = 3.141; int y = x;

Forward References  Forward references must be resolved  A forward reference is added to the symbol table as a forward reference, and later updated when type declaration is met  At the end of scope, must check that all forward references have been resolved 35 TYPE Ptr_List_Entry = POINTER TO List_Entry; TYPE List_Entry = RECORD Element : Integer; Next : Ptr_List_Entry; END RECORD;

Type Table  All types in a compilation unit are collected in a type table  For each type, its table entry contains:  Type constructor: basic, record, array, pointer,…  Size and alignment requirements  to be used later in code generation  Types of components (if applicable)  e.g., types of record fields 36

Type Equivalence: Name Equivalence (Used in C) t1 not (name) equivalence to t2 37 Typedef t1 = ARRAY[Integer] OF Integer; Typedef t2 = ARRAY[Integer] OF Integer; t3 equivalent to t4 Typedef t3 = ARRAY[Integer] OF Integer; Typedef t4 = t3

t5, t6, t7 are all (structurally) equivalent 38 Type Equivalence: Structural Equivalence Type t5 = RECORD c: Integer; p: POINTER TO t5; END RECORD; Type t6 = RECORD c: Integer; p: POINTER TO t6; END RECORD; Type t7 = RECORD c: Integer; p: POINTER TO RECORD c: Integer; p: POINTER to t5; END RECORD; END RECORD;

In practice  Almost all modern languages use name equivalence for efficiency. 39

l-values and r-values  What is dst? What is src?  dst is a memory location where the value should be stored (called l-value).  src is a value (called r-value). 40 dst := src

l-values and r-values (example) 41 x:= y x x47 x y … … … 17 0x x47 x y … … …

42 lvaluervalue lvalue-deref rvalueerror- expected found l-values and r-values

So far…  Identification matches applied occurrences of identifier to its defining occurrence  The symbol table maintains this information  Type checking checks which type combinations are legal  Each node of an expression in the syntax tree represents either an l-value (location) or an r- value (value) 43

Abstract Syntax Tree (AST)  Abstract away some syntactic details of the source language 44 S ➞ if E then S else S | … if (x>0) then { y = 42} else { y = 73 }

Parse tree (concrete syntax tree) 45 S 0 Sif x E > { S } id num the n E ( ) = else S { S } id num =

Abstract Syntax Tree (AST) 46 if 0x Assign idnum Rel-op op: > idnum Assign

Syntax Directed Translation  The parse tree is used to drive the translation.  Don’t only decide if the string is in the language, but also perform some actions with each rule discovered.  (already mentioned when we did recursive descent)  Semantic attributes  Attributes attached to grammar symbols  Semantic actions  How to update the attributes  Attribute grammars 47

Attribute grammars  Attributes  Every grammar symbol has attached attributes  Example: Expr.type  Semantic actions  Every production rule can define how to assign values to attributes  Example: Expr ➞ Expr + Term Expr.type = Expr.type when (Expr.type == Term.type) Error otherwise 48

Indexed symbols  Add indexes to distinguish repeated grammar symbols  Does not affect grammar (indices do not “count”)  Used with semantic actions  Expr ➞ Expr + Term Becomes Expr ➞ Expr 1 + Term  Now we can specify the actions clearly: Expr ➞ Expr 1 + Term Expr.type = Expr 1.type when (Expr 1.type == Term.type) Error otherwise 49

Example 50 ProductionSemantic Rule D ➞ T LL.type = T.type T ➞ intT.type = integer T ➞ floatT.type = float L ➞ L 1, idL 1.type = L.type addType(id.entry,L.type) L ➞ idaddType(id.entry,L.type) D float L L id1 T L id2 id3 float x,y,z float

Attribute Evaluation  Build the AST  Fill attributes of terminals with values derived from their representation  Execute evaluation rules of the nodes to assign values until no new values can be assigned  In the right order such that  No attribute value is used before its available  Each attribute will get a value only once 51

Dependencies  A semantic equation a = b1,…,bm requires computation of b1,…,bm to determine the value of a  The value of a depends on b1,…,bm  We write a  bi 52

Cycles  Cycle in the dependence graph  May not be able to compute attribute values 53 T E E.S = T.i T.i = E.s + 1 T.i E.s ASTDependence graph

Attribute Evaluation  Build the AST  Build dependency graph  Compute evaluation order using topological ordering  Execute evaluation rules based on topological ordering  Works as long as there are no cycles 54

Building Dependency Graph  All semantic equations take the form attr1 = func1(attr1.1, attr1.2,…) attr2 = func2(attr2.1, attr2.2,…)  Build a directed dependency graph G  For every attribute a of a node n in the AST create a node n.a  For every node n in the AST and a semantic action of the form b = f(c1,c2,…ck) add edges of the form (ci,b)  Actions with side effects use a dummy attribute 55

Example 56 ProductionSemantic Rule D ➞ T LL.type = T.type T ➞ intT.type = integer T ➞ floatT.type = float L ➞ L1, idL1.type = L.type addType(id.entry,L.type) L ➞ idaddType(id.entry,L.type) ProductionSemantic Rule D ➞ T LL.type = T.type T ➞ intT.type = integer T ➞ floatT.type = float L ➞ L1, idL1.type = L.type L.dmy = addType(id.entry,L.type) L ➞ idL.dmy = addType(id.entry,L.type) Convention: Add dummy variables for side effects.

Example 57 D float L L id1 T L id2 id3 float x,y,z type dmy entry type dm y ProductionSemantic Rule D ➞ T LL.type = T.type T ➞ intT.type = integer T ➞ floatT.type = float L ➞ L1, idL1.type = L.type addType(id.entry,L.type) L ➞ idaddType(id.entry,L.type)

Example 58 D float L L id1 T L id2 id3 float x,y,z type dmy entry type dmy ProductionSemantic Rule D ➞ T LL.type = T.type T ➞ intT.type = integer T ➞ floatT.type = float L ➞ L1, idL1.type = L.type L.dmy = addType(id.entry,L.type) L ➞ idL.dmy = addType(id.entry,L.type)

Topological Order  For a graph G=(V,E), |V|=k  Ordering of the nodes v1,v2,…vk such that for every edge (vi,vj)  E, i < j Example topological orders: ,

Back to example 60 float x,y,z type dmy entry type dmy float ent1 ent2 ent3 float

But what about cycles?  For a given attribute grammar, it is hard to detect if it has cyclic dependencies  Exponential cost  Special classes of attribute grammars  Our “usual trick”: sacrifice generality for efficient computation 61

Inherited vs. Synthesized Attributes  Synthesized attributes  Computed from children of a node  Inherited attributes  Computed from parents and siblings of a node  Attributes of tokens are technically considered as synthesized attributes  In general for a production rule: if an attribute is set in the parent, it is synthesized; if it is set in the child, it is inherited. 62

Example 63 D float L L id1 T L id2 id3 float x,y,z float inherited synthesized ProductionSemantic Rule D ➞ T LL.type = T.type T ➞ intT.type = integer T ➞ floatT.type = float L ➞ L1, idL1.type = L.type L.dmy = addType(id.entry,L.type) L ➞ idL.dmy = addType(id.entry,L.type)

S-attributed Grammars  Special class of attribute grammars  Only uses synthesized attributes (S-attributed)  No use of inherited attributes  Can be computed by any bottom-up parser during parsing  Attributes can be stored on the parsing stack  Reduce operation computes the (synthesized) attribute from attributes of children 64

S-attributed Grammar: example 65 ProductionSemantic Rule S ➞ E ; print(E.val) E ➞ E1 + T E.val = E1.val + T.val E ➞ T E.val = T.val T ➞ T1 * F T.val = T1.val * F.val T ➞ F T.val = F.val F ➞ (E) F.val = E.val F ➞ digit F.val = digit.lexval Value obtained from lexical analyzer (A simple example: values are not always obtainable during compile time.)

example 66 3 F T E 4 F * 7 F T S Lexval=3 Lexval= 4 Lexval=7 val=7 val=4 val=28 val=3 val=31 print(31) T +E val=28

L-attributed grammars  L-attributed attribute grammar when every attribute in a production A ➞ X 1 …X n is  A synthesized attribute, or  An inherited attribute of X j, 1 <= j <=n that only depends on  Attributes of X 1 …X j-1 to the left of X j, or  Inherited attributes of A  Such semantic attributes can be computed during a single DFS walk on the tree.  When first visiting a node, we have finished traversing all its left siblings. (Can compute inherited.)  Upon completing the traversal of a node, we have completed visiting all its descendants. (Can compute synthesized.) 67

Example: typesetting  Each box is built from smaller boxes from which it gets the height and depth, and to which it sets the point size.  pointsize (ps) – size of letters in a box. Subscript text has smaller point size of o.7p.  height (ht) – distance from top of the box to the baseline  depth (dp) – distance from baseline to the bottom of the box. 68

Example: typesetting 69 productionsemantic rules S ➞ B B.ps = 10 B ➞ B1 B2 B1.ps = B.ps B2.ps = B.ps B.ht = max(B1.ht,B2.ht) B.dp = max(B1.dp,B2.dp) B ➞ B1 sub B2 B1.ps = B.ps B2.ps = 0.7*B.ps B.ht = max(B1.ht,B2.ht – 0.25*B.ps) B.dp = max(B1.dp,B2.dp– 0.25*B.ps) B ➞ text B.ht = getHt(B.ps,text.lexval) B.dp = getDp(B.ps,text.lexval)

Computing the attributes from left to right during a DFS traversal procedure dfvisit (n: node); begin for each child m of n, from left to right begin evaluate inherited attributes of m; dfvisit (m) end; evaluate synthesized attributes of n end

Summary  Identification, synbol table  Type checking  Contextual analysis can move information between nodes in the AST  Even when they are not “local”, or context free.  Attribute grammars  Attach attributes to variables and semantic actions to derivation rules.  Attribute evaluation  Build dependency graph, topological sort, evaluate  Can sometimes be done during the parsing.  Special classes with pre-determined evaluation order: S- attributed, L-attributed 71