Lecture 8 Semantic Analysis.

Slides:



Advertisements
Similar presentations
CPSC 388 – Compiler Design and Construction
Advertisements

Symbol Table.
Semantics Static semantics Dynamic semantics attribute grammars
Semantic Analysis Chapter 6. Two Flavors  Static (done during compile time) –C –Ada  Dynamic (done during run time) –LISP –Smalltalk  Optimization.
Chapter 5 Syntax-Directed Translation. Translation of languages guided by context-free grammars. Attach attributes to the grammar symbols. Values of the.
1 Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University Modern Compiler Design.
1 Compiler Construction Intermediate Code Generation.
1 Error detection in LR parsing Errors are discovered when a slot in the action table is blank. Canonical LR(1) parsers detect and report the error as.
Lecture # 20 Type Systems. 2 A type system defines a set of types and rules to assign types to programming language constructs Informal type system rules,
Type Checking Compiler Design Lecture (02/25/98) Computer Science Rensselaer Polytechnic.
Chapter 6 Type Checking Section 0 Overview 1.Static Checking Check that the source program follows both the syntactic and semantic conventions of the source.
Compiler Construction
Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,
1 CMPSC 160 Translation of Programming Languages Fall 2002 Lecture-Modules slides derived from Tevfik Bultan, Keith Cooper, and Linda Torczon Department.
Type Checking in Cool Alex Aiken (Modified by Mooly Sagiv)
Chapter 6 Type Checking Section 0 Overview
Context-sensitive Analysis. Beyond Syntax There is a level of correctness that is deeper than grammar fie(a,b,c,d) int a, b, c, d; { … } fee() { int f[3],g[0],
Context-sensitive Analysis, II Ad-hoc syntax-directed translation, Symbol Tables, andTypes.
Chapter 3 Program translation1 Chapt. 3 Language Translation Syntax and Semantics Translation phases Formal translation models.
1 Type Type system for a programming language = –set of types AND – rules that specify how a typed program is allowed to behave Why? –to generate better.
Compiler Construction Sohail Aslam Lecture Beyond Syntax  These questions are part of context-sensitive analysis  Answers depend on values, not.
Cs164 Prof. Bodik, Fall Symbol Tables and Static Checks Lecture 14.
Static checking and symbol table Chapter 6, Chapter 7.6 and Chapter 8.2 Static checking: check whether the program follows both the syntactic and semantic.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
Chapter 5 Syntax-Directed Translation Section 0 Approaches to implement Syntax-Directed Translation 1、Basic idea Guided by context-free grammar (Translating.
CSc 453 Semantic Analysis Saumya Debray The University of Arizona Tucson.
1 Abstract Syntax Tree--motivation The parse tree –contains too much detail e.g. unnecessary terminals such as parentheses –depends heavily on the structure.
Semantic Analysis Legality checks –Check that program obey all rules of the language that are not described by a context-free grammar Disambiguation –Name.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
COMPILERS Semantic Analysis hussein suleman uct csc3005h 2006.
1 Semantic Analysis Aaron Bloomfield CS 415 Fall 2005.
COP4020 Programming Languages Semantics Prof. Xin Yuan.
Lesson 11 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
COMPILERS Symbol Tables hussein suleman uct csc3003s 2007.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
410/510 1 of 18 Week 5 – Lecture 1 Semantic Analysis Compiler Construction.
Chapter 5: Syntax directed translation –Use the grammar to direct the translation The grammar defines the syntax of the input language. Attributes are.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Review: Syntax directed translation. –Translation is done according to the parse tree. Each production (when used in the parsing) is a sub- structure of.
1 Static Checking and Type Systems Chapter 6 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
1 Syntax-Directed Translation Part I Chapter 5 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007.
Bernd Fischer RW713: Compiler and Software Language Engineering.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Semantic Analysis II Type Checking EECS 483 – Lecture 12 University of Michigan Wednesday, October 18, 2006.
Chapter 8: Semantic Analyzer1 Compiler Designs and Constructions Chapter 8: Semantic Analyzer Objectives: Syntax-Directed Translation Type Checking Dr.
CSE 420 Lecture Program is lexically well-formed: ▫Identifiers have valid names. ▫Strings are properly terminated. ▫No stray characters. Program.
Semantic Analysis Chapter 6. Two Flavors  Static (done during compile time) –C –Ada  Dynamic (done during run time) –LISP –Smalltalk  Optimization.
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
Chapter4 Syntax-Directed Translation Introduction : 1.In the lexical analysis step, each token has its attribute , e.g., the attribute of an id is a pointer.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 9 Ahmed Ezzat Semantic Analysis.
COMPILERS Semantic Analysis hussein suleman uct csc3003s 2009.
Lecture 9 Symbol Table and Attributed Grammars
Semantics Analysis.
Context-Sensitive Analysis
A Simple Syntax-Directed Translator
Constructing Precedence Table
CS 326 Programming Languages, Concepts and Implementation
Syntax-Directed Translation Part I
Semantic Analysis Chapter 6.
Lecture 11: Context Sensitive Analysis
Syntax-Directed Translation Part I
Syntax-Directed Translation Part I
Context-sensitive Analysis
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
Syntax-Directed Translation Part I
SYNTAX DIRECTED DEFINITION
Compiler Construction
COMPILERS Semantic Analysis
Syntax-Directed Translation Part I
Compiler Construction
Presentation transcript:

Lecture 8 Semantic Analysis

The Compiler So Far Lexical analysis Detects inputs with illegal tokens Parsing Detects inputs with ill-formed parse trees Semantic analysis Last “front end” phase Catches all remaining errors

What’s Wrong? String y = “abc” ; y ++ ; Example 1 int y = x + 3;

Why a Separate Semantic Analysis? Parsing cannot catch some errors Some language constructs are not context-free Example: All used variables must have been declared (i.e. scoping) ex: { int x { .. { .. x ..} ..} ..} Example: A method must be invoked with arguments of proper type (i.e. typing) ex: int f(int, int) {…} called by f(‘a’, 2.3, 1)

More problems require semantic analysis Is x a scalar, an array, or a function? Is x declared before it is used? Is x defined before it is used? Are any names declared but not used? Which declaration of x does this reference? Is an expression type-consistent? Does the dimension of a reference match the declaration? Where can x be stored? (heap, stack, . . . ) Does *p reference the result of a malloc()? Is an array reference in bounds? Does function foo produce a constant value?

Why is semantic analysis hard? need non-local information answers depend on values, not on syntax answers may involve computation

How can we answer these questions? 1. use context-sensitive grammars (CSG) general problem is P-space complete 2. use attribute grammars(AG) augment context-free grammar with rules calculate attributes for grammar symbols 3. use ad hoc techniques augment grammar with arbitrary code execute code at corresponding reduction store information in attributes, symbol tables

Attribute Grammars A generalization of CFG An attribute grammar is a context free grammar with associated attributes and semantic rules Each grammar symbol is associated with a set of attributes Each production is associated with a set of semantic rules for computing attributes Also called Syntax-directed Definition in Dragon book.

Dependences between attributes Attribute values are computed from constants & other attributes synthesized attribute value computed from children [& constants] inherited attribute value computed from siblings & parent [& constants ] induce dependency graph among attributes of parse tree nodes.

Synthesized Attributes

Synthesized Attributes

Inherited Attributes

Inherited Attributes

Example attribute grammar A grammar to evaluate signed binary numbers Production Evaluation Rules 1 NUM  SIGN LIST LIST.pos = 0 NUM.val = SIGN.neg ? -LIST.val : LIST.val 2 SIGN  + SIGN.neg = false 3 SIGN  - SIGN.neg = true 4 LIST  BIT BIT.pos = LIST.pos LIST.val = BIT.val 5 LIST  LIST BIT LIST1.pos = LIST0.pos + 1 BIT.pos = LIST0.pos LIST0.val = LIST1.val + BIT.val 6 BIT  0 BIT.val = 0 7 BIT  1 BIT.val = 2BIT.pos

Annotated Parse Tree

val and neg are synthesized attributes pos is an inherited attribute Annotated Parse Tree and Dependency graph

Two Notations for AGs Syntax-Directed Definition Syntax-directed Translation [Scheme]

Syntax-directed definition Each grammar production A®a is associated with a set of semantic rules of the form b := f (c1, c2, …, ck) where f is a function and 1. b is a synthesized attribute of A and c1, c2, …, ck are attributes of grammar symbols in a, or 2. b is an inherited attribute of one of the grammar symbols in a and c1, c2, …, ck are attributes of A or grammar symbols in a

Dependencies of Attributes • In the semantic rule b := f(c1, c2, …, ck) we say b depends on c1, c2, …, ck • The semantic rule b must be evaluated after the semantic rules for c1, c2, …, ck • The dependencies of attributes can be represented by a directed graph called dependency graph

Dependency Graph

Evaluatino Order Apply topological sort on dependency graph a1 := float a2 := a1 addtype(a3, a2) /* a4 */ a5 := a2 addtype(a6, a5) /* a7 */ a8 := a5 addtype(a9, a8) /* a10 */

S-attributed Grammar S-attribute Grammar: all attributes are synthesized attributes. can be evaluated in one-pass useful in many context. (e.g. Calculator) A good math to LR parsing.

Example

L-attributed Grammar A syntax-directed definition is L-attributed if each attribute in each semantic rule for each production A ® X1 X2 … Xn is a synthesized attribute, or an inherited attribute of Xj, 1 £ j £ n, depending only on 1. the attributes of X1, X2, …, Xj-1 and/or 2. the inherited attributes of A

Example

Counter Example

pros and cons of AGs advantages: clean formalism automatic generation of evaluator high-level specification. Disadvantages efficiency determined by evaluation strategy increase space requirement circular testing handling non-local information

Syntax-directed Translation [scheme] allow arbitrary actions can have global data structure can place actions among production

The definition A translation scheme is an attribute grammar in which semantic rules are enclosed between braces { and }, and are inserted within the right sides of productions The value of an attribute must be available when a semantic rule refers to it

Examples YACC A ::= B C { $$ = concat($1,$2); } CUP A ::= B:m C:p { RESULT = concat(m,p); } Typical uses build abstract syntax tree & symbol table perform error/type checking

Example D ® T {L.in := T.type} L T ® int {T.type := integer} T ® float {T.type := float} L ® {L1.in := L.in} L1 ‘,’ id {addtype(id.entry, L.in)} L ® id {addtype(id.entry, L.in)}

Example

Restriction on translation scheme An inherited attribute for a symbol on the right side must be computed in a semantic rule before that symbol A semantic rule must not refer to a synthesized attribute for a symbol to its right A synthesized attribute for the symbol on the left can be computed after all attributes it depends on have been computed

From L-Attributed Definitions to Translation Schemes

From L-Attributed Definitions to Translation Schemes S ® {B.ps := 10} B {S.ht := B.ht} B ® {B1.ps := B.ps} B1 {B2.ps := B.ps} B2 {B.ht := max(B1.ht, B2.ht)} B ® {B1.ps := B.ps} B1 sub {B2.ps := shrink(B.ps)} B2 {B.ht := disp(B1.ht, B2.ht)} B ® text {B.ht := text.ht ´ B.ps}

Abstract syntax tree An abstract syntax tree is a condensed form of parse tree useful for representing constructs. usually a parse tree with the nodes for most non-terminal symbols removed. This represents “x - 2 * y”. can use a linearized (operator) form of the tree. x 2 y * - in postfix form. A popular intermediate representation.

Scope Matching identifier declarations with uses Important static analysis step in most languages

Scope (Cont.) The scope of an identifier is the portion of a program in which that identifier is accessible The same identifier may refer to different things in different parts of the program Different scopes for same name don’t overlap An identifier may have restricted scope

Static vs. Dynamic Scope Most languages have static scope Scope depends only on the program text, not run-time behavior C and Java have static scope A few languages are dynamically scoped Lisp, SNOBOL Lisp has changed to mostly static scoping Scope depends on execution of the program

Static Scoping Example { int x = 0; x++; { int x = 1; x++; } }

Static Scoping Example (Cont.) { int x = 0; x++; { int x = 1; x++; } } Uses of x refer to closest enclosing definition

Dynamic Scope A dynamically-scoped variable refers to the closest enclosing binding in the execution of the program Example (defun g (y) (let (a 4) (f 3))); (defun f (x) a); (g 5) => 4

Symbol Tables Consider the block: B :{ int x = 0; Es} Idea: Before processing Es, add definition of x to current definitions, overriding any other definition of x After processing Es, remove definition of x and restore old definition of x A symbol table is a data structure that tracks the current bindings of identifiers

Symbol tables A symbol table associates values or attributes (e.g., types and values) with names. What should be in a symbol table? variable and procedure names literal constants and strings

Symbol Tables What information might compiler need? textual name data type declaring procedure lexical level of declaration if array, number and size of dimensions if procedure, number and type of parameters

Symbol Tables Implementation usually implemented as hash tables How to handle nested lexical scoping? when we ask about a name, we want the closest lexical declaration One solution use one symbol table per scope tables chained to enclosing scopes insert names in table for current scope name lookup starts in current table if needed, checks enclosing scopes in order

tables scopes A TA B1 TB1 TB2 C TC B2

Imperative-style Symbol Table beginScope() start a new nested scope endScope() exit current scope addSymbol(x, value) add a symbol x to the table Object lookup(x) finds current x (or null) checkScope(x) true if x defined in current scope (needed for checking duplicate declarations in the same scope)

class SymbolTable class SymbolTabe { static class Entry { Symbol id; Object value; int scope; Entry(…){…} } private Map map = new HashMap(); private int curScope; … private Stack stack = new Stack(); lookup(Symbol s) { Entry e = (Entry) map.get(s) ; if( e == null) return null; else return e.value; } checkScope(x) { if( e == null) return null; else return e.scope = curScope; } beginScope() { curScope++; push(null); } endScope() { for( Symbol s = (Symbol) pop(); s != null ; s = (Symbol) pop() ) map.delete(s); curScope--; } addSymbol(Symbol x, Object v) { assert lookup(x) == null; push(x) ; map.add(x,v); }… … }

Types What is a type? The notion varies from language to language Consensus A set of values A set of operations on those values Classes are one instantiation of the modern notion of type

Why Do We Need Type Systems? Consider the assembly language fragment addi r1, r2, r3 What are the types of r1, r2, r3?

Types and Operations Certain operations are legal for values of each type It doesn’t make sense to add a function pointer and an integer in C It does make sense to add two integers But both have the same assembly language implementation!

Type Systems A language’s type system specifies which operations are valid for which types The goal of type checking is to ensure that operations are used with the correct types Enforces intended interpretation of values, because nothing else will! Type systems provide a concise formalization of the semantic checking rules

What Can Types do For Us? Can detect certain kinds of errors : “abc” ++ ; x = ar[ “abc”] ; int x = “abc” ; Memory errors: Reading from an invalid pointer, etc. int x[50] ; x[50] = 3; expressiveness (overloading, polymorphism) help determine which methods/constructors would be invoked. Ex: add(Complex, Complex), add(int,int), add(String,String),.. add(23,14) => add(int, int) invoked provide information for code generation ex: memory size

Type Checking Overview Three kinds of languages: Statically typed: All or almost all checking of types is done as part of compilation (C, Java, Cool) Dynamically typed: Almost all checking of types is done as part of program execution (Scheme) Untyped: No type checking (machine code)

Pros and cons Static typing: catches many programming errors at compile time Avoids overhead of runtime type checks Dynamic typing: Static type systems are restrictive Rapid prototyping easier in a dynamic type system

The practice Most code is written in statically typed languages with an “escape” mechanism Unsafe casts in C, Java (Person) malloc(1024) // c Object x = …; Person p = (Person) x; // Java It’s debatable whether this compromise represents the best or worst of both worlds

Type systems Types A type is a set of values that share a set of common properties. e.g. : int, InputStream, String, … Defined by language (built-in types) and/or programmer (user-defined types) Types are represented by type expressions Type system 1. A set of types in a programming language, and 2. a collection of rules for assigning types to the various parts of a program A type checker enforces a type system on users’ programs

Type Systems (continued) Example type rules E1 and E2 are of the type int ==> E1+E2 is of the type int. x = E ==> x and E must have the same type.

Type checking Type checker enforces rules of type system may be strong/weak, static/dynamic Static type checking performed at compile time early detection, no run-time overhead not always possible (e.g., A[i]) Dynamic type checking performed at run time more flexible, rapid prototyping overhead to check run-time type tags

Type expressions Type expressions (v.s. value expressions) used to represent the type of a language construct describes both language and programmer types Examples basic types: int, float, char, ... constructed types: arrays, records, pointers, functions,...

Type Expressions A basic type is a type expression boolean, char, integer, real, void, typeError A type constructor applied to type expressions is a type expression array: array(I, T) record: T1 x T2 x …x Tn pointer: pointer(T) function: D  R

Type Expressions Constructing new types, if T, T1,…, Tn are type expressions, I an non-negative integer then array(I, T) // array type T1 x T2 x … x Tn // record type pointer(T) // pointer T1 x T2 x …x Tn  T // funciton are type expressions

A simple type checker Using a synthesized attribute grammar, we will describe a type checker for arrays, pointers, statements, and functions. Grammar for source language: P  D ; S D  D ; D | id: T T  char | integer | array [num] of T | * T E  literal | num | id | E mod E | E[E] | E * S  id = E; | if E then S | while E do S | S ; S

Basic types char, integer, typeError assume all arrays start at 1, e.g., array [256] of char results in the type expression array(256,char) * builds a pointer type, so * integer results in the type expression pointer(integer)

Type Declarations D  id: T { addType(id.entry, T.type) } T  char { T.type = char } T  integer { T.type = integer } T  * T1 { T.type = pointer(T1.type) } T  array [num] of T1 { T.type = array(num.val, T1.type) }

Type checking of expressions E  literal { E.type = char } E  num { E.type = int } E  id { E.type = lookup(id.entry) } E  E1 mod E2 { E.type = (E1.type == int and E2.type == int ) ? int : typeError } E  E1 [ E2 ] {E.type = ( E1.type == array(s, t) and E2.type == int)? t : typeError} E  * E1 {E.type := (E1.type == pointer(t))? t :typeError}

Type checking of Statements P  D “;” S S  id “=” E {S.type = (lookup(id.entry) == E.type) ? void : typeError} S  if E then S1 {S.type = (E.type == boolean)? S1.type : typeError} S  while E do S1 {S.type := (E.type == boolean) ? S1.type else typeError} S  S1 “;” S2 {S.type := (S1.type == void and S2.type == void)? void : typeError}

Type Checking of Functions T ® T1 ® T2 {T.type := T1.type ® T2 .type} E ® E1 ( E2 ) {E.type := (E1.type == (s ® t and E2.type == s )? t : typeError}