Semantic Analysis Having figured out the program’s structure, now figure out what it means.

Slides:



Advertisements
Similar presentations
CPSC 388 – Compiler Design and Construction
Advertisements

Semantic Analysis and Symbol Tables
Symbol Table.
Objects and Classes David Walker CS 320. Advanced Languages advanced programming features –ML data types, exceptions, modules, objects, concurrency,...
Intermediate Code Generation
Semantic Analysis Chapter 6. Two Flavors  Static (done during compile time) –C –Ada  Dynamic (done during run time) –LISP –Smalltalk  Optimization.
Programming Languages and Paradigms
Programming Languages and Paradigms The C Programming Language.
1 Compiler Construction Intermediate Code Generation.
Winter Compiler Construction T7 – semantic analysis part II type-checking Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv.
Overview of Previous Lesson(s) Over View  Front end analyzes a source program and creates an intermediate representation from which the back end generates.
INF 212 ANALYSIS OF PROG. LANGS Type Systems Instructors: Crista Lopes Copyright © Instructors.
CSE 425: Semantics II Implementing Scopes A symbol table is in essence a dictionary –I.e., every name appears in it, with the info known about it –Usually.
Type Checking.
Compiler Construction
Objects and Classes David Walker CS 320. Advanced Languages advanced programming features –ML data types, exceptions, modules, objects, concurrency,...
Elaboration or: Semantic Analysis Compiler Baojian Hua
Run time vs. Compile time
1 Type Type system for a programming language = –set of types AND – rules that specify how a typed program is allowed to behave Why? –to generate better.
Guide To UNIX Using Linux Third Edition
Building An Interpreter After having done all of the analysis, it’s possible to run the program directly rather than compile it … and it may be worth it.
Programming Languages 2nd edition Tucker and Noonan Chapter 6 Types I was eventually persuaded of the need to design programming notations so as to maximize.
Cs164 Prof. Bodik, Fall Symbol Tables and Static Checks Lecture 14.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
2.2 A Simple Syntax-Directed Translator Syntax-Directed Translation 2.4 Parsing 2.5 A Translator for Simple Expressions 2.6 Lexical Analysis.
CSc 453 Semantic Analysis Saumya Debray The University of Arizona Tucson.
Semantic Analysis Legality checks –Check that program obey all rules of the language that are not described by a context-free grammar Disambiguation –Name.
OOPs Object oriented programming. Based on ADT principles  Representation of type and operations in a single unit  Available for other units to create.
Semantic Analysis CS 671 February 5, CS 671 – Spring The Compiler So Far Lexical analysis Detects inputs with illegal tokens –e.g.: main$
Data Structures Using C++ 2E
COP4020 Programming Languages
CSC3315 (Spring 2009)1 CSC 3315 Programming Languages Hamid Harroud School of Science and Engineering, Akhawayn University
COMPILERS Semantic Analysis hussein suleman uct csc3005h 2006.
1 Semantic Analysis Aaron Bloomfield CS 415 Fall 2005.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 12: Pointers continued, C strings.
APCS Java AB 2004 Review of CS1 and CS2 Review for AP test #1 Sources: 2003 Workshop notes from Chris Nevison (Colgate University) AP Study Guide to go.
COMPILERS Symbol Tables hussein suleman uct csc3003s 2007.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
410/510 1 of 18 Week 5 – Lecture 1 Semantic Analysis Compiler Construction.
CSE 425: Data Types I Data and Data Types Data may be more abstract than their representation –E.g., integer (unbounded) vs. 64-bit int (bounded) A language.
Compiler Construction Dr. Noam Rinetzky and Orr Tamir School of Computer Science Tel Aviv University
CPS 506 Comparative Programming Languages Syntax Specification.
Programming Languages 2nd edition Tucker and Noonan Chapter 6 Type Systems I was eventually persuaded of the need to design programming notations so as.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
Chapter 1 Introduction Major Data Structures in Compiler
OOPs Object oriented programming. Abstract data types  Representationof type and operations in a single unit  Available for other units to create variables.
CS536 Types 1. Roadmap Back from our LR Parsing Detour Name analysis – Static v dynamic – Scope Today – Type checking 2 Scanner Parser Tokens Semantic.
SOEN 343 Software Design Section H Fall 2006 Dr Greg Butler
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Semantic Analysis II Type Checking EECS 483 – Lecture 12 University of Michigan Wednesday, October 18, 2006.
1 Structure of Compilers Lexical Analyzer (scanner) Modified Source Program Parser Tokens Semantic Analysis Syntactic Structure Optimizer Code Generator.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
CS412/413 Introduction to Compilers Radu Rugina Lecture 11: Symbol Tables 13 Feb 02.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Lucas Bang Lecture 11: Pointers.
Lecture 9 Symbol Table and Attributed Grammars
A Simple Syntax-Directed Translator
Constructing Precedence Table
Semantic Analysis with Emphasis on Name Analysis
CS 536 / Fall 2017 Introduction to programming languages and compilers
Names, Binding, and Scope
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
CSE401 Introduction to Compiler Construction
Compiler Construction
Compiler Construction
Presentation transcript:

Semantic Analysis Having figured out the program’s structure, now figure out what it means

Compiler Passes Analysis of input program (front-end) character stream Lexical Analysis Code Generation Optimization Intermediate Code Generation Semantic Analysis Syntactic Analysis annotated AST abstract syntax tree token stream target language intermediate form intermediate form Synthesis of output program (back-end)

Semantic Analysis/Checking Semantic analysis: the final part of the analysis half of compilation –afterwards comes the synthesis half of compilation Purposes: perform final checking of legality of input program, “missed” by lexical and syntactic checking name resolution, type checking,break stmt in loop,... “understand” program well enough to do synthesis Typical goal: relate assignments to & references of particular variable

Symbol Tables Key data structure during semantic analysis, code generation Stores info about the names used in program –a map (table) from names to info about them –each symbol table entry is a binding –a declaration adds a binding to the map –a use of a name looks up binding in the map –report a type error if none found

An Example class C { int x; boolean y; int f(C c) { int z;......z...c...new C()...x...f(..)... }

ZPL Compiler ZPL is a research programming language for high performance parallel computers –Developed at UW over past decade –Language design + compiler effort Language design goals –Parallel programming is difficult, implies new abstractions are needed; invent program constructs –Help programmers produce fast, portable, convenient programs Compiler goals –Parallel computers are different and complex, a tough target –Figure out how to compile new abstractions –Optimize for new set of constraints Check out:

Regions: A New Construct Regions are a new idea for parallelism A region is a set of indices -- like an array declaration without any data region R = [1..n, 1..n]; -- {(1,1), (1,2),…,(n,n)} region BigR = [0..n+1, 0..n+1]; Region V = [1..13]; Regions are used to declare array variables var A, B, C : [R] double; D : [BigR] double; Clubs : [V] byte;

Regions [continued] Regions are also used to control the parallel execution of statements [1..n,1..n] A := B*C; -- Array op on n^2 data [BigR] D := 0; -- Array op on (n+2)^2 data [R] D := A; -- Replace interior of D [2..10] Clubs := Index1; -- Constant array [1] Clubs := 'A'; -- Assign single element x := 11; [x] Clubs := ’J';

Region Syntax Problems Regions are a new idea … must work out the syntactic forms –For region R = [1..n,1..n] –Define regDecl ::= region id = regDef regDef ::= [ regionSpec ] regionSpec ::= dimSpec | dimSpec, regionSpec dimSpec ::= range | single | … range ::= bound.. bound bound ::= int | var | …

Using Region Definitions For declaration arrayDecl ::= var idList : regDef dataType var A, B, C : [1..n,1..n] double; For statement control stmt ::= [regDef] stmtChoice ; stmtChoice ::= assign | if | … [R] A := B * C;

Two Syntax Problems Literal region specifications [ [1..n,1..n] ] A := B*C; Singleton region specifications vs region names [R] A := B; [x] Clubs := 'J';

More on ZPL In the future …

A Bigger Example class C { int x; boolean y; int f(C c) { int z;... { boolean x; C z; int f;...z...c...new C()...x...f(...)... }...z...c...new C()...x...f(...)... }

Nested Scopes Can have same name declared in different scopes Want references to use closest textually-enclosing declaration static/lexical scoping, block structure closer declaration shadows declaration of enclosing scope Simple solution: –one symbol table per scope –each scope’s symbol table refers to its lexically enclosing scope’s symbol table –root is the global scope’s symbol table –look up declaration of name starting with nearest symbol table, proceed to enclosing symbol tables if not found locally All scopes in program form a tree

Name Spaces Sometimes we can have same name refer to different things, but still unambiguously. Example: class F { int F(F F) { // 3 different F’s are available here!... new F() F = this.F(...)... } In MiniJava: threename spaces classes, methods, and variables We always know which we mean for each name reference, based on its syntactic position Simple solution: symbol table stores a separate map for each name space

Information About Names Different kinds of declarations store different information about their names –must store enough information to be able to check later references to the name A variable declaration: its type whether it’s final, etc. whether it’s public, etc. (maybe) whether it’s a local variable, an instance variable, a global variable, or...

Information About Names (Continued) A method declaration: its argument and result types whether it’s static, etc. whether it’s public, etc. A class declaration: its class variable declarations its method and constructor declarations its superclass

Generic Type Checking Algorithm To do semantic analysis & checking on a program, recursively type check each of the nodes in the program’s AST,each in the context of the symbol table for its enclosing scope going down, create any nested symbol tables & context needed recursively type check child subtrees on the way back up, check that the children are legal in the context of their parents Each AST node class defines its own type check method, which fills in the specifics of this recursive algorithm Generally: declaration AST nodes add bindings to the current symbol table statement AST nodes check their subtrees expression AST nodes check their subtrees and return a result type

MiniJava Type Check Implementation In the SymbolTable subdirectory: Various SymbolTable classes, organized into a hierarchy: SymbolTable GlobalSymbolTable NestedSymbolTable ClassSymbolTable CodeSymbolTable Support the following operations (and more): declareClass, lookupClass declareInstanceVariable, declareLocalVariable, lookupVariable declareMethod, lookupMethod

Class, Variable and Method Information lookupClass returns a ClassSymbolTable –includes all the information about the class’s interface lookupVariable returns a VarInterface –stores the variable’s type A hierarchy of implementations: VarInterface LocalVarInterface InstanceVarInterface lookupMethod returns a MethodInterface –stores the method’s argument and result types

Key AST Type Check Operations void Program.typecheck() throws TypecheckCompilerExn; –typecheck the whole program void Stmt.typecheck(CodeSymbolTable) throws TypecheckCompilerExn; –Type check a statement in the context of the given symbol table ResolvedType Expr.typecheck(CodeSymbolTable) throws TypecheckCompilerExn; –type check an expression in the context of the given symbol table, returning the type of the result

Forward References Typechecking class declarations is tricky: need to allow for forward references from the bodies of earlier classes to the declarations of later classes class First { Second next; // must allow this forward ref int f() {... next.g()... // and this forward ref } class Second { First prev; int g() {... prev.f()... }

Supporting Forward References Simple solution: type check a program’s class declarations in multiple passes first pass: remember all class declarations { First --> class{?}, Second -->class{?}} second pass: compute interface to each class, checking class types in headers { First --> class{ next:Second }, Second -->class{ prev:First }} third pass: check method bodies, given interfaces

Supporting Forward References [continued] void ClassDecl.declareClass(GlobalSymbolTable) throws TypecheckCompilerExn; declare the class in the global symbol table void ClassDecl.computeClassInterface() throws TypecheckCompilerExn; fill out the class’s interface, given the declared classes void ClassDecl.typecheckClass() throws TypecheckCompilerExn; type check the body of the class, given all classes’ interfaces

Example Type Checking Operation class VarDeclStmt { String name; Type type; void typecheck(CodeSymbolTable st) throws TypecheckCompilerException { st.declareLocalVar(type.resolve(st), name); } resolve checks that a syntactic type expression is a legal type, and returns the corresponding resolved type declareLocalVar checks for duplicate variable declaration in this scope

Example Type Checking Operation class AssignStmt { String lhs; Expr rhs; void typecheck(CodeSymbolTable st) throws TypecheckCompilerException { VarInterface lhs_iface = st.lookupVar(lhs); ResolvedType lhs_type = lhs_iface.getType(); ResolvedType rhs_type = rhs.typecheck(st); rhs_type.checkIsAssignableTo(lhs_type); } lookupVar checks that the name is declared as a var checkIsAssignableTo verifies that an expression yielding the rhs type can be assigned to a variable declared to be of lhs type initially, rhs type is equal to or a subclass of lhs type

Example Type Checking Operation class IfStmt { Expr test; Stmt then_stmt; Stmt else_stmt; void typecheck(CodeSymbolTable st) throws TypecheckCompilerException { ResolvedType test_type = test.typecheck(st); test_type.checkIsBoolean(); then_stmt.typecheck(st); else_stmt.typecheck(st); } checkIsBoolean checks that the type is a boolean

Example Type Checking Operation class BlockStmt { List stmts; void typecheck(CodeSymbolTable st) throws TypecheckCompilerException { CodeSymbolTable nested_st = new CodeSymbolTable(st); foreach Stmt stmt in stmts { stmt.typecheck(nested_st); } } (Garbage collection will reclaim nested_st when done)

Example Type Checking Operation class IntLiteralExpr extends Expr { int value; ResolvedType typecheck(CodeSymbolTable st) throws TypecheckCompilerException { return ResolvedType.intType(); } ResolvedType.intType() returns the resolved int type

Example Type Checking Operation class VarExpr extends Expr { String name; ResolvedType typecheck(CodeSymbolTable st) throws TypecheckCompilerException { VarInterface iface = st.lookupVar(name); return iface.getType(); }

Example Type Checking Operation class AddExpr extends Expr { Expr arg1; Expr arg2; ResolvedType typecheck(CodeSymbolTable st) throws TypecheckCompilerException { ResolvedType arg1_type = arg1.typecheck(st); ResolvedType arg2_type = arg2.typecheck(st); arg1_type.checkIsInt(); arg2_type.checkIsInt(); return ResolvedType.intType(); }

Polymorphism and Overloading Some operations are defined on multiple types Example: assignment statement: lhs = rhs; works over any lhs & rhs types, as long as they’re compatible works the same way for all such types Assignment is a polymorphic operation Another example: equals expression: expr1 == expr2 works if both exprs are ints or both are booleans (but nothing else, in MiniJava) compares integer values if both are ints, compares boolean values if both are booleans works differently for different argument types Equality testing is an overloaded operation

Polymorphism and Overloading [continued] Full Java allows methods & constructors to be overloaded, too different methods can have same name but different argument types Java 1.5 supports (parametric) polymorphism via generics: parameterized classes and methods

An Example Overloaded Type Check class EqualExpr extends Expr { Expr arg1; Expr arg2; ResolvedType typecheck(CodeSymbolTable st) throws TypecheckCompilerException { ResolvedType arg1_type = arg1.typecheck(st); ResolvedType arg2_type = arg2.typecheck(st); if (arg1_type.isIntType() && arg2_type.isIntType()) { // resolved overloading to int version return ResolvedType.booleanType(); } else if (arg1_type.isBooleanType() && arg2_type.isBooleanType()) { // resolved overloading to boolean version return ResolvedType.booleanType(); } else { throw new TypecheckCompilerException("bad overload"); }}}

Type Checking Extensions in Project [1] Add resolved type for double Add resolved type for arrays –parameterized by element type Questions: –when are two array types equal? –when is one a subtype of another? –when is one assignable to another? Add symbol table support for static class variable declarations –StaticVarInterface class –declareStaticVariable method

Type Checking Extensions in Project [2] Implement type checking for new statements and expressions: IfStmt else stmt is optional ForStmt loop index variable must be declared to be an int initializer & increment expressions must be ints test expression must be a boolean BreakStmt must be nested in a loop DoubleLiteralExpr result is double OrExpr like AndExpr

Type Checking Extensions in Project [3] ArrayAssignStmt array expr must be an array index expr must be an int rhs expr must be assignable to array’s element type ArrayLookupExpr array expr must be an array index expr must be an int result is array’s element type ArrayLengthExpr array expr must be an array result is an int ArrayNewExpr length expr must be an int element type must be a legal type result is array of given element type

Type Checking Extensions in Project [4] Extend existing operations on ints to also work on doubles Allow unary operations taking ints ( NegateExpr ) to be overloaded on doubles Allow binary operations taking ints ( AddExpr, SubExpr, MulExpr, DivExpr, LessThanExpr,LessEqualExpr, GreaterEqualExpr, GreaterThanExpr,EqualExpr, NotEqualExpr ) to be overloaded on doubles –also allow mixed arithmetic: if operator invoked on an int and a double, then implicitly coerce the int to a double and then use the double version Extend isAssignableTo to allow ints to be assigned/passed/ returned to doubles, via an implicit coercion

Type Checking Terminology Static vs. dynamic typing static: checking done prior to execution (e.g. compile-time) dynamic: checking during execution Strong vs. weak typing strong: guarantees no illegal operations performed weak: can’t make guarantees Caveats: Hybrids common Mistaken usage also common “untyped,” “typeless” could mean dynamic or weak staticdynamic strong weak

Type Equivalence When is one type equal to another? implemented in MiniJava with ResolvedType.equals(ResolvedType) method “Obvious” for atomic types like int, boolean, class types What about type "constructors" like arrays? int[] a1; int[] a2; int[][] a3; boolean[] a4; Rectangle[] a5; Rectangle[] a6;

Type Equivalence Parameterized types in Java 1.5: List l1; List l2; List >l3; In C: int* p1; int* p2; struct {int x;} s1; struct {int x;} s2; typedef struct {int x;} S; S s3; S s4;

Name vs Structural Equivalence Name equivalence: two types are equal iff they came from the same textual occurrence of a type constructor implement with pointer equality of ResolvedType instances special case: type synonyms (e.g. typedef) don’t define new types e.g. class types, struct types in C, datatypes in ML Structural equivalence: two types are equal iff they have same structure –if atomic types, then obvious –if type constructors: same constructor recursively, equivalent arguments to constructor –implement with recursive implementation of equals, or by canonicalization of types when types created then use pointer equality –e.g. atomic types, array types, record types in ML

Type Conversions and Coercions In Java, can explicitly convert an object of type double to one of type int –can represent as unary operator –typecheck, codegen normally In Java, can implicitly coerce an object of type int to one of type double –compiler must insert unary conversion operators, based on result of type checking

Type Casts In C and Java, can explicitly cast an object of one type to another sometimes cast means a conversion (casts between numeric types) sometimes cast means just a change of static type without doing any computation (casts between pointer types or pointer and numeric types) In C: safety/correctness of casts not checked allows writing low-level code that’s type-unsafe more often used to work around limitations in C’s static type system In Java: downcasts from superclass to subclass include run-time type check to preserve type safety static typechecker allows the cast codegen introduces run-time check Java’s main form of dynamic type checking