Overview of Semantic Analysis

Slides:



Advertisements
Similar presentations
Semantic Analysis and Symbol Tables
Advertisements

Symbol Table.
Semantics Static semantics Dynamic semantics attribute grammars
Semantic Analysis Chapter 6. Two Flavors  Static (done during compile time) –C –Ada  Dynamic (done during run time) –LISP –Smalltalk  Optimization.
Variables, Environments and Closures. Overview We will Touch on the notions of variable extent and scope Introduce the notions of lexical scope and.
Prof. Necula CS 164 Lecture 141 Run-time Environments Lecture 8.
Prof. Fateman CS 164 Lecture 151 Language definition by interpreter, translator, continued Lecture 15.
ISBN Chapter 10 Implementing Subprograms.
CS 330 Programming Languages 10 / 16 / 2008 Instructor: Michael Eckmann.
CS 536 Spring Run-time organization Lecture 19.
Tutorial 6 & 7 Symbol Table
1 Pertemuan 20 Run-Time Environment Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Run time vs. Compile time
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
Building An Interpreter After having done all of the analysis, it’s possible to run the program directly rather than compile it … and it may be worth it.
Cs164 Prof. Bodik, Fall Symbol Tables and Static Checks Lecture 14.
Functional Programming in Scheme and Lisp. Overview In a functional programming language, functions are first class objects. You can create them, put.
1 Names, Scopes and Bindings Aaron Bloomfield CS 415 Fall
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 8: Semantic Analysis and Symbol Tables.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
1 Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University Modern Compiler Design.
Prof. Fateman CS 164 Lecture 111 Syntax  Simple Semantics Lecture 11.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
Variables, Environments and Closures. Overview Touch on the notions of variable extent and scope Introduce the notions of lexical scope and dynamic.
Prof. Fateman CS 164 Lecture 161 Overview of Semantic Analysis Lecture 16.
LECTURE 3 Compiler Phases. COMPILER PHASES Compilation of a program proceeds through a fixed series of phases.  Each phase uses an (intermediate) form.
CS412/413 Introduction to Compilers Radu Rugina Lecture 11: Symbol Tables 13 Feb 02.
Ada, Scheme, R Emory Wingard. Ada History Department of Defense in search of high level language around Requirements drafted for the language.
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
Implementing Subprograms
Object Lifetime and Pointers
Functions Students should understand the concept and basic mechanics of the function call/return pattern from CS 1114/2114, but some will not. A function.
Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area
Implementing Subprograms Chapter 10
Names and Attributes Names are a key programming language feature
CS 215 Final Review Ismail abumuhfouz Fall 2014.
Context-Sensitive Analysis
A Simple Syntax-Directed Translator
CS510 Compiler Lecture 4.
Compiler Construction (CS-636)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CS 326 Programming Languages, Concepts and Implementation
Implementing Subprograms
Semantic Analysis with Emphasis on Name Analysis
Variables, Environments and Closures
Run-time organization
Basic Program Analysis: AST
Syntax-Directed Translation
Subroutines Idea: useful code can be saved and re-used, with different data values Example: Our function to find the largest element of an array might.
Implementing Subprograms
Names, Binding, and Scope
Stacks.
CSE 3302 Programming Languages
Java Programming Language
Variables, Environments and Closures
CMPE 152: Compiler Design October 4 Class Meeting
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
CSE401 Introduction to Compiler Construction
Implementing Subprograms
Stacks.
UNIT V Run Time Environments.
Languages and Compilers (SProg og Oversættere)
Symbol Table 薛智文 (textbook ch#2.7 and 6.5) 薛智文 96 Spring.
COMPILERS Semantic Analysis
Common Lisp II.
Recursive Procedures and Scopes
Implementing Subprograms
Chapter 10 Def: The subprogram call and return operations of
Presentation transcript:

Overview of Semantic Analysis Lecture 12 Prof. Fateman CS 164 Lecture 12

Prof. Fateman CS 164 Lecture 12 Outline The role of semantic analysis in a compiler A laundry list of tasks Errors difficult or impossible to detect earlier Tied more to “meaning” of program rather than appearance Scope, OO inheritance Implementation: symbol tables Types (of data, but also functions, methods) Type of a function or method includes number and types of parameters and return value Type of an array: base type, index set [how many subscripts?] Type of other objects: collection of names and types Prof. Fateman CS 164 Lecture 12

Prof. Fateman CS 164 Lecture 12 The Compiler So Far Lexical analysis Detects inputs with illegal sequences (non-tokens). Collects names, keywords, operators, numbers, etc. Parsing Allows inputs with well-formed parse trees, rejects others. Builds an abstract syntax tree. Semantic analysis Last “front end” phase Catches all static type errors in a strictly typed language May generate more kinds of errors and warnings Any program errors detected here can save time in debugging Some errors cannot be found even yet: (duplicate library entry points?) Prof. Fateman CS 164 Lecture 12

Why a Separate Semantic Analysis? Parsing cannot catch some errors Some language constructs are not context-free Examples: Identifier declaration and use. Parameter / argument count agreement. [declare foo:int …… foo=3 …..] //good, foo is defined [declare bar:int …… foo=4 …..] //assume foo not defined, bad [foo(a,b):= … x=foo(1,2,3)…. ] // bad, disagreement An “abstract version” of the variable declaration and use problem is to return “valid sentence” for items in this language: { wcw | w Î (a + b)* } The 1st w represents the declaration; the 2nd w represents a use, and c is the intervening program text. This is not context free. Prof. Fateman CS 164 Lecture 12

Prof. Fateman CS 164 Lecture 12 Why not CF? An abstract version of the problem is to return “valid sentence” for items in this language: { wcw | w Î (a + b)* } Also note [declare foo:int; declare foo:bool …] is not valid… Another famous non-context free language is { an bn cn | n>0 } That is, CFG cannot count 3 things; recall that DFA cannot count 2… Proof is based on “pumping lemma for CF languages” similar to proof that an bn is a problem for DFA. The texts first prove “Ogden’s Lemma” that, if the language of a CFG is infinite, then for strings u v w x y, S * uvwxy, but also S * u vi w xi y. So if S * u ai bi ci y then it also means S * u ai+1 bi ci+1 y [beyond scope of CS164, but not hard] Prof. Fateman CS 164 Lecture 12

What Does Semantic Analysis Do? Checks of many kinds . . . MJ typechecking program should check: All identifiers are declared Types and numbers of arguments and return values agree with use (declarations, method bodies) Reserved built-in functions (e.g. Print) are not misused Arithmetic is performed only on numeric values Boolean expressions are used for conditionals Arrays are instantiated properly Declared identifiers are used (otherwise suspicious unused names!) BIG WARNING “Same type” is subtle. Subclass object is an instance of its superclass too. Other items can also be checked, e.g. no two method parameters have the same name! no two local variables/methods/ have the same name. No built-in functions are redefined. . . Prof. Fateman CS 164 Lecture 12

What Does Semantic Analysis Do? Each programming language processor will check some items but maybe not others on that list. Which items can, in principle, be checked depends on the language definition Can Lisp do semantic analysis? It must do some if “good” compiling is done It can do some analysis even when not necessary… when processing “defun” even if not compiling (Analyzing MCE from CS61a) If interpreting, it sometimes (must?) wait. (defun foo(x)(foo x x x x x)) ;wrong. When is it noticed? (defun foo(x)(let ((y 1)(y 2)) y); what is returned? Prof. Fateman CS 164 Lecture 12

Prof. Fateman CS 164 Lecture 12 Scope Matching identifier declarations with uses Important static analysis step in most languages Including MJ, (even Lisp) In principle, the same name can be used repeatedly, though it might not be great “software engineering” to reuse names often. (name X scope)  (declaration) //compiletime (name X scope)  (location [or value]) //runtime Prof. Fateman CS 164 Lecture 12

Prof. Fateman CS 164 Lecture 12 What’s Wrong? Example 1 class Fac { public int ComputeFac(int num){ int num_aux ; int foo; if (num < 1) num_aux = 1 ; else num_aux = num * (this.ComputeFac(num-1)) ; return num_aux ; } Prof. Fateman CS 164 Lecture 12

Prof. Fateman CS 164 Lecture 12 Scope (Cont.) The scope of an identifier is the portion of a program in which that identifier is “accessible” – its binding to a value can be used The same identifier may refer to different things in different parts of the program That is, if different scopes for same name don’t overlap An identifier may have restricted scope Names and their values and types may be shadowed in different ways; binding by method/function call, local vars Prof. Fateman CS 164 Lecture 12

Static vs. Dynamic Scope Most languages these days have static scope Scope depends only on the program text, not run-time behavior, and can be checked at compile-time (or type-checking time) Java, Common Lisp and Scheme have static scope A few languages are dynamically scoped Lisp (“special” variables only), SNOBOL Access to values in dynamic scope depends on execution history of the program Prof. Fateman CS 164 Lecture 12

Static Scoping Example (in Common Lisp) (let ((x 1) ;;x bound to 1 (y 2)) (+ y ;;y bound to 2 (let((x 3)) x ;;x bound to 3, returned from let )) Prof. Fateman CS 164 Lecture 12

Static vs Dynamic Scoping Example in CL (defun foo(x) (bar)) ;; regular, static scope (defun bar() (print x)) (foo 3)  attempt to take value of unbound variable x (defun foo(x)(declare (special x)) (bar)) (defun bar() (declare (special x)) (print x)) (foo 3)  3 3 Prof. Fateman CS 164 Lecture 12

Functional Scoping Example in CL (setf r (let((x 5)) #'(lambda() (format t "x=~s,y=~s" x y)))) (setf x 3) (setf y ‘toplevel) (funcall r)  x=5, y=toplevel (let ((y ‘middle)) (funcall r))  x=5, y=toplevel Environments! Prof. Fateman CS 164 Lecture 12

Prof. Fateman CS 164 Lecture 12 Scope in MiniJava MiniJava identifier bindings are introduced by variable type declarations produce default bindings Formal parameters id’s Classes, methods and variable names are in the same “namespace.” Can we redefine boolean or int as a Class or are they in a separate namespace? Prof. Fateman CS 164 Lecture 12

“Scope” in CL is more complex Generally lexical scope for names. Some declarations (e.g. fixnum, special) can be file-wide “Defstructs” are global Some names cannot be rebound (e.g. car, cdr) without major effort Packages provide another scoping/ hiding mechanism– you can redefine car and cdr and use them in a package to shadow the regular programs. Details here are not important except to be aware that you haven’t seen all the possibilities! Contrary to Scheme, there are separate spaces for names of functions and variables. So (setf car ‘(3 4)) (car car) is ok. (setf list #’+) (apply list car) -> 7 (apply #’list car)->(3 4) Prof. Fateman CS 164 Lecture 12

“Scope” in C is almost unused Variables declared in Blocks, rarely used Local scope for parameters and locals Static external global Prof. Fateman CS 164 Lecture 12

Implementing the Most-Closely Nested Rule Much of semantic analysis can be expressed as a recursive descent of an AST Process an AST node n Process the children of n Finish processing the AST node n When performing semantic analysis on a portion of the AST, we need to know which identifiers are defined Prof. Fateman CS 164 Lecture 12

Prof. Fateman CS 164 Lecture 12 Implementing . . . (Cont.) Example: the scope of variable bindings is one subtree Class foo{int x; E ;} x can be used in subtree E. In a classic lexical scope situation, we can hide x in the subtree E by defining a sub-block like {block int x; ….} In a class/inheritance setup, we hide any x in parent by Class foo extends parent{ int x; E;} Prof. Fateman CS 164 Lecture 12

Prof. Fateman CS 164 Lecture 12 Symbol Tables Consider again: {int x; E;} Idea: Before processing (e.g. typechecking, evaluating) E, add (type, default value) definition of x to current definitions, overriding any other definition of x After processing E, remove definition of x and restore old definition of x A symbol table is a data structure that tracks the current bindings of identifiers, e.g. the name ! type correspondence Prof. Fateman CS 164 Lecture 12

Runtime analogs of Symbol Tables can hold values. ;Because we know so much about them, we can ;use very simple structures: a stack to do common lisp variable bindings (let ((a 3)) (let ((a 1) (b a)) (+ a b)) (let ((a 3)) (let* ((a 1) ;sequential (b a)) (+ a b)) Prof. Fateman CS 164 Lecture 12

A Simple Symbol Table Implementation, C-style Structure is a stack Operations add_symbol(x) push x and associated info, such as x’s type, on the stack find_symbol(x) search stack, starting from top, for x. Return first x found or NULL if none found remove_symbol() pop the stack Prof. Fateman CS 164 Lecture 12

A Simple Lisp Symbol Table Implementation Structure is the stack used by lisp to call and return functions. Storage is by (rebinding) symbol table ST to (cons newbinding ST) Prof. Fateman CS 164 Lecture 12

A Simple Lisp Symbol Table Implementation (setf globals '((glob1 . integer) (glob2 . string))) (defun dotypechecking (expression symbol-table) ;; POINT A ;; look in expression. If there is a reference to ;; x in expression, look up x in symbol-table to ;; find type, or value or other property. ... ;; by (assoc x symbol-table) . Check that it is being used correctly. ;; If the expression is a ClassDecl in MJ, and defines new variable x say ;; (VarDecls (VarDecl IntType (id x lc))) (Statements ….) ;; then REBIND this way. (dotypechecking STATEMENTS (cons(cons ‘x stuff) symbol-table)) ;rebind symbol-table. ;; after you return, ;; if you continue processing more expressions, symbol-table ;; is same as POINT A ;; etc. ;; If you are finished, return typechecked results. ;; in the process of returning, the argument symbol-table is ;; popped off lisp run-time stack. Prof. Fateman CS 164 Lecture 12

A Simple Lisp Symbol Table Implementation . Thus add_symbol is essentially cons. find_symbol is assoc remove_symbol is unnecessary! .. Removed by return if there are several x's on symbol-table. No sweat. assoc get the right one. Prof. Fateman CS 164 Lecture 12

Prof. Fateman CS 164 Lecture 12 Limitations The simple symbol table works for let Symbols added one at a time Declarations are perfectly nested Could add a bunch of bindings all at once (dotypechecking expr (append list_of_bindings symboltab) What if your language allows same name for type, variable, function… distinguished by usage. Need fancier lookup. All names are handled lexically Other problems? Prof. Fateman CS 164 Lecture 12

Prof. Fateman CS 164 Lecture 12 A Fancier Symbol Table Find _typename(x)  find_symbol(x,kind=‘type) Find_varname(x)  find_symbol(x,kind=‘var) Find_funname(x)  find_symbol(x,kind=‘fun) Rebinding  make_ST_entry(x,kind=type) These turn out to be trivial Lisp programs; if assoc returns the wrong kind of entry, keep looking. Prof. Fateman CS 164 Lecture 12

Function (method) Definitions Method names can be used in the MJ text of a program before being defined. (mutually recursive, even) So we can’t check use-definition agreement using a sequential stack-based symbol table or even in one pass Solution Pass 1: Gather all function names, arg-types, return types Pass 2: Do the checking of all the bodies Semantic analysis can require multiple passes Usual design requires 1, maybe 2. Type inference, badly designed, can use many.. (Ada) Prof. Fateman CS 164 Lecture 12

Prof. Fateman CS 164 Lecture 12 Why extra passes? Consider a language in which a:=b+c means, if a has type double-float, b and c are single-float: convert b to double, c to double, add double and store in a. If we parse bottom up, we don’t know what the type is of a until “too late” synthesized vs. inherited attributes. Prof. Fateman CS 164 Lecture 12

Not everyone likes a stack Assoc takes time O(n) for n items on the stack. What if you have thousands of identifiers? (AWA page 110)? Prof. Fateman CS 164 Lecture 12

Thousands of identifiers? Well, how likely is THAT? It happens in C code with piles of #include foo.h, .. But it doesn’t happen in lisp; a cost of checking (pro/con).. A hash-table typically is used then: every time one enters/leaves the scope of bindings, they are inserted/deleted from the hash-table, and the OLD bindings if any, are restored. Prof. Fateman CS 164 Lecture 12

Still maybe you are in a hurry What’s your hurry? If you are using this as a RUN TIME mechanism for finding the values of variables, a stack implemented as a list is not good; a stack as an array may be ok. E.g. X:=X+1 becomes: Find X on the stack O(n) or O(1) or memoized Change its value What if X is very far away on the stack?? Advantage: changing scope is trivial push/pop. Sometimes O(n) used in dynamic scope languages, called deep binding: cheap binding, expensive access. Prof. Fateman CS 164 Lecture 12

Not everyone likes a stack (continued) Another runtime stack tactic You can always find the value for anything named X in a fixed location in a hash table or array, say location 1234 Changing its value is easy. When a “new X” is needed in some scope, store the contents of 1234, or the “old X” on a stack. Initialize location 1234 to the value for the new X. Any reference to the new X goes to 1234. When the process exits the scope of new X, pop the value of the old X off the stack and put it in 1234. This is called shallow binding, used in dynamic scope languages (old lisp, also “special” variables in CL). Cheap access, expensive binding. Prof. Fateman CS 164 Lecture 12

Block structured symbol tables Here’s another mechanism Put all variables/bindings at a given lexical level L in an array, vector or hashtable Make a collection of pointers to each lexical level L, L-1 …1, 0=global. Now to look up the value/type/etc of X, look for X in lexical level L, then L-1, until you find X or give up… So you look in L hash-tables, rather than searching item by item through the stack. When you leave a scope, you abandon the whole level L hashtable. If you can actually pre-calculate that X is the 4th item in level 3 then you may be able to access its value in one indexed memory access, e.g. if you have level 1, 2, 3’s… address in a register. This is the usual situation for lexically scoped languages. For historical reasons this mechanism is sometimes called a DISPLAY Prof. Fateman CS 164 Lecture 12

Summary: basic static semantic analysis Information generated by a pass over the AST can be used For type analysis Checking other details of proper usage Must be coordinated with analysis of scope keep track of name – attribute correspondences Prof. Fateman CS 164 Lecture 12