CMSC 330: Organization of Programming Languages

Slides:



Advertisements
Similar presentations
Type Checking, Inference, & Elaboration CS153: Compilers Greg Morrisett.
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
Programming Languages and Paradigms
The lambda calculus David Walker CS 441. the lambda calculus Originally, the lambda calculus was developed as a logic by Alonzo Church in 1932 –Church.
Control Structures Any mechanism that departs from straight-line execution: –Selection: if-statements –Multiway-selection: case statements –Unbounded iteration:
Recursion vs. Iteration The original Lisp language was truly a functional language: –Everything was expressed as functions –No local variables –No iteration.
Parameter Passing. Variables: lvalues and rvalues In the assignment statement “X=Y”, variables X and Y are being used in different ways. Y is being used.
Principles of programming languages 4: Parameter passing, Scope rules Department of Information Science and Engineering Isao Sasano.
Type Checking.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Foundations of Programming Languages: Introduction to Lambda Calculus
Memory Allocation. Three kinds of memory Fixed memory Stack memory Heap memory.
CS 536 Spring Run-time organization Lecture 19.
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
Honors Compilers Addressing of Local Variables Mar 19 th, 2002.
1 Pertemuan 20 Run-Time Environment Matakuliah: T0174 / Teknik Kompilasi Tahun: 2005 Versi: 1/6.
Run time vs. Compile time
Catriel Beeri Pls/Winter 2004/5 environment 68  Some details of implementation As part of / extension of type-checking: Each declaration d(x) associated.
CS 536 Spring Code Generation II Lecture 21.
Semantics of Calls and Returns
The environment of the computation Declarations introduce names that denote entities. At execution-time, entities are bound to values or to locations:
Chapter 9: Subprogram Control
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
Adapted from Dr. Craig Chase, The University of Texas at Austin.
Chapter TwelveModern Programming Languages1 Memory Locations For Variables.
Imperative Programming
CMSC 330: Organization of Programming Languages Evaluation Strategies.
CS 2104 Prog. Lang. Concepts Subprograms
Bindings and scope  Bindings and environments  Scope and block structure  Declarations Programming Languages 3 © 2012 David A Watt, University.
CMSC 330: Organization of Programming Languages Lambda Calculus Introduction λ.
Compiler Construction
10/16/2015IT 3271 All about binding n Variables are bound (dynamically) to values n values must be stored somewhere in the memory. Memory Locations for.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Tevfik Bultan Lecture 12: Pointers continued, C strings.
224 3/30/98 CSE 143 Recursion [Sections 6.1, ]
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
CPS120: Introduction to Computer Science Decision Making in Programs.
1 Records Record aggregate of data elements –Possibly heterogeneous –Elements/slots are identified by names –Elements in same fixed order in all records.
© Kenneth C. Louden, Chapter 11 - Functional Programming, Part III: Theory Programming Languages: Principles and Practice, 2nd Ed. Kenneth C. Louden.
Semantics In Text: Chapter 3.
ITCS 3181 Logic and Computer Systems 2015 B. Wilkinson Slides4-2.ppt Modification date: March 23, Procedures Essential ingredient of high level.
Copyright © 2006 The McGraw-Hill Companies, Inc. Basic Terminology Value-returning functions: –known as “non-void functions/methods” in C/C++/Java –called.
CMSC 330: Organization of Programming Languages Functional Programming with OCaml.
Concepts of programming languages Chapter 5 Names, Bindings, and Scopes Lec. 12 Lecturer: Dr. Emad Nabil 1-1.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.
CSE 332: C++ Statements C++ Statements In C++ statements are basic units of execution –Each ends with ; (can use expressions to compute values) –Statements.
Types and Programming Languages Lecture 11 Simon Gay Department of Computing Science University of Glasgow 2006/07.
CMSC 330: Organization of Programming Languages Operational Semantics a.k.a. “WTF is Project 4, Part 3?”
Programming Fundamentals. Topics to be covered Today Recursion Inline Functions Scope and Storage Class A simple class Constructor Destructor.
Semantic Analysis II Type Checking EECS 483 – Lecture 12 University of Michigan Wednesday, October 18, 2006.
CMSC 330: Organization of Programming Languages Lambda Calculus and Types.
CMSC 330: Organization of Programming Languages Operational Semantics.
Constructs for Data Organization and Program Control, Scope, Binding, and Parameter Passing. Expression Evaluation.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Lucas Bang Lecture 11: Pointers.
1 Assembly Language: Function Calls Jennifer Rexford.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
CSC 143 P 1 CSC 143 Recursion [Chapter 5]. CSC 143 P 2 Recursion  A recursive definition is one which is defined in terms of itself  Example:  Compound.
CSE 332: C++ Exceptions Motivation for C++ Exceptions Void Number:: operator/= (const double denom) { if (denom == 0.0) { // what to do here? } m_value.
Run-Time Environments Presented By: Seema Gupta 09MCA102.
Design issues for Object-Oriented Languages
CS314 – Section 5 Recitation 9
Functions.
Names and Attributes Names are a key programming language feature
Principles of programming languages 4: Parameter passing, Scope rules
Fundamentals of Programming
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
Assembly Language Programming II: C Compiler Calling Sequences
UNIT V Run Time Environments.
Rehearsal: Lazy Evaluation Infinite Streams in our lazy evaluator
Presentation transcript:

CMSC 330: Organization of Programming Languages Type Systems Implementation Topics

Review: Lambda Calculus A lambda calculus expression is defined as e ::= x variable | λx.e function | e e function application λx.e is like (fun x -> e) in OCaml CMSC 330

Review: Beta-Reduction Whenever we do a step of beta reduction... (λx.e1) e2 → e1[x/e2] ...alpha-convert variables as necessary Examples: (λx.x (λx.x)) z = (λx.x (λy.y)) z → z (λy.y) (λx.λy.x y) y = (λx.λz.x z) y → λz.y z CMSC 330

The Need for Types Consider untyped lambda calculus false = λx.λy.y 0 = λx.λy.y Since everything is encoded as a function... We can easily misuse terms false 0 → λy.y if 0 then ... Everything evaluates to some function The same thing happens in assembly language Everything is a machine word (a bunch of bits) All operations take machine words to machine words CMSC 330

What is a Type System? A type system is some mechanism for distinguishing good programs from bad Good = well typed Bad = ill typed or not typable; has a type error Examples 0 + 1 // well typed false 0 // ill-typed; can’t apply a boolean CMSC 330

Static versus Dynamic Typing In a static type system, we guarantee at compile time that all program executions will be free of type errors OCaml and C have static type systems In a dynamic type system, we wait until runtime, and halt a program (or raise an exception) if we detect a type error Ruby has a dynamic type system CMSC 330

Simply-Typed Lambda Calculus e ::= n | x | λx:t.e | e e We’ve added integers n as primitives Without at least two distinct types (integer and function), can’t have any type errors Functions now include the type of their argument t ::= int | t → t int is the type of integers t1 → t2 is the type of a function that takes arguments of type t1 and returns a result of type t2 t1 is the domain and t2 is the range Notice this is a recursive definition, so that we can give types to higher-order functions CMSC 330

Type Judgments We will construct a type system that proves judgments of the form A ⊢ e : t “In type environment A, expression e has type t” If for a program e we can prove that it has some type, then the program type checks Otherwise the program has a type error, and we’ll reject the program as bad CMSC 330

Type Environments A type environment is a map from variables names to their types ∅ is the empty type environment A, x:t is just like A, except x now has type t When we see a variable in the program, we’ll look up its type in the environment CMSC 330

Type Rules e ::= n | x | λx:t.e | e e TInt TVar x:t ∊ A A ⊢ n : int A ⊢ x : t TApp TFun A, x : t ⊢ e : t' A ⊢ e : t → t' A ⊢ e' : t A ⊢ λx:t.e : t → t' A ⊢ e e' : t' CMSC 330

Example A = { + : int → int → int } B = A, x : int B ⊢ x : int B ⊢ + : i→i→i B ⊢ + x : int → int B ⊢ 3 : int B ⊢ + x 3 : int A ⊢ (λx:int.+ x 3) : int → int A ⊢ 4 : int A ⊢ (λx:int.+ x 3) 4 : int CMSC 330

Discussion The type rules provide a way to reason about programs (ie. a formal logic) The tree of judgments we just saw is a kind of proof in this logic that the program has a valid type So the type checking problem is like solving a jigsaw puzzle Can we apply the rules to a program in such a way as to produce a typing proof? We can do this automatically CMSC 330

An Algorithm for Type Checking (Write this in OCaml!) TypeCheck : type env × expression → type TypeCheck(A, n) = int TypeCheck(A, x) = if x in A then A(x) else fail TypeCheck(A, λx:t.e) = let t' = TypeCheck((A, x:t), e) in t → t' TypeCheck(A, e1 e2) = let t1 = TypeCheck(A, e1) in let t2 = TypeCheck(A, e2) in if dom(t1) = t2 then range(t1) else fail CMSC 330

Type Inference We could extend the rules to allow the type checker to deduce the types of every expression in a program even without the annotations This is called type inference Not covered in this class CMSC 330

Practice Reduce the following: Derive and prove the type of: (λx.λy.x y y) (λa.a) b (or true) (and true false) (* 1 2) (* m n = λM.λN.λx.(M (N x)) ) Derive and prove the type of: A ⊢ (λf:int->int.λn:int.f n) (λx:int. + 3 x) 6 λx:int->int->int. λy:int->int. λz:int.x z (y z) A = { + : int → int → int } CMSC 330

Review of CMSC 330 Syntax Semantics Implementation Regular expressions Finite automata Context-free grammars Semantics Operational semantics Lambda calculus Implementation Names and scope Evaluation order Concurrency Generics Exceptions Garbage collection CMSC 330

Implementation: Names and Scope CMSC 330

Names and Binding Programs use names to refer to things E.g., in x = x + 1, x refers to a variable A binding is an association between a name and what it refers to int x; /* x is bound to a stack location containing an int */ int f (int) { ... } /* f is bound to a function */ class C { ... } /* C is bound to a class */ let x = e1 in e2 (* x is bound to e1 *) CMSC 330

Name Restrictions Languages often have various restrictions on names to make lexing and parsing easier Names cannot be the same as keywords in the language Sometimes case is restricted Names generally cannot include special characters like ; , : etc Usually names are upper- and lowercase letters, digits, and _ (where the first character can’t be a digit) CMSC 330

Names and Scopes Good names are a precious commodity They help document your code They make it easy to remember what names correspond to what entities We want to be able to reuse names in different, non-overlapping regions of the code CMSC 330

Names and Scopes (cont’d) A scope is the region of a program where a binding is active The same name in a different scope can have a different binding A name is in scope if it's bound to something within the particular scope we’re referring to Two names bound to the same object are aliases CMSC 330

Example i is in scope j is in scope void w(int i) { ... } void x(float j) { void y(float i) { void z(void) { int j; char *i; i is in scope in the body of w, the body of y, and after the declaration of j in z but all those i’s are different j is in scope in the body of x and z CMSC 330

Ordering of Bindings Languages make various choices for when declarations of things are in scope Generally, all declarations are in scope from the declaration onward What about function calls? int x = 0; int f() { return x; } int g() { int x = 1; return f(); } What is the result of calling g()? CMSC 330

Static Scope In static scoping, a name refers to its closest binding, going from inner to outer scope in the program text Languages like C, C++, Java, Ruby, and OCaml are statically scoped int i; { int j; float i; j = (int) i; } CMSC 330

Dynamic Scope In a language with dynamic scoping, a name refers to its closest binding at runtime CMSC 330

Ordering of Bindings Back to the example: int x = 0; int f() { return x; } int g() { int x = 1; return f(); } What is the result of calling g() ... ... with static scoping? ... with dynamic scoping? CMSC 330

Static vs. Dynamic Scope Static scoping Local understanding of function behavior Know at compile-time what each name refers to A bit trickier to implement Dynamic scoping Can be hard to understand behavior of functions Requires finding name bindings at runtime Easier to implement (just keep a global table of stacks of variable/value bindings ) CMSC 330

Namespaces Languages have a “top-level” or outermost scope Many things go in this scope; hard to control collisions Common solution: add a hierarchy OCaml: Modules List.hd, String.length, etc. open to add names into current scope Java: Packages java.lang.String, java.awt.Point, etc. import to add names into current scope C++: Namespaces namespace f { class g { ... } }, f::g b, etc. using namespace to add names to current scope CMSC 330

Free and Bound Variables The bound variables of a scope are those names that are declared in it If a variable is not bound in a scope, it is free The bindings of variables which are free in a scope are "inherited" from declarations of those variables in outer scopes in static scoping { /* 1 */ int j; { /* 2 */ float i; j = (int) i; } j is bound in scope 1 i is bound in scope 2 j is free in scope 2 CMSC 330

Implementation: Evaluation Order CMSC 330

Call-by-Value (CBV) In call-by-value (eager evaluation), arguments to functions are fully evaluated before the function is invoked This is the standard evaluation order that we're used to from C, C++, and Java Also in OCaml, in let x = e1 in e2, the expression e1 is fully evaluated before e2 is evaluated CMSC 330

Call-by-Value in Imperative Languages In C, C++, and Java, call-by-value has another feature What does this program print? Prints 0 void f(int x) { x = 3; } int main() { int x = 0; f(x); printf("%d\n", x); CMSC 330

Call-by-Value in Imperative Languages, con't. Actual parameter is copied to stack location of formal parameter Modification of formal parameter not reflected in actual parameter! x void f(int x) { x = 3; } x 3 int main() { int x = 0; f(x); printf("%d\n", x); } CMSC 330

Call-by-Reference (CBR) Alternative idea: Implicitly pass a pointer or reference to the actual parameter If the function writes to it the actual parameter is modified void f(int x) { x = 3; } x 3 x int main() { int x = 0; f(x); printf("%d\n", x); } CMSC 330

Call-by-Reference (cont’d) Advantages The entire argument doesn't have to be copied to the called function It's more efficient if you’re passing a large (multi-word) argument Allows easy multiple return values Disadvantages Can you pass a non-variable (e.g., constant, function result) by reference? It may be hard to tell if a function modifies an argument What if you have aliasing? CMSC 330

Aliasing We say that two names are aliased if they refer to the same object in memory C examples (this is what makes optimizing C hard) int x; int *p, *q; /*Note that C uses pointers to simulate call by reference */ p = &x; /* *p and x are aliased */ q = p; /* *q, *p, and x are aliased */ struct list { int x; struct list *next; } struct list *p, *q; ... q = p; /* *q and *p are aliased */ /* so are p->x and q->x */ /* and p->next->x and q->next->x... */ CMSC 330

Call-by-Name (CBN) Call-by-name (lazy evaluation) Theory simple: In a function: Let add x y = x+y add (a*b) (c*d) Then each use of x and y in the function definition is just a literal substitution of the actual arguments, (a*b) and (c*d), respectively Implementation difficult: Highly complex, inefficient, and provides little improvement over other mechanisms, as later slides demonstrate CMSC 330

Call-by-Name (cont’d) In call-by-name, arguments to functions are evaluated at the last possible moment, just before they're needed Haskell; cbn; arguments evaluated here let add x y = x + y let z = add (add 3 1) (add 4 1) add x y = x + y z = add (add 3 1) (add 4 1) OCaml; cbv; arguments evaluated here CMSC 330

Call-by-Name (cont’d) What would be an example where this difference matters? let cond p x y = if p then x else y let rec loop n = loop n let z = cond true 42 (loop 0) OCaml; eager; infinite recursion at call cond p x y = if p then x else y loop n = loop n z = cond True 42 (loop 0) Haskell; lazy; never evaluated because parameter is never used CMSC 330

Three-Way Comparison Consider the following program under the three calling conventions For each, determine i's value and which a[i] (if any) is modified int i = 1; void p(int f, int g) { g++; f = 5 * i; } int main() { int a[] = {0, 1, 2}; p(a[i], i); printf("%d %d %d %d\n", i, a[0], a[1], a[2]); CMSC 330

Example: Call-by-Value int i = 1; void p(int f, int g) { g++; f = 5 * i; } int main() { int a[] = {0, 1, 2}; p(a[i], i); printf("%d %d %d %d\n", i, a[0], a[1], a[2]); i a[0] a[1] a[2] f g 1 1 2 1 1 5 2 CMSC 330

Example: Call-by-Reference int i = 1; void p(int f, int g) { g++; f = 5 * i; } int main() { int a[] = {0, 1, 2}; p(a[i], i); printf("%d %d %d %d\n", i, a[0], a[1], a[2]); i /g a[0] a[1] /f a[2] 1 1 2 2 10 2 10 CMSC 330

Example: Call-by-Name int i = 1; void p(int f, int g) { g++; f = 5 * i; } int main() { int a[] = {0, 1, 2}; p(a[i], i); printf("%d %d %d %d\n", i, a[0], a[1], a[2]); i a[0] a[1] a[2] 1 1 2 i++; a[i] = 5*i; 2 10 2 10 The expression a[i] isn't evaluated until needed, in this case after i has changed. CMSC 330

Evaluation Order in Lambda Calculus (λx.x) ((λy.y y) z) CMSC 330

Evaluation Order in Lambda Calculus (λx.x) ((λy.y y) z) Eager (λx.x) (z z) z z (λx.x) ((λy.y y) z) Lazy (λy.y y) z CMSC 330

CBV versus CBN CBN is flexible- strictly more programs terminate E.g., where we might have an infinite loop with cbv, we might avoid it with cbn by waiting to evaluate Order of evaluation is really hard to see in CBN Call-by-name doesn't mix well with side effects (assignments, print statements, etc.) Call-by-name is more expensive since: Functions have to be passed around If you use a parameter twice in a function body, its thunk (the unevaluated argument) will be called twice Haskell actually uses call-by-need (each formal parameter is evaluated only once, where it's first used in a function) CMSC 330

Review Evaluation strategies Names and bindings Scope Reduction order Free vs. bound Scope Static vs. dynamic Reduction order Eager evaluation Lazy evaluation Parameter passing (calling convention) Call-by-value Call-by-reference Call-by-name and others... CMSC 330

Implementation: Function Calls CMSC 330

How Function Calls Really Work Function calls are so important they usually have direct instruction support on the hardware We won’t go into the details of assembly language programming See CMSC 212, 311, 412, or 430 But we will discuss just enough to know how functions are called CMSC 330

Machine Model (x86) The CPU has a fixed number of registers Think of these as memory that’s really fast to access For a 32-bit machine, each can hold a 32-bit word Important x86 registers eax generic register for computing values esp pointer to the top of the stack ebp pointer to start of current stack frame eip the program counter (points to next instruction in text segment to execute) CMSC 330

The x86 Stack Frame/Activation Record The stack just after f transfers control to g return instruction ptr previous frames ebp for caller of f frame boundary f’s locals, saves parameters for g frame boundary return instr ptr (eip) ebp esp saved ebp of f CMSC 330 Based on Fig 6-1 in Intel ia-32 manual

x86 Calling Convention To call a function When a function returns Push parameters for function onto stack Invoke CALL instruction to Push current value of eip onto stack I.e., save the program counter Start executing code for called function Callee pushes ebp onto stack to save it When a function returns Put return value in eax Invoke LEAVE to pop stack frame Set esp to ebp Restore ebp that was saved on stack and pop it off the stack Invoke RET instruction to load return address into eip I.e., start executing code where we left off at call CMSC 330

Example gcc -S a.c int f(int a, int b) { return a + b; } int main(void) { int x; x = f(3, 4); f: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax addl 8(%ebp), %eax leave ret main: ... subl $8, %esp pushl $4 pushl $3 call f l: addl $16, %esp movl %eax, -4(%ebp) gcc -S a.c CMSC 330

Lots More Details There’s a whole lot more to say about calling functions Local variables are allocated on stack by the callee as needed This is usually the first thing a called function does Saving registers If the callee is going to use eax itself, you’d better save it to the stack before you call Passing parameters in registers More efficient than pushing/popping from the stack Etc... See other courses for more details CMSC 330

Tail Calls A tail call is a function call that is the last thing a function does before it returns let add x y = x + y let f z = add z z (* tail call *) let rec length = function [] -> 0 | (_::t) -> 1 + (length t) (* not a tail call *) let rec length a = function [] -> a | (_::t) -> length (a + 1) t (* tail call *) CMSC 330

Tail Recursion Recall that in OCaml, all looping is via recursion Seems very inefficient Needs one stack frame for recursive call A function is tail recursive if it is recursive and the recursive call is a tail call CMSC 330

Tail Recursion (cont’d) [1;2] let rec length l = match l with [] -> 0 | (_::t) -> 1 + (length t) length [1; 2] l [2] l [] l eax: 0 eax: 2 eax: 1 However, if the program is tail recursive... Can instead reuse stack frame for each recursive call CMSC 330

Tail Recursion (cont’d) 1 2 let rec length a l = match l with [] -> a | (_::t) -> (length (a + 1) t) length 0 [1; 2] l [] [1;2] [2] eax: 2 The same stack frame is reused for the next call, since we’d just pop it off and return anyway CMSC 330

Tail Recursion (cont'd) Corollary: any tail-recursive function can be rewritten using an iterative loop. int fact (int n) { int v = 1; while (int n >= 1) v *= n--; return v; } int fact2 (int n, int a) { if (n<=1) return a; else return fact(n-1, a*n); } int fact (int n) { return fact2(n,1); CMSC 330