Theory of Compilation 236360 Erez Petrank Lecture 10: Runtime. 1.

Theory of Compilation 236360 Erez Petrank Lecture 10: Runtime. 1

Runtime Environment Code generated by the compiler to handle stuff that the programmer does not wish to handle. For example: file handling, memory management, synchronization (create threads, implement locks, etc.), runtime stack (activation records), etc. You can think of all those as library functions, but they are always there by languages definition. – No need to link them in. The more complex items nowadays are dynamic memory management and management of parallelism. We will talk about activation records and present an introduction to memory management. – Threads and synchronization are discussed in the operating systems course.

Activation Records and Function Call 3

Motivation Functions are invoked frequently; it is important to understand what goes on during execution. Handling functions has a significant impact on efficiency. 4

Supporting Procedures new computing environment – at least temporary memory for local variables passing information into the new environment – parameters transfer of control to/from procedure handling return values 5

Design Decisions scoping rules – static scoping vs. dynamic scoping caller/callee conventions – parameters – who saves register values? allocating space for local variables 6

Static (lexical) Scoping 7 main ( ) { int a = 0 ; int b = 0 ; { int b = 1 ; { int a = 2 ; printf (“%d %d\n”, a, b) } { int b = 3 ; printf (“%d %d\n”, a, b) ; } printf (“%d %d\n”, a, b) ; } printf (“%d %d\n”, a, b) ; } B0B0 B1B1 B3B3 B3B3 B2B2 DeclarationScopes a=0B0,B1,B3 b=0B0 b=1B1,B2 a=2B2 b=3B3 In most modern languages (C, Java), a name refers to its (closest) enclosing scope known at compile time

Dynamic Scoping Each identifier is associated with its latest definition in the run. To find the relevant definition, search in the local function first, then search in the function that called the local function, then search in the function that called that function, and so on. 8

Dynamic Scoping Implementation Each identifier is associated with a global stack of bindings When entering scope where identifier is declared – push declaration on identifier stack When existing scope where identifier is declared – pop identifier stack Evaluating the identifier in any context binds to the current top of stack Determined at runtime Not always possible to type-check or discover access to non-initialized variables at compile-time. 9

Example what value is returned from main? static scoping? dynamic scoping? 10 int x = 42; int f() { return x; } int g() { int x = 1; return f(); } int main() { return g(); }

Static Scoping Properties Java, C++, etc. Identifier binding is known at compile time Address of the variable is known at compile time Assigning addresses to variables is part of code generation No runtime errors of “access to undefined variable” Can check types of variables 11

A Naïve Implementation Using a Table var x: int; int foo(): int {return x; } int bar(): int {var x: int; x = 1; return foo(); } int main() {x = 0; print bar(); } IdentifierAddress x (global)432234 x (in bar)432238

A problem with the naïve implementation: Recursion (direct or indirect) What is the address of ‘a’ (of fib), in the following example? procedure fib(n: int) }var a: int; var b: int; if (n == 2) return 1; if (n ≤ 1) return 0; a = fib(n – 1); b = fib(n – 2); return a + b; } procedure main() {print fib(5); } Note that this problem does not exist for dynamic scoping.

Activation Record (frame) A separate space for each procedure invocation Managed at runtime – code for managing it generated by the compiler Desired properties – efficient allocation and deallocation procedures are called frequently – variable size different procedures may require different memory sizes 14

Memory Layout 15 stack grows down (towards lower addresses) heap grows up (towards higher addresses) heap code static data stack

Activation Record (frame) 16 parameter k parameter 1 access link return information dynamic link registers & misc local variables temporaries next frame would be here … administrative part high addresses low addresses frame pointer stack pointer incoming parameters stack grows down

Runtime Stack Stack of activation records Call = push new activation record Return = pop activation record Only one “active” activation record – top of stack Recursion handled implicitly. 17

Runtime Stack SP – stack pointer – top of current frame FP – frame pointer – base of current frame – Sometimes called BP (base pointer) 18 Current frame …… Previous frame SP FP stack grows down

Pentium Runtime Stack 19 Pentium stack registers Pentium stack and call/ret instructions RegisterUsage ESPStack pointer EBPBase pointer InstructionUsage push, pusha,…push on runtime stack pop,popa,…pop calltransfer control to called routine returntransfer control back to caller

Call Sequences The processor does not save the content of registers on procedure calls So who will? – Caller saves and restores registers – Callee saves and restores registers – But can also have both save/restore some registers 20

Call Sequences 21 call caller callee return caller Caller push code Callee push code (prologue) Function code Callee pop code (epilogue) Caller pop code Push caller-saved registers Push actual parameters (in reverse order) push return address Jump to call address Push current base-pointer bp = sp Push local variables Push callee-save registers Pop callee-save registers Pop callee activation record Pop old base-pointer pop return address Jump to address Pop caller-save registers Pop parameters

Call Sequences 22 call caller callee return caller add $8, %esp pop %ecx Push caller-saved registers Push actual parameters (in reverse order) push return address Jump to call address Push current base-pointer bp = sp Push local variables Push callee-save registers Pop callee-save registers Pop callee activation record Pop old base-pointer pop return address Jump to address Pop caller-save registers Pop parameters push %ecx push $21 push $42 call _foo Function code push %ebp mov %esp, %ebp sub $8, %esp push %ebx pop %ebx mov %ebp, %esp pop %ebp ret call

“To Callee-save or to Caller-save?” Which of them implies more efficiency? Callee-saved registers need only be saved when callee modifies their value. Caller saving: – calling a recursive routine, we want to save before the call. – assumes nothing on callee (library, unknown code). Typically, heuristics and conventions are followed 23

Accessing Stack Variables Use offset from FP (%ebp) Remember – stack grows downwards Above FP = parameters Below FP = locals Examples – %ebp + 4 = return address – %ebp + 8 = first parameter – %ebp – 4 = first local 24 …… SP FP Return address Local 1 … Local n Previous fp Param n … param1 FP+8 FP-4

Factorial – fact(int n) 25 fact: pushl %ebp # save frame pointer movl %esp,%ebp # ebp=esp pushl %ebx # save ebx movl 8(%ebp),%ebx # ebx = n cmpl $1,%ebx # n = 1 ? jle.lresult # then done leal -1(%ebx),%eax # eax = n-1 pushl %eax # call fact # fact(n-1) imull %ebx,%eax # eax=retv*n jmp.lreturn #.lresult: movl $1,%eax # retv.lreturn: movl -4(%ebp),%ebx # restore ebx movl %ebp,%esp # restore esp popl %ebp # restore ebp ESP EBP Return address old %ebx Previous fp n EBP+8 EBP-4 old %ebp old %ebx (stack in intermediate point) (disclaimer: real compiler can do better than that)

Windows Exploit(s) Buffer Overflow void foo (char *x) { char buf[2]; strcpy(buf, x); } int main (int argc, char *argv[]) { foo(argv[1]); }./a.out abracadabra Segmentation fault 26 Stack grows this way Memory addresses Previous frame Return address Saved FP char* x buf[2] … ab ra ca da br (YMMV)

Nested Procedures For example – Pascal, Javascript any routine can have sub-routines any sub-routine can access anything that is defined in its containing scopes or inside the sub-routine itself 27

Example: Nested Procedures program p; var x: Integer; procedure a var y: Integer; procedure b begin… b … end; function c var z: Integer; procedure d begin… d … end; begin… c …end; begin… a … end; begin… p … end. 28 possible call sequence: p  a  a  c  b  c  d What is the address of variable “y” in procedure d?

nested procedures Can call a sibling, ancestor, and ancestor’s siblings. But can use variables only of ancestors! B can call C as it is defined in the ancestor A. But if C updates y, it updates A’s y. 29 Procedure A; var y: int; Procedure B; var y: real; begin …. end; Procedure C; begin y:=5; end;

nested procedures Can call a sibling, ancestor, and ancestor’s siblings. But can use variables only of ancestors! When “c” uses variables from “a”, which “a” is it? how do we find the right activation record at runtime? Goal: find the closest activation record of a given nesting level. if routine of level k uses variables of the same level, it uses its own variables. if it uses variables of level j < k then it must be the last routine called at level j If a procedure is last at level j on the stack, then it must be an ancestor of the current routine 30 P a b P Pc Pd a Pa Pc Pd

Finding the Relevant Variable problem: a routine may need to access variables of another routine that contains it statically solution: the access link in the activation record The access link points to the last activation record of the nesting level above it – in our example, access link of d points to activation records of c Access links are created at runtime number of links to be traversed is known at compile time Cost while accessing a variable: traversing pointers from one nesting level to the other. Cost while entering a routine: walk the stack to find the closest routine with one lower nesting level. 31

access links program p; var x: Integer; procedure a var y: Integer; procedure b begin… b … end; function c var z: Integer; procedure d begin… d … end; begin… c …end; begin… a … end; begin… p … end. 32 a a c b c d y y z z possible call sequence: p  a  a  c  b  c  d a b P c c d a

Efficient management of the access links We maintain an array: the display array. Size: max nesting level, Content: D[i] holds a pointer to the closest activation record with nesting level i. When we need to access a variable in a containing method, we can access it easily using the display array. 33

Managing the display array When a routine of nesting level i is called: Save the value of D[i] in the activation record (for restoration on exit) Update D[i] to point at the new activation record. On returning from the routine, restore the previous value of D[i]. access link q(1, 9) s d [ 1 ] d [ 2 ]

Managing the display array When a routine of nesting level i is called: Save the value of D[i] in the activation record (for restoration on exit) Update D[i] to point at the new activation record. On returning from the routine, restore the previous value of D[i]. saved d [ 2 ] q(1, 3) saved d [ 2 ] q(1, 9) s d [ 1 ] d [ 2 ]

Managing the display array When a routine of nesting level i is called: Save the value of D[i] in the activation record (for restoration on exit) Update D[i] to point at the new activation record. On returning from the routine, restore the previous value of D[i]. saved d [ 3 ] p(1, 3) saved d [ 2 ] q(1, 3) saved d [ 2 ] q(1, 9) s d [ 1 ] d [ 2 ] d [ 3 ]

Managing the display array When a routine of nesting level i is called: Save the value of D[i] in the activation record (for restoration on exit) Update D[i] to point at the new activation record. On returning from the routine, restore the previous value of D[i]. saved d [ 2 ] e(1, 3) saved d [ 3 ] p(1, 3) saved d [ 2 ] q(1, 3) saved d [ 2 ] q(1, 9) s d [ 1 ] d [ 2 ] d [ 3 ]

Cost of finding a Variable Without the Display: – Cost while accessing a variable: traversing pointers from one nesting level to the other. – Cost while entering a routine: walk the stack to find the closest routine with one lower nesting level. Using the Display: – Cost while accessing a variable: constant. (check the Display.) – Cost while entering/exiting a routine: constant. (update the Display.) Runtime costs ! 38

Activation Records: Summary compile time memory management for procedure data works well for data with well-scoped lifetime – deallocation when procedure returns 39

Dynamic Memory Management: Introduction There is a course about this topic: 236780 “Algorithms for dynamic memory management” 40

Static and Dynamic Variables Static variables are defined in a method and are allocated on the runtime stack, as explained in the first part of this lecture. Sometimes there is a need for allocation during the run. – E.g., when managing a linked-list whose size is not predetermined. This is dynamic allocation. In C, “malloc” allocates a space and “delete” says that the program will not use this space anymore. 41 Ptr = malloc (256 bytes); /* Use ptr */ Free (Ptr);

Dynamic Memory Allocation In Java, “new” allocates an object for a given class. – President obama = new President But there is no instruction for manually deleting the object. It is automatically reclaimed by a garbage collector when the program “does not need it” anymore. 42 course c = new course(236360); c.class = “TAUB 2”; Faculty.add(c);

Manual Vs. Automatic Memory Management A manual memory management lets the programmer decide when objects are deleted. A memory manager that lets a garbage collector delete objects is called automatic. Manual memory management creates severe debugging problems – Memory leaks, – Dangling pointers. In large projects where objects are shared between various components it is sometimes difficult to tell when an object is not needed anymore. Considered the BIG debugging problem of the 80’s What is the main debugging problem today? 43

Automatic Memory Reclamantion When the system “knows” the object will not be used anymore, it reclaims its space. Telling whether an object will be used after a given line of code is undecidable. Therefore, a conservative approximation is used. An object is reclaimed when the program has “no way of accessing it”. Formally, when it is unreachable by a path of pointers from the “root” pointers, to which the program has direct access. – Local variables, pointers on stack, global (class) pointers, JNI pointers, etc. It is also possible to use code analysis to be more accurate sometimes. 44

What’s good about automatic “garbage collection”? © Erez Petrank45 Software engineering: – Relieves users of the book-keeping burden. – Stronger reliability, fewer bugs, faster debugging. – Code understandable and reliable. (Less interaction between modules.) Security (Java): – Program never gets a pointer to “play with”.

Importance Memory is the bottleneck in modern computation. – Time & energy (and space). Optimal allocation (even if all accesses are known in advance to the allocator) is NP-Complete (to even approximate). Must be done right for a program to run efficiently. Must be done right to ensure reliability. 46

GC and languages © Erez Petrank47 Sometimes it’s built in: – LISP, Java, C#. – The user cannot free an object. Sometimes it’s an added feature: – C, C++. – User can choose to free objects or not. The collector frees all objects not freed by the user. Most modern languages are supported by garbage collection.

Most modern languages rely on GC © Erez Petrank48 Source: “The Garbage Collection Handbook” by Richard Jones, Anthony Hosking, and Eliot Moss. 61 7

What’s bad about automatic “garbage collection”? © Erez Petrank49 It has a cost: – Old Lisp systems 40%. – Today’s Java program (if the collection is done “right”) 5- 15%. Considered a major factor determining program efficiency. Techniques have evolved since the 60’s. We will only survey basic techniques.

Garbage Collection Efficiency Overall collection time (percentage of running time). Pauses in program run. Space overhead. Cache Locality (efficiency and energy). 50

Three classical algorithms Reference counting Mark and sweep (and mark-compact) Copying. The last two are also called tracing algorithms because they go over (trace) all reachable objects. 51

Reference counting [Collins 1960] © Erez Petrank52 Recall that we would like to know if an object is reachable from the roots. Associate a reference count field with each object: how many pointers reference this object. When nothing points to an object, it can be deleted. Very simple, used in many systems.

Basic Reference Counting © Erez Petrank53 Each object has an RC field, new objects get o.RC:=1. When p that points to o 1 is modified to point to o 2 we execute: o 1.RC--, o 2.RC++. if then o 1.RC==0: – Delete o 1. – Decrement o.RC for all “children” of o 1. – Recursively delete objects whose RC is decremented to 0. o1o1 o2o2 p

A Problem: Cycles © Erez Petrank54 The Reference counting algorithm does not reclaim cycles! Solution 1: ignore cycles, they do not appear frequently in modern programs. Solution 2: run tracing algorithms (that can reclaim cycles) infrequently. Solution 3: designated algorithms for cycle collection. Another problem for the naïve algorithm: requires a lot of synchronization in parallel programs. Advanced versions solve that.

The Mark-and-Sweep Algorithm [McCarthy 1960] © Erez Petrank55 Mark phase: – Start from roots and traverse all objects reachable by a path of pointers. – Mark all traversed objects. Sweep phase: – Go over all objects in the heap. – Reclaim objects that are not marked.

The Mark-Sweep algorithm © Erez Petrank56 Traverse live objects & mark black. White objects can be reclaimed. registers Roots Note! This is not the heap data structure!

Triggering © Erez Petrank57 New(A)= if free_list is empty mark_sweep() if free_list is empty return (“out-of-memory”) pointer = allocate(A) return (pointer) Garbage collection is triggered by allocation.

Basic Algorithm © Erez Petrank 58 mark_sweep()= for Ptr in Roots mark(Ptr) sweep() mark(Obj)= if mark_bit(Obj) == unmarked mark_bit(Obj)=marked for C in Children(Obj) mark(C) Sweep()= p = Heap_bottom while (p < Heap_top) if (mark_bit(p) == unmarked) then free(p) else mark_bit(p) = unmarked; p=p+size(p)

Properties of Mark & Sweep © Erez Petrank59 Most popular method today (at a more advanced form). Simple. Does not move objects, and so heap may fragment. Complexity: Mark phase: live objects (dominant phase)  Sweep phase: heap size. Termination: each pointer traversed once. Various engineering tricks are used to improve performance.

During the run objects are allocated and reclaimed. Gradually, the heap gets fragmented. When space is too fragmented to allocate, a compaction algorithm is used. Move all live objects to the beginning of the heap and update all pointers to reference the new locations. Compaction is considered very costly and we usually attempt to run it infrequently, or only partially. Mark-Compact 60 The Heap

An Example: The Compressor A simplistic presentation of the Compressor: Go over the heap and compute for each live object where it moves to – To the address that is the sum of live space before it in the heap. – Save the new locations in a separate table. Go over the heap and for each object: – Move it to its new location – Update all its pointers. Why can’t we do it all in a single heap pass? (In the full algorithm: succinct table, execute the first pass quickly, and parallelization.) 61

Mark Compact Important parameters of a compaction algorithm: – Keep order of objects? – Use extra space for compactor data structures? – How many heap passes? – Can it run in parallel on a multi-processor? We do not elaborate in this intro. 62

Copying garbage collection © Erez Petrank63 Heap partitioned into two. Part 1 takes all allocations. Part 2 is reserved. During GC, the collector traces all reachable objects and copies them to the reserved part. After copying the parts roles are reversed: Allocation activity goes to part 2, which was previously reserved. Part 1, which was active, is reserved till next collection. 12

Properties of Copying Collection © Erez Petrank67 Compaction for free Major disadvantage: half of the heap is not used. “Touch” only the live objects – Good when most objects are dead. – Usually most new objects are dead, and so there are methods that use a small space for young objects and collect this space using copying garbage collection.

A very simplistic comparison CopyingMark & sweepReference Counting Live objects Size of heap (live objects) Pointer updates + dead objects Complexity Half heap wasted Bit/object + stack for DFS Count/object + stack for DFS Space overhead For freeAdditional work Compaction long Mostly shortPause time Cycle collectionMore issues

Modern Memory Management Considers standard program properties. Handle parallelism: – Stop the program and collect in parallel on all available processors. – Run collection concurrently with the program run. Cache consciousness. Real-time. 69

Some terms to be remembered © Erez Petrank70 Heap, objects Allocate, free (deallocate, delete, reclaim) Reachable, live, dead, unreachable Roots Reference counting, mark and sweep, copying, compaction, tracing algorithms Fragmentation

Recap Lexical analysis – regular expressions identify tokens (“words”) Syntax analysis – context-free grammars identify the structure of the program (“sentences”) Contextual (semantic) analysis – type checking defined via typing judgements – can be encoded via attribute grammars – Syntax directed translation Intermediate representation – many possible IRs; generation of intermediate representation; 3AC; backpatching Runtime: – services that are always there: function calls, memory management, threads, etc. 71

Journey inside a compiler 72 Lexical Analysis Syntax Analysis Sem. Analysis Inter. Rep. Code Gen. float position; float initial; float rate; position = initial + rate * 60 Token Stream

Journey inside a compiler 73 Lexical Analysis Syntax Analysis Sem. Analysis Inter. Rep. Code Gen. 60 = + * AST idsymboltypedata 1positionfloat… 2initialfloat… 3ratefloat… symbol table S  ID = E E  ID | E + E | E * E | NUM

Journey inside a compiler 74 Lexical Analysis Syntax Analysis Sem. Analysis Inter. Rep. Code Gen. 60 = + * inttofloat 60 = + * AST coercion: automatic conversion from int to float inserted by the compiler idsymboltype 1positionfloat 2initialfloat 3ratefloat symbol table

Journey inside a compiler 75 Lexical Analysis Syntax Analysis Sem. Analysis Inter. Rep. Code Gen. t1 = inttofloat(60) t2 = id3 * t1 t3 = id2 + t2 id1 = t3 3AC 60 = + * inttofloat productionsemantic rule S  id = ES.code := E. code || gen(id.var ‘:=‘ E.var) E  E1 op E2E.var := freshVar(); E.code = E1.code || E2.code || gen(E.var ‘:=‘ E1.var ‘op’ E2.var) E  inttofloat(num)E.var := freshVar(); E.code = gen(E.var ‘:=‘ inttofloat(num)) E  idE.var := id.var; E.code = ‘’ t1 = inttofloat(60) t2 = id3 * t1 t3 = id2 * t2 id1 = t3 (for brevity, bubbles show only code generated by the node and not all accumulated “code” attribute) note the structure: translate E1 translate E2 handle operator

Journey inside a compiler 76 Inter. Rep. Code Gen. Lexical Analysis Syntax Analysis Sem. Analysis 3AC Optimized t1 = inttofloat(60) t2 = id3 * t1 t3 = id2 + t2 id1 = t3 t1 = id3 * 60.0 id1 = id2 + t1 value known at compile time can generate code with converted value eliminated temporary t3

Journey inside a compiler 77 Inter. Rep. Code Gen. Lexical Analysis Syntax Analysis Sem. Analysis Optimized t1 = id3 * 60.0 id1 = id2 + t1 Code Gen LDF R2, id3 MULF R2, R2, #60.0 LDF R1, id2 ADDF R1,R1,R2 STF id1,R1

Problem 3.8 from [Appel] A simple left-recursive grammar: E  E + id E  id A simple right-recursive grammar accepting the same language: E  id + E E  id Which has better behavior for shift-reduce parsing? 78

Answer The stack never has more than three items on it. In general, with LR-parsing of left-recursive grammars, an input string of length O(n) requires only O(1) space on the stack. 79 E  E + id E  id Input id+id+id+id+id id (reduce) E E + E + id (reduce) E E + E + id (reduce) E E + E + id (reduce) E E + E + id (reduce) E stack left recursive

Answer The stack grows as large as the input string. In general, with LR-parsing of right-recursive grammars, an input string of length O(n) requires O(n) space on the stack. 80 E  id + E E  id Input id+id+id+id+id id id + id + id id + id + id + id + id id + id + id + id id + id + id + id + id + id + id + id + id (reduce) id + id + id + id + E (reduce) id + id + id + E (reduce) id + id + E (reduce) id + E (reduce) E stack right recursive

Theory of Compilation 236360 Erez Petrank Lecture 10: Runtime. 1.

Similar presentations

Presentation on theme: "Theory of Compilation 236360 Erez Petrank Lecture 10: Runtime. 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Theory of Compilation 236360 Erez Petrank Lecture 10: Runtime. 1.

Similar presentations

Presentation on theme: "Theory of Compilation 236360 Erez Petrank Lecture 10: Runtime. 1."— Presentation transcript:

Similar presentations

About project

Feedback