Implementing Subprograms What actions must take place when subprograms are called and when they terminate? calling a subprogram has several associated actions: calling subprogram’s environment must be saved subprogram’s local variables, execution status, return location handle parameter passing to called subprogram allocation and storage of called subprogram’s local variables transfer control to subprogram subprogram termination then has the following actions: return parameters that are out-mode deallocation of local variables return value from subprogram (if it is a function) restore calling subprogram’s environment transfer control to calling subprogram at the proper place Generically, this is known as subprogram linkage and in most languages, this is performed through the run-time stack by using activation record instances
FORTRAN I-77 We first examine FORTRAN first since it is simplest early FORTRAN had no recursion, all memory allocation was set up at compile time so all variables’/parameters’ memory locations are known at compile time subprogram call save execution status of current program unit carry out parameter passing pass by copy required copying parameters pass by reference required substituting an address for the value pass return address to called subprogram start execution of the subprogram subprogram return if pass-by-result parameters, copy current values to corresponding actual parameters and if subprogram is a function, value is moved to a place accessible by the caller execution status of caller is restored control is transferred back to caller
Activation Records FORTRAN compiler generates ARs for each subprogram which stores local variables, parameters return address functional value (for functions) With each AR generated, the compiler is able to determine the amount of storage space needed for each subprogram parameter passing is a matter of copying values from one area of memory to another transferring control is merely a matter of changing the PC based on subprogram start addresses or calling subprogram return addresses memory access is efficient because all addresses are known at compile time
ALGOL-like Activation Records ALGOL used the run-time stack for ARs permits recursion unlike in FORTRAN where there was only a single AR for each subprogram the run-time stack was used in FORTRAN only to denote where to return to after subprograms terminate The ALGOL approach was to have the compiler generate an AR template for every subprogram upon a subprogram call, an instance of the template is generated by the run-time environment, pushed onto the run-time stack, and the stack pointer adjusted to point at the new top of stack an activation record instance (ARI) will contain local variables parameters return location return value (if a function) links to connect to rest of run-time stack Why does the FORTRAN approach not permit recursion? Consider the following C code: int fact(int x) { if(x > 1) return x * fact(x – 1); else return 1; } If we had only one activation record for x, that is, only static memory that stores x, and we called fact(3); we would have this situation: x = 3, perform x * fact(2); x becomes 2, perform x * fact(1); x becomes 1, return 1 compute x * 1 with x = 1 Thus, fact(3) = 1!
ALGOL ARI Static Link Dynamic Link Parameters points to bottom of ARI for static parent used for non-local variable access Dynamic Link points to top of ARI of caller used to destroy current ARI end of subprogram Parameters Store space for every parameter (whether a value or a pointer) Local variables store space for each local variable stack-dynamic storage allocated at run-time but compile-time type checked recursion available by creating an instance for each recursion call Local variables Parameters Dynamic Link Static Link Return Address NOTE: In C-languages, the static link will point to main’s data since there are no other static parents
Example of ARI for C function void sub(float total, int part) { int list[4]; float sum; … } 2 parameters (a float and an int) 2 variables, a 4-element int array and a float Return address (where in the code to return to when sub terminates) Dynamic link points to next ARI on stack so that this ARI can be popped off Static link points to static parent (main) for non-local variable references void function so no space needed for return value
Example Without Recursion void A(int x) { int y; ... C(y); } void B(float r) { int s, t; A(s); void C(int q) { void main() { float p; B(p); The run-time stack starts with just main. When B is called, its ARI is pushed onto the stack with its dynamic link pointing to previous top of the run-time stack. The CPU’s stack pointer register is reset to point now at the top of B’s ARI. The dynamic link makes it easy to pop B’s ARI off the stack, we just reset the stack pointer to equal B’s dynamic link. We will discuss the static link later in the chapter. When B calls A, A’s ARI is pushed onto the run-time stack and its dynamic link points to the previous top of the stack. We reset the stack pointer to point at the top of A’s ARI. The stack is similarly affected by C is called. When C terminates, we reset the stack pointer to point from Top to C’s dynamic link, thus C’s ARI is no longer accessible, and we are again referencing A’s ARI. When A terminates, we reset the stack pointer to point to A’s dynamic link (the top of B’s ARI). When B terminates, we reset the stack pointer to point to B’s dynamic link, which is the top of main’s ARI. Stack after: main calls B B calls A A calls C
Example With Recursion (part I) int factorial(int n) { if(n<=1) return 1; else return n*factorial(n-1); } void main( ) int value; value = factorial(3); Point 1 Point 2 Here is a n example to demonstrate how recursion works with the run-time stack. Notice in this example how the parameter, n, changes from instance to instance. Point 3 Stack contents at point 1 during each recursive call
Example With Recursion (part II) Stack contents at point 2 as each recursive call completes Here, we complete the example and you can see also set into place is the return value from each function call. This value is passed back and used in the return statement. Stack contents at point 3
Non Local Variable References Assume in some subprogram, a reference is made to a non-local variable how do we determine what is being referenced? non-local variables will be stored either in static memory (if the variable is global or declared static) or on the run-time stack if on the run-time stack, which ARI do we check? a top-down search through the ARIs would be inefficient, the compiler can determine where the variable is stored using scope rules, and set up a pointer directly in C/C++/Java, subprograms are not nested so non-local references would be global variables stored in static memory in non-C languages (notably Ada/Pascal-like languages), subprograms can be nested and the nestedness of the subprograms provides the information needed to find the non-local variable Two methods: Static Chains, Displays Why use non-local variables in a subprogram? There is NO GOOD REASON to do this, but it does not stop programmers from doing this. By referencing a variable that is not local and not passed as a parameter, the reference makes the variable global. It may not be known throughout all subprograms, but it is known in more than one place. It is dangerous to do this as it creates an alias. But it also greatly harms readability and reliability. Here’s an example of how this might work in a Pascal-like subprogram: Procedure outer(a : integer) var b : integer; Procedure inner(c : integer) var a : integer; begin … a := c * b; // a refers to inner’s local variable … // c refers to inner’s parameter end; // b refers to outer’s local variable and is thus a non-local begin // reference end;
Static Chains The compiler can determine for any given subprogram which subprogram is it’s static parent the static link in the ARI points to this static parent The compiler can then determine how many static links must be taken to track down a given reference for instance, assume Main contains SubA – so SubA has a static link to Main SubA contains SubB – so SubB has a static link to SubA assume Main has declared x and neither SubA nor SubB has declared x if SubB references x, then x is found by following 2 static links (from SubB to SubA and from SubA to Main) to reach Main’s ARI A static chain is the information needed to resolve the reference and consists of: chain offset – the number of static links to follow (determined by static scope and nestedness) local offset – the position in the subprogram’s ARI of this variable (starting from the bottom of this ARI)
Ada Example The stack at position 1 Static chains: Position 1: program Main_2; var X : integer; procedure Bigsub; var A, B, C : integer; procedure Sub1; var A, D : integer; begin { Sub1 } A := B + C; <----------1 end; { Sub1 } procedure Sub2(X : integer); var B, E : integer; procedure Sub3; var C, E : integer; begin { Sub3 } Sub1; E := B + A: <--------2 end; { Sub3 } begin { Sub2 } Sub3; A := D + E; <----------3 end; { Sub2 } begin { Bigsub } SUB2(7); end; { Bigsub } begin Bigsub; end. { Main_2 } Static chains: Position 1: A = (0, 3) B = (1, 4) C = (1, 5) Position 2: E = (0, 4) A = (2, 3) Position 3: A = (1, 3) D = error E = (0, 5) Position 1 is in Sub1. The reference to A is to Sub1’s local variable A, which we can access through 0 static links so has a chain offset of 0. The reference to B and C are to Bigsub’s local variables B and C. Since Sub1 is nested inside of Bigsub, we need to follow 1 static link to get to Bigsub and thus to B and C so they have chain offsets of 1. The local offsets are equal to their relative positions in their ARIs (starting at the bottom of the ARI), so A in Sub1 has a local offset of 3, B and C in Bigsub have local offsets of 4 and 5 respectively. Position 2 is in Sub3, which is nested inside Sub2 which is nested inside Bigsub. E is local to Sub3 so has a chain offset of 0 and a local offset of 4. B is in Sub2, so has a chain offset of 1 and a local offset of 4. A is in Bigsub, so has a chain offset of 2 and a local offset of 3. Position 3 is in Sub2, which is nested inside of Bigsub. A is in Bigsub so has a chain offset of 1 and a local offset of 3. E is in Sub2 so has a chain offset of 0 and a local offset of 5. D however is only in Sub1 and Sub2 is not nested inside of Sub1 at all, so this is an erroneous reference and will cause a syntax error. NOTE: Main_2 calls Bigsub which calls Sub2 which calls Sub3 which calls Sub1
Blocks and Efficiency Blocks can have their own local variables a good compiler can optimize the AR based on the scope of the local variables declared in blocks in the example code below, a/b and g/f are used in different blocks and so can share the same stack space int main() { int x, y, z; while() { int c, d, e; int a, b; …} int g, f; }
Displays Static chains are easy to generate at compile-time but they are inefficient at run-time because the number of static links that might need to be followed to access a variable is strictly based on the degree of nestedness of the subprogram this could be any arbitrary amount in a language like Ada An alternative approach is to use displays: collect all static links into an array at any time, the contents of this array is the addresses of accessible ARIs on the stack a display offset value is used to link to the correct ARI and then a local offset is used to find the location of the variable every subprogram call and return requires modification of the display to reflect the new static scope situation this approach is also costly at run-time but only requires modification when a subprogram is called or terminates non-local references can be performed by following only one link
Display Example Using our previous code, we see how the Display and run-time stack change during the execution of the program Main_2 calls Bigsub calls Sub2 Sub2 calls Sub1 “hides” Sub2 Return to Sub2, Sub2 calls Sub3 From point 1, A = B + C A (0, 3) B (2, 3) (down 2 in Display) C (1, 3) (down 1 in Display) Now, assume that Sub1 contains a nested subprogram, Sub4 where Main_2 calls Bigsub calls Bus2 callsSub3 calls Sub1 calls Sub4 Before Sub1 calls Sub4 and after Sub1 calls Sub4 Dotted line means pointer currently inactive (unavailable)
Implementing Dynamic Scoping Reference to non-local variables is determined based on the order of subprogram calls, not their spatial relationship as in static scoping two implementation methods Deep Access similar to static chains except dynamic links are followed there are no static links, and the distance traversed cannot be determined at compile time Shallow Access in dynamic scoping, if variables share the same name, only the most recently declared one is currently active shallow access uses a separate stack, one for each variable name where the given variable stack is modified after each function call/return and access to the variable is always from the top of its run-time stack don’t confuse deep/shallow access and deep/shallow binding
Dynamic Scope Example void sub3( ) { Assume main calls sub2 which int x, z; x = u + v; … } void sub2( ) { int w, x; void sub1( ) { int v, w; void main( ) { int v, u; Assume main calls sub2 which calls sub1 which calls sub2 which calls sub3 The run-time stack using deep access is shown to the right whereas the shallow access is shown below at the point where sub3 is active We see that v is from sub1 (the most recent sub1) and w is from sub2