CS 3304 Comparative Languages Lecture 13: Subroutines and Control Abstraction 28 February 2012
Introduction Abstraction: a process by which the programmer can associate a name with a potentially complicated program fragment that can be thought of in terms of its purpose, rather than in terms of its implementation: Control abstraction: performs a well-defined operation. Data abstraction: representation of information. Subroutine is a principal mechanism for control abstraction: Mostly parameterized: Actual parameters: arguments passed into a subroutine. Formal parameters: parameters in the subroutine definition. Function: a subroutine that returns a value. Procedure: a subroutine that does not return a value. Subroutines are usually declared before being used.
Subroutine Call Stack Each routine, as it is called, is given a new stack frame (activation record) at the top of the stack. A frame may contain arguments and/or return values, bookkeeping information (including the return address and saved registers), local variables, and/or temporaries. When subroutine returns, its frame is popped from the stack. Stack pointer register: contains the address of either the last used location at the top of the stack, or the first unused location, depending on convention. Frame pointer register: contains an address within the frame. 5. What are the purposes of the stack pointer and frame pointer registers? Why does a subroutine often needs both?
Allocation Strategies Static: Code. Globals. Own variables. Explicit constants (including strings, sets, other aggregates). Small scalars may be stored in the instructions themselves. Stack: Parameters. Local variables. Temporaries. Bookkeeping information: return program counter (dynamic link), saved registers, line number, saved display entries, static link. Heap: Dynamic allocation.
Subroutine Nesting Static chains: languages with nested subroutines and static scoping (Pascal, Ada). Objects that lie in lexically surrounding subroutines: neither local nor global. Each stack frame contains a reference to the frame of the lexically surrounding subroutine. Dynamic link: the saved value of the frame pointer, which is restored on subroutine return. The lexically surrounding routine is always active! 3. Describe how to maintain the static chain during a subroutine call.
Calling Sequence Maintenance of stack is responsibility of calling sequence and subroutine prolog and epilog: Space is saved by putting as much in the prologue (code executed at the beginning) and epilogue (code executed at the end) as possible. Time may be saved by putting stuff in the caller instead, where more information may be known: E.g., there may be fewer registers in use at the point of call than are used somewhere in the callee. Common strategy is to divide registers into caller-saves and callee-saves sets: Caller uses the “callee-saves” registers first. “Caller-saves” registers if necessary. Local variables and arguments are assigned fixed offsets from the stack pointer or frame pointer at compile time: Some storage layouts use a separate arguments pointer. 1. What is a subroutine calling sequence? What does it do? What is meant by Subroutine prologue and epilogue? 6. Why do RISC machines typically pass subroutine parameters in registers rather than on the stack?
Calling Sequence: Caller Saves any caller-saves registers whose values will be needed after the call. Computes the values of arguments and moves them into the stack of registers. Computes the static link (if this is a language with nested subroutines), and passes it as an extra, hidden argument. Uses a special subroutine call instruction to jump to the subroutine, simultaneously passing the return address on the stack or in a register. 7. Why do subroutine calling conventions often give the caller responsibility for saving half the registers and the callee responsibility for saving the other half?
Calling Sequence: Callee Prologue: Allocates a frame by subtracting an appropriate constant from stack pointer. Saves the old frame pointer into the stack, and assigns it an appropriate new value. Saves any callee-saves registers that may be overwritten by the current routine (including the static link and return address, if they were passed in registers). Epilogue: Moves the return value (if any) into a register or a reserved location in the stack. Restores callee-saves registers if needed. Restores the frame pointer and the stack pointer. Jumps back to the return address. Moves the return value to wherever it is needed. Restores caller-saves registers if needed. 8. If work can be done in either the caller or the callee, why do we typically prefer to do it in the callee?
Typical Stack Frame Usually grows downward toward lower addresses. Arguments are accessed as positive offsets from the frame pointer. Local variables and temporaries are accessed at negative offsets from the frame pointer. Arguments to be passed to called routines are assembled at the top of the frame using positive offsets from the stack pointer. 9. Why do compilers typically allocate space for arguments in the stack, even when they pass them in registers?
Special-Case Optimizations Many parts of the calling sequence, prologue, and/or epilogue can be omitted in common cases. Particularly leaf routines (those that don't call other routines): Leaving things out saves time. Simple leaf routines don't use the stack - don't even use memory – and are exceptionally fast. Display: In static chains the access to an object in a scope k levels out requires that the static chain be dereferenced k times. This number can be reduced to a constant by use of a display, a small array that replaces the static chain: The j-th element of the display contains a reference to the frame of the most recently active subroutine at lexical nesting level j. An object k levels out can be found using the address stored in element j = i –k of the display. 4. What is a display? How does it differ from a static chain? 10. List the optimizations that can be made to the subroutine calling sequence in important special cases (e.g., leaf routines).
CISC versus RISC Compilers for CISC machines tend to pass arguments on the stack; compilers for RISC machines tend to pass argument in registers. Compilers for CISC machines usually dedicate a register to the frame pointer; compilers for RISC machines often do not. Compilers for CISC machines often rely on special-purpose instructions to implement parts of the calling sequence; available instructions on a RISC machine are typically much simpler. 2. How do calling sequences typically differ in CISC and RISC compilers?
Other Improvements Register windows - a hardware mechanism, an alternative to saving and restoring registers on subroutine calls: Map the instruction set architecture (ISA) limited set of register names onto some subset (window) of a much large collection. Old and new mappings may overlap: argument passing. In-Line Expansion: a copy of the “called” routine becomes a part of the “caller” – no actual subroutine call occurs. Avoids overheads such as space allocation, branch delays (call and return), maintaining static chain/display, saving/restoring registers. Allows the compiler to perform code improvement such as global register allocation, instruction scheduling, etc. Usually the compiler chooses which subroutines to expand in-line. A programmer can suggest (C++, C99). It is semantically neutral: no effect on the meaning of the program. Increases the code size. 11. How does an in-line subroutine differ from a macro? 12. Under what circumstances is it desirable to expand a subroutine in-line?
Parameter Passing Most subroutines are parameterized. Most languages use a prefix notation for calls to user-defined subroutines - the subroutine name followed by a parenthesized argument list: Lisp - the function name inside the parenthesis: (max a b). ML – names can be defined as infix operators. Lisp/Smalltalk – user-defined subroutines use the same style of syntax as built-in operators. Examples: Pascal: if a > b then max := a else max := b; Lisp: (if (> a b) (setf max a) (setf max b)) Smalltalk: (a > b) ifTrue: [max <- a] ifFalse: [max <- b]. 13. What is the difference between formal and actual parameters?
Parameter Modes Parameter-passing mode and related semantic details are heavily influenced by implementation issues. The two most common parameter-passing modes (mostly for languages with a value model of variable): Call-by-value: each actual parameter is assigned into the corresponding formal parameter when a subroutine is called and then the two are independent. Call-by-reference: each formal parameter introduces, within the body of subroutine, a new name for the corresponding actual parameter. Aliases: If the actual parameter is also visible within the subroutine under its original name. The distinction between value and reference parameters is fundamentally an implementation issue.
Values and Reference Parameters Call-by-value/result: Copies the actual parameters into the corresponding formal parameters at the beginning of subroutine execution. Copies the formal parameters back to the corresponding actual parameters when the subroutine returns. Pascal: parameters are passed by value by default. Reference is preceded by the keyword var. C: always passed by value. Fortran: always passed by reference.
Call-by-Sharing Call-by-value and call-by-reference don’t make much sense in a language with a reference model of variables. Pass the reference itself and let the actual and formal parameters refer to the same object: Different from call-by-value: although the actual parameter is copied to the formal parameter, the referenced object can be modified. Different from call-by-reference: while the object can be changed, the identity of that object cannot change. Java uses call-by-value for built-in types and call-by-sharing for user-defined class types. C# can provide passing by reference by labeling a formal parameter and each corresponding argument with ref or out keyword. 16. What parameter mode is typically used in languages with a reference model of variables?
Call-by-Reference Some languages (Pascal, Modula) provide both call-by reference and call-by-value: Call-by-reference: If the called subroutine should change the value of an actual parameter. Requires an extra level of indirection. Can be used to pass large arguments: could introduce bugs. Call-by-value: To ensure that the called subroutine does not change. Requires copying actuals to formals, a potentially time-consuming operation when arguments are large. 14. Describe four common parameter-passing modes. How does a programmer chose which one to use when?
Read-Only Parameters Modula-3 provides a READONLY parameter mode. Any formal parameter whose declaration is preceded by READONLY cannot be changed by the called routine: Cannot be on the left hand side of an assignment statement. Cannot read it from a file. Cannot pass it by reference to any other subroutine. C provides const. Tends to confuse the key pragmatic issue (does the implementation pass a value or a reference?) with two semantic issues: Is the callee allowed to change the formal parameter. If so, will the changes be reflected in the actual parameter. 15. Explain the rationale for READONLY parameters in Modula-3.
Parameter Modes in Ada Three parameter-passing modes: in, out and in out. in parameters pass information from the caller to the callee: they can be read by the callee but not written. out parameters pass information from the callee to the caller. in out parameters pass information in both directions: they can be both read and written. For scalar and access (pointer) parameter types all three modes are implemented by copying values: in: call-by-value. In out: call-by-value/result. out: call-by-result. Erroneous program: can tell the difference between value and address-based implementations of (nonscalar, nonpointer) in out. Euclid outlaws the creation of aliases to hide the distinction between reference and value/result. 17. Describe the parameter modes of Ada. How do they differ from the modes of other modern languages? 18. What does it mean for an Ada program to be erroneous?
References in C++ C++ improves on C by introducing an explicit notion of a reference. Reference parameters are specified by preceding their name with an ampersand in the header of the function. References in C++ see their principal use as parameters. Another important use is for function returns, especially for objects that do not support a copy operation (e.g., file buffer). The object-oriented features of C++, and its operator overloading make reference returns particularly useful. 19. Give an example in which it is useful to return a reference from a function in C++.
Closures as Parameters A closure is a reference to a subroutine together with its referencing environment. It may be passed as a parameter. A closure needs to include both a code address and a referencing environment. Subroutines are routinely passed as parameters (and returned as results) in functional languages. Object closure: in object-oriented language a method is packaged with its environment within an explicit object. C# delegates: provide type safety without the restrictions of inheritance. 20. List three reasons why a language implementation might implement a parameter as a closure.
Call-by-Name A call-by-name parameter is re-evaluated in the caller’s referencing environment every time it is used. The effect is as if the called routine had been textually expanded at the point of call, with the actual parameter (which may be a complicated expression) replacing every occurrence of the formal parameter. Label parameters: Both Algol 60 and Algol 68 allow a label to be passed as a parameter. If a called routine performs a goto to such a label, control will usually need to escape the local context, unwinding the subroutine call stack. Both call-by-name and label parameters lead to code that is difficult to understand.
Special-Purpose Parameters Conformant arrays: A formal array parameter whose shape is finalized at run time. Default (optional) parameters: one that need not necessarily be provided by the caller. If it is missing, then a preestablished default value will be used instead Named parameters: instead of positional, some languages allow parameters to be named (keyword parameters). Their order does not matter: put(item => 37, base => 8); Variable numbers of arguments - e.g., printf/scanf in C: int printf(char *format, ..) 21. What is a conformant (open) array? 22. What are default parameters? How are they implemented? 23. What are named (keyword) parameters? Why are they useful? 24. Explain the value of variable-length argument lists. What distinguishes such lists in Java and C# from their counterparts in C and C++?
Function Returns The syntax varies greatly. Early imperative languages: an assignment statement whose left-hand side is the name of the function. More recent: an explicit return statement. Some languages allow the result of the function to have a name in its own right: procedure A_max(ref A[1:*]: int) returns rtn : int Many languages place restrictions on the types of objects that can be returned from a function: C, Pascal: a composite type. ML, Python: a tuple of values. Modula-3, Ada 95: a subroutine implemented as a closure. 25. Describe three common mechanisms for specifying the return value of a function. What are their relative strengths and drawbacks?
Generic Subroutines and Modules Performing the same operation for a variety of different objects types. Provide an explicitly polymorphic generic facility that allows a collection of similar subroutines or modules (with different types) to be created from a single copy of the source code. Generic modules (classes): very useful for creating containers – data abstractions that hold a collection of objects. Generic subroutine (methods): needed in generis modules. Generic parameter: Java, C#: only types. Ada, C++: more general, including ordinary types, including subroutines and classes. 26. What is the principal purpose of generics? In what sense do generics serve a broader purpose in C++ and Ada than they do in Java and C#? 27. How does a generic subroutine differ from a macro?
Implementation Options Ada, C++: Purely static - all the work done at compile time. A compiler creates a separate copy of the code for every instance. Java: All instances of a given generic will share the same code at run time. If T is a generic type parameter in Java, then object of class T are (automatically) treated as instances of Object. C#: Creates specialized implementations of a generic for different built-in type or value types (like C++). The generic code must be typesafe, independent of the arguments provided in an y particular instantiation (like Java). 28. Under what circumstances can a language implementation share code among separate instances of a generic?
Generic Parameter Constraints Because a generic is an abstraction, it is important that its interface provide all the information that must be known by a user of the abstraction. Constraining generic parameters: the operations permitted on a generic parameter type must be explicitly declared. Java, C#: require that a generic parameter support a particular set of methods. C++, Modula-3: no explicit constraints but check how parameters are used.
Implicit Instantiation Before the generic can be used, an instance of a generic class must be created (e.g., C++): queue<int, 50> *my_queue = new queue<int, 50>(); The same for subroutines (e.g., Ada): procedure int_sort is new sort(integere, int_array, “<“); … int_sort(my_array); Other languages treat generic subroutines as form of overloading (C++, Java, C#).
Generics in C++, Java, and C# Most ambitious. Templates are intended for almost any programming task that requires substantially similar but not identical copies of an abstraction. Java/C#: Provide generics purely for the sake of polymorphism. Java: Design influenced by the desire for backward compatibility with existing version of the language and the existing virtual machines and libraries. C#: Generics were included from the very beginning.
Summary Subroutines allow the programmer to encapsulate code behind a narrow interface. Subroutine call stack contains stack frames (activation records) for currently active subroutines. There are several parameter-passing modes, all of which are implemented by passing values, references, or closures. Generics allow a control abstraction to be parameterized (at compile time) in terms of the types of its parameters, rather than just their values.