Procedures and Functions Procedures and Functions – subprograms – are named fragments of program they can be called from numerous places within a main program within the body of other subprograms within themselves (if language allows recursion) they have formal parameters (identifiers) which are associated with argument expressions at a place where the subprogram is called A function returns a result, which can be used in – or as – some expression whose value is stored somewhere or passed along somewhere. A procedure returns no result, it must cause some side-effect if it is to be useful: perform some I/O modify some external data structure or the value of some argument expression Compiler Construction1
Parameter passing mechanisms Different languages support a variety of parameter-passing mechanisms. Call-by-value: space is created for a FP, the actual parameter (sometimes called argument) – Rvalue of the ARGEXPR – is calculated and used to initialise the FP. That is, copying of the value occurs. On return, the ARGEXPR is unaffected. Call-by-value-result: space is created for a FP, the actual parameter is calculated and used to initialise the FP. On return, the Rvalue of the FP is stored back into the Lvalue of the ARGEXPR. Call-by-result: space is created for a FP, but it is not initialised and the ARGEXPR is not evaluated. On return, its Rvalue is stored into the Lvalue of the ARGEXPR. Call-by-reference: the Lvalue of the ARGEXPR is used as the Lvalue of the FP, no extra space for it is allocated. Changes to the FP directly affect the ARGEXPR. Call-by-name: the FP is treated as fully synonymous with the ARGEXPR. Compiler Construction2 procedure P(int x, int y, int z) x, y, z are formal parameters ( FP s) call P(var, i+j, 4)) var, i+j, 4 are argument expressions ( ARGEXPR s)
Some of the issues in parameter passing 1.Some kinds of argument expression do not have Lvalues constants, function calls, expressions with operators –call-by-reference needs a temporary location to be created for them –call-by-(value-)result can perhaps ignore the copy-back in these cases 2.Some kinds of actual parameter are large arrays, strings, class instances call-by-value(-result) involves making copies of such objects –expensive passing pointers to such objects is cheap, the pointers may be dereferenced within the subprogram body –effectively what call-by-reference does –programmers must distinguish between making changes to a copy of an object, and making changes to the original dereferenced object Compiler Construction3
Calling sequence requirements A compiler must arrange for certain things to happen upon call and return On call: Allocate space – in registers, or on control stack for the FPs, in call-by-value, call-by-result, call-by-value-result for ARGEXPR Lvalues, in call-by-reference, call-by-result, call-by-value-result Calculate Rvalues and/or Lvalues, as appropriate, and store them in this space Push a return address onto the control stack, jump to callee’s entry point Execute spill code for saving registers, if their contents are needed after return Reserve stack space for local variables and compiler-generated temporaries Adjust the top-of-stack On return: Restore the top-of-stack, return control to the saved point of call (If a function: Place its return value in a register) Copy FP s to ARGEXPR Lvalues, in call-by-result, call-by-value-result Execute spill code to restore registers Compiler Construction4
Dividing responsibilities between caller and callee In a single-language environment, a compiler-writer has some discretion over how to arrange a calling sequence. –In a multi-language environment, conventions must be adhered to. The caller ‘knows’ which registers hold values that are required after return of control, so can provide spill code for exactly those registers At many points of call, few, if any, registers need be saved then restored –faster code, avoiding recalculation and pointless dumps/restores The callee does not know which registers are important to the caller after return: At any point of call, it is possible that every register is important to the caller, so provide spill code for saving and restoring all registers –smaller code overall –each call is slower and may be doing work pointlessly Optional arguments, dynamic local arrays etc. may have implications for where, and how many times, the top-of-stack should be adjusted. Compiler Construction5
Inlining An optimisation technique, applicable to small non-recursive subprograms: At a call point, take (abstract) parse tree for function/procedure, systematically rename all its FP s to new names replace (abstract) parse tree node for a call to the subprogram with a constructed parse tree corresponding to a block of code that –initialises (renamed) FP s as if they were local variables, –performs the (modified) body of the subprogram –places function value in a register, and/or modifies ARGEXPR values compile the code for this modified part of the (abstract) parse tree Eliminates much call-sequence overhead: no transfers of control, push/pop of a return address, unnecessary spill code no restriction on register use – each call is treated individually Compiler Construction6
Execution strategies An interpreter contains routines that mimic the actions called for by a source program. A parser and lexer produce a (abstract) parse tree, the interpreter then walks this tree, carrying out actions on its own memory structures in sympathy with the desired actions of the program being interpreted. A compile-and-go compiler compiles a main program, possibly accessing subprograms from a subprogram library, initialises memory with the program code it has generated, and transfers control to its start address. Instructions for a Virtual Machine (such as the JVM, with its bytecode) are produced, typically based on stack manipulations. An interpreter for this virtual machine executes these instructions. (Java bytecode is target for some non-Java.) –Machine-specific compilation of these VM instructions may be performed Code for subprograms may be put into, and later got out of, a subprogram library. Compiler Construction7
Subprogram Libraries Compilers often offer the possibility of separate compilation Header files provide the compiler with information about the signatures of other subprograms mentioned in the subprogram(s) about to be compiled. Compiler then produces so-called object code, which is written to a file. Compiler can do this for a main routine as well. A collection of object code files constitutes a subprogram library. A main routine, and the subprograms it depends on, and the subprograms they in turn depend on and so on, then need to be linked (by a system linker program) to form a stored executable file; or to be immediately loaded and run. The object files for subprograms may be linked with a variety of main routines. Subprograms need not all be written in the same high-level language. Compiler Construction8
Code Relocation At the time of separate compilation, it is not known where in memory the code for a routine will be when it is executed jumps to instructions need to know an address to jump to references to statically-allocated memory need to know an address to refer to Object code therefore consists of at least two parts the subprogram code, written as if it were going to be loaded into machine memory starting at address zero a relocation map, indicating those instructions containing an address field when loading, or producing an executable, these instructions are modified by adding the subprogram’s actual starting address to their address field If an executable file is being produced, it is still not known where in memory it will reside: a second round of relocation is needed when loading then running it. Compiler Construction9