Accomplishing Executables How is this done? Copyright © 2014-2017 Curt Hill
Introduction An executable program is a series of bits that is perfectly understandable to a machine Most programs need to be coded in a textual or graphical fashion to be done by people This presentation considers how this transformation occurs Some of you may already be familiar with this process Copyright © 2014-2017 Curt Hill
Types of Transformers There are two basic classes of program that accomplish this transformation: Interpreters Compilers There are many combinations of these two We shall look at several examples Copyright © 2014-2017 Curt Hill
Definitions Machine language is the only language a CPU may execute Completely different for different types of CPUs An interpreter takes in a source language program It executes it as if it were machine language program A compiler translates the source language from one form to another Most often the result is machine language Translating from a new or rare language to a common one has been done Copyright © 2014-2017 Curt Hill
Pure Interpreters Before the Apple there were a number of BASICs for very small machines Very low memory footprint Usually less than 16K Some as small as 2K This is for the editor and interpreter in one piece of code Copyright © 2014-2017 Curt Hill
Form of BASIC Original form of BASIC was very simple This made interpretation easier The format was: line cmd parms Where The line is a required line number The cmd was the type of statement The parms were any needed other information Copyright © 2014-2017 Curt Hill
BASIC Statements At a minimum the following statements were required REM – a comment LET – an assignment PRINT – display on screen INPUT – Get from keyboard IF – single line conditional GOTO – required line number GOSUB – Almost a procedure call STOP – End program Copyright © 2014-2017 Curt Hill
A Simple BASIC Program 10 REM ECHO NAME 20 PRINT “WHAT IS YOUR NAME?” 30 INPUT A$ 40 PRINT “HELLO ” A$ 50 STOP Copyright © 2014-2017 Curt Hill
Interaction The user interacted with the interpreter in one of two ways Type in a command without a line number This is immediately executed Type in a command with line number Place it program in position determined by line number If it already exists then replace it Copyright © 2014-2017 Curt Hill
Execution When the user types in Run the program is executed Start at the lowest line number and go from there When the STOP is found, then return to the prompt Copyright © 2014-2017 Curt Hill
Internals The program was stored in text format with no change Or keywords converted to upper case The interpreter would parse each line each time it was executed No translation at all These kind of programs were small so the overhead of interpretation was not a problem Copyright © 2014-2017 Curt Hill
Similarly Some scripting languages are the same A DOS Batch file or UNIX Shell script use the same approach Certain commands like the IF are handled by the script processor All others are presumed to be OS commands and passed on Copyright © 2014-2017 Curt Hill
Simple Interpreter BASIC Interpreter BASIC Source Statements Copyright © 2014-2017 Curt Hill
Somewhat Better As machines get faster or the languages get more complicated we see some changes The source program now resides in a file Some form of transformation is applied to the source before it is executed Copyright © 2014-2017 Curt Hill
BASIC again Slightly more sophisticated editing and file manipulation Pre-compilation processing The reserved words are transformed into subroutine addresses The call to the subroutine is much quicker Variables or parameters are similarly made into addresses Line is parsed just once Overhead of interpretation reduced Copyright © 2014-2017 Curt Hill
Most Interpreters Editor Interpreter Converter Source Internal form Copyright © 2014-2017 Curt Hill
SNOBOL4 A very powerful pattern matching language It converted the source language into an internal form and then executed SNOBOL4 could do self-modifiying programs so it was important that the transformation routines and internal form were present at the same time Copyright © 2014-2017 Curt Hill
Compilation Interpretation is like the butler You tell the butler to do something and he does it immediately Compilation is like a translator Convert a program in one language to that of another Usually the result is machine language Copyright © 2014-2017 Curt Hill
Three Step Take the source language and convert into object code Object, in this context, means machine language, but not yet ready to execute In other contexts it usually means a variable instantiation of a class Take several object codes and libraries and link together into an executable Load the executable and run Copyright © 2014-2017 Curt Hill
Compilation Editor Compiler Linker Loader Source Library Object Executable Copyright © 2014-2017 Curt Hill
Compile Cycle The compiler translates the source into object Object is machine language Not yet executable Linker takes one or more objects and libraries and creates the executable Loader executes Copyright © 2014-2017 Curt Hill
project.cpp #include <iostream.h> #include <vector.h> #include “MyClass.h” … void doit(int k){…} int main(void){ cout << “Enter a value”; int a, b; cin >> a >> b; MyClass x(a,b); char * st = x.ToString(); } Copyright © 2014-2017 Curt Hill
Source and Object Inside that code were several types of routines The main function was a function that must be externally declared The doit function was only needing to be internally declared The cin, cout and vector types are well known externals MyClass was external, but only used here Copyright © 2014-2017 Curt Hill
Object file When the C++ compiler executes it compiles project.cpp and produces and object file .OBJ on windows and .o on UNIX This object file is mostly machine language It also contains a relocation dictionary and an external symbol dictionary Copyright © 2014-2017 Curt Hill
External Symbol Dictionary The external symbol dictionary is a list of all external symbols In this code the following external symbols were seen: main – which is needed as the entry point of the program The name of the cin >> function The name of the cout << function The MyClass ToString function The doit function may not be externally referenced so is not present Copyright © 2014-2017 Curt Hill
Calls The compiler places the doit and main functions into the object Thus the call to doit may be fully formed The call to MyClass.ToString cannot be properly generated Where is that code in relation to the others? That is for the linker to decide With the help of the relocation dictionary Copyright © 2014-2017 Curt Hill
Relocation Dictionary For each call to an external routine there is an address that cannot be determined The relocation dictionary records all of these for later processing It also records where main is The main function does not have to be first Any function labeled with extern is listed or in an include is here Copyright © 2014-2017 Curt Hill
Relocation Dictionary Some addresses, such as the address of doit are relative to the beginning of the module Beginning of the module is usually assumed to be zero If this is loaded anywhere in memory that address needs to adjusted to the correct beginning point in memory This is also a relocation dictionary item Copyright © 2014-2017 Curt Hill
Linker Takes the object files and creates an executable Reads in the object from project.cpp and MyClass.cpp and arranges them in a file Finds from a library all of the routines that it needs for this program and places them in the file The cin >> function, cout << and any vector methods Copyright © 2014-2017 Curt Hill
Linker Again Beside placing the object files into a new executable the linker has to process the two dictionaries Each address of an external needs to be filled in The executable must also have a smaller relocation dictionary Must also be in suitable format for the loader Copyright © 2014-2017 Curt Hill
Loader Usually invisible to most of us Takes an executable: Part of the OS Takes an executable: Relocates it, if needed Allocates memory for it Creates a process to start it Starts it Cleans up when complete Copyright © 2014-2017 Curt Hill
Linking Iostream.lib Project.OBJ Loc(cin<<) Addr(MyClass.ToString) Loc(cout>>) Addr(cin <<) Addr(cout >>) Project.EXE MyClass.OBJ main MyClass(int,int) Loc(MyClass.ToString) MyClass.ToString cin cout Copyright © 2014-2017 Curt Hill
Names In the normal compilation scheme where a name becomes an address needs to be noted This is static linking The compiler converts all internal names to addresses Variables, constants, internal functions The linker converts all external names to addresses The executable has no names at all Almost Copyright © 2014-2017 Curt Hill
Static and Dynamic In static linking the executable ends up with all the code it needs Windows also has dynamic link libraries (DLLs) A DLL may be shared by multiple processes at the same time To each it appears to be part of the address space This saves memory for very commonly used subroutines Copyright © 2014-2017 Curt Hill
DLLs Called in a completely different way Specify the file name and a function name First call loads the DLL into memory When done it is released Does not leave memory until there are no processes using it UNIX and most other systems have a similar feature for frequently used routines Copyright © 2014-2017 Curt Hill
Java Not quite either the compilation system described nor an interpreter The output of the compiler is a .class file This is machine language for the Java Virtual Machine (JVM) The JVM is defined so that the overhead of interpretation is very low Calls are little different as well Copyright © 2014-2017 Curt Hill
Java Function Calls Every function call is dynamic rather than static Each function call in the JVM has the actual method name Quite unlike machine language of any other machine When called the JVM checks whether it is present in memory and then executes it If not it loads it then executes it Copyright © 2014-2017 Curt Hill
Differences No need for a link step in Java Functions are always called dynamically This makes some link errors into run-time errors It also allows a long running program to get a refreshed method without stopping the program The .NET system operates similar to the Java system Copyright © 2014-2017 Curt Hill
Some History The first compiler is FORTRAN – 1957-1959 It followed the pattern of assemblers The first interpreter is LISP – 1959 It only does dynamic calls In late 1960s SNOBOL4 interpreter was written in a macro language To implement, just code the macros for your machine This led to similar approaches Copyright © 2014-2017 Curt Hill
More History In the 1970s Pascal gained popularity without corporate support If you wanted to implement you received: The compiler source It compiled into P-Code A compiler or interpreter for Pcode was then devised You had a working system JVM is an extension of this approach Copyright © 2014-2017 Curt Hill
Finally There have been many variations on the themes given here Any interactive program can be considered an interpreter of sorts In most compiled languages we mostly use static linking because of its speed The need for speed has greatly reduced One of the reasons for scripting languages popularity as well as Java Copyright © 2014-2017 Curt Hill