Course Overview PART I: overview material PART II: inside a compiler

Course Overview PART I: overview material PART II: inside a compiler
1 Introduction 2 Language processors (tombstone diagrams, bootstrapping) 3 Architecture of a compiler PART II: inside a compiler 4 Syntax analysis 5 Contextual analysis 6 Runtime organization 7 Code generation PART III: conclusion Interpretation 9 Review

Arrays An array is a composite data type; an array value consists of multiple values of the same type. Arrays are in some sense like records, except that their elements all have the same type. The elements of arrays are typically indexed using an integer value (In some languages such as for example Pascal, also other “ordinal” types can be used for indexing arrays). Two kinds of arrays (with different runtime representation schemas): static arrays: their size (number of elements) is known at compile time. dynamic arrays: their size can not be known at compile time because the number of elements may vary at run-time. Q: Which are the “cheapest” arrays? Why?

Static Arrays Example: ‘J’ ‘o’ ‘h’ ‘n’ ‘ ’ Name ‘K’ ‘r’ ‘S’ ‘o’ ‘p’
type Name = array 6 of Char; var me: Name; var names: array 2 of Name ‘J’ ‘o’ ‘h’ ‘n’ ‘ ’ names[0][0] names[0][1] names[0][2] names[0][3] names[0][4] names[0][5] Name me[0] ‘K’ me[1] ‘r’ me[2] ‘S’ ‘o’ ‘p’ ‘h’ ‘i’ ‘a’ names[1][0] names[1][1] names[1][2] names[1][3] names[1][4] names[1][5] Name ‘i’ me[3] ‘s’ me[4] ‘ ’ me[5] ‘ ’

Static Arrays Example: ‘K’ 5 ‘i’ 22 ‘d’ 4 type Coding = record
Char c, Integer n end var code: array 3 of Coding ‘K’ 5 code[0].c code[0].n Coding ‘i’ 22 code[1].c code[1].n Coding ‘d’ 4 code[2].c code[2].n Coding

Static Arrays size[T] = n * size[TE] address[a[0] ] = address[a]
type T = array n of TE; var a : T; a[0] size[T] = n * size[TE] address[a[0] ] = address[a] address[a[1]] = address[a]+size[TE] address[a[2] ] = address[a]+2*size[TE] … address[a[k] ] = address[a]+k*size[TE] a[1] a[2] a[n-1]

Static Arrays with different lower bound
Example: in Pascal type T = array [4..10] of TE; var a : T; a[4] size[T] = 7 * size[TE] address[a[4] ] = address[a] address[a[5]] = address[a]+size[TE] address[a[6] ] = address[a]+2*size[TE] … address[a[k] ] = address[a]+(k-4)*size[TE] a[5] a[6] a[10]

Example: in Pascal type T = array [l..u] of TE; var a : T; size[T] = (u-l+1) * size[TE] address[a[k] ] = address[a]+(k-l)*size[TE] = address[a]+k*size[TE]-l*size[TE] = (address[a]-l*size[TE]) +k*size[TE] The “origin” of the array corresponds to address[a[0] ] address[a[k] ] = origin[a] +k*size[TE] origin[a] = address[a] - l*size[TE] = address[a[0] ]

Note: The origin of the array (corresponds to a[0]) is an address which may be outside of the array! Example: type T = array [3..7] of TE; var a : T; a[-1] origin a[0] a[1] a[2] a[3] a[4] array bounds a[5] a[6] a[7]

Dynamic Arrays Dynamic arrays are arrays whose size is not known until run time. Example 1: Java Arrays (all arrays in Java are dynamic) char[ ] buffer; buffer = new char[buffersize]; ... for (int i=0; i<buffer.length; i++) buffer[i] = ‘ ’; Dynamic array: no size given in declaration Array creation at runtime determines size Can ask for size of an array at run time Q: How could we represent Java arrays?

Dynamic Arrays Java Arrays A possible representation for Java arrays
char[ ] buffer; buffer = new char[len]; A possible representation for Java arrays buffer[0] ‘C’ buffer[1] ‘o’ buffer.length 7 buffer[2] ‘m’ buffer.origin • ‘p’ buffer[3] ‘i’ buffer[4] ‘l’ buffer[5] ‘e’ buffer[6]

Dynamic Arrays Java Arrays
char[ ] buffer; buffer = new char[len]; Another possible representation for Java arrays buffer.length 7 buffer • buffer[0] ‘C’ buffer[1] ‘o’ buffer[2] ‘m’ Note: In reality Java also stores a type in its representation for arrays, because Java arrays are objects (instances of classes). ‘p’ buffer[3] ‘i’ buffer[4] ‘l’ buffer[5] ‘e’ buffer[6]

Dynamic Arrays Example 2: Dynamic arrays in Ada
Different from Java because also the lower bound can be dynamically determined. type String is array (Integer range < >) of Character d: String (1 .. k ); s: String (m .. n - 1 ); k, m, n are Integer variables => value not known at compile time Variables d and s are both of type String. Concatenation and lexicographic comparison are allowed on these arrays even if they have different ranges. Assignments will copy the contents of one array into the other, but only works if both have same number of elements. Otherwise => runtime error.

Dynamic Arrays Example 2: Dynamic arrays in Ada
type String is array (Integer range < >) of Character d: String (1 .. k); A possible representation for Ada arrays d[0] origin lower upper • d[1] d ‘a’ 1 ‘b’ d[2] k ‘c’ d[3] Note: remember origin can be outside the actual array. ‘z’ d[k]

Dynamic Arrays Example 2: Dynamic arrays in Ada origin •
lower upper type T is array (Integer range < >) of TE a: T (l .. u); • a l u The formulas: address[a[k] ] = content[address[a]] + k*size[TE] Runtime index check is needed: l ≤ k ≤ u where l = content[ address[a] + address-size ] u = content[ address[a] + address-size + size[Integer] ] (Note: Similar runtime index checks also performed in Java)

Recursive Types A recursive type is a type which is defined in terms of itself. In other words, a component type of the recursive type is the type itself. Example 1: In Pascal type List = ^ Node; Node = record head : Integer; tail : List; end; Recursive In Pascal, recursive types are only allowed when the type is a pointer type (denoted by ^). Why only allowed for pointer types? All pointers have the same size.

Recursive Types • • • • Example 1: In Pascal type List = ^ Node;
Node = record head : Integer; tail : List; end; var primes : List; primes • 2 3 5 • • •

Recursive Types Example 2: In Haskell data Tree = Nil
| Node Int Tree Tree t = Node 0 (Node 1 Nil Nil) (Node 3 (Node 4 Nil Nil) (Node 5 Nil Nil)) In Haskell, there are no explicit pointer types, but recursive types are allowed in disjoint unions (Haskell “data” types). The representation of the above Tree type of course uses a pointer structure, similar to the list on the previous page.

| Node Int Tree Tree t = Node 0 (Node 1 Nil Nil) (Node 2 (Node 3 Nil Nil) (Node 4 Nil Nil)) Tag: Node or Nil 1 Node • Nil Node Nil t • • 3 Node • Nil • 2 Node • Nil Node

| Node Int Tree Tree Another possible representation. 1 This representation may be more memory efficient if we can “steal” a few bits from a pointer to represent a small tag. Node Nil t • Node Nil • Node • …

Expression Evaluation
Data Representation: how to represent values of the source language on the target machine Expression Evaluation: How to organize computing the values of expressions (taking care of intermediate results) Stack Storage Allocation: How to organize storage for variables (considering lifetimes of global and local variables) Routines: How to implement procedures, functions (and how to pass their parameters and return values) Heap Storage Allocation: How to organize storage for variables (considering lifetimes of heap variables) Object Orientation: Runtime organization for OO languages (how to handle classes, objects, methods)

What is the problem? Computing the value of something like this: (a * b) + (1 - (c * 2)) on a low level machine. Low level machine has instructions for multiplication, addition, subtraction, etc. Each instruction operates on two values at a time. Problem: How to use these simple instructions to compute complex expressions More specifically: how to manage intermediate results

Intermediate results A sequence of machine instructions that compute the value of an expression E1 op E2 Instructions that compute value of E1 Produces value of E1 Instructions that compute value of E2 Produces value of E2 (while this is executing, value of E1 must be saved somewhere) Do operation op Needs values of E1 and E2

a*b + (1-c*2) + * a*b - 1-c*2 Intermediate results a b 1 * c*2 c 2

Expression Evaluation on a Register Machine (RM)
A register machine has a number of “registers” R1, R2, R3, … which can be used to store intermediate values. Typical Instructions: STORE Ri a LOAD Ri x MULT Ri x SUB Ri x ADD Ri x RM code is efficient, but compilation to a RM is rather complex: Must assign a specific register to each intermediate result. Must manage allocation of registers (try to reuse/minimize number of registers). Machine only has a fixed number of registers, so what if this is not enough? x = Register | #number | address | ...

Expression Evaluation on a Register Machine
Example: Computing (a * b) + (1 - (c * 2)) on a register machine. A register machine has a number of “registers” R1, R2, R3, … which can be used to store intermediate values. LOAD R1 a //R1: a MULT R1 b //R1: a * b LOAD R2 #1 //R2: 1 LOAD R3 c //R3: c MULT R3 #2 //R3: c * 2 SUB R2 R3 //R2: 1 - (c * 2) ADD R1 R2 //R1: (a * b) + (1 - (c * 2))

Expression Evaluation on a Stack Machine (SM)
On a stack machine, the intermediate results are stored on a stack. Operations take their arguments from the top of the stack and put the result back on the stack. Typical Instructions: STORE a LOAD x MULT SUB ADD Stack machine: Very natural for expression evaluation (see examples on next two pages). Requires more instructions for the same expression, but the instructions are simpler.

Expression Evaluation on a Stack Machine
Example 1: Computing (a * b) + (1 - (c * 2)) on a stack machine. LOAD a //stack: a LOAD b //stack: a b MULT //stack: (a*b) LOAD #1 //stack: (a*b) 1 LOAD c //stack: (a*b) 1 c LOAD #2 //stack: (a*b) 1 c 2 MULT //stack: (a*b) 1 (c*2) SUB //stack: (a*b) (1-(c*2)) ADD //stack: (a*b)+(1-(c*2)) Note the correspondence between the instructions and the expression written in postfix notation: a b * 1 c 2 * - +

Expression Evaluation on a Stack Machine
Example 2: Computing (0 < n) && odd(n) on a stack machine. LOAD #0 //stack: 0 LOAD n //stack: 0 n LT //stack: (0<n) LOAD n //stack: (0<n) n CALL odd //stack: (0<n) odd(n) AND //stack: (0<n)&&odd(n) This example illustrates that calling functions/procedures fits in just as naturally with the stack machine evaluation model as operations that correspond to machine instructions. In register machines this is much more complicated, because a stack must be created in memory for managing subroutine calls/returns.

Course Overview PART I: overview material PART II: inside a compiler

Similar presentations

Presentation on theme: "Course Overview PART I: overview material PART II: inside a compiler"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Course Overview PART I: overview material PART II: inside a compiler

Similar presentations

Presentation on theme: "Course Overview PART I: overview material PART II: inside a compiler"— Presentation transcript:

Similar presentations

About project

Feedback