Intermediate Code Representations
Conceptual phases of compiler Lexical Analysis (scanner) Syntax analysis (parser) Semantic Analysis Code optimization Code generation Sequence of tokens Optimized code Intermediate code - IR1 Intermediate code IR2 Target code Front End machine independent language dependent Middle Back End machine dependent language independent
Why use an IR? Separates machine independent and machine dependent parts of the compiler - Both retargetable. Easier to perform machine independent optimizations than at machine code level Example: common sub-expression elimination 3. Simplifies code generation
IR – Encodes Compiler’s Program Knowledge Thus, some IR PROPERTIES: Ease of generation Ease of manipulation Size Freedom of Expression Level of Abstraction Selecting IR is critical.
3 Categories of IRs Structural/Graphical - AST and Concrete ST - call graph - program dependence graph (PDG) 2. Linear - 3-address code - abstract stack machine code Hybrid - control flow graph (CFG) Advantages and disadvantages and typical uses of these categories of IRs
Level of Abstraction Consider:A[j,i] = @A + j*10 + i Loadi 1, R1 [ ] A I J Loadi 1, R1 Sub RJ, R1, R2 Loadi 10, R3 Mult R2, R3, R4 Sub Ri, R1, r5 Add R4, R5, R6 Loadi @A, R7 Add R7, R6, R8 Load R8, RAIJ What is the construct being represented? Array subscripting of A[I,j]. High level AST – good for memory disambiguation, maybe harder to optimize, easier to generate Low level 3-addr code: different opts capable here
Some Design Issues for IRs Questions to Ponder: What is the minimum needed in the language’s set of operators? What is the advantage of a small set of operators? What is the concern of designing the operations Close to actual machine operations? 4. What is the potential problem of having a small Set of IR operations? Need to express the source languages Small set of oeprators – easier to implement If too close to particular machine, then lose portability Small set could lead to long instruction sequences – requires more work during optimization phase
High Level Graphical Representations Consider: A -> V := E E -> E + E | E * E | - E | id String: a := b * - c + b * - c Exercise: Concrete ST? AST? DAG? AST: more compact, easier to generate code DAG: unique node for each value. More compact. Showing redundant expressions explicitly. Easy to Generate during parsing. Encodes redundancy.
Linear IRs: Three Address Code Sequence of instructions of the form X := y op z where x, y and z are variable names, constants, or compiler generated variables (“temporaries”) Only one operator is permitted on the RHS – expressions computed using temporaries
Simple Linear IRs Write the 3 – address code for: a := b * - c + b * - c ? = -c = b * ? … complete the code from the ast? The dag? There is a need for compiler-generated temporary variables (temps) to represent intermediary values in internal nodes of ast. Code from ast: T1 = -c T2 = b * T1 T3 = -c T4 = b * T3 T5 = T2 + T4 A = T5 Versus from dag T3 = T1 + T2 A = T3
Exercise Give the 3 address code for: Z := x * y + a[j] / sum(b)
More Simple Linear IRs Stack machine code: push, pop, ops Consider: x – 2 * y Advantages? Push x Push 2 Push y Mult Sub Advantages: compact, temp names are implicit. Temps take up no extra space. Simple to generate and execute, useful when code transmitted over slow common links (the internet).
Hybrid IRs
Exercise – Construct the CFG Where are the leaders? Basic blocks? Edges?
Call Graph Representation Node = function or method Edge from A to B : A has a call site where B is potentially called
Exercise: Construct a call graph
Multiple IRs: WHIRL
Key Highlights of IRs