Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers.

Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers Allocation Instruction Selection Peephole Optimization Peephole Instruction Selection

Spring 2014Jim Hogg - UW - CSE - P501N-2 Source Target Front End Back End Scan chars tokens AST IR AST = Abstract Syntax Tree IR = Intermediate Representation ‘Middle End’ Optimize Select Instructions Parse Convert Allocate Registers Emit Machine Code IR A Compiler Instruction Selection: processors don't support IR; need to convert IR into real machine code

Spring 2014Jim Hogg - UW - CSE - P501N-3 The Big Picture Compiler = lots of fast stuff, followed by hard problems Scanner: O(n) Parser: O(n) Analysis & Optimization: ~ O(n log n) Instruction selection: fast or NP-Complete Instruction scheduling: NP-Complete Register allocation: NP-Complete Recall: approach in P501 to describing backend is 'survey' level A deeper dive would require a further full, 10-week course

Compiler Backend: 3 Parts Select Instructions eg: Best x86 instruction to implement: vr17 = 12(arp) ? Schedule Instructions eg: Given instructions a,b,c,d,e,f,g would the program run faster if they were issued in a different order, such as d,c,a,b,g,e,f ? Allocate Registers eg: which variables to store in registers? which to store in memory? Spring 2014Jim Hogg - UW - CSE - P501N-4

Instruction Selection is... Spring 2014Jim Hogg - UW - CSE - P501N-5 Select Real Instructions Mid-Level IR Low-Level IR Specific to one chip/ISA Use chip addressing modes But still not decided on registers TAC - Three Address Code t1  a op b Tree or Linear  supply of temps Array address calcs. explicit Optimizations all done Storage locations decided

Instruction Scheduling is... Spring 2014Jim Hogg - UW - CSE - P501N-6 Schedule Execute in-order to get correct answer abcdefghabcdefgh badfcghfbadfcghf Issue in new order eg: memory fetch is slow eg: divide is slow Overall faster Still get correct answer! Originally devised for super-computers Now used everywhere: in-order procs - Atom, older ARM out-of-order procs - newer x86 Compiler does 'heavy lifting' - reduce chip power

Register Allocation is... Spring 2014Jim Hogg - UW - CSE - P501N-7 Allocate Virtual registers or temps  supply R1 = = R1 push R1 R1 = = R1 pop R1 Real machine registers eg: EAX, EBP Very finite supply! Enregister/Spill After allocating registers, may do a second pass of scheduling to improve speed of spill code

Instruction selection chooses which target instructions (eg: for x86, for ARM) to use for each IR instruction Selecting the best instructions is massively difficult because modern ISAs provide: huge choice of instructions wide choice of addressing modes Eg: Intel's x64 Instruction Set Reference Manual = 1422 pages Spring 2014Jim Hogg - UW - CSE - P501N-8 Instruction Selection

Spring 2014Jim Hogg - UW - CSE - P501N-9 Choices, choices... Most chip ISAs provide many ways to do the same thing: eg: to set eax to 0 on x86 has several alternatives: mov eax, 0xor eax, eax sub eax, eaximul eax, 0 Many machine instructions do several things at once – eg: register arithmetic and effective address calculation. Recall: lea rdst, [rbase + rindex*scale + offset]

Spring 2014Jim Hogg - UW - CSE - P501N-10 Overview: Instruction Selection Map IR into near-assembly code For MiniJava, we emitted textual assembly code Commercial compilers emit binary code directly Assume known storage layout and code shape ie: optimization phases have already done their thing Combine low-level IR operations into machine instructions (take advantage of addressing modes, etc)

Spring 2014Jim Hogg - UW - CSE - P501N-11 Criteria for Instruction Selection Several possibilities Fastest Smallest Minimal power (eg: don’t use a function-unit if leaving it powered- down is a win) Sometimes not obvious eg: if one of the function-units in the processor is idle and we can select an instruction that uses that unit, it effectively executes for free, even if that instruction wouldn’t be chosen normally (Some interaction with scheduling here…)

Instruction Selection: Approaches Two main techniques: 1. Tree-based matches (eg: maximul munch algorithm) 2. Peephole-based generation Spring 2014Jim Hogg - UW - CSE - P501N-12 Note that we select instructions from the target ISA. We do not decide on which physical registers to use. That comes later during Register Allocation We have a few generic registers: ARP, StackPointer, ResultReg

Spring 2014Jim Hogg - UW - CSE - P501N-13 Tree-Based Instruction Selection  ID ID e  f case ID: t1 = offset(node) t2 = base(node) reg = nextReg() emit(loadA0, t1, t2, res) How to generate target code for this simple tree? We could use a template approach - similar to converting an AST into IR: Resulting codegen is correct, but dumb Doesn't even cover: call-by-val, call-by-ref, enregistered, different data type, etc

Spring 2014Jim Hogg - UW - CSE - P501N-14 loadI 4 => r 5 loadAO r arp, r 5 => r 6 loadI 8 => r 7 loadAO r arp, r 7 => r 8 mult r 6, r 8 => r 9  ID ID e  f loadAI r arp, 4 => r 5 loadAI r arp, 8 => r 6 mult r 5, r 6 => r 7 Naive Ideal case ID: t1 = offset(node) t2 = base(node) reg = nextReg() emit(loadA0, t1, t2, res) Template Code Generation

IR (3-address Code) Tree-lets Production ILOC Template 5... 6 Reg  Labload l => r n 8 Reg  Num 9 Reg   Reg 1 load r 1 => r n 10 Reg   + Reg 1 Reg 2 loadA0 r 1, r 2 => r n 11 Reg   + Reg 1 Num 2 loadAI r 1, n 2 => r n 14 Reg   + Lab 1 Reg 2 loadAI r 2, l 1 => r n 15 Reg  + Reg 1 Reg 2 add r 1, r 2 => r n 16 Reg  + Reg 1 Num 2 addI r 1, 2 2 => r n 19 Reg  + Lab 1 Reg 2 addI r 2, l 1 => r n 20... Spring 2014Jim Hogg - UW - CSE - P501N-15  Memory Dereference + Reg 1 Num 2  11 Rules for Tree-to-Target Conversion (prefix notation)  + Reg 1 Num 2

Example: Tiling a tiny 4-node tree Spring 2014Jim Hogg - UW - CSE - P501N-16 + Lab @G Num 12  Load variable The tree above shows code to access variable, c, stored at offset 12 bytes from label G How many ways can we tile this tree into equivalent ILOC code?

Potential Matches N-17 + Lab @G Num 12  11 6 + Lab @G Num 12  14 8 + Lab @G Num 12  10 8 6 + Lab @G Num 12  10 8 6 + Lab @G Num 12  9 6 16 + Lab @G Num 12  9 19 8 + Lab @G Num 12  9 15 8 6 + Lab @G Num 12  9 15 8 6 loadI @G => r i loadAI r i 12 => r j loadI l 2 => r i loadAI r i @G => r j loadI @G => r i loadI l 2 => r j loadAO r i r j => r k loadI l 2 => r i loadI @G => r j loadAO r i r j => r k

Spring 2014Jim Hogg - UW - CSE - P501N-18 Tree-Pattern Matching A given tiling “implements” a tree if: covers every node in the tree, and overlap between any two tiles (trees) is limited to a single node If is in the tiling, then node is also covered by a leaf in another operation tree in the tiling – unless it is the root Where two operation trees meet, they must be compatible (ie: expect the same value in the same location)

IR AST for Simple Expression Spring 2014Jim Hogg - UW - CSE - P501N-19 = - Num 4 Val arp +   Num -16 Val arp +  Num 2  Num 12 Val @G + a = b - 2  c a local var, offset 4 from arp b call-by-ref param c var, offset 12 from label @G = + Val 1 Num 1 -   + Val 2 Num 2  Num 3  + Lab 1 Num 4 Prefix form - same info as IR tree:

IR AST as Prefix Text String Spring 2014Jim Hogg - UW - CSE - P501N-20 = + Val 1 Num 1 -   + Val 2 Num 2  Num 3  + Lab 1 Num 4 No parentheses? - don't need them: evaluate this expression from right-to-left, using simple stack machine Production ILOC Template 5... 6 Reg  Lab 1 load l 1 => r n 7 Reg  Val 1 8 Reg  Num 1 loadl n 1 => r n 9 Reg   Reg 1 load r 1 => r n 10 Reg   + Reg 1 Reg 2 loadA0 r 1, r 2 => r n 11 Reg   + Reg 1 Num 2 loadAI r 1, n 2 => r n 14 Reg   + Lab 1 Reg 2 loadAI r 2, l 1 => r n 15 Reg  + Reg 1 Reg 2 add r 1, r 2 => r n 16 Reg  + Reg 1 Num 2 addI r 1, 2 2 => r n 19 Reg  + Lab 1 Reg 2 addI r 2, l 1 => r n 20... Rewrite Rules - Another BNF, but including ambiguity!

Select Instructions with LR Parser Spring 2014Jim Hogg - UW - CSE - P501N-21 = + Val 1 Num 1 -   + Val 2 Num 2  Num 3  + Lab 1 Num 4 Production ILOC Template 5... 6 Reg  Lab 1 load l 1 => r n 7 Reg  Val 1 8 Reg  Num 1 loadl n 1 => r n 9 Reg   Reg 1 load r 1 => r n 10 Reg   + Reg 1 Reg 2 loadA0 r 1, r 2 => r n 11 Reg   + Reg 1 Num 2 loadAI r 1, n 2 => r n 14 Reg   + Lab 1 Reg 2 loadAI r 2, l 1 => r n 15 Reg  + Reg 1 Reg 2 add r 1, r 2 => r n 16 Reg  + Reg 1 Num 2 addI r 1, 2 2 => r n 19 Reg  + Lab 1 Reg 2 addI r 2, l 1 => r n 20... Use LR parser for Grammar to parse prefix expression Ambiguous, so lots of parse-action conflicts: resolve with tie-breaker rules: Lowest cost (need to augment Grammar productions with cost) Favor large reductions over smaller ("maximal munch") reduce-reduce conflict - choose longer reduction shift-reduce conflict - choose shift => largest number of prefix ops translated into machine instruction

Originally devised as the last optimization pass of a compiler Examine a small, sliding window of target code (a few adjacent instructions) and optimize Spring 2014Jim Hogg - UW - CSE - P501N-22 storeAI r 1 => r arp, 8 loadAI r arp, 8 => r 15 Memory[r arp + 8] = r 1 r 15 = Memory[r arp + 8] Reminder Original storeAI r 1 => r arp, 8 i2i r 1 => r 15 Memory[r arp + 8] = r 1 r 15 = r 1 Optimized Cooper&Torczon "ILOC" Peephole Optimization

Peeps, 2 Spring 2014Jim Hogg - UW - CSE - P501N-23 addI r 2, 0 => r 7 mult r 4, r 7 => r 10 r 7 = r 2 + 0 r 10 = r 4 * r 7 Reminder Original mult r 4, r 2 => r 10 r 10 = r 4 * r 2 Optimized jumpI L10 L10: jumpI L20 Original jumpI L20 L10: jumpI L20 Optimized

Modern Peephole Optimizer Spring 2014Jim Hogg - UW - CSE - P501N-24 ExpanderSimplifierMatcher IRLIR Modern ISAs are enormous Linear search for a match no longer fast enough So... Like a miniature compiler Use for both peeps and instruction-selection

Instruction Selection via Peeps Spring 2014Jim Hogg - UW - CSE - P501N-25 t 1 = 2  c a = b - t 1 r 10 = 2 r 11 = @G r 12 = 12 r 13 = r 11 + r 12 r 14 = M[r 13 ] r 15 = r 10 * r 14 r 16 = -16 r 17 = r arp + r 16 r 18 = M[r 17 ] r 19 = M[r 18 ] r 20 = r 19 - r 15 r 21 = 4 r 22 = r arp + r 21 M[r 22 ] = r 20 LIR r 10 = 2 r 11 = @G r 14 = M(r 11 +r 12 ) r 15 = r 10 * r 14 r 18 = M(rarp-16) r 19 = M(r 18 ) r 20 = r 19 - r 15 M(r arp +4) = r 20 Simplified LIR a 4(arp) Local Variable b -16(arp) Call-by-reference parameter c 12(@G) Variable IR loadI 2 => r10 loadI @G => r11 loadAI r11, 12 => r14 mult r10, r14 => r15 loadAI rarp, -16 => r18 load r18 => r19 sub r19, r15 => r20 storeAI r20 => rarp,4 LIR 14 Instructions 8 Instructions

The Simplifier Spring 2014Jim Hogg - UW - CSE - P501N-26 r 10 = 2 r 11 = @G r 12 = 12 r 11 = @G r 12 = 12 r 13 = r 11 + r 12 r 11 = @G r 13 = r 11 + 12 r 14 = M[r 13 ] r 11 = @G r 14 = M[r 11 + 12] r 15 = r 10 * r 14 r 14 = M[r 11 +12] r 15 = r 10 * r 14 r 16 = -16 r 15 = r 10 * r 14 r 16 = -16 r 17 = r arp + r 16 r 15 = r 10 * r 14 r 17 = r arp - 16 r 18 = M[r 17 ] r 15 = r 10 * r 14 r 18 = M[r arp -16] r 19 = M[r 18 ] r 18 = M[r arp -16] r 19 = M[r 18 ] r 20 = r 19 - r 15 r 19 = M[r 18 ] r 20 = r 19 - r 15 r 21 = 4 r 20 = r 19 - r 15 r 21 = 4 r 22 = r arp + r 21 r 20 = r 19 - r 15 r 22 = r arp + 4 M[r 22 ] = r 20 r 20 = r 19 - r 15 M[r 22 ] = r 20 Roll Const Prop + DCE This example is simplified - never re-uses a virtual register: so we can delete the instruction after a constant propagation. In general, we would have to generate liveness info for each virtual register to perform safe DCE

Spring 2014Jim Hogg - UW - CSE - P501N-27 Instruction Scheduling Register Allocation And more…. Next

Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers.

Similar presentations

Presentation on theme: "Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers.

Similar presentations

Presentation on theme: "Spring 2014Jim Hogg - UW - CSE - P501N-1 CSE P501 – Compiler Construction Compiler Backend Organization Instruction Selection Instruction Scheduling Registers."— Presentation transcript:

Similar presentations

About project

Feedback