Peephole Optimization Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load.

Peephole Optimization Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load pairs MOV %eax => mema MOV mema => %eax Can eliminate the second instruction without needing any global knowledge of mema Use algebraic identities Special-case individual instructions

Algebraic identities worth recognizing single instructions with a constant operand: A * 2 = A + A A * 1 = A A * 0 = 0 A / 1 = A More delicate with floating-point

Is this ever helpful? Why would anyone write X * 1? Why bother to correct such obvious junk code? In fact one might write #define MAX_TASKS 1... a = b * MAX_TASKS; Also, seemingly redundant code can be produced by other optimizations. This is an important effect.

Replace Multiply by Shift A := A * 4; Can be replaced by 2-bit left shift (signed/unsigned) But must worry about overflow if language does A := A / 4; If unsigned, can replace with shift right But shift right arithmetic is a well-known problem Language may allow it anyway (traditional C)

Addition chains for multiplication If multiply is very slow (or on a machine with no multiply instruction like the original SPARC), decomposing a constant operand into sum of powers of two can be effective : X * 125 = x * 128 - x*4 + x two shifts, one subtract and one add, which may be faster than one multiply Note similarity with efficient exponentiation method

The Right Shift problem Arithmetic Right shift: shift right and use sign bit to fill most significant bits -5 111111...1111111011 SAR 111111...1111111101 which is -3, not -2 in most languages -5/2 = -2 Prior to C99, implementations were allowed to truncate towards or away from zero if either operand was negative

Folding Jumps to Jumps A jump to an unconditional jump can copy the target address JNE lab1... lab1 JMP lab2 Can be replaced by JNE lab2 As a result, lab1 may become dead (unreferenced)

Jump to Return A jump to a return can be replaced by a return JMP lab1... lab1 RET Can be replaced by RET lab1 may become dead code

Tail Recursion Elimination A subprogram is tail-recursive if the last computation is a call to itself: function last (lis : list_type) return lis_type is begin if lis.next = null then return lis; else return last (lis.next); end; Recursive call can be replaced with lis := lis.next; goto start; -- added label

Advantages of tail recursion elimination saves time: an assignment and jump is faster than a call with one parameter saves stack space: converts linear stack usage to constant usage. In languages with no loops, this may be a required optimization: specified in Scheme standard.

Tail-recursion elimination at the Instruction Level Consider the sequence on the x86: CALL func RET CALL pushes return point on stack, RET in body of func removes it, RET in caller returns Can generate instead: JMP func Now RET in func returns to original caller, because single return address on stack

Peephole optimization in the REALIA COBOL compiler Full compiler for Standard COBOL, targeted to the IBM PC. Now distributed by Computer Associates Runs in 150K bytes, but must be able to handle very large programs that run on mainframes No global optimization possible: multiple linear passes over code, no global data structures, no flow graph. Multiple peephole optimizations, compiler iterates until code is stable. Each pass scan code backwards to minimize address recomputations

Typical COBOL code: control structures and perform blocks. Process-Balance. if Balance is negative then perform Send-Bill else perform Record-Credit end-if. Send-Bill.... Record-Credit....

Simple Assembly: perform equivalent to call Pb: cmp balance, 0 jnl L1 call Sb jmp L2 L1: call Rc L2: ret Sb:... ret Rc:... ret

Fold jump to return statement Pb: cmp balance, 0 jnl L1 call Sb jmp L2 -- jump to return L1: call Rc L2: ret Sb:... ret Rc:... ret

Corresponding Assembly Pb: cmp balance, 0 jnl L1 -- jump to unconditional jump call Sb ret -- folded L1: jmp Rc -- will become useless L2: ret Sb:... ret Rc:... ret

code following a jump is unreachable Pb: cmp balance, 0 jnl Rc -- folded jmp Sb ret -- unreachable L1: jmp Rc -- unreachable Sb:... ret Rc:... ret

Jump to following instruction is a noop Pb: cmp balance, 0 jnl Rc jmp Sb -- jump to next instruction Sb:... ret Rc:... ret

Final code Pb: cmp balance, 0 jnl Rc Sb:... ret Rc:... ret Final code as efficient as inlining. All transformations are local. Each optimization may yield further optimization opportunities Iterate till no further change

Arcane tricks Consider typical maximum computation if A >= B then C := A; else C := B; end if; For simplicity assume all unsigned, and all in registers

Eliminating max jump on x86 Simple-minded asm code CMP A, B JNAE L1 MOV A=>C JMP L2 L1: MOV B=>C L2: One jump in either case

Computing max without jumps on X86 Architecture-specific trick: use subtract with borrow instruction and carry flag CMP A, B ; CF=1 if B > A, CF = 0 if A >= B SBB %eax,%eax ; all 1's if B > A, all 0's if A >= B MOV %eax, C NOT C ; all 0's if B > A, all 1's if A >= B AND B=>%eax ; B if B>A, 0 if A>=B AND A=>C ; 0 if B >A, A if A>=B OR %eax=>C ; B if B>A, A if A>=B More instructions, but NO JUMPS Supercompiler: exhaustive search of instruction patterns to uncover similar tricks

Peephole Optimization Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load.

Similar presentations

Presentation on theme: "Peephole Optimization Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Peephole Optimization Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load.

Similar presentations

Presentation on theme: "Peephole Optimization Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load."— Presentation transcript:

Similar presentations

About project

Feedback