Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.

Slides:



Advertisements
Similar presentations
Fabián E. Bustamante, Spring 2007 Machine-Level Programming II: Control Flow Today Condition codes Control flow structures Next time Procedures.
Advertisements

Binary Translation Using Peephole Superoptimizers Sorav Bansal, Alex Aiken Stanford University.
Code optimization: –A transformation to a program to make it run faster and/or take up less space –Optimization should be safe, preserve the meaning of.
CSC 221 Computer Organization and Assembly Language Lecture 21: Conditional and Block Structures: Assembly Programs.
Assembly Language for x86 Processors 6th Edition Chapter 5: Procedures (c) Pearson Education, All rights reserved. You may modify and copy this slide.
C Programming and Assembly Language Janakiraman V – NITK Surathkal 2 nd August 2014.
Computer Organization And Assembly Language
Lecture 11 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD
PC hardware and x86 3/3/08 Frans Kaashoek MIT
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Accessing parameters from the stack and calling functions.
Assembly Language for Intel-Based Computers Chapter 2: IA-32 Processor Architecture Kip Irvine.
ICS312 Set 3 Pentium Registers. Intel 8086 Family of Microprocessors All of the Intel chips from the 8086 to the latest pentium, have similar architectures.
Web siteWeb site ExamplesExamples Irvine, Kip R. Assembly Language for Intel-Based Computers, Defining and Using Procedures Creating Procedures.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
CEG 320/520: Computer Organization and Assembly Language ProgrammingIntel Assembly 1 Intel IA-32 vs Motorola
6.828: PC hardware and x86 Frans Kaashoek
Fast, Effective Code Generation in a Just-In-Time Java Compiler Rejin P. James & Roshan C. Subudhi CSE Department USC, Columbia.
Code Generation Gülfem Savrun Yeniçeri CS 142 (b) 02/26/2013.
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
The x86 Architecture Lecture 15 Fri, Mar 4, 2005.
IA32 (Pentium) Processor Architecture. Processor modes: 1.Protected (mode we will study) – 32-bit mode – 32-bit (4GB) address space 2.Virtual 8086 modes.
Arithmetic Flags and Instructions
26-Nov-15 (1) CSC Computer Organization Lecture 6: Pentium IA-32.
Other Processors. Having learnt MIPS, we can learn other major processors. Not going to be able to cover everything; will pick on the interesting aspects.
Chapter 2 Parts of a Computer System. 2.1 PC Hardware: Memory.
Arrays. Outline 1.(Introduction) Arrays An array is a contiguous block of list of data in memory. Each element of the list must be the same type and use.
CSC 221 Computer Organization and Assembly Language Lecture 16: Procedures.
Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009.
CSC 221 Computer Organization and Assembly Language Lecture 20: Conditional and Block Structures.
Assembly Language for Intel-Based Computers, 4 th Edition Lecture 22: Conditional Loops (c) Pearson Education, All rights reserved. You may modify.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Paradyn Project Safe and Efficient Instrumentation Andrew Bernat.
Chapter 8 String Operations. 8.1 Using String Instructions.
Machine-Level Programming 2 Control Flow Topics Condition Codes Setting Testing Control Flow If-then-else Varieties of Loops Switch Statements.
Chapter Overview General Concepts IA-32 Processor Architecture
Assembly function call convention
Machine-Level Programming 2 Control Flow
Instruction Set Architecture
C function call conventions and the stack
Data Transfers, Addressing, and Arithmetic
IA32 Processors Evolutionary Design
Conditional Branch Example
Other Processors.
Introduction to Information Security
Aaron Miller David Cohen Spring 2011
Speeding up Machine-Code Synthesis
Basic Microprocessor Architecture
Assembly IA-32.
# include < stdio.h > v oid main(void) { long NUM1[5]; long SUM; long N; NUM1[0] = 17; NUM1[1] = 3; NUM1[2] =  51; NUM1[3] = 242; NUM1[4] = 113; SUM =
Assembly Language Programming V: In-line Assembly Code
COAL Chapter 1,2,3.
Model-Assisted Machine-Code Synthesis
asum.ys A Y86 Programming Example
Y86 Processor State Program Registers
Partial Evaluation of Machine Code
Condition Codes Single Bit Registers
Machine-Level Programming 2 Control Flow
CS 301 Fall 2002 Computer Organization
Machine-Level Programming 2 Control Flow
Practical Session 4.
Machine-Level Programming 2 Control Flow
EECE.3170 Microprocessor Systems Design I
EECE.3170 Microprocessor Systems Design I
Computer Architecture CST 250
Machine-Level Programming II: Control Flow
X86 Assembly Review.
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Principles of Computers 18th Lecture
Computer Architecture and System Programming Laboratory
Presentation transcript:

Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015

IA-32 Primer 2 Registers 1000 ESP Stack push eax 20 EAX 20 mov ebx, eax 20 EBX mov [esp], lea esp, [esp+4] add eax, ebx 40 Registers EAX, EBX, ECX, EDX, ESP, EBP, EIP, ESI, EDI Registers EAX, EBX, ECX, EDX, ESP, EBP, EIP, ESI, EDI Flags CF, OF, ZF, SF, PF, … Flags CF, OF, ZF, SF, PF, …

Superoptimization Input: An instruction sequence I – Instructions belong to an ISA – No loops Output: An instruction-sequence I’ – Instructions belong to same ISA – I’ is equivalent to I On all inputs, I and I’ produce the same results – I’ is the optimal implementation of I + timeout t an improved shorter faster lesser energy consumption

Superoptimization sub eax, ecx neg eax sub eax, 1 sub eax, ecx neg eax sub eax, 1 IA-32 Superoptimizer IA-32 Superoptimizer not eax add eax, ecx not eax add eax, ecx EAX ← ECX – EAX – 1

Superoptimization add ebp. 4 mov [ebp], eax add ebp, 4 mov [ebp], ebx add ebp. 4 mov [ebp], eax add ebp, 4 mov [ebp], ebx IA-32 Superoptimizer IA-32 Superoptimizer mov [ebp+4], eax mov [ebp+8], ebx add ebp, 8 mov [ebp+4], eax mov [ebp+8], ebx add ebp, 8

Why do we need a superoptimizer?

© Alvin Cheung, CSE 501

Peephole Optimization Done at low-level IR – Closer to assembly code Sliding window – “peephole” Peephole rules – Rewrite rules of form “LHS → RHS ” If current peephole matches LHS of some rule, replace with RHS

Peephole Rules Rule categoryLHSRHS Eliminating redundant loads and stores mov reg, [addr] // Load contents of [addr] in reg mov [addr], reg // Store contents of reg in [addr] (Both instructions must be in the same basic block) mov reg, [addr] Control-flow optimizations goto L1 … L1: goto L2 goto L2 … L1: goto L2 Algebraic rewrites add oprnd, 0(none) imul oprnd, 2shl oprnd, 1 Machine idioms add oprnd, 1inc oprnd

Problem with Peephole Rules Manually written – Cumbersome – Error-prone

Peephole Superoptimizer Superoptimizer. mov eax, [esp] mov [esp], eax. mov eax, [esp] add eax, 1 mov [esp], eax.... mov eax, [esp] mov [esp], eax... mov eax, [esp] add eax, 1 mov [esp], eax.... mov eax, [esp]. mov eax, [esp] add eax, 1 mov [esp], eax.... mov eax, [esp]... add eax, 1 mov [esp], eax...

Peephole Superoptimizer Superoptimizer. mov eax, [esp] mov [esp], eax. mov eax, [esp] add eax, 1 mov [esp], eax. mov eax, [esp] mov [esp], eax. mov eax, [esp] add eax, 1 mov [esp], eax. mov eax, [esp]. add [esp], 1 mov eax, [esp].... mov eax, [esp].... add [esp], 1 mov eax, [esp]...

Peephole Superoptimizer Superoptimization is a sloooooooooowwwww process

Optimization Database Training programs Instruction sequences Superoptimizer Optimization database

Peephole Superoptimizer. mov eax, [esp] mov [esp], eax. mov eax, [esp] add eax, 1 mov [esp], eax.... mov eax, [esp] mov [esp], eax... mov eax, [esp] add eax, 1 mov [esp], eax.... mov eax, [esp]. mov eax, [esp] add eax, 1 mov [esp], eax.... mov eax, [esp]... add eax, 1 mov [esp], eax... Optimization database

Peephole Superoptimizer. mov eax, [esp] mov [esp], eax. mov eax, [esp] add eax, 1 mov [esp], eax. mov eax, [esp] mov [esp], eax. mov eax, [esp] add eax, 1 mov [esp], eax. mov eax, [esp]. add [esp], 1 mov eax, [esp].... mov eax, [esp].... add [esp], 1 mov eax, [esp]... Optimization database

Outline Problem statement Motivation – Peephole optimization – Speeding up critical inner loops Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization

Checking equivalence of two instruction-sequences Key primitive in superoptimizers and related tools “Do two loop-free instruction sequences I and I’ produce the same outputs for all possible inputs?”

Checking equivalence of two instruction-sequences 1.Encode I as a logical formula ϕ 2.Encode I’ as a logical formula ψ 3.Use an SMT solver to check if ϕ ⟺ ψ is logically valid How to encode an instruction sequence as a logical formula? How to use an SMT solver to check if two formulas are equivalent?

How to encode an instruction sequence I as a logical formula ϕ? 1.Symbolically evaluate I on Id state to obtain post-state σ 2.Convert σ into a logical formula

Concrete Evaluation IA-32 state IA-32 interpreter – Concrete operational semantics of IA-32 instructions

IA-32 State ⟨RegMap, FlagMap, MemMap⟩ RegMap : Register ↦ 32-bit value FlagMap : Flag ↦ Boolean value MemMap : 32-bit address ↦ 8-bit value ⟨ [EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩ ⟨ [EAX ↦ 0, EBX ↦ 8, ECX ↦ 0, …, ESP ↦ 1000, EBP ↦ 0, …, EDI ↦ 0 ], [SF ↦ false, CF ↦ false, … OF ↦ false ], [0 ↦ 0, 4 ↦ 0, 8 ↦ 0, …, 1000 ↦ 2, 1004 ↦ 0, …, ↦ 0]⟩ ⟨ [EAX ↦ 0, EBX ↦ 8, ECX ↦ 0, …, ESP ↦ 1000, EBP ↦ 0, …, EDI ↦ 0 ], [SF ↦ false, CF ↦ false, … OF ↦ false ], [0 ↦ 0, 4 ↦ 0, 8 ↦ 0, …, 1000 ↦ 2, 1004 ↦ 0, …, ↦ 0]⟩ 32-bit address ↦ 32-bit value

Concrete Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩ ⟨ [EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩

Concrete Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩ ⟨ [EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩

Concrete Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX ↦ 2, EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩ ⟨ [EAX ↦ 2, EBX ↦ 8, ESP ↦ 1000 ], [ ], [1000 ↦ 2]⟩

Concrete Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX ↦ 6, EBX ↦ 8, ESP ↦ 1000 ], [PF ↦ true ], [1000 ↦ 2]⟩ ⟨ [EAX ↦ 6, EBX ↦ 8, ESP ↦ 1000 ], [PF ↦ true ], [1000 ↦ 2]⟩

Concrete Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX ↦ 6, EBX ↦ 4, ESP ↦ 1000 ], [PF ↦ true ], [1000 ↦ 2]⟩ ⟨ [EAX ↦ 6, EBX ↦ 4, ESP ↦ 1000 ], [PF ↦ true ], [1000 ↦ 2]⟩

Concrete Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX ↦ 6, EBX ↦ 4, ESP ↦ 1000 ], [PF ↦ true ], [1000 ↦ 4]⟩ ⟨ [EAX ↦ 6, EBX ↦ 4, ESP ↦ 1000 ], [PF ↦ true ], [1000 ↦ 4]⟩

Symbolic Evaluation Symbolic IA-32 state Symbolic IA-32 interpreter – Symbolic transformers for instructions – Obtained from concrete operational semantics of IA-32 instructions

Symbolic IA-32 State ⟨RegMap, FlagMap, MemMap⟩ RegMap : Symbolic register-constant ↦ term FlagMap : Symbolic flag-constant ↦ formula MemMap : Function symbol ↦ function-update expression ID state ⟨ [EAX’ ↦ EAX, EBX’ ↦ EBX, …, ESP’ ↦ ESP, EBP’ ↦ EBP, …, EDI’ ↦ EDI ], [SF’ ↦ SF, CF’ ↦ CF, … ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩ ID state ⟨ [EAX’ ↦ EAX, EBX’ ↦ EBX, …, ESP’ ↦ ESP, EBP’ ↦ EBP, …, EDI’ ↦ EDI ], [SF’ ↦ SF, CF’ ↦ CF, … ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩ QFBV+Arrays

Symbolic Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX’ ↦ EAX, EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ SF, …, ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩ ⟨ [EAX’ ↦ EAX, EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ SF, …, ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩

Symbolic Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX’ ↦ EAX, EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ SF, …, ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩ ⟨ [EAX’ ↦ EAX, EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ SF, …, ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩

Symbolic Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX’ ↦ Mem(ESP), EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ SF, …, ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩ ⟨ [EAX’ ↦ Mem(ESP), EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ SF, …, ZF’ ↦ ZF ], [Mem’ ↦ Mem]⟩

Symbolic Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((Mem(ESP) + 4) < 0), …, ZF’ ↦ ((Mem(ESP) + 4) = 0) ], [Mem’ ↦ Mem]⟩ ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((Mem(ESP) + 4) < 0), …, ZF’ ↦ ((Mem(ESP) + 4) = 0) ], [Mem’ ↦ Mem]⟩

Symbolic Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), …, ZF’ ↦ ((EBX – 4) = 0) ], [Mem’ ↦ Mem]⟩ ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), …, ZF’ ↦ ((EBX – 4) = 0) ], [Mem’ ↦ Mem]⟩

Symbolic Evaluation mov eax, [esp] add eax, 4 sub ebx, 4 mov [esp], ebx ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), …, ZF’ ↦ ((EBX – 4) = 0) ], [Mem’ ↦ Mem[ESP ↦ EBX – 4]]⟩ ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), …, ZF’ ↦ ((EBX – 4) = 0) ], [Mem’ ↦ Mem[ESP ↦ EBX – 4]]⟩

Convert a symbolic state into a logical formula ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), …, ZF’ ↦ ((EBX – 4) = 0) ], [Mem’ ↦ Mem[ESP ↦ EBX – 4]]⟩ ⟨ [EAX’ ↦ Mem(ESP) + 4, EBX’ ↦ EBX – 4, …, ESP’ ↦ ESP, … ], [SF’ ↦ ((EBX – 4) < 0), …, ZF’ ↦ ((EBX – 4) = 0) ], [Mem’ ↦ Mem[ESP ↦ EBX – 4]]⟩ EAX’ = Mem(ESP) + 4 ∧ EBX’ = EBX – 4 ∧ … ∧ ESP’ = ESP ∧ … ∧ SF’ = ((EBX – 4) < 0) ∧ … ∧ ZF’ = ((EBX – 4) = 0) ∧ Mem’ = Mem[ESP ↦ EBX – 4] EAX’ = Mem(ESP) + 4 ∧ EBX’ = EBX – 4 ∧ … ∧ ESP’ = ESP ∧ … ∧ SF’ = ((EBX – 4) < 0) ∧ … ∧ ZF’ = ((EBX – 4) = 0) ∧ Mem’ = Mem[ESP ↦ EBX – 4]

How to encode an instruction sequence I as a logical formula ϕ? 1.Symbolically evaluate I on Id state to obtain post-state σ 2.Convert σ into a logical formula push eax EAX’ = EAX ∧ EBX’ = EBX ∧ … ∧ ESP’ = ESP – 4 ∧ … ∧ SF’ = SF ∧ … ∧ ZF’ = ZF ∧ Mem’ = Mem[ESP – 4 ↦ EAX] EAX’ = EAX ∧ EBX’ = EBX ∧ … ∧ ESP’ = ESP – 4 ∧ … ∧ SF’ = SF ∧ … ∧ ZF’ = ZF ∧ Mem’ = Mem[ESP – 4 ↦ EAX] ESP’ = ESP – 4 ∧ Mem’ = Mem[ESP – 4 ↦ EAX]

Exercise Convert the following instruction sequence into a QFBV formula lea esp, [esp - 4] mov [esp], 1 push ebp mov ebp, esp

Checking Equivalence of Two Instruction-Sequences 1.Encode I as a logical formula ϕ 2.Encode I’ as a logical formula ψ 3.Use an SMT solver to check if ϕ ⟺ ψ is logically valid How to encode an instruction sequence as a logical formula? How to use an SMT solver to check if two formulas are equivalent?

How to use SMT solver to check equivalence? Conventionally used for “satisfiability” – Symbolic execution for test-case generation EAX > 0 ∧ EBX = EAX + 4 ∧ EBX > 0 SMT solver SAT [EAX ↦ 1, EBX ↦ 5 ] SAT [EAX ↦ 1, EBX ↦ 5 ] EAX > 0 ∧ EBX = EAX + 4 ∧ EBX ≤ 0 SMT solver UNSAT

How to use SMT solver to check equivalence? SMT solver UNSAT (Two formulas are equivalent) UNSAT (Two formulas are equivalent) SMT solver SAT [EAX ↦ 1, EBX ↦ 1 ] (Counterexample) SAT [EAX ↦ 1, EBX ↦ 1 ] (Counterexample)

Checking equivalence of two instruction sequences sub eax, ecx neg eax sub eax, 1 sub eax, ecx neg eax sub eax, 1 EAX’ = -1 * (EAX + (-1 * ECX)) - 1 ∧ SF’ = (-1 * (EAX + (-1 * ECX)) - 1) < 0 EAX’ = -1 * (EAX + (-1 * ECX)) - 1 ∧ SF’ = (-1 * (EAX + (-1 * ECX)) - 1) < 0 not eax add eax, ecx not eax add eax, ecx EAX’ = -1 * EAX + ECX - 1 ∧ SF’ = (-1 * EAX + ECX - 1) < 0 EAX’ = -1 * EAX + ECX - 1 ∧ SF’ = (-1 * EAX + ECX - 1) < 0 SMT solver UNSAT

Checking equivalence of two instruction sequences mov eax, [esp] add eax, ebx mov eax, [esp] add eax, ebx EAX’ = Mem(ESP) + EBX ∧ SF’ = (Mem(ESP) + EBX) < 0 EAX’ = Mem(ESP) + EBX ∧ SF’ = (Mem(ESP) + EBX) < 0 mov eax, [esp] sub eax, ebx mov eax, [esp] sub eax, ebx SMT solver SAT [EBX ↦ 1, ESP ↦ 1000, Mem[1000 ↦ 1 ]] SAT [EBX ↦ 1, ESP ↦ 1000, Mem[1000 ↦ 1 ]] EAX’ = Mem(ESP) + EBX ∧ SF’ = (Mem(ESP) - EBX) < 0 EAX’ = Mem(ESP) + EBX ∧ SF’ = (Mem(ESP) - EBX) < 0

Outline Problem statement Motivation – Peephole optimization Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization

Thousands of unique IA-32 instruction schemas Exponential cost of enumeration += Instruction- Sequence Enumerator Equivalence Check Equivalence Check I A Naïve Superoptimizer Optimality check No Timeout expired I’

Outline Problem statement Motivation – Peephole optimization Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization

Canonicalization mov ebp, esp mov esp, ebp mov ebp, esp mov esp, ebp IA-32 Superoptimizer IA-32 Superoptimizer mov ebp, esp mov ebp, eax mov eax, ebp mov ebp, eax mov eax, ebp IA-32 Superoptimizer IA-32 Superoptimizer mov ebp, eax mov esi, esp mov esp, esi mov esi, esp mov esp, esi IA-32 Superoptimizer IA-32 Superoptimizer mov esi, esp

Canonicalization mov ebp, esp mov esp, ebp mov ebp, esp mov esp, ebp IA-32 Superoptimizer IA-32 Superoptimizer mov ebp, esp mov ebp, eax mov eax, ebp mov ebp, eax mov eax, ebp IA-32 Superoptimizer IA-32 Superoptimizer mov ebp, eax mov esi, esp mov esp, esi mov esi, esp mov esp, esi IA-32 Superoptimizer IA-32 Superoptimizer mov esi, esp mov reg1, reg2 mov reg2, reg1 mov reg1, reg2 mov reg2, reg1 IA-32 Superoptimizer IA-32 Superoptimizer mov reg1, reg2

Canonicalization add ebp, 20 add ebp, 30 add ebp, 20 add ebp, 30 IA-32 Superoptimizer IA-32 Superoptimizer add ebp, 50 add reg1, c1 add reg1, c2 add reg1, c1 add reg1, c2 IA-32 Superoptimizer IA-32 Superoptimizer add reg1, c1+c2

Canonicalization add ebp, 20 add ebp, 30 add ebp, 20 add ebp, 30 Canonicalizer add ebp, 50 add reg1, c1 add reg1, c2 add reg1, c1 add reg1, c2 Uncanonicalizer add reg1, c1+c2 IA-32 Superoptimizer IA-32 Superoptimizer reg1 ↦ ebp c1 ↦ 20 c2 ↦ 30 reg1 ↦ ebp c1 ↦ 20 c2 ↦ 30

Instruction- Sequence Enumerator Equivalence Check Equivalence Check I A Naïve Superoptimizer Optimality check No Timeout expired I’

Instruction- Sequence Enumerator Equivalence Check Equivalence Check I A Naïve Superoptimizer + Canonicalization Optimality check No Timeout expired I’ Canonicalizer I’’ Uncanonicalizer

Outline Problem statement Motivation – Peephole optimization Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization

Instruction- Sequence Enumerator Equivalence Check Equivalence Check I A Naïve Superoptimizer + Canonicalization Optimality check No Timeout expired I’ Canonicalizer I’’ Uncanonicalizer

Test Cases mov reg1, [reg2] add reg1, reg3 mov reg1, [reg2] add reg1, reg3 mov reg1, [reg2] sub reg1, reg3 mov reg1, [reg2] sub reg1, reg3 [REG3 ↦ 0, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG3 ↦ 0, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 1, REG3 ↦ 0, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 1, REG3 ↦ 0, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 1, REG3 ↦ 0, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 1, REG3 ↦ 0, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 2, REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 2, REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 0, REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] [REG1 ↦ 0, REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]]

Instruction- Sequence Enumerator Equivalence check Equivalence check I Test Cases Optimality check Optimality check No Timeout expired I’ Canonicalizer Test cases Test cases Fail I’’ Uncanonicalizer

Checking equivalence of two instruction sequences mov reg1, [reg2] add reg1, reg3 mov reg1, [reg2] add reg1, reg3 REG1’ = Mem(REG2) + REG3 ∧ SF’ = (Mem(REG2) + REG3) < 0 REG1’ = Mem(REG2) + REG3 ∧ SF’ = (Mem(REG2) + REG3) < 0 mov reg1, [reg2] sub reg1, reg3 mov reg1, [reg2] sub reg1, reg3 SMT solver SAT [REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] SAT [REG3 ↦ 1, REG2 ↦ 1000, Mem[1000 ↦ 1 ]] REG1’ = Mem(REG2) + REG3 ∧ SF’ = (Mem(REG2) – REG3) < 0 REG1’ = Mem(REG2) + REG3 ∧ SF’ = (Mem(REG2) – REG3) < 0 Counterexample

Instruction- Sequence Enumerator Equivalence check Equivalence check I Test Cases Optimality check Optimality check No Timeout expired I’ Canonicalizer Test cases Test cases Fail Counter-example Counterexample- guided inductive synthesis (CEGIS) I’’ Uncanonicalizer

Outline Problem statement Motivation – Peephole optimization Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization

Pruning sub-optimal candidates. add ebp, c1+c2. One-instruction sequencesTwo-instruction sequencesThree-instruction sequences. add ebp, c1 add ebp, c

Instruction- Sequence Enumerator + Pruning Equivalence check Equivalence check I Superoptimizer Optimality check Optimality check No Timeout expired I’ Canonicalizer Test cases Test cases Fail Counter-example

Results – Search-space pruning LengthOriginal search space After canonicalization After pruningReduction factor million2.49 million1.2 million billion8.6 billion3.11 billion52.1 Original search space After canonicalization After pruning After test-cases

But still …. Superoptimization is still a sloooooooooowwwww process After canonicalization, pruning, and testing, checking all candidates of length up to 3 instructions takes several hours

Outline Problem statement Motivation – Peephole optimization Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization

© Eric Schkufza, ASPLOS 2013

Montgomery multiplication kernel in OpenSSL library (116 instructions): STOKE’s result is 16 instructions shorter and 1.6 times faster than gcc –O3’s result

Outline Problem statement Motivation – Peephole optimization Equivalence of two instruction-sequences A naïve superoptimizer Massalin/Bansal-Aiken superoptimizer – Canonicalization – Test cases – Counterexamples as tests – Pruning away sub-optimal candidates Stochastic superoptimization