Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken

Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken
Stratified Synthesis: Automatically Learning the x86-64 Instruction Set Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken PLDI, Santa Barbara, June 16, 2016

Automatically Reason about Programs
Motivation Symbolic Execution Automatically Reason about Programs Program Verification 𝜙 ≡ I would like to start with some motivation. A lot of work in our community has been about building tools to automatically reason about programs. Symbolic execution has been used to explore paths through programs, for example to find inputs that cause some bad behavior (or prove that no such inputs exist). In program verification, we try to prove a program equivalent to some formal specification. Or similarly, we might want to prove the equivalence between two programs, maybe because one is much faster while we have more confidence in the other. And there are many more. Program Equivalence ≡ …

Automatically reasoning about programs requires
Formal Semantics All of these example have one thing in common: they require a formal semantics of the programming language to allow tools to automatically reason about programs.

x86-64 testq %rdi, %rdi je .L1 xorq %rax, %rax .L0: movq %rdi, %rdx andq $0x1, %rdx addq %rdx, %rax shrq $0x1, %rdi jne .L0 cltq retq .L1: In our setting, we are interested in x86-64, the most widely used instruction set on desktops and servers. So, for us, programs look something like this. To be able to reason about such programs automatically, we need a formal specification of all instructions. The advantage of dealing with x86 directly is that our tools can work with almost any binary, regardless of its origin or source language.

64-bit bit-vector addition
x86-64 Semantics 64-bit bit-vector addition addq $0x1, %rax rax←rax 64-bit constant previous value of rax Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al
al ←al Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

al ←al rax 64 bits Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right. eax bits ax 16 bits ah al 8 bits 8 bits

al ←al rax←rax 63:8 ∘ rax 7: rax 64 bits Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right. eax bits ax 16 bits ah al 8 bits 8 bits

addw $0x1, %ax rax←rax 63:16 ∘ rax 15: addl $0x1, %eax rax←rax[63:32] ∘(rax[31:0] ) Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

addw $0x1, %ax rax←rax 63:16 ∘ rax 15: addl $0x1, %eax rax← ∘(rax[31:0] ) Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

addw $0x1, %ax rax←rax 63:16 ∘ rax 15: addl $0x1, %eax rax← ∘(rax[31:0] ) zf← 0 32 =(eax ) cf← ∘eax [32,32] sf← eax [31,31] of←¬eax 31,31 ∧(eax )[31,31] pf←(eax )[0,0]⊕(eax )[1,1]⊕ (eax )[2,2]⊕(eax )[3,3]⊕ (eax )[4,4]⊕(eax )[5,5]⊕ (eax )[6,6]⊕(eax )[7,7] Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

Related Work Manual partial specifications Taly/Godefroid [PLDI’12]
CompCert [CACM’09], BAP [CAV’11], BitBlaze [ICISS’08], Codesurfer/x86 [ETAPS’05], McVeto [CAV’10], STOKE [ASPLOS’13], Jakstab [CAV’08], many others Taly/Godefroid [PLDI’12] Automatically synthesize specification from templates Only 534 instructions

Automatically Learn a Specification for the x86-64 ISA
Bit-vector formulas of input-output behavior

Strategy: Split Instruction Set
All instructions Base set Specify manually Remaining Instructions Learn specification automatically

Strategy: Using Program Synthesis
combine base formulas Program 𝑝 Instruction 𝑖 synthesize Formula 𝜙 How do we synthesize programs? Formal guarantee? 𝑖≡𝜙 1 2

combine base formulas Program 𝑝 Instruction 𝑖 synthesize Formula 𝜙 How do we synthesize programs? Randomized search Guided by cost function Based on test-cases Using STOKE [ASPLOS’13] 1

combine base formulas Program 𝑝 Instruction 𝑖 synthesize Formula 𝜙 Formal guarantee? 𝑖≡𝜙 𝑝≡𝜙 2

combine base formulas Program 𝑝 Instruction 𝑖 synthesize Formula 𝜙 Formal guarantee? 𝑖≡𝜙 𝑖≡𝑝≡𝜙 2

combine base formulas Program 𝑝 Instruction 𝑖 synthesize Candidate formula 𝜙 Formal guarantee? 𝑖≡𝜙 𝑖≡𝑝≡𝜙 2

combine base formulas Program 𝑝 Instruction 𝑖 synthesize Candidate formula 𝜙 Candidate formula 𝜙′ Program 𝑝′ … Candidate formula 𝜙′′ yes ✔ increase confidence 𝜙 ? 𝜙′ no Add counter example, remove wrong program(s)

Solver Imprecision Increase confidence yes no
Remove incorrect program(s) 𝜙 ? 𝜙′ unknown timeout No information about equivalence

Solver Imprecision Increase confidence yes no
Remove incorrect program(s) 𝜙 ? 𝜙′ unknown timeout No information about equivalence Equivalence class 1 Equivalence class 2

Picking a Program Prefer programs whose formulas are
Equivalence class 1 Equivalence class 2 Equivalence class 3 Prefer programs whose formulas are Precise (fewest uninterpreted functions) Fast (fewest non-linear arithmetic operations) Simple (fewest nodes)

Problem: Synthesis Limitations
synthesize

Solution: Stratified Search

Generalizing Formulas
Learn addw %ax, %dx dx←dx ax Rename addw %cx, %bx bx←bx cx ✔ addw (%rsp), %dx dx←dx M rsp ✔ addw $0x5, %dx dx←dx ✔

Generalization Summary
Learn formula for register-only instructions Generalize formulas To other types of operands Check on test inputs

What if Generalization Impossible?
shufps $0xb3, %xmm0, %xmm1 Problem: No corresponding register-only variant Solution: Brute force a formula for every constant

Experiment Base set (51 instructions)
Integer, bitwise and float operations Data movement (including conditional move) Conversion operations Pseudo instructions (11 templates) Split and combine registers Changing status flags

Goal Total instructions 3,684 Out-of-scope Goal instructions 2,918
System instructions Crypto instructions Deprecated instructions String instructions Goal instructions 2,918 invpcid, jle aeskeygenassist fadd scasq

Results Base set 51 Pseudo instructions 11
Register-only instructions learned 692 Generalized 8-bit constant instructions learned Total formulas learned 1,795.42

Evaluation: Are the Formulas Correct?
Compare with handwritten formulas (from STOKE) Available for comparison 1,431.91 Automatically proven equivalent 1,377.91 Equivalent with additional lemma 4

Compare with handwritten formulas (from STOKE) Available for comparison 1,431.91 fadd 𝑎,𝑏 =fadd 𝑏,𝑎 Automatically proven equivalent 1,377.91 Equivalent with additional lemma 4

Compare with handwritten formulas (from STOKE) Available for comparison 1,431.91 Automatically proven equivalent 1,377.91 Equivalent with additional lemma 4 Semantically different 50 Handwritten formula correct Learned formula correct 50

Evaluation: Is Stratification Necessary?
Stratum 0 Stratum 1 Stratum 2 Stratum 3 base set stratum 𝑖 = if 𝑖∈baseset 1+ max 𝑖 ′ ∈𝑀(𝑖) stratum i ′ otherwise

stratum 𝑖 = if 𝑖∈baseset 1+ max 𝑖 ′ ∈𝑀(𝑖) stratum i ′ otherwise

Evaluation: Size of Learned Formulas
Fully inlined: 3526 instructions number of nodes in learned formula number of nodes in handwritten formula

Conclusions Automatically learned 1,795 formulas
Stratification key to scale program synthesis Compare to hand-written specification More correct, equally precise, same size Source code, formulas, experimental results

Backup Slides

Limitations Missing base instructions Program synthesis limits
Some integer and floating point operations are missing Program synthesis limits Shortest known program is long and outside of reach e.g., byte-vectorized operation Cost function limitation For one bit of output, the cost function does not give enough signal Crazy instructions

SMT solver usage (Z3) Total decisions 7,075 Equivalent 6,669 (94.26%)
New equivalence class 356 (5.03%) Counter-examples 50 (0.71%) Timeouts (45 seconds): 3

Experiment Details Intel Xeon E5-2697 (28 cores) at 2.6 GHz
hours (register-only) hours (8-bit constants) Total of 11, core hours

Test Cases Random inputs (random machine state)
“Interesting” bit-patterns 0, 1, −1, 2 𝑛 , NaN, Infinity Test cases learned from counter-examples

Evaluation: Simplifcation
Formulas are simplified Constant propagation Move bit-selection over concatenation 2 64 ∗ ≡ 0 64 ∘rax 63,0 ≡ rax

Evaluation: Formula Precision
Formula precision (number of uninterpreted functions) Learned formulas equally precise in all but 4 cases Formula quality (number of non-linear operations) Learned formulas contain same number of non-linear operations, except for 11 cases

Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken

Similar presentations

Presentation on theme: "Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken

Similar presentations

Presentation on theme: "Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken"— Presentation transcript:

Similar presentations

About project

Feedback