Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken

Slides:



Advertisements
Similar presentations
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Advertisements

Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
Machine/Assembler Language Putting It All Together Noah Mendelsohn Tufts University Web:
There are two types of addressing schemes:
Timed Automata.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
CS2422 Assembly Language & System Programming September 19, 2006.
The CPU Revision Typical machine code instructions Using op-codes and operands Symbolic addressing. Conditional and unconditional branches.
1 HOIST: A System for Automatically Deriving Static Analyzers for Embedded Systems John Regehr Alastair Reid School of Computing, University of Utah.
1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization.
DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.
1 ICS 51 Introductory Computer Organization Fall 2009.
1 Carnegie Mellon Assembly and Bomb Lab : Introduction to Computer Systems Recitation 4, Sept. 17, 2012.
Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.
1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Carnegie Mellon Machine-Level Programming II: Control Carnegie Mellon.
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Rahul Sharma, Eric Schkufza, Berkeley Churchill, Alex Aiken.
Conditionally Correct Superoptimization Rahul Sharma, Eric Schkufza, Berkeley Churchill, Alex Aiken (Stanford University)
Spring 2016Assembly Review Roadmap 1 car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Car c = new Car(); c.setMiles(100);
CMPT 438 Algorithms.
Instruction Set Architecture
Machine-Level Programming I: Basics
Homework Reading Lab with your assigned section starts next week
Credits and Disclaimers
ECE 353 Lab 3 Pipeline Simulator
Part of the Assembler Language Programmers Toolbox
Control Unit Lecture 6.
Y86-64 Instructions 8-byte integer operations (no floating point!)  “word” (it is considered a long word for historical reasons, because the original.
A Closer Look at Instruction Set Architectures: Expanding Opcodes
Homework Reading Continue work on mp1
CS-401 Assembly Language Programming
Symbolic Instruction and Addressing
Machine-Level Programming II: Control Flow
Introduction to Assembly Language
x86-64 Programming II CSE 351 Winter 2018
Processor Architecture: The Y86-64 Instruction Set Architecture
Georg Hofferek, Ashutosh Gupta, Bettina Könighofer, Jie-Hong Roland Jiang and Roderick Bloem Synthesizing Multiple Boolean Functions using Interpolation.
LPSAT: A Unified Approach to RTL Satisfiability
Objective of This Course
Encodings and reducibility
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Symbolic Instruction and Addressing
Processor Architecture: The Y86-64 Instruction Set Architecture
Programming in JavaScript
Shift & Rotate Instructions)
COMP 1321 Digital Infrastructure
COMP 1321 Digital Infrastructure
Symbolic Instruction and Addressing
Shift & Rotate Instructions)
Getting Started Download the tarball for this session. It will include the following files: driver 64-bit executable driver.c C driver source bomb.h declaration.
Computer Architecture CST 250
Microprocessor and Assembly Language
Boolean Expressions to Make Comparisons
Lecture 6 Architecture Algorithm Defin ition. Algorithm 1stDefinition: Sequence of steps that can be taken to solve a problem 2ndDefinition: The step.
ECE 352 Digital System Fundamentals
IEEE Floating Point Adder Verification
Recording Synthesis History for Sequential Verification
Reseeding-based Test Set Embedding with Reduced Test Sequences
Chapter 6 –Symbolic Instruction and Addressing
Other Processors Having learnt MIPS, we can learn other major processors. Not going to be able to cover everything; will pick on the interesting aspects.
Carnegie Mellon Ithaca College
Carnegie Mellon Ithaca College
Automatic Abstraction of Microprocessors for Verification
CS201- Lecture 8 IA32 Flow Control
Getting Started Download the tarball for this session. It will include the following files: driver 64-bit executable driver.c C driver source bomb.h declaration.
Credits and Disclaimers
Computer Architecture Assembly Language
Software Development Techniques
Part I Data Representation and 8086 Microprocessors
Presentation transcript:

Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken Stratified Synthesis: Automatically Learning the x86-64 Instruction Set Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken PLDI, Santa Barbara, June 16, 2016

Automatically Reason about Programs Motivation Symbolic Execution Automatically Reason about Programs Program Verification 𝜙 ≡ I would like to start with some motivation. A lot of work in our community has been about building tools to automatically reason about programs. Symbolic execution has been used to explore paths through programs, for example to find inputs that cause some bad behavior (or prove that no such inputs exist). In program verification, we try to prove a program equivalent to some formal specification. Or similarly, we might want to prove the equivalence between two programs, maybe because one is much faster while we have more confidence in the other. And there are many more. Program Equivalence ≡ …

Automatically reasoning about programs requires Formal Semantics All of these example have one thing in common: they require a formal semantics of the programming language to allow tools to automatically reason about programs.

x86-64 testq %rdi, %rdi je .L1 xorq %rax, %rax .L0: movq %rdi, %rdx andq $0x1, %rdx addq %rdx, %rax shrq $0x1, %rdi jne .L0 cltq retq .L1: In our setting, we are interested in x86-64, the most widely used instruction set on desktops and servers. So, for us, programs look something like this. To be able to reason about such programs automatically, we need a formal specification of all instructions. The advantage of dealing with x86 directly is that our tools can work with almost any binary, regardless of its origin or source language.

64-bit bit-vector addition x86-64 Semantics 64-bit bit-vector addition addq $0x1, %rax rax←rax + 64 1 64 64-bit constant previous value of rax Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al al ←al + 8 1 8 Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al al ←al + 8 1 8 rax 64 bits Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right. eax 32 bits ax 16 bits ah al 8 bits 8 bits

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al al ←al + 8 1 8 rax←rax 63:8 ∘ rax 7:0 + 8 1 8 rax 64 bits Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right. eax 32 bits ax 16 bits ah al 8 bits 8 bits

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al addw $0x1, %ax rax←rax 63:16 ∘ rax 15:0 + 16 1 16 addl $0x1, %eax rax←rax[63:32] ∘(rax[31:0] + 32 1 32 ) Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al addw $0x1, %ax rax←rax 63:16 ∘ rax 15:0 + 16 1 16 addl $0x1, %eax rax← 0 32 ∘(rax[31:0] + 32 1 32 ) Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al addw $0x1, %ax rax←rax 63:16 ∘ rax 15:0 + 16 1 16 addl $0x1, %eax rax← 0 32 ∘(rax[31:0] + 32 1 32 ) zf← 0 32 =(eax + 32 1 32 ) cf← 0 1 ∘eax + 33 1 33 [32,32] sf← eax + 32 1 32 [31,31] of←¬eax 31,31 ∧(eax + 32 1 32 )[31,31] pf←(eax + 32 1 32 )[0,0]⊕(eax + 32 1 32 )[1,1]⊕ (eax + 32 1 32 )[2,2]⊕(eax + 32 1 32 )[3,3]⊕ (eax + 32 1 32 )[4,4]⊕(eax + 32 1 32 )[5,5]⊕ (eax + 32 1 32 )[6,6]⊕(eax + 32 1 32 )[7,7] Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

Related Work Manual partial specifications Taly/Godefroid [PLDI’12] CompCert [CACM’09], BAP [CAV’11], BitBlaze [ICISS’08], Codesurfer/x86 [ETAPS’05], McVeto [CAV’10], STOKE [ASPLOS’13], Jakstab [CAV’08], many others Taly/Godefroid [PLDI’12] Automatically synthesize specification from templates Only 534 instructions

Automatically Learn a Specification for the x86-64 ISA Bit-vector formulas of input-output behavior

Strategy: Split Instruction Set All instructions Base set Specify manually Remaining Instructions Learn specification automatically

Strategy: Using Program Synthesis combine base formulas Program 𝑝 Instruction 𝑖 synthesize Formula 𝜙 How do we synthesize programs? Formal guarantee? 𝑖≡𝜙 1 2

Strategy: Using Program Synthesis combine base formulas Program 𝑝 Instruction 𝑖 synthesize Formula 𝜙 How do we synthesize programs? Randomized search Guided by cost function Based on test-cases Using STOKE [ASPLOS’13] 1

Strategy: Using Program Synthesis combine base formulas Program 𝑝 Instruction 𝑖 synthesize Formula 𝜙 Formal guarantee? 𝑖≡𝜙 𝑝≡𝜙 2

Strategy: Using Program Synthesis combine base formulas Program 𝑝 Instruction 𝑖 synthesize Formula 𝜙 Formal guarantee? 𝑖≡𝜙 𝑖≡𝑝≡𝜙 2

Strategy: Using Program Synthesis combine base formulas Program 𝑝 Instruction 𝑖 synthesize Candidate formula 𝜙 Formal guarantee? 𝑖≡𝜙 𝑖≡𝑝≡𝜙 2

Strategy: Using Program Synthesis combine base formulas Program 𝑝 Instruction 𝑖 synthesize Candidate formula 𝜙 Candidate formula 𝜙′ Program 𝑝′ … Candidate formula 𝜙′′ yes ✔ increase confidence 𝜙 ? 𝜙′ no Add counter example, remove wrong program(s)

Solver Imprecision Increase confidence yes no Remove incorrect program(s) 𝜙 ? 𝜙′ unknown timeout No information about equivalence

Solver Imprecision Increase confidence yes no Remove incorrect program(s) 𝜙 ? 𝜙′ unknown timeout No information about equivalence

Solver Imprecision Increase confidence yes no Remove incorrect program(s) 𝜙 ? 𝜙′ unknown timeout No information about equivalence Equivalence class 1 Equivalence class 2

Picking a Program Prefer programs whose formulas are Equivalence class 1 Equivalence class 2 Equivalence class 3 Prefer programs whose formulas are Precise (fewest uninterpreted functions) Fast (fewest non-linear arithmetic operations) Simple (fewest nodes)

Picking a Program Prefer programs whose formulas are Equivalence class 1 Equivalence class 2 Equivalence class 3 Prefer programs whose formulas are Precise (fewest uninterpreted functions) Fast (fewest non-linear arithmetic operations) Simple (fewest nodes)

Problem: Synthesis Limitations synthesize

Solution: Stratified Search

Generalizing Formulas Learn addw %ax, %dx dx←dx + 16 ax Rename addw %cx, %bx bx←bx + 16 cx ✔ addw (%rsp), %dx dx←dx + 16 M rsp ✔ addw $0x5, %dx dx←dx + 16 5 16 ✔

Generalization Summary Learn formula for register-only instructions Generalize formulas To other types of operands Check on test inputs

What if Generalization Impossible? shufps $0xb3, %xmm0, %xmm1 Problem: No corresponding register-only variant Solution: Brute force a formula for every constant

Experiment Base set (51 instructions) Integer, bitwise and float operations Data movement (including conditional move) Conversion operations Pseudo instructions (11 templates) Split and combine registers Changing status flags

Goal Total instructions 3,684 Out-of-scope Goal instructions 2,918 System instructions 302 Crypto instructions 35 Deprecated instructions 332 String instructions 97 Goal instructions 2,918 invpcid, jle aeskeygenassist fadd scasq

Results Base set 51 Pseudo instructions 11 Register-only instructions learned 692 Generalized 984 8-bit constant instructions learned 119.42 Total formulas learned 1,795.42

Evaluation: Are the Formulas Correct? Compare with handwritten formulas (from STOKE) Available for comparison 1,431.91 Automatically proven equivalent 1,377.91 Equivalent with additional lemma 4

Evaluation: Are the Formulas Correct? Compare with handwritten formulas (from STOKE) Available for comparison 1,431.91 fadd 𝑎,𝑏 =fadd 𝑏,𝑎 Automatically proven equivalent 1,377.91 Equivalent with additional lemma 4

Evaluation: Are the Formulas Correct? Compare with handwritten formulas (from STOKE) Available for comparison 1,431.91 Automatically proven equivalent 1,377.91 Equivalent with additional lemma 4 Semantically different 50 Handwritten formula correct Learned formula correct 50

Evaluation: Is Stratification Necessary? Stratum 0 Stratum 1 Stratum 2 Stratum 3 base set stratum 𝑖 = 0 if 𝑖∈baseset 1+ max 𝑖 ′ ∈𝑀(𝑖) stratum i ′ otherwise

Evaluation: Is Stratification Necessary? stratum 𝑖 = 0 if 𝑖∈baseset 1+ max 𝑖 ′ ∈𝑀(𝑖) stratum i ′ otherwise

Evaluation: Is Stratification Necessary?

Evaluation: Size of Learned Formulas Fully inlined: 3526 instructions number of nodes in learned formula number of nodes in handwritten formula

Conclusions Automatically learned 1,795 formulas Stratification key to scale program synthesis Compare to hand-written specification More correct, equally precise, same size Source code, formulas, experimental results https://github.com/StanfordPL/strata/

Backup Slides

Limitations Missing base instructions Program synthesis limits Some integer and floating point operations are missing Program synthesis limits Shortest known program is long and outside of reach e.g., byte-vectorized operation Cost function limitation For one bit of output, the cost function does not give enough signal Crazy instructions

SMT solver usage (Z3) Total decisions 7,075 Equivalent 6,669 (94.26%) New equivalence class 356 (5.03%) Counter-examples 50 (0.71%) Timeouts (45 seconds): 3

Experiment Details Intel Xeon E5-2697 (28 cores) at 2.6 GHz 268.86 hours (register-only) 159.12 hours (8-bit constants) Total of 11,983.37 core hours

Test Cases Random inputs (random machine state) “Interesting” bit-patterns 0, 1, −1, 2 𝑛 , NaN, Infinity Test cases learned from counter-examples

Evaluation: Simplifcation Formulas are simplified Constant propagation Move bit-selection over concatenation 2 64 ∗ 64 4 64 ≡ 8 64 0 64 ∘rax 63,0 ≡ rax

Evaluation: Formula Precision Formula precision (number of uninterpreted functions) Learned formulas equally precise in all but 4 cases Formula quality (number of non-linear operations) Learned formulas contain same number of non-linear operations, except for 11 cases