Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken

Slides:

Advertisements

Similar presentations

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.

Advertisements

Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.

Machine/Assembler Language Putting It All Together Noah Mendelsohn Tufts University Web:

There are two types of addressing schemes:

Timed Automata.

Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.

CS2422 Assembly Language & System Programming September 19, 2006.

The CPU Revision Typical machine code instructions Using op-codes and operands Symbolic addressing. Conditional and unconditional branches.

1 HOIST: A System for Automatically Deriving Static Analyzers for Embedded Systems John Regehr Alastair Reid School of Computing, University of Utah.

1 ICS 51 Introductory Computer Organization Fall 2006 updated: Oct. 2, 2006.

Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization.

DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.

1 ICS 51 Introductory Computer Organization Fall 2009.

1 Carnegie Mellon Assembly and Bomb Lab : Introduction to Computer Systems Recitation 4, Sept. 17, 2012.

Superoptimization Venkatesh Karthik Srinivasan Guest Lecture in CS 701, Nov. 10, 2015.

1 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Carnegie Mellon Machine-Level Programming II: Control Carnegie Mellon.

Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.

Rahul Sharma, Eric Schkufza, Berkeley Churchill, Alex Aiken.

Conditionally Correct Superoptimization Rahul Sharma, Eric Schkufza, Berkeley Churchill, Alex Aiken (Stanford University)

Spring 2016Assembly Review Roadmap 1 car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Car c = new Car(); c.setMiles(100);

CMPT 438 Algorithms.

Instruction Set Architecture

Machine-Level Programming I: Basics

Homework Reading Lab with your assigned section starts next week

Credits and Disclaimers

ECE 353 Lab 3 Pipeline Simulator

Part of the Assembler Language Programmers Toolbox

Control Unit Lecture 6.

Y86-64 Instructions 8-byte integer operations (no floating point!)  “word” (it is considered a long word for historical reasons, because the original.

A Closer Look at Instruction Set Architectures: Expanding Opcodes

Homework Reading Continue work on mp1

CS-401 Assembly Language Programming

Symbolic Instruction and Addressing

Machine-Level Programming II: Control Flow

Introduction to Assembly Language

x86-64 Programming II CSE 351 Winter 2018

Processor Architecture: The Y86-64 Instruction Set Architecture

Georg Hofferek, Ashutosh Gupta, Bettina Könighofer, Jie-Hong Roland Jiang and Roderick Bloem Synthesizing Multiple Boolean Functions using Interpolation.

LPSAT: A Unified Approach to RTL Satisfiability

Objective of This Course

Encodings and reducibility

Morgan Kaufmann Publishers Computer Organization and Assembly Language

Symbolic Instruction and Addressing

Processor Architecture: The Y86-64 Instruction Set Architecture

Programming in JavaScript

Shift & Rotate Instructions)

COMP 1321 Digital Infrastructure

COMP 1321 Digital Infrastructure

Symbolic Instruction and Addressing

Shift & Rotate Instructions)

Getting Started Download the tarball for this session. It will include the following files: driver 64-bit executable driver.c C driver source bomb.h declaration.

Computer Architecture CST 250

Microprocessor and Assembly Language

Boolean Expressions to Make Comparisons

Lecture 6 Architecture Algorithm Defin ition. Algorithm 1stDefinition: Sequence of steps that can be taken to solve a problem 2ndDefinition: The step.

ECE 352 Digital System Fundamentals

IEEE Floating Point Adder Verification

Recording Synthesis History for Sequential Verification

Reseeding-based Test Set Embedding with Reduced Test Sequences

Chapter 6 –Symbolic Instruction and Addressing

Other Processors Having learnt MIPS, we can learn other major processors. Not going to be able to cover everything; will pick on the interesting aspects.

Carnegie Mellon Ithaca College

Carnegie Mellon Ithaca College

Automatic Abstraction of Microprocessors for Verification

CS201- Lecture 8 IA32 Flow Control

Getting Started Download the tarball for this session. It will include the following files: driver 64-bit executable driver.c C driver source bomb.h declaration.

Credits and Disclaimers

Computer Architecture Assembly Language

Software Development Techniques

Part I Data Representation and 8086 Microprocessors

Presentation transcript:

Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken Stratified Synthesis: Automatically Learning the x86-64 Instruction Set Stefan Heule, Eric Schkufza, Rahul Sharma, Alex Aiken PLDI, Santa Barbara, June 16, 2016

Automatically Reason about Programs Motivation Symbolic Execution Automatically Reason about Programs Program Verification 𝜙 ≡ I would like to start with some motivation. A lot of work in our community has been about building tools to automatically reason about programs. Symbolic execution has been used to explore paths through programs, for example to find inputs that cause some bad behavior (or prove that no such inputs exist). In program verification, we try to prove a program equivalent to some formal specification. Or similarly, we might want to prove the equivalence between two programs, maybe because one is much faster while we have more confidence in the other. And there are many more. Program Equivalence ≡ …

Automatically reasoning about programs requires Formal Semantics All of these example have one thing in common: they require a formal semantics of the programming language to allow tools to automatically reason about programs.

x86-64 testq %rdi, %rdi je .L1 xorq %rax, %rax .L0: movq %rdi, %rdx andq $0x1, %rdx addq %rdx, %rax shrq $0x1, %rdi jne .L0 cltq retq .L1: In our setting, we are interested in x86-64, the most widely used instruction set on desktops and servers. So, for us, programs look something like this. To be able to reason about such programs automatically, we need a formal specification of all instructions. The advantage of dealing with x86 directly is that our tools can work with almost any binary, regardless of its origin or source language.

64-bit bit-vector addition x86-64 Semantics 64-bit bit-vector addition addq $0x1, %rax rax←rax + 64 1 64 64-bit constant previous value of rax Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al al ←al + 8 1 8 Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al al ←al + 8 1 8 rax 64 bits Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right. eax 32 bits ax 16 bits ah al 8 bits 8 bits

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al al ←al + 8 1 8 rax←rax 63:8 ∘ rax 7:0 + 8 1 8 rax 64 bits Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right. eax 32 bits ax 16 bits ah al 8 bits 8 bits

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al addw $0x1, %ax rax←rax 63:16 ∘ rax 15:0 + 16 1 16 addl $0x1, %eax rax←rax[63:32] ∘(rax[31:0] + 32 1 32 ) Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al addw $0x1, %ax rax←rax 63:16 ∘ rax 15:0 + 16 1 16 addl $0x1, %eax rax← 0 32 ∘(rax[31:0] + 32 1 32 ) Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

x86-64 Semantics addq $0x1, %rax rax←rax + 64 1 64 addb $0x1, %al addw $0x1, %ax rax←rax 63:16 ∘ rax 15:0 + 16 1 16 addl $0x1, %eax rax← 0 32 ∘(rax[31:0] + 32 1 32 ) zf← 0 32 =(eax + 32 1 32 ) cf← 0 1 ∘eax + 33 1 33 [32,32] sf← eax + 32 1 32 [31,31] of←¬eax 31,31 ∧(eax + 32 1 32 )[31,31] pf←(eax + 32 1 32 )[0,0]⊕(eax + 32 1 32 )[1,1]⊕ (eax + 32 1 32 )[2,2]⊕(eax + 32 1 32 )[3,3]⊕ (eax + 32 1 32 )[4,4]⊕(eax + 32 1 32 )[5,5]⊕ (eax + 32 1 32 )[6,6]⊕(eax + 32 1 32 )[7,7] Ok, that’s not so bad, so the specification we are looking for is this. Rax is updated with rax plus 1. Here, the plus is addition over 64-bit wide bit vectors, and 1 is a 64bit constant. We can easily also get a specification for related instructions. For 8 bit, 16 and 32. Actually, it turns out x86 is weird, and for 32 bits the semantics are slightly different. The upper 32 bits get zeroed out. Here, the little circle is concatenation. Also, all of these instructions also set all 5 status flags, so really the specification looks like this. Not really something we necessarily want to write by hand, or can easily get right.

Related Work Manual partial specifications Taly/Godefroid [PLDI’12] CompCert [CACM’09], BAP [CAV’11], BitBlaze [ICISS’08], Codesurfer/x86 [ETAPS’05], McVeto [CAV’10], STOKE [ASPLOS’13], Jakstab [CAV’08], many others Taly/Godefroid [PLDI’12] Automatically synthesize specification from templates Only 534 instructions

Automatically Learn a Specification for the x86-64 ISA Bit-vector formulas of input-output behavior

Strategy: Split Instruction Set All instructions Base set Specify manually Remaining Instructions Learn specification automatically

Strategy: Using Program Synthesis combine base formulas Program 𝑝 Instruction 𝑖 synthesize Formula 𝜙 How do we synthesize programs? Formal guarantee? 𝑖≡𝜙 1 2

Strategy: Using Program Synthesis combine base formulas Program 𝑝 Instruction 𝑖 synthesize Formula 𝜙 How do we synthesize programs? Randomized search Guided by cost function Based on test-cases Using STOKE [ASPLOS’13] 1

Strategy: Using Program Synthesis combine base formulas Program 𝑝 Instruction 𝑖 synthesize Formula 𝜙 Formal guarantee? 𝑖≡𝜙 𝑝≡𝜙 2

Strategy: Using Program Synthesis combine base formulas Program 𝑝 Instruction 𝑖 synthesize Formula 𝜙 Formal guarantee? 𝑖≡𝜙 𝑖≡𝑝≡𝜙 2

Strategy: Using Program Synthesis combine base formulas Program 𝑝 Instruction 𝑖 synthesize Candidate formula 𝜙 Formal guarantee? 𝑖≡𝜙 𝑖≡𝑝≡𝜙 2

Strategy: Using Program Synthesis combine base formulas Program 𝑝 Instruction 𝑖 synthesize Candidate formula 𝜙 Candidate formula 𝜙′ Program 𝑝′ … Candidate formula 𝜙′′ yes ✔ increase confidence 𝜙 ? 𝜙′ no Add counter example, remove wrong program(s)

Solver Imprecision Increase confidence yes no Remove incorrect program(s) 𝜙 ? 𝜙′ unknown timeout No information about equivalence

Solver Imprecision Increase confidence yes no Remove incorrect program(s) 𝜙 ? 𝜙′ unknown timeout No information about equivalence

Solver Imprecision Increase confidence yes no Remove incorrect program(s) 𝜙 ? 𝜙′ unknown timeout No information about equivalence Equivalence class 1 Equivalence class 2

Picking a Program Prefer programs whose formulas are Equivalence class 1 Equivalence class 2 Equivalence class 3 Prefer programs whose formulas are Precise (fewest uninterpreted functions) Fast (fewest non-linear arithmetic operations) Simple (fewest nodes)

Picking a Program Prefer programs whose formulas are Equivalence class 1 Equivalence class 2 Equivalence class 3 Prefer programs whose formulas are Precise (fewest uninterpreted functions) Fast (fewest non-linear arithmetic operations) Simple (fewest nodes)

Problem: Synthesis Limitations synthesize

Solution: Stratified Search

Generalizing Formulas Learn addw %ax, %dx dx←dx + 16 ax Rename addw %cx, %bx bx←bx + 16 cx ✔ addw (%rsp), %dx dx←dx + 16 M rsp ✔ addw $0x5, %dx dx←dx + 16 5 16 ✔

Generalization Summary Learn formula for register-only instructions Generalize formulas To other types of operands Check on test inputs

What if Generalization Impossible? shufps $0xb3, %xmm0, %xmm1 Problem: No corresponding register-only variant Solution: Brute force a formula for every constant

Experiment Base set (51 instructions) Integer, bitwise and float operations Data movement (including conditional move) Conversion operations Pseudo instructions (11 templates) Split and combine registers Changing status flags

Goal Total instructions 3,684 Out-of-scope Goal instructions 2,918 System instructions 302 Crypto instructions 35 Deprecated instructions 332 String instructions 97 Goal instructions 2,918 invpcid, jle aeskeygenassist fadd scasq

Results Base set 51 Pseudo instructions 11 Register-only instructions learned 692 Generalized 984 8-bit constant instructions learned 119.42 Total formulas learned 1,795.42

Evaluation: Are the Formulas Correct? Compare with handwritten formulas (from STOKE) Available for comparison 1,431.91 Automatically proven equivalent 1,377.91 Equivalent with additional lemma 4

Evaluation: Are the Formulas Correct? Compare with handwritten formulas (from STOKE) Available for comparison 1,431.91 fadd 𝑎,𝑏 =fadd 𝑏,𝑎 Automatically proven equivalent 1,377.91 Equivalent with additional lemma 4

Evaluation: Are the Formulas Correct? Compare with handwritten formulas (from STOKE) Available for comparison 1,431.91 Automatically proven equivalent 1,377.91 Equivalent with additional lemma 4 Semantically different 50 Handwritten formula correct Learned formula correct 50

Evaluation: Is Stratification Necessary? Stratum 0 Stratum 1 Stratum 2 Stratum 3 base set stratum 𝑖 = 0 if 𝑖∈baseset 1+ max 𝑖 ′ ∈𝑀(𝑖) stratum i ′ otherwise

Evaluation: Is Stratification Necessary? stratum 𝑖 = 0 if 𝑖∈baseset 1+ max 𝑖 ′ ∈𝑀(𝑖) stratum i ′ otherwise

Evaluation: Is Stratification Necessary?

Evaluation: Size of Learned Formulas Fully inlined: 3526 instructions number of nodes in learned formula number of nodes in handwritten formula

Conclusions Automatically learned 1,795 formulas Stratification key to scale program synthesis Compare to hand-written specification More correct, equally precise, same size Source code, formulas, experimental results https://github.com/StanfordPL/strata/

Backup Slides

Limitations Missing base instructions Program synthesis limits Some integer and floating point operations are missing Program synthesis limits Shortest known program is long and outside of reach e.g., byte-vectorized operation Cost function limitation For one bit of output, the cost function does not give enough signal Crazy instructions

SMT solver usage (Z3) Total decisions 7,075 Equivalent 6,669 (94.26%) New equivalence class 356 (5.03%) Counter-examples 50 (0.71%) Timeouts (45 seconds): 3

Experiment Details Intel Xeon E5-2697 (28 cores) at 2.6 GHz 268.86 hours (register-only) 159.12 hours (8-bit constants) Total of 11,983.37 core hours

Test Cases Random inputs (random machine state) “Interesting” bit-patterns 0, 1, −1, 2 𝑛 , NaN, Infinity Test cases learned from counter-examples

Evaluation: Simplifcation Formulas are simplified Constant propagation Move bit-selection over concatenation 2 64 ∗ 64 4 64 ≡ 8 64 0 64 ∘rax 63,0 ≡ rax

Evaluation: Formula Precision Formula precision (number of uninterpreted functions) Learned formulas equally precise in all but 4 cases Formula quality (number of non-linear operations) Learned formulas contain same number of non-linear operations, except for 11 cases