Reverse-Engineering Instruction Encodings Wilson Hsieh, University of Utah Dawson Engler, Stanford University Godmar Back, University of Utah.

Slides:



Advertisements
Similar presentations
Instruction Set Design
Advertisements

Goal: Write Programs in Assembly
1 Lecture 3: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation.
1 ECE462/562 ISA and Datapath Review Ali Akoglu. 2 Instruction Set Architecture A very important abstraction –interface between hardware and low-level.
Fall EE 333 Lillevik 333f06-l4 University of Portland School of Engineering Computer Organization Lecture 4 Assembly language programming ALU and.
Instruction Set Architecture Classification According to the type of internal storage in a processor the basic types are Stack Accumulator General Purpose.
Assembly Process. Machine Code Generation Assembling a program entails translating the assembly language into binary machine code This requires more than.
Welcome to Systems Software The purpose of this course is to provide background in fundamental types of system software, particularly assemblers, loaders,
Instructions Set Bo Cheng Instruction Set Design An Instruction Set provides a functional description of a processor. It is the visible.
Some thoughts: If it is too good to be true, it isn’t. Success is temporary. It is hard work to make it simple. Knowing you did it right is enough reward.
Instruction Representation II (1) Fall 2007 Lecture 10: Instruction Representation II.
Lecture 5 Sept 14 Goals: Chapter 2 continued MIPS assembly language instruction formats translating c into MIPS - examples.
Instruction Representation II (1) Fall 2005 Lecture 10: Instruction Representation II.
Table 1. Software Hierarchy Levels.. Essential Tools An assembler is a program that converts source-code programs into a machine language (object file).
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
CS 300 – Lecture 6 Intro to Computer Architecture / Assembly Language Instructions.
ISA-2 CSCE430/830 MIPS: Case Study of Instruction Set Architecture CSCE430/830 Computer Architecture Instructor: Hong Jiang Courtesy of Prof. Yifeng Zhu.
Chapter 2 Software Tools and Assembly Language Syntax.
4-1 Chapter 4 - The Instruction Set Architecture Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring.
Chapter 3 Elements of Assembly Language. 3.1 Assembly Language Statements.
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix A Authors: John Hennessy & David Patterson.
Lecture Objectives: 1)Define the terms least significant bit and most significant bit. 2)Explain how unsigned integer numbers are represented in memory.
Introduction: Exploiting Linux. Basic Concepts Vulnerability A flaw in a system that allows an attacker to do something the designer did not intend,
Memory and Addressing How and Where Information is Stored.
CS 147 June 13, 2001 Levels of Programming Languages Svetlana Velyutina.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
1 CS/COE0447 Computer Organization & Assembly Language Chapter 2 Part 4.
4-1 Chapter 4 - The Instruction Set Architecture Principles of Computer Architecture by M. Murdocca and V. Heuring © 1999 M. Murdocca and V. Heuring Principles.
26-Nov-15 (1) CSC Computer Organization Lecture 6: Pentium IA-32.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /08/2013 Lecture 10: MIPS Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL STATE.
Computer Organization Rabie A. Ramadan Lecture 3.
Chapter 2 — Instructions: Language of the Computer — 1 Memory Operands Main memory used for composite data – Arrays, structures, dynamic data To apply.
The Assembly Process Computer Organization and Assembly Language: Module 10.
CS Computer Organization Numbers and Instructions Dr. Stephen P. Carl.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
CSCI206 - Computer Organization & Programming
IA32 Processors Evolutionary Design
Microprocessor Systems Design I
Morgan Kaufmann Publishers
Microprocessor Systems Design I
MIPS Coding Continued.
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
Computer Architecture and Organization Miles Murdocca and Vincent Heuring Chapter 4 – The Instruction Set Architecture.
Conditional Branches What distinguishes a computer from a simple calculator is its ability to make decisions Decisions are made using the if statement,
William Stallings Computer Organization and Architecture 8th Edition
Instruction Format MIPS Instruction Set.
The University of Adelaide, School of Computer Science
THE sic mACHINE CSCI/CMPE 3334 David Egle.
CSCI206 - Computer Organization & Programming
The University of Adelaide, School of Computer Science
C Prog. To Object Code text text binary binary Code in files p1.c p2.c
Computer Architecture & Operations I
MIPS Instruction Encoding
The University of Adelaide, School of Computer Science
ECE232: Hardware Organization and Design
MIPS Instruction Encoding
Instruction encoding The ISA defines Format = Encoding
Computer Architecture
COMS 361 Computer Organization
COMS 361 Computer Organization
COMS 361 Computer Organization
Instruction Format MIPS Instruction Set.
Welcome to Systems Software
MIPS Coding Continued.
Lecture 4: Instruction Set Design/Pipelining
Program Assembly.
MIPS Arithmetic and Logic Instructions
Presentation transcript:

Reverse-Engineering Instruction Encodings Wilson Hsieh, University of Utah Dawson Engler, Stanford University Godmar Back, University of Utah

Reverse-Engineering Instruction Encodings USENIX ‘01 What’s the Problem?  Dynamic code generation, JIT compilation –Emit instructions quickly –Therefore, avoid assembler  Need to know how to produce binary instructions  Want to express instructions in assembly “Generate add %l1, %l2, %l1 for SPARC”

Reverse-Engineering Instruction Encodings USENIX ‘01 What Do We Do?  How can I get the following mapping: assembly instruction  binary format  That mapping exists in the assembler already! assembler assembly instruction binary instruction  So let’s reverse-engineer it out of the assembler.

Reverse-Engineering Instruction Encodings USENIX ‘01 DERIVE Tool Chain instruction description DERIVE assembler encoding description code emitter JIT compiler code emitter generator debugger disassembler instruction description

Reverse-Engineering Instruction Encodings USENIX ‘01 Instruction Descriptions /* SPARC fragment */ iregs = ( %g0, %g1, %g2,..., %i6, %i7 ); and, andcc, andn,...  &op& r_1:iregs, r_2:iregs, r_dest:iregs | &op& r_1:iregs, imm, r_dest:iregs ; ba, bn, bne, …  &op& &label& | &op&”,a” &label& ;

Reverse-Engineering Instruction Encodings USENIX ‘01 DERIVE Tool Chain instruction description DERIVE assembler encoding description code emitter JIT compiler code emitter generator debugger disassembler

Reverse-Engineering Instruction Encodings USENIX ‘01 Encoding Descriptions /* MIPS breakpoint instruction */ { “break”, “&op& imm”, 1, /* operand */ 4, /* bytes */... { 0xd, 0x0, 0x0, 0x0, }, /* opcode information */ { /* operand information */ { “imm”, /* name */ IMMED, /* an immediate */ IDENT, /* encoded value = input value */ 0, /* lowest value */ 10, /* length */... 16, /* bit offset */ I_UNSIGNED, /* unsigned field */... }, } }

Reverse-Engineering Instruction Encodings USENIX ‘01 DERIVE Tool Chain instruction description DERIVE assembler encoding description code emitter JIT compiler code emitter generator debugger disassembler

Reverse-Engineering Instruction Encodings USENIX ‘01 Code Emitters /* x86 addl instruction */ #define E_addl_rr_1(_code, rf, rt) do {\ register unsigned short _0 = (0xc001\ | ((((rf)) << 11))\ | (((rt)) << 8)));\ *(unsigned short*)((char*) _code) = _0;\ _code = (void *)((char *) _code + 2);\ } while (0) /* emit “addl %ecx, %ebx” in code_buffer */ E_addl_rr_1(code_buffer, REGecx, REGebx);

Reverse-Engineering Instruction Encodings USENIX ‘01 Instruction Model  Opcode  Registers (names) –Register sets –Cache prefetch hints on MIPS –Address scale on x86  Immediates (integers) –Not registers  Labels (jump targets) –Absolute jumps –Relative jumps OPCODEOPCODE ARG1ARG1 ARG2ARG2 ARG3ARG3 0 31

Reverse-Engineering Instruction Encodings USENIX ‘01 Overall Strategy  Solve for one field at a time –Hold other fields fixed and vary the desired field –Use randomization when necessary to find legal values  Anything that is not in a field is the opcode

Reverse-Engineering Instruction Encodings USENIX ‘01 Intuition Behind DERIVE Assembly instructionBinary encoding and %g7, %g6, %g0 ; 0x8009 0xc006 and %g7, %g6, %g1 ; 0x8209 0xc006 and %g7, %g6, %g2 ; 0x8409 0xc006 and %g7, %g6, %g3 ; 0x8609 0xc006 and %g7, %g6, %g4 ; 0x8809 0xc006 and %g7, %g6, %g5 ; 0x8a09 0xc006 and %g7, %g6, %g6 ; 0x8c09 0xc006 and %g7, %g6, %g7 ; 0x8e09 0xc006 and %g7, %g6, %o0 ; 0x9009 0xc006 and %g7, %g6, %o1 ; 0x9209 0xc006 and %g7, %g6, %o2 ; 0x9409 0xc006 and %g7, %g6, %o3 ; 0x9609 0xc006 and %g7, %g6, %o4 ; 0x9809 0xc006

Reverse-Engineering Instruction Encodings USENIX ‘01 DERIVE Structure Field TypeSolver register fieldsregister solver absolute jump targetsimmediate solver immediate fieldsimmediate solver relative jump targetsjump solver

Reverse-Engineering Instruction Encodings USENIX ‘01 Register Solver  Primary assumptions (for purposes of the talk): –Register fields are independent –All register values are legal  Enumerate registers for one field at a time –Hold other fields constant –Solve each field separately  Example: 3 register fields, 5 bits per field –2^5 * 3 = 32 * 3 = 96 combinations

Reverse-Engineering Instruction Encodings USENIX ‘01 Intuition Behind DERIVE Assembly instructionBinary encoding and %g7, %g6, %g0 ; 0x8009 0xc006 and %g7, %g6, %g1 ; 0x8209 0xc006 and %g7, %g6, %g2 ; 0x8409 0xc006 and %g7, %g6, %g3 ; 0x8609 0xc006 and %g7, %g6, %g4 ; 0x8809 0xc006 and %g7, %g6, %g5 ; 0x8a09 0xc006 and %g7, %g6, %g6 ; 0x8c09 0xc006 and %g7, %g6, %g7 ; 0x8e09 0xc006 and %g7, %g6, %o0 ; 0x9009 0xc006 and %g7, %g6, %o1 ; 0x9209 0xc006 and %g7, %g6, %o2 ; 0x9409 0xc006 and %g7, %g6, %o3 ; 0x9609 0xc006 and %g7, %g6, %o4 ; 0x9809 0xc006

Reverse-Engineering Instruction Encodings USENIX ‘01 Immediate Solver  Primary assumptions: –Immediate field is a single range of bits in instruction  Explore each bit size to find encoding of one field –Values of 1, 2, 4, 8, 16,... –Again, hold other fields constant  Example: 10-bit immediate field –10 combinations

Reverse-Engineering Instruction Encodings USENIX ‘01 Jump Solver  Primary assumptions: –Label field is a single range of bits  Emit jumps to different offsets –Find where label goes for encoding of “0” –Find smallest jump size –Find high bit by emitting a negative-valued jump

Reverse-Engineering Instruction Encodings USENIX ‘01 Solving Time ProcessorRun Time (minutes) Description (lines) Alpha ARM~ MIPS2.581 PowerPC SPARC4.897 x86~ x86-kaffe4.9106

Reverse-Engineering Instruction Encodings USENIX ‘01 Instruction Emitter Generator  Reads in DERIVE-generated specifications  Produces C macros –Can generate runtime checks –Debugging support –Handles multiple instruction encodings –“Linkage” macros for backpatching  Used to retarget Kaffe (publicly available JVM) on x86 –Reduced backend description from 2084  1267 lines (40%)

Reverse-Engineering Instruction Encodings USENIX ‘01 Extensions  Can handle instructions that take a subset of registers –SPARC double-word loads  Special encodings that are register-dependent –%eax on x86  Can handle simple transformations –Low bits dropped off of jump offsets  User can specify transformations –Address scaling on x86  User can specify registers that are dependent –PowerPC post-increment instructions

Reverse-Engineering Instruction Encodings USENIX ‘01 Future Work  Extending DERIVE –Fields that are broken up into multiple bit ranges –Memoization of computations  ATOM-like tools –Reverse-engineering linkers

Reverse-Engineering Instruction Encodings USENIX ‘01 Related Work  Instruction encoding munging –NJ Toolkit [Ramsey & Fernández, USENIX 1995]  Testing assemblers –NJ Toolkit [Fernández and Ramsey, ICSE 1997]  Reverse engineering compiler technology –Retarget back-end generators [Collberg, PLDI 1997]

Reverse-Engineering Instruction Encodings USENIX ‘01 Summary  DERIVE is a cool hack, but it isn’t just a hack. –It is a useful tool. –It is a good proof of concept. –We did some clever tricks to build it. 