Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reverse-Engineering Instruction Encodings Wilson Hsieh, University of Utah Dawson Engler, Stanford University Godmar Back, University of Utah.

Similar presentations


Presentation on theme: "Reverse-Engineering Instruction Encodings Wilson Hsieh, University of Utah Dawson Engler, Stanford University Godmar Back, University of Utah."— Presentation transcript:

1 Reverse-Engineering Instruction Encodings Wilson Hsieh, University of Utah Dawson Engler, Stanford University Godmar Back, University of Utah

2 Reverse-Engineering Instruction Encodings USENIX ‘01 What’s the Problem?  Dynamic code generation, JIT compilation –Emit instructions quickly –Therefore, avoid assembler  Need to know how to produce binary instructions  Want to express instructions in assembly “Generate add %l1, %l2, %l1 for SPARC”

3 Reverse-Engineering Instruction Encodings USENIX ‘01 What Do We Do?  How can I get the following mapping: assembly instruction  binary format  That mapping exists in the assembler already! assembler assembly instruction binary instruction  So let’s reverse-engineer it out of the assembler.

4 Reverse-Engineering Instruction Encodings USENIX ‘01 DERIVE Tool Chain instruction description DERIVE assembler encoding description code emitter JIT compiler code emitter generator debugger disassembler instruction description

5 Reverse-Engineering Instruction Encodings USENIX ‘01 Instruction Descriptions /* SPARC fragment */ iregs = ( %g0, %g1, %g2,..., %i6, %i7 ); and, andcc, andn,...  &op& r_1:iregs, r_2:iregs, r_dest:iregs | &op& r_1:iregs, imm, r_dest:iregs ; ba, bn, bne, …  &op& &label& | &op&”,a” &label& ;

6 Reverse-Engineering Instruction Encodings USENIX ‘01 DERIVE Tool Chain instruction description DERIVE assembler encoding description code emitter JIT compiler code emitter generator debugger disassembler

7 Reverse-Engineering Instruction Encodings USENIX ‘01 Encoding Descriptions /* MIPS breakpoint instruction */ { “break”, “&op& imm”, 1, /* operand */ 4, /* bytes */... { 0xd, 0x0, 0x0, 0x0, }, /* opcode information */ { /* operand information */ { “imm”, /* name */ IMMED, /* an immediate */ IDENT, /* encoded value = input value */ 0, /* lowest value */ 10, /* length */... 16, /* bit offset */ I_UNSIGNED, /* unsigned field */... }, } }

8 Reverse-Engineering Instruction Encodings USENIX ‘01 DERIVE Tool Chain instruction description DERIVE assembler encoding description code emitter JIT compiler code emitter generator debugger disassembler

9 Reverse-Engineering Instruction Encodings USENIX ‘01 Code Emitters /* x86 addl instruction */ #define E_addl_rr_1(_code, rf, rt) do {\ register unsigned short _0 = (0xc001\ | ((((rf)) << 11))\ | (((rt)) << 8)));\ *(unsigned short*)((char*) _code) = _0;\ _code = (void *)((char *) _code + 2);\ } while (0) /* emit “addl %ecx, %ebx” in code_buffer */ E_addl_rr_1(code_buffer, REGecx, REGebx);

10 Reverse-Engineering Instruction Encodings USENIX ‘01 Instruction Model  Opcode  Registers (names) –Register sets –Cache prefetch hints on MIPS –Address scale on x86  Immediates (integers) –Not registers  Labels (jump targets) –Absolute jumps –Relative jumps OPCODEOPCODE ARG1ARG1 ARG2ARG2 ARG3ARG3 0 31

11 Reverse-Engineering Instruction Encodings USENIX ‘01 Overall Strategy  Solve for one field at a time –Hold other fields fixed and vary the desired field –Use randomization when necessary to find legal values  Anything that is not in a field is the opcode

12 Reverse-Engineering Instruction Encodings USENIX ‘01 Intuition Behind DERIVE Assembly instructionBinary encoding and %g7, %g6, %g0 ; 0x8009 0xc006 and %g7, %g6, %g1 ; 0x8209 0xc006 and %g7, %g6, %g2 ; 0x8409 0xc006 and %g7, %g6, %g3 ; 0x8609 0xc006 and %g7, %g6, %g4 ; 0x8809 0xc006 and %g7, %g6, %g5 ; 0x8a09 0xc006 and %g7, %g6, %g6 ; 0x8c09 0xc006 and %g7, %g6, %g7 ; 0x8e09 0xc006 and %g7, %g6, %o0 ; 0x9009 0xc006 and %g7, %g6, %o1 ; 0x9209 0xc006 and %g7, %g6, %o2 ; 0x9409 0xc006 and %g7, %g6, %o3 ; 0x9609 0xc006 and %g7, %g6, %o4 ; 0x9809 0xc006

13 Reverse-Engineering Instruction Encodings USENIX ‘01 DERIVE Structure Field TypeSolver register fieldsregister solver absolute jump targetsimmediate solver immediate fieldsimmediate solver relative jump targetsjump solver

14 Reverse-Engineering Instruction Encodings USENIX ‘01 Register Solver  Primary assumptions (for purposes of the talk): –Register fields are independent –All register values are legal  Enumerate registers for one field at a time –Hold other fields constant –Solve each field separately  Example: 3 register fields, 5 bits per field –2^5 * 3 = 32 * 3 = 96 combinations

15 Reverse-Engineering Instruction Encodings USENIX ‘01 Intuition Behind DERIVE Assembly instructionBinary encoding and %g7, %g6, %g0 ; 0x8009 0xc006 and %g7, %g6, %g1 ; 0x8209 0xc006 and %g7, %g6, %g2 ; 0x8409 0xc006 and %g7, %g6, %g3 ; 0x8609 0xc006 and %g7, %g6, %g4 ; 0x8809 0xc006 and %g7, %g6, %g5 ; 0x8a09 0xc006 and %g7, %g6, %g6 ; 0x8c09 0xc006 and %g7, %g6, %g7 ; 0x8e09 0xc006 and %g7, %g6, %o0 ; 0x9009 0xc006 and %g7, %g6, %o1 ; 0x9209 0xc006 and %g7, %g6, %o2 ; 0x9409 0xc006 and %g7, %g6, %o3 ; 0x9609 0xc006 and %g7, %g6, %o4 ; 0x9809 0xc006

16 Reverse-Engineering Instruction Encodings USENIX ‘01 Immediate Solver  Primary assumptions: –Immediate field is a single range of bits in instruction  Explore each bit size to find encoding of one field –Values of 1, 2, 4, 8, 16,... –Again, hold other fields constant  Example: 10-bit immediate field –10 combinations

17 Reverse-Engineering Instruction Encodings USENIX ‘01 Jump Solver  Primary assumptions: –Label field is a single range of bits  Emit jumps to different offsets –Find where label goes for encoding of “0” –Find smallest jump size –Find high bit by emitting a negative-valued jump

18 Reverse-Engineering Instruction Encodings USENIX ‘01 Solving Time ProcessorRun Time (minutes) Description (lines) Alpha6.3104 ARM~43.227 MIPS2.581 PowerPC4.8186 SPARC4.897 x86~240.221 x86-kaffe4.9106

19 Reverse-Engineering Instruction Encodings USENIX ‘01 Instruction Emitter Generator  Reads in DERIVE-generated specifications  Produces C macros –Can generate runtime checks –Debugging support –Handles multiple instruction encodings –“Linkage” macros for backpatching  Used to retarget Kaffe (publicly available JVM) on x86 –Reduced backend description from 2084  1267 lines (40%)

20 Reverse-Engineering Instruction Encodings USENIX ‘01 Extensions  Can handle instructions that take a subset of registers –SPARC double-word loads  Special encodings that are register-dependent –%eax on x86  Can handle simple transformations –Low bits dropped off of jump offsets  User can specify transformations –Address scaling on x86  User can specify registers that are dependent –PowerPC post-increment instructions

21 Reverse-Engineering Instruction Encodings USENIX ‘01 Future Work  Extending DERIVE –Fields that are broken up into multiple bit ranges –Memoization of computations  ATOM-like tools –Reverse-engineering linkers

22 Reverse-Engineering Instruction Encodings USENIX ‘01 Related Work  Instruction encoding munging –NJ Toolkit [Ramsey & Fernández, USENIX 1995]  Testing assemblers –NJ Toolkit [Fernández and Ramsey, ICSE 1997]  Reverse engineering compiler technology –Retarget back-end generators [Collberg, PLDI 1997]

23 Reverse-Engineering Instruction Encodings USENIX ‘01 Summary  DERIVE is a cool hack, but it isn’t just a hack. –It is a useful tool. –It is a good proof of concept. –We did some clever tricks to build it.  http://www.cs.utah.edu/~wilson/derive.tar.gz


Download ppt "Reverse-Engineering Instruction Encodings Wilson Hsieh, University of Utah Dawson Engler, Stanford University Godmar Back, University of Utah."

Similar presentations


Ads by Google