Download presentation
Presentation is loading. Please wait.
1
Reverse-Engineering Instruction Encodings Wilson Hsieh, University of Utah Dawson Engler, Stanford University Godmar Back, University of Utah
2
Reverse-Engineering Instruction Encodings USENIX ‘01 What’s the Problem? Dynamic code generation, JIT compilation –Emit instructions quickly –Therefore, avoid assembler Need to know how to produce binary instructions Want to express instructions in assembly “Generate add %l1, %l2, %l1 for SPARC”
3
Reverse-Engineering Instruction Encodings USENIX ‘01 What Do We Do? How can I get the following mapping: assembly instruction binary format That mapping exists in the assembler already! assembler assembly instruction binary instruction So let’s reverse-engineer it out of the assembler.
4
Reverse-Engineering Instruction Encodings USENIX ‘01 DERIVE Tool Chain instruction description DERIVE assembler encoding description code emitter JIT compiler code emitter generator debugger disassembler instruction description
5
Reverse-Engineering Instruction Encodings USENIX ‘01 Instruction Descriptions /* SPARC fragment */ iregs = ( %g0, %g1, %g2,..., %i6, %i7 ); and, andcc, andn,... &op& r_1:iregs, r_2:iregs, r_dest:iregs | &op& r_1:iregs, imm, r_dest:iregs ; ba, bn, bne, … &op& &label& | &op&”,a” &label& ;
6
Reverse-Engineering Instruction Encodings USENIX ‘01 DERIVE Tool Chain instruction description DERIVE assembler encoding description code emitter JIT compiler code emitter generator debugger disassembler
7
Reverse-Engineering Instruction Encodings USENIX ‘01 Encoding Descriptions /* MIPS breakpoint instruction */ { “break”, “&op& imm”, 1, /* operand */ 4, /* bytes */... { 0xd, 0x0, 0x0, 0x0, }, /* opcode information */ { /* operand information */ { “imm”, /* name */ IMMED, /* an immediate */ IDENT, /* encoded value = input value */ 0, /* lowest value */ 10, /* length */... 16, /* bit offset */ I_UNSIGNED, /* unsigned field */... }, } }
8
Reverse-Engineering Instruction Encodings USENIX ‘01 DERIVE Tool Chain instruction description DERIVE assembler encoding description code emitter JIT compiler code emitter generator debugger disassembler
9
Reverse-Engineering Instruction Encodings USENIX ‘01 Code Emitters /* x86 addl instruction */ #define E_addl_rr_1(_code, rf, rt) do {\ register unsigned short _0 = (0xc001\ | ((((rf)) << 11))\ | (((rt)) << 8)));\ *(unsigned short*)((char*) _code) = _0;\ _code = (void *)((char *) _code + 2);\ } while (0) /* emit “addl %ecx, %ebx” in code_buffer */ E_addl_rr_1(code_buffer, REGecx, REGebx);
10
Reverse-Engineering Instruction Encodings USENIX ‘01 Instruction Model Opcode Registers (names) –Register sets –Cache prefetch hints on MIPS –Address scale on x86 Immediates (integers) –Not registers Labels (jump targets) –Absolute jumps –Relative jumps OPCODEOPCODE ARG1ARG1 ARG2ARG2 ARG3ARG3 0 31
11
Reverse-Engineering Instruction Encodings USENIX ‘01 Overall Strategy Solve for one field at a time –Hold other fields fixed and vary the desired field –Use randomization when necessary to find legal values Anything that is not in a field is the opcode
12
Reverse-Engineering Instruction Encodings USENIX ‘01 Intuition Behind DERIVE Assembly instructionBinary encoding and %g7, %g6, %g0 ; 0x8009 0xc006 and %g7, %g6, %g1 ; 0x8209 0xc006 and %g7, %g6, %g2 ; 0x8409 0xc006 and %g7, %g6, %g3 ; 0x8609 0xc006 and %g7, %g6, %g4 ; 0x8809 0xc006 and %g7, %g6, %g5 ; 0x8a09 0xc006 and %g7, %g6, %g6 ; 0x8c09 0xc006 and %g7, %g6, %g7 ; 0x8e09 0xc006 and %g7, %g6, %o0 ; 0x9009 0xc006 and %g7, %g6, %o1 ; 0x9209 0xc006 and %g7, %g6, %o2 ; 0x9409 0xc006 and %g7, %g6, %o3 ; 0x9609 0xc006 and %g7, %g6, %o4 ; 0x9809 0xc006
13
Reverse-Engineering Instruction Encodings USENIX ‘01 DERIVE Structure Field TypeSolver register fieldsregister solver absolute jump targetsimmediate solver immediate fieldsimmediate solver relative jump targetsjump solver
14
Reverse-Engineering Instruction Encodings USENIX ‘01 Register Solver Primary assumptions (for purposes of the talk): –Register fields are independent –All register values are legal Enumerate registers for one field at a time –Hold other fields constant –Solve each field separately Example: 3 register fields, 5 bits per field –2^5 * 3 = 32 * 3 = 96 combinations
15
Reverse-Engineering Instruction Encodings USENIX ‘01 Intuition Behind DERIVE Assembly instructionBinary encoding and %g7, %g6, %g0 ; 0x8009 0xc006 and %g7, %g6, %g1 ; 0x8209 0xc006 and %g7, %g6, %g2 ; 0x8409 0xc006 and %g7, %g6, %g3 ; 0x8609 0xc006 and %g7, %g6, %g4 ; 0x8809 0xc006 and %g7, %g6, %g5 ; 0x8a09 0xc006 and %g7, %g6, %g6 ; 0x8c09 0xc006 and %g7, %g6, %g7 ; 0x8e09 0xc006 and %g7, %g6, %o0 ; 0x9009 0xc006 and %g7, %g6, %o1 ; 0x9209 0xc006 and %g7, %g6, %o2 ; 0x9409 0xc006 and %g7, %g6, %o3 ; 0x9609 0xc006 and %g7, %g6, %o4 ; 0x9809 0xc006
16
Reverse-Engineering Instruction Encodings USENIX ‘01 Immediate Solver Primary assumptions: –Immediate field is a single range of bits in instruction Explore each bit size to find encoding of one field –Values of 1, 2, 4, 8, 16,... –Again, hold other fields constant Example: 10-bit immediate field –10 combinations
17
Reverse-Engineering Instruction Encodings USENIX ‘01 Jump Solver Primary assumptions: –Label field is a single range of bits Emit jumps to different offsets –Find where label goes for encoding of “0” –Find smallest jump size –Find high bit by emitting a negative-valued jump
18
Reverse-Engineering Instruction Encodings USENIX ‘01 Solving Time ProcessorRun Time (minutes) Description (lines) Alpha6.3104 ARM~43.227 MIPS2.581 PowerPC4.8186 SPARC4.897 x86~240.221 x86-kaffe4.9106
19
Reverse-Engineering Instruction Encodings USENIX ‘01 Instruction Emitter Generator Reads in DERIVE-generated specifications Produces C macros –Can generate runtime checks –Debugging support –Handles multiple instruction encodings –“Linkage” macros for backpatching Used to retarget Kaffe (publicly available JVM) on x86 –Reduced backend description from 2084 1267 lines (40%)
20
Reverse-Engineering Instruction Encodings USENIX ‘01 Extensions Can handle instructions that take a subset of registers –SPARC double-word loads Special encodings that are register-dependent –%eax on x86 Can handle simple transformations –Low bits dropped off of jump offsets User can specify transformations –Address scaling on x86 User can specify registers that are dependent –PowerPC post-increment instructions
21
Reverse-Engineering Instruction Encodings USENIX ‘01 Future Work Extending DERIVE –Fields that are broken up into multiple bit ranges –Memoization of computations ATOM-like tools –Reverse-engineering linkers
22
Reverse-Engineering Instruction Encodings USENIX ‘01 Related Work Instruction encoding munging –NJ Toolkit [Ramsey & Fernández, USENIX 1995] Testing assemblers –NJ Toolkit [Fernández and Ramsey, ICSE 1997] Reverse engineering compiler technology –Retarget back-end generators [Collberg, PLDI 1997]
23
Reverse-Engineering Instruction Encodings USENIX ‘01 Summary DERIVE is a cool hack, but it isn’t just a hack. –It is a useful tool. –It is a good proof of concept. –We did some clever tricks to build it. http://www.cs.utah.edu/~wilson/derive.tar.gz
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.